The best AI chatbots of 2025: ChatGPT, Copilot, and others worth trying – Yahoo! Tech
Manage your account
AI
Audio
Computing
Gaming
Home entertainment
Phones
Puzzle hints
Science
Streaming
Tech news
VPN
Deals
More
…
Free AI chatbots deliver more power than ever before.
ChatGPT, Copilot, and Grok top our performance rankings.
Image generation and storytelling now rival premium AIs.
The introduction of the first successful AI chatbot in 2022 was a tech quake on the scale of the introduction of the internet itself and the smartphone. The reality of its existence changed reality itself.
Also: You’re reading more AI-generated content than you think
You know the story since then. AI chatbots have become hugely popular, often saving folks a lot of work, while also putting jobs at risk. They have transformed education, writing, coding, and more.
What is the best AI chatbot right now?
ChatGPT is the OG chatbot. This is the AI that shook up the world. The company has been innovating ever since, and its latest free offering shows that. Also, because ChatGPT is the market leader, there are many resources available for it, including tons of articles, many books, courses, free training videos, and more.
Also: I’m an AI tools expert, and these are the 4 I pay for now (plus 2 I’m eyeing)
With a top overall score, ChatGPT is our overall winner. Let’s first explain our hands-on approach, tell you about a few surprises, and then we’ll explain why ChatGPT won the top spot. We’re also looking at Copilot, Grok, Gemini, Perplexity, Claude, DeepSeek, and Meta AI.
Hands-on with the best free chatbots
Here at ZDNET, we publish plenty of articles on the impact of AI. This one is meant to be more practical. It’s our hands-on, chatbot-by-chatbot comparison to help you decide which to use. We put each chatbot’s free tier to the test (a total of 112 individual tests), proving you don’t need to spend anything to gain access to billions of dollars of compute capability.
Rather than taking the easy way out and spewing a bunch of specs and model names at you, we approached the ranking process by running each chatbot through a series of real-world tests.
We’re also avoiding AI model mentions (like GPT-5 vs. GPT-5-mini) here because the AI companies treat their free AI tiers like gumbo. Gumbo is often a restaurant offering made of whatever meat, poultry, or seafood leftovers are available. While almost always tasty, there’s never a guarantee that the exact same gumbo experience will be repeated from day to day. Likewise, AI companies tend to provide whatever lower-resource-intensive models are available at the time to their free-tier users, and those models may change at any time.
Also: 10 ChatGPT prompt tricks I use – to get the best results, faster
Our tests consist of ten text-based questions encompassing summarization and web access, academic concept explanation, math and analysis, cultural discussion, literary analysis, travel itinerary, emotional support, translation and cultural relevance, a coding test, and a long-form story test. On one test, we ask the AIs to explain the academic concept to a five-year-old. There are also four image tests that include generating a flying aircraft carrier, a giant robot, a young baseball player in a medieval court, and an homage to the movie Back to the Future.
The details of the tests and the exact questions we asked are provided at the end of this article. That way, you can try our tests with any or all of the chatbots in your own browser window. If you do, let us know what you think of the results in the comments below.
Each chatbot is ranked on a 100-point scale for text-related prompts and a 20-point scale for image-related prompts. The overall scores are the sum of both score categories for a total of 120 points.
Big surprises
Doing the hands-on tests netted a number of fairly big surprises. We were particularly surprised by just how much value is being provided by the AI vendors for free.
We experienced almost no throttling through our series of 10 back-to-back prompts.
The second surprise was how much the AIs let you do without requiring you to create an account or log in.
The third big surprise was just the overall quality of the responses.
While some responses from bottom-of-the-list AIs seemed somewhat phoned in, the overall quality across the board has improved drastically since the last time we took a comprehensive look at free AI chatbot use.
We used each chatbot for a few hours straight, with little or no throttling. But if you want to use them constantly, all day every day, it’s likely you’ll hit some resource usage limits enforced by the AI vendors.
Most of the AIs have premium plans in addition to the free plans. These plans offer deeper thinking, more powerful AIs capable of solving bigger and more complex problems, with added features for things like more autonomous capabilities and in-depth programming support. Where appropriate, we’ve mentioned those plans and their prices.
And with that, let’s dive into our overall winner, ChatGPT.
The best AI chatbots of 2025 OpenAI’s ChatGPT Best AI chatbot overall
Overall score: 109
One thing we noticed is that about half of our text-based prompts were handled nearly perfectly by almost all of the chatbots we tested. These included the ability to explain a basic academic concept to a child, do math and analysis, provide a cultural discussion with context, perform a quick literary analysis, and translate text and provide context. ChatGPT aced all of these.
(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
Also: How to use ChatGPT: A beginner’s guide to the most popular AI chatbot
Where ChatGPT fell down was its ability to locate and summarize a current event. Our test sends the AIs to look at a Yahoo News article about the flu, and provide a summary. Perhaps because I was running it in an incognito window and hadn’t logged in, ChatGPT sent me to Yahoo’s Taiwanese news portal and presented its results in traditional Chinese (specifically used in Taiwan).
ChatGPT constructed a good tour for the travel itinerary test. It included many of the appropriate stops. It also included pictures for each day’s itinerary, and some clothing recommendations for March in the Northeast.
ChatGPT also aced my basic coding test. We’ll subject the chatbots to a comprehensive set of coding tests in a different article, but coding is worth ten points of the one hundred text points awarded in this evaluation.
For the long-context story assignment, ChatGPT lost a few points because it didn’t produce the 1,500 words required. Also, while it told a story with the right tone and style for the assignment, it presented much of the story as almost an outline, with headings for each main character.
While the image quality is subjective, ChatGPT did a good job with the image assignments. The character produced for the Back to the Future assignment is just a random kid, but it did show the correct text logo, a DeLorean, and the kid holding a skateboard.
Also: Is ChatGPT Plus still worth $20 when the free version offers so much – including GPT-5?
Overall, as the OG AI chatbot, ChatGPT’s free tier is a solid offering with a bunch of added features like standalone apps, a recently announced browser, and a lot of capability as you scale into its higher tiers.
Text score: 91 out of 100
Image score: 18 out of 20
Premium offerings: ChatGPT offers a Plus plan for $20-per-month and a Pro plan for $200-per-month. Both offer most of ChatGPT’s higher-end model features, but scale up the resource availability based on which plan you use.
Images generated using ChatGPT:
Image generation
Good code results
Large ecosystem
Gets very naggy asking for login
Wrong language response from web lookup
Show expert Take Show less Microsoft Copilot Best for Microsoft users
Overall score: 97
Copilot (formerly part of Bing) integrates with Microsoft products. While that’s the above-the-fold headline, the free version of Copilot is also a rather good standalone chatbot offering. Running logged out, in an incognito/private browsing mode, Copilot was the least naggy of all the AIs. It asked me to log in just once, and allowed me to proceed completely through my tests without either requiring me to log in or asking me again.
Also: How to remove Copilot from your Microsoft 365 plan – before you have to pay for it
Copilot’s free tier successfully did web access and looked up a current news story about the flu, although it pulled data from other articles, including a bird flu article in Canada, and something about an Australian woman who had an asthma flare-up. Both were related stories, but the AI did deviate from the assignment and lost points there.
It competently handled explaining an academic concept, identified a math sequence, discussed a cultural issue with context, and analyzed the key themes from a well-known book.
When it came to our vacation travel itinerary test, it not only pointed out appropriate stops and points of interest, but picked up on the prompt’s mention of going in March and identified some events happening in Boston in March. However, it did not recommend visiting the USS Constitution, which is a top-line historical point of interest, and it didn’t recommend anything regarding weather or clothing for the windy, cold month.
For our emotional support job interview jitters test, the chatbot gave a number of constructive suggestions, but also recommended doing your homework and thoroughly researching the company before the interview.
Copilot lost some points in coding. It not only missed edge cases, it also had some string handling errors and wrote code that had notable performance issues. For the company that produces the VS Code development environment, it’s a bit of a disappointment.
Copilot wrote a charming, engaging long-form story, fully meeting the requirements of the prompt, except for being 187 words short of our specified minimum. Still, it was a complete story that was well written and absolutely appropriate to the style implied by the prompt.
Image generation took a loooong time, more than five minutes each. The quality of images was good. The picture got the kid’s baseball uniform quite right, including the logo on the cap and even properly spelling “New York” on the shirt (something AIs have had difficulty with). It failed on the fourth, our Back to the Future-themed challenge, with a “I can’t generate that image because it would violate copyright policies” message. It did, however, create a fourth image (of a techno-witch), meaning we didn’t hit any resource limitation walls on the free tier.
Also: College students can get Microsoft Copilot free for a year – here’s how
Our take is that if you’re an active Microsoft user, you shouldn’t hesitate to use Copilot. If you’re just interested in a free AI chatbot, Copilot will do it for you as well. It’s our second-best ranked AI chatbot overall.
Text score: 87 out of 100
Image score: 10 out of 20
Premium offerings: Copilot has a $20-per-month Pro plan that provides access to more capabilities and provides AI features inside Microsoft 365 applications. There are also business plans, a $10-per-month Pro plan for developers, and an ever-increasing set of tiers and options for business users.
Images generated using Copilot:
Deep Microsoft integration on paid plans
Good responses overall
Web access
Slow image generation
Blocked image topics
Too many premium plans and options
Show expert Take Show less xAI Grok Best travel itinerary
Overall score: 96
Grok was definitely an underdog on our list. We certainly didn’t expect it to earn the third-place position on the winner’s podium. But it did.
Grok’s free offering absolutely aced our travel itinerary test question. It didn’t include images, but gave the most personal and usable itinerary of all of the chatbots. It included general pricing for various attractions, a very good mix of attractions and eating (mentioning my personal favorite, the Union Oyster House), discussed planning for the weather, and explained why certain items were chosen for each day. The response just felt the most “human” of all the itineraries I’ve seen.
Grok also displayed an interesting quirk that was kind of charming. The second test question in our series of ten asks the AI to explain educational constructivism to a five-year-old. AIs are often told to assume a style, and a classic test is “explain it like you would to a five-year-old.” In this test, Grok gave a short but usable answer to that question, but then went on to append explanations for five-year-olds to most of the other questions asked, including coding.
Its coding response is worth taking an extra moment to discuss. Code was generated by the AI, but it had a few minor bugs, including a whitespace bug, a leading zero bug, and a decimal bug. However, it added an explanation of the problems it was trying to fix, aimed at a five-year-old, which made the issue quite clear.
Also: Why xAI is giving you ‘limited’ free access to Grok 4
I still can’t decide if I think continuing the explain-to-a-five-year-old theme throughout the session was good conversational awareness, or overdone. For example, it correctly identified the Fibonacci sequence, and then went on to explain it at a five-year-old level. It did the same when it analyzed the themes in Game of Thrones’ A Song of Ice and Fire, which was somewhat strange considering how dark those themes are.
Grok skipped the kid-friendly discussion when it translated a sentence to Latin. It gave a very good explanation of the relevance of Latin in today’s society.
Grok was the only AI to report word count (1,512) for the long-form story project. It also hit on the proper themes, but it lost points because it seemed to try a little too hard to incorporate the prompt elements without truly integrating them into the story. At the end, it gave a summary of what it was about for a five-year-old.
When running in incognito mode and logged out, the image generator refused to do any image generation at all, saying it couldn’t. When I tried using Grok from my Twitter/X account, it produced all four, but they could have been better. The baseball player looked like he was in a Medieval Times restaurant rather than in actual medieval times. And while the Back to the Future test produced a kid in a puffy vest with a DeLorean and skateboard (and a Doc Brown peeking out from behind), it was placed in front of a house right out of 1980s Bergen County, New Jersey, rather than 1950s Hill Valley, California.
Also: X’s Grok did surprisingly well in my AI coding tests
Still, we can declare Grok to be a fully competitive AI chatbot. Can you grok it? Which famous author originated the term “grok”? Comment with your answer below.
Text score: 86 out of 100
Image score: 10 out of 20
Premium offerings: Some of Grok’s premium features are tied to premium X/Twitter plans. But there’s also a SuperGrok service with access to more powerful models that comes in at either $30-per-month or $300-per-month depending on how far you want to go (the $300-per-month plan provides a preview of Grok 4 Heavy, a “heavier” model).
Images generated using Grok:
Excellent itinerary
Conversational tone
No nagging
No images outside of Twitter/X
Buggy code
Old web stories
Show expert Take Show less Google Gemini Best for image generation
Overall score: 95
Google Gemini (formerly Bard) is showing up all over Google’s offerings, including inside Chrome. In this ranking, we’re not looking at the various implementations and delivery modes. Instead, we’re sticking to our approach of doing hands-on testing of actual AI performance with actual questions.
Gemini’s test results were another surprise, but not for a good reason. Going into our testing process, I fully expected Gemini’s free tier to come in at #2, right after ChatGPT. But it landed at #4, below even Grok. That’s just embarrassing.
I have to start by telling you where Gemini lost points, because it’s amusing. Well, amusing to me. I’m sure there’s a product manager at Google who will be anything but amused. For each chatbot, one of my tests is translating a sentence into Latin. Since I don’t do Latin, I feed the results of each translation to Google Translate for translation back to English. Do you know which chatbot translation Google Translate couldn’t translate? The only one? Yep. Google Gemini.
Beyond precious irony, the AI did quite well on questions that required factual results, but it seemed to struggle a bit whenever it was asked for subjective recommendations like a travel itinerary or explaining an academic concept to a child. For the latter, it did provide a solid enough answer but went very much overboard on analogies. Worse, the analogies didn’t quite fit the examples it used.
It scored 10 out of 10 on the math sequencing prompt, on the Game of Thrones theme analysis, and on our test prompt about the impact of social media on society. It also did quite well in our job interview question. Gemini was far more practical in its advice than ChatGPT, offering tangible tips for interview success and for increasing confidence going into the interview.
Also: Gemini arrives in Chrome – here’s everything it can do now
Gemini provided a difficult-to-read table for the seven days of travel. The prompt asked for an itinerary of Boston looking at tech and history themes, but Gemini decided that history was always in the morning and tech always in the afternoon, regardless of the location or distance between points of interest.
Our current-events web-access question not only failed to pull information from the site we requested, but also went out and pulled information from sites we didn’t request. When I requested a summary of a specific article, it did not actually give a synopsis of information from the desired article, but instead gathered information from other tangentially related articles. It clearly did not do what I asked. Many of the AIs seemed to miss the basic point when asked to summarize a specific article.
The Gemini test code was generally solid, although it missed some issues that are quite mainstream and could hardly be considered edge cases. This would likely have caused some failures for users.
Also: Gemini Pro 2.5 is a stunningly capable coding assistant – and a big threat to ChatGPT
For our long-form story request, the AI first thought I was asking for an image. I corrected it and gave it the prompt again. Weirdly, the AI boldfaced random words throughout the story. I found the 3,379-word story good enough, but a little hard to follow. The story also seemed to try to force-fit random concepts into the overall narrative, as if the AI wasn’t entirely sure how to knit the whole piece together.
Image generation itself was good, but there were complications. The AI insisted I sign in to test images. I tried to sign in using my test account, but the AI wouldn’t even spin up the chatbot prompt interface. I tried in both incognito mode and with a regular window, to no success. I even tried it with Safari instead of Chrome.
Also: Google’s Gemini 2.5 Flash Image ‘nano banana’ model is generally available
I finally decided to try with my personal account. I’m not paying for Gemini in that account, but my personal account does have some Google paid features attached to it. That was the only way I could get Gemini to produce images. It also wouldn’t run continuing my previous session, so there was no way to tell whether I’d have worn out my welcome by adding image requests.
That said, once I got it working, it took far less time than ChatGPT to generate images, maybe five or six seconds all told. Gemini created all four images. The Back to the Future image looked very much like Marty McFly with a skateboard, with a DeLorean ripped from the movie set. Gemini used the new Nano Banana image model, which is quite good.
Overall, Gemini is convenient because it’s right there in all you do with Google. If you do a Google search, it’s usually at the top of the search results, ready to siphon off traffic from the sites it scraped for its answers. Image generation is first-rate, but overall performance could and should be better from Google.
Text score: 77 out of 100
Image score: 18 out of 20
Premium offerings: The $19.99-per-month Google AI Pro plan gives you access to its higher-end AI models, along with access to a whole host of additional AI features, including expanded use of Google’s enormously helpful NotebookLM tool. The $249-per-month Google AI Ultra plan gives you far more resource usage, plus free YouTube Premium.
Images generated using Gemini:
Great images
Solid answers to factual questions
Many Google-centric integrations
Wouldn’t generate images in test account
Couldn’t translate Latin
Unhelpful itinerary
Show expert Take Show less Perplexity Best for verified search
Overall score: 93
Rounding out our top five is Perplexity, which bills itself as an AI search engine. Our first test should have been Perplexity’s core competency, but it didn’t do what was asked of it.
Perplexity did explain the flu story on the Yahoo News site, but it also went considerably beyond what was requested, to discuss Japan’s early flu epidemic and someone who almost died after the flu put him in a coma. Neither was part of the main story Perplexity was asked to summarize.
I did like how Perplexity presents sources in front of its answers. That helps you get a better feel for what it’s using to formulate your answers, and gives you places you can go for more research.
Perplexity did a fine job explaining an academic concept, identified a math sequence, discussed a cultural issue with context, and analyzed the key themes from a well-known book. Having the sources up front and visible was nice, too.
Also: Want Perplexity Pro for free? 4 ways to get a year of access for $0 (a $200 value)
When it came time to construct a travel itinerary, Perplexity showed a few images at the beginning of its response, but the answers almost seemed phoned in. The first day, it suggested a few smaller museums, but never got to recommending visiting the USS Constitution. By Day 4, it seemed to lose the will to live, suggesting just one museum. On Day 5, it suggested visiting Google’s offices in Cambridge.
For our job interview support question, it did say, literally, “You’ve got this!” There were a few basic suggestions, but they were simplistic guidelines like “prepare thoroughly” and focus on your body language and voice. Interestingly, all the chatbots below our top five used the phrase “You’ve got this!” in their answers to our question.
Also: Inbox swamped? Perplexity’s new Email Assistant works for Gmail and Outlook
Latin translation and cultural context were good. Perplexity also did a good job coding. It left out some very edge cases, but what it generated was good enough to ship.
Our large-context story test resulted in 925 words, well under the number requested. Perplexity returned less of a story and more of a scene setting. There was no conflict beyond a bit of a regurgitation of the character descriptions. The AI even described the story as “out of Diagon Alley,” almost word-for-word from the prompt. It produced some elements that might have formed themselves into a nice tale, but it definitely came across much more like a not-completely-finished student assignment.
Image generation without sign-in resulted in Perplexity returning images it found on the web with no AI generation at all. Once I signed in, I was allowed three images, which were really what it considered to be three pro searches.
The Back to the Future test was definitely evocative of the movie, except the kid was dressed differently and the bottom of the skateboard had a giant “McFly” painted on it. The DeLorean wasn’t movie-perfect, but it fit the theme. The kid in King Arthur’s court was pretty much perfect. The giant robot was very cool, although some of the text on the signage was indecipherable.
Also: How to get Perplexity AI Pro for free on your Samsung TV – and what it can do
I was not all that impressed with Perplexity. I know some of our editors prefer Perplexity for searching, but I was underwhelmed. Its web search (both in my tests and in other random searches I’ve done in the past) just didn’t seem any better than a typical Google search. Other AI features were adequate, but I didn’t find anything that made this stand out better than the tools that scored higher. You can play with it for free, so give it a try and let me know if you agree in the comments below.
Text score: 81 out of 100
Image score: 12 out of 20
Premium offerings: Perplexity offers a range of plans, starting at $20-per-month for Perplexity Pro. Unlike the free tier, the Pro plan offers “practically unlimited” Pro searches, among other resource boosts. There’s also a Max plan for $200-per-month that provides access to early AI models and lots more resources. One nice option: Perplexity offers its Pro plan for $5-per-month to students who can prove they’re students.
Images generated using Perplexity:
Good sourcing at the top of each answer
Nice images, but limited runs
No account password
Lots of nags
Phoned-in itinerary planning
Show expert Take Show less Other contenders
I tested eight of the most well-known chatbots equally, but three of them didn’t produce strong enough results to be in our top five.
Anthropic Claude
Overall score: 89
The free Claude tier immediately lost 20 points because it won’t generate images. It also refused to work without a sign-in. It did fine on factual questions and did a great job on the long-form story generation.
Also: Claude’s latest model is cheaper and faster than Sonnet 4 – and free
Claude was weak on the web search and on coding. Given the popularity of Claude code, this was a definite shocker. It suffered from leading-zero removal that could mangle the decimals, poor error management, some code redundancy, and a lack of type safety.
DeepSeek
Overall score: 78
DeepSeek also won’t run without an account and a login. Responses took a little longer than all the other chatbots. DeepSeek failed accessing Yahoo, but it was able to access one of my own sites. So it’s possible that Yahoo is blocking DeepSeek’s region.
Also: DeepSeek claims its new AI model can cut the cost of predictions by 75% – here’s how
It also did well on the large-context story challenge, returning 2,344 words. It was a good story, darker and more violent than the others, but still a fun read.
DeepSeek did fine on the basic factual questions, but did poorly on the travel itinerary and job interview support prompts. It also returned buggy code on the coding challenge. Image generation created a link to a Google URL that doesn’t exist.
Meta AI
Overall score: 77
As with the other bottom-of-our-list also-rans, Meta AI required a login. With the exception of its answers to the math challenge and explaining constructivism to a child, Meta AI’s answers were, to use a technical term, feh. Most of the answers seemed very shallow and phoned in, with little detail or elaboration.
Also: Your embarrassing Meta AI prompts might be public – here’s how to check
The coding test returned buggy code, and the large-context story started to generate, but failed completely with a “Something went wrong” error that I was able to repeat across sessions and browsers.
Image generation wasn’t bad. Instead of generating just one image, it generated four. Most were fairly generic, but it made a reasonable attempt. I wouldn’t advise using Meta AI for text-based prompts, but you might get a nice image or two out of it.
FAQs How did I choose these AI chatbots?
It’s fairly obvious to anyone tracking this field what the top AI chatbots are. So I pulled together a list of the eight best-known chatbots, with the intention of choosing the five best.
Because AI is moving so fast, I wanted to go beyond my and my editors’ expectations and objectively subject each of them to a wide range of quality and performance tests. Those are documented below.
The ranking of the chatbots came directly from the results of those tests, and some of them challenged my expectations. For example, I fully expected Grok to be near the bottom of our results, but it wound up at #3, beating out even Google’s Gemini. That’s why we did testing, rather than just sharing chatbots based on our expectations or personal usage.
What is an AI chatbot?
Imagine you’re talking to a friend or colleague through a texting interface or something like Slack. That’s called chatting. Talking to an AI is very similar, in that you type in your statement or question and you get back an answer. The only difference is a big one. There’s not a person on the other end, but instead a piece of software.
How do AI chatbots work?
Chatbots use large language models (LLMs) to produce conversational responses. These LLMs are trained based on insanely huge amounts of information, books, documents, websites, and more, all of which build up their knowledge base. Because everything should be reduced to a car analogy, let’s do that here. Think of the LLM as the engine of a car. Think of the chatbot interface as the cabin of the car, where the driver controls the vehicle.
If you want to delve in deeper, here’s my explainer: How ChatGPT actually works (and why it’s been so game-changing).
How much do AI chatbots cost?
All of the ones we’re spotlighting here are free. That said, depending on what you do with them, you could spend anything from free to hundreds of dollars a month. Personally, I pay for four tools, each of which ranges from $10 to $20 per month. But keep in mind my job is to use AI. I also paid $200 for a single month of OpenAI’s ChatGPT Pro, but that was because I wanted its help producing software at warp speed.
What is the difference between an AI chatbot and an AI writer?
Let’s first establish something we haven’t discussed before. AI can be used in a lot of different applications, not just chatting back and forth. AI is used to make video game characters smart, and it’s used to keep self-driving cars on the road (and just about everything in between).
An AI chatbot is really a general-purpose interface to an AI language model. An AI writer is an AI that is used mostly to generate writing output, but not participate in a general discussion. All of the chatbots shown in this article can function as AI writers.
Testing methodology
Testing the chatbots consisted of ten questions that resulted in text output, along with four prompts intended to produce images. I started with the following eight questions designed to produce a wide variety of answers.
Summarization and web access: This is designed to test an AI’s ability to access the web, retrieve current information, follow directions by limiting what it reports, and then summarize the results. “Summarize the flu story by visiting the Yahoo News site.”
Academic concept explanation: This test is designed to do two things: prove an AI’s ability to research and report on a concept, and then repackage that concept so it is understandable by a child, thereby also showcasing that the AI is able to refactor information for a particular audience. “Explain educational constructivism to a five-year-old.”
Math and analysis: This test is designed to evaluate an AI’s ability to do pattern recognition, to use that pattern to extrapolate additional answers, and then to demonstrate its reasoning. The sequence shown is a classic math sequence called the Fibonacci sequence, although the name is never provided to the AIs. “Fill in the blanks: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, __, 89, 144, ___, 377, ___, ___, ___. Explain your reasoning.”
Cultural discussion: This tests an AI’s ability to make a case, form a coherent argument, argue a side, and postulate an opinion where there is no clear right answer. “Do you think social media has improved or worsened communication in society? Provide two reasons for your view.”
Literary analysis: This tests an AI’s knowledge base for contemporary literature, and its ability to identify and articulate themes while staying relevant to the original source material. “What are the main themes of the novel ‘A Song of Ice and Fire’ and why are they important?”
Travel itinerary: This tests an AI’s knowledge of geographic regions, its ability to find relevant information on the web, to construct a helpful plan, to organize the results, and to make recommendations. I used Boston because it’s a city I’m quite familiar with, so I could more easily evaluate answers. “Imagine you are a travel advisor. I want a week-long vacation in Boston in March focused on technology and history. What itinerary would you recommend?”
Emotional support: This test balances an AI’s ability to provide some emotional support with a practical challenge, a job interview. It looks to see whether the AIs provide tangible tips that can help a candidate get through an interview, or just fall back on “You’ve got this.” “I’m feeling very nervous about an upcoming job interview. Can you give me some advice or words of encouragement?”
Translation and cultural relevance: This tests an AI’s ability to translate from one language to another. It also asks the AI to blend the language with a discussion of cultural relevance. Since Latin is not a mainstream spoken language, it challenges the AI to find the reasons for the ongoing survival of the language and talk about where it’s actively used. “Translate the following English sentence into Latin, and then explain Latin’s use in today’s culture: ‘The celebration will take place tomorrow in the town square.'”
Next up was a coding test. Although we already have a long-running set of AI coding tests, it’s important when evaluating a chatbot to see if it can code, even in the free tier. For this test, I turned to Test 2 in my evaluation suite, which is a test of JavaScript regular expression code. I read each response from the AIs carefully to identify where each AI was strong and weak. Over the years, I’ve graded hundreds of college-level coding assignments, so this evaluation was nothing new to me.
The last text-based test was taken from my 10 prompt tricks article, and was arguably the most fun. Trick number 2 asks the AI to write a short story about a bookshop and its back room. In the article, I told the AI to use no more than 500 words, but in these comparative tests, I tell the AIs to use no fewer than 1,500 words. The idea is to see whether an AI can sustain a longer context for an answer and how creative it can get. Some of the responses were fairly weak, but some were truly fun reads.
Each of the above tests was worth 10 points, for a total of 100 points.
We also wanted to see if you could get quality image generation from a free AI. With a few limited exceptions from our also-ran contenders, the answer is yes. For test prompts, I pulled the four image prompts shown in my comparison of image generators article. This is particularly interesting, because the last test asks for a representation of the movie Back to the Future and is meant to test how the AIs respond to potential guardrails about copyrighted content. Even though it’s very old, I chose Back to the Future because its imagery is iconic and known to almost everyone.
The image tests were worth five points each, for a total of 20 points.
What will you use?
Which free AI chatbot impressed you the most? Have you tried any of the eight chatbots we tested, or did your results differ from ours? Do you value accuracy, creativity, or personality most in your AI assistant? Are you sticking with one chatbot or switching depending on the task? Let us know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at
zTV.
AI
Audio
Computing
Gaming
Home Entertainment
Phones
Puzzle hints
Science
Streaming
VPN
Wearables
Deals
Advertise
About our ads
Licensing
Careers
Help
Feedback
Sitemap
Follow us on
© 2025 Yahoo. All rights reserved.