Human teens beat AI at an international math competition – Popular Science
Get the Popular Science daily newsletterđĄ
Breakthroughs, discoveries, and DIY tips sent every weekday.
Terms of Service and Privacy Policy.
For the first time ever, AI models achieved prestigious gold-level scores at the International Mathematics Olympiad, one of the worldâs premiere math competitions. Their success is an undeniable bragging right for the technologyâs biggest supporters. But as it stands, Google and OpenAIâs most cutting-edge, experimental AI programs still canât beat an extremely smart teenager.
It may seem ironic, but complex mathematics is still one of AIâs biggest hurdles. There are many analyses into why this remains such an issue, but generally speaking, it has to do with how the technology works. After receiving a prompt, AI like ChatGPT and Google Gemini break the words and letters down into âtokens,â then parse and predict an appropriate response. To an AI, an answer is just the most likely string of tokens. Humans, however, process them as words, sentences, and complete thoughts.
Given these parameters, AI doesnât possess the âlogicâ capabilities required to handle complex mathematical prompts. This is largely because math prompts usually donât have multiple possible answersâonly a single, correct solution. Today, a pocket calculator will invariably give you the objectively true answer to multiplying 4596 by 4859 (22,331,964). Meanwhile, ChatGPT might still offer you an answer of 22,325,364:
Since 1959, the International Mathematical Olympiad (IMO) has served as one of the worldâs premiere events for youngâhumanâmath whizzes. Many mathematicians would need longer than their allotted time to answer just one of the IMOâs problemsâand most people wouldnât be able to solve any of them.Â
Australia most recently hosted the 66th annual IMO competition in Queensland, where 641 teenagers from 112 countries met on July 15 to tackle six questions in under 4.5 hours. This time, however, they had some additional competition: a pair of experimental AI reasoning models from Google and OpenAI.
The bots fared well. Both companies have since announced that their programs scored high enough on this yearâs IMO test to earn gold medals. Each AI managed to solve 5 of the 6 problems within the time limit, earning 35 out of the maximum 42 possible points. This year, only about 10 percent of human entrants received a gold-level score.
It marked a major improvement from Googleâs last showing at IMO. In 2024, a version of its DeepMind reasoning AI reached a silver-medal score after solving four of six problems, although it required 2-3 days of computation instead of the 4.5-hour time limit. According to IMO president Gregor Dolinar, one of the most striking points of this yearâs results wasnât just the AI programsâ calculations, but the ways in which they explained their âthoughtâ process to arrive at each answer.
âTheir solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow,â Dolinar said via Googleâs announcement.
Thereâs at least one last IMO milestone for both companies: a perfect score. This year, five teens pulled off that accomplishment. And even if Google or OpenAI ties humans at the IMO in the coming years, the victory may still require context. As AFP noted, IMO organizers couldnât confirm how much computing power was required by either AI model, or if there was any additional human oversight during the calculations.
And while AIâs latest technological leap forward is impressive, it still likely required disconcertingly massive amounts of energy and water. Companies like Google, OpenAI, and Microsoft are all investing heavily in data center projects to support their AI projectsâall of which need power sources. In some cases, that may even include expanding the use of fossil fuels. Watchdogs previously estimated that at this rate, the AI industry may consume as much energy as Argentina, if not multiple nations combined. Thatâs a problem that AIânor its makersâhave yet to solve.
More deals, reviews, and buying guides
Andrew Paul is a staff writer for Popular Science.
Breakthroughs, discoveries, and DIY tips sent every weekday.
By signing up you agree to our Terms of Service and Privacy Policy.
Articles may contain affiliate links which enable us to share in the revenue of any purchases made.
Registration on or use of this site constitutes acceptance of our Terms of Service.
© 2025 Recurrent. All rights reserved.