A.I. Is Coming for Mathematics, Too – The New York Times

Supported by
For thousands of years, mathematicians have adapted to the latest advances in logic and reasoning. Are they ready for artificial intelligence?

In the collection of the Getty museum in Los Angeles is a portrait from the 17th century of the ancient Greek mathematician Euclid: disheveled, holding up sheets of “Elements,” his treatise on geometry, with grimy hands.
For more than 2,000 years, Euclid’s text was the paradigm of mathematical argumentation and reasoning. “Euclid famously starts with ‘definitions’ that are almost poetic,” Jeremy Avigad, a logician at Carnegie Mellon University, said in an email. “He then built the mathematics of the time on top of that, proving things in such a way that each successive step ‘clearly follows’ from previous ones, using the basic notions, definitions and prior theorems.” There were complaints that some of Euclid’s “obvious” steps were less than obvious, Dr. Avigad said, yet the system worked.
But by the 20th century, mathematicians were no longer willing to ground mathematics in this intuitive geometric foundation. Instead they developed formal systems — precise symbolic representations, mechanical rules. Eventually, this formalization allowed mathematics to be translated into computer code. In 1976, the four-color theorem — which states that four colors are sufficient to fill a map so that no two adjacent regions are the same color — became the first major theorem proved with the help of computational brute force.
Now mathematicians are grappling with the latest transformative force: artificial intelligence.
In 2019, Christian Szegedy, a computer scientist formerly at Google and now at a start-up in the Bay Area, predicted that a computer system would match or exceed the problem-solving ability of the best human mathematicians within a decade. Last year he revised the target date to 2026.
Akshay Venkatesh, a mathematician at the Institute for Advanced Study in Princeton and a winner of the Fields Medal in 2018, isn’t currently interested in using A.I., but he is keen on talking about it. “I want my students to realize that the field they’re in is going to change a lot,” he said in an interview last year. He recently added by email: “I am not opposed to thoughtful and deliberate use of technology to support our human understanding. But I strongly believe that mindfulness about the way we use it is essential.”
In February, Dr. Avigad attended a workshop about “machine-assisted proofs” at the Institute for Pure and Applied Mathematics, on the campus of the University of California, Los Angeles. (He visited the Euclid portrait on the final day of the workshop.) The gathering drew an atypical mix of mathematicians and computer scientists. “It feels consequential,” said Terence Tao, a mathematician at the university, winner of a Fields Medal in 2006 and the workshop’s lead organizer.
Dr. Tao noted that only in the last couple years have mathematicians started worrying about A.I.’s potential threats, whether to mathematical aesthetics or to themselves. That prominent community members are now broaching the issues and exploring the potential “kind of breaks the taboo,” he said.
One conspicuous workshop attendee sat in the front row: a trapezoidal box named “raise-hand robot” that emitted a mechanical murmur and lifted its hand whenever an online participant had a question. “It helps if robots are cute and nonthreatening,” Dr. Tao said.
These days there is no shortage of gadgetry for optimizing our lives — diet, sleep, exercise. “We like to attach stuff to ourselves to make it a little easier to get things right,” Jordan Ellenberg, a mathematician at the University of Wisconsin-Madison, said during a workshop break. A.I. gadgetry might do the same for mathematics, he added: “It’s very clear that the question is, What can machines do for us, not what will machines do to us.”
One math gadget is called a proof assistant, or interactive theorem prover. (“Automath” was an early incarnation in the 1960s.) Step-by-step, a mathematician translates a proof into code; then a software program checks whether the reasoning is correct. Verifications accumulate in a library, a dynamic canonical reference that others can consult. This type of formalization provides a foundation for mathematics today, said Dr. Avigad, who is the director of the Hoskinson Center for Formal Mathematics (funded by the crypto entrepreneur Charles Hoskinson), “in just the same way that Euclid was trying to codify and provide a foundation for the mathematics of his time.”
Of late, the open-source proof assistant system Lean is attracting attention. Developed at Microsoft by Leonardo de Moura, a computer scientist now with Amazon, Lean uses automated reasoning, which is powered by what is known as good old-fashioned artificial intelligence, or GOFAI — symbolic A.I., inspired by logic. So far the Lean community has verified an intriguing theorem about turning a sphere inside out as well as a pivotal theorem in a scheme for unifying mathematical realms, among other gambits.
But a proof assistant also has drawbacks: It often complains that it does not understand the definitions, axioms or reasoning steps entered by the mathematician, and for this it has been called a “proof whiner.” All that whining can make research cumbersome. But Heather Macbeth, a mathematician at Fordham University, said that this same feature — providing line-by-line feedback — also makes the systems useful for teaching.
In the spring, Dr. Macbeth designed a “bilingual” course: She translated every problem presented on the blackboard into Lean code in the lecture notes, and students submitted solutions to homework problems both in Lean and prose. “It gave them confidence,” Dr. Macbeth said, because they received instant feedback on when the proof was finished and whether each step along the way was right or wrong.
Since attending the workshop, Emily Riehl, a mathematician at Johns Hopkins University, used an experimental proof-assistant program to formalize proofs she had previously published with a co-author. By the end of a verification, she said, “I’m really, really deep into understanding the proof, way deeper than I’ve ever understood before. I’m thinking so clearly that I can explain it to a really dumb computer.”
Another automated-reasoning tool, used by Marijn Heule, a computer scientist at Carnegie Mellon University and an Amazon scholar, is what he colloquially calls “brute reasoning” (or, more technically, a Satisfiability, or SAT, solver). By merely stating, with a carefully crafted encoding, which “exotic object” you want to find, he said, a supercomputer network churns through a search space and determines whether or not that entity exists.
Just before the workshop, Dr. Heule and one of his Ph.D. students, Bernardo Subercaseaux, finalized their solution to a longstanding problem with a file that was 50 terabytes in size. Yet that file hardly compared with a result that Dr. Heule and collaborators produced in 2016: “Two-hundred-terabyte maths proof is largest ever,” a headline in Nature announced. The article went on to ask whether solving problems with such tools truly counted as math. In Dr. Heule’s view, this approach is needed “to solve problems that are beyond what humans can do.”
Another set of tools uses machine learning, which synthesizes oodles of data and detects patterns but is not good at logical, step-by-step reasoning. Google’s DeepMind designs machine-learning algorithms to tackle the likes of protein folding (AlphaFold) and winning at chess (AlphaZero). In a 2021 Nature paper, a team described their results as “advancing mathematics by guiding human intuition with A.I.
Yuhuai “Tony” Wu, a computer scientist formerly at Google and now with a start-up in the Bay Area, has outlined a grander machine-learning goal: to “solve mathematics.” At Google, Dr. Wu explored how the large language models that empower chatbots might help with mathematics. The team used a model that was trained on internet data and then fine-tuned on a large math-rich data set, using, for instance, an online archive of math and science papers. When asked in everyday English to solve math problems, this specialized chatbot, named Minerva, was “pretty good at imitating humans,” Dr. Wu said at the workshop. The model obtained scores that were better than an average 16-year-old student on high school math exams.
Ultimately, Dr. Wu said, he envisioned an “automated mathematician” that has “the capability of solving a mathematical theorem all by itself.”
Mathematicians have responded to these disruptions with varying levels of concern.
Michael Harris, at Columbia University, expresses qualms in his “Silicon Reckoner” Substack. He is troubled by the potentially conflicting goals and values of research mathematics and the tech and defense industries. In a recent newsletter, he noted that one speaker at a workshop, “A.I. to Assist Mathematical Reasoning,” organized by the National Academies of Sciences, was a representative from Booz Allen Hamilton, a government contractor for intelligence agencies and the military.

Dr. Harris lamented the lack of discussion about the larger implications of A.I. on mathematical research, particularly “when contrasted with the very lively conversation going on” about the technology “pretty much everywhere except mathematics.”
Geordie Williamson, of the University of Sydney and a DeepMind collaborator, spoke at the N.A.S. gathering and encouraged mathematicians and computer scientists to be more involved in such conversations. At the workshop in Los Angeles, he opened his talk with a line adapted from “You and the Atom Bomb,” a 1945 essay by George Orwell. “Given how likely we all are to be profoundly affected within the next five years,” Dr. Williamson said, “deep learning has not roused as much discussion as might have been expected.”
Dr. Williamson considers mathematics a litmus test of what machine learning can or cannot do. Reasoning is quintessential to the mathematical process, and it is the crucial unsolved problem of machine learning.
Early during Dr. Williamson’s DeepMind collaboration, the team found a simple neural net that predicted “a quantity in mathematics that I cared deeply about,” he said in an interview, and it did so “ridiculously accurately.” Dr. Williamson tried hard to understand why — that would be the makings of a theorem — but could not. Neither could anybody at DeepMind. Like the ancient geometer Euclid, the neural net had somehow intuitively discerned a mathematical truth, but the logical “why” of it was far from obvious.
At the Los Angeles workshop, a prominent theme was how to combine the intuitive and the logical. If A.I. could do both at the same time, all bets would be off.
But, Dr. Williamson observed, there is scant motivation to understand the black box that machine learning presents. “It’s the hackiness culture in tech, where if it works most of the time, that’s great,” he said — but that scenario leaves mathematicians dissatisfied.
He added that trying to understand what goes on inside a neural net raises “fascinating mathematical questions,” and that finding answers presents an opportunity for mathematicians “to contribute meaningfully to the world.”