AI Tackles Decades-Old Math Puzzles with Long Game

A game of chess requires its players to think several moves ahead, a skill that computer programs have mastered over the years. Back in 1996, an IBM supercomputer famously beat the then world chess champion Garry Kasparov. Later, in 2017, an artificial intelligence (AI) program developed by Google DeepMind, called AlphaZero, triumphed over the best computerized chess engines of the time after training itself to play the game in a matter of hours.

More recently, some mathematicians have begun to actively pursue the question of whether AI programs can also help in cracking some of the world's toughest math problems. But, whereas an average game of chess lasts about 30 to 40 moves, these research-level math problems require solutions that take a million or more steps, or moves.

In a preprint paper , a team led by Caltech's Sergei Gukov , the John D. MacArthur Professor of Theoretical Physics and Mathematics, describes developing a new type of machine-learning algorithm that can solve math problems requiring extremely long sequences of steps. They used their new algorithm to solve families of problems related to an overarching decades-old math problem called the Andrews-Curtis conjecture. In essence, the algorithm can think farther ahead than even advanced programs like AlphaZero.

"Our program aims to find long sequences of steps that are rare and hard to find," says study first author Ali Shehper, a postdoctoral scholar at Rutgers University who will soon join Caltech as a research scientist. "It's like trying to find your way through a maze the size of Earth. These are very long paths that you have to test out, and there's only one path that works."

The use of AI to solve math problems has become increasingly popular. Google DeepMind's AlphaProof performed at the level of a silver medalist in the 2024 International Mathematical Olympiad, a high-school level math competition. And OpenAI's o3 program recently reasoned its way through benchmark problems in math, science, and computer programming.

The Caltech-led mathematicians are focusing not on routine problems but rather the toughest in their field. In the new study, they used AI to solve two families of problems within the Andrews-Curtis conjecture, which was first proposed 60 years ago.

While they did not solve the main conjecture itself, they disproved families of problems, referred to as potential counterexamples, which had remained open for roughly 25 years; they also made significant progress on another family of counterexamples that has been open for 44 years. Counterexamples are basically mathematical cases that would disprove an original conjecture. If the counterexamples themselves are disproved, then the original conjecture may still be true.

"Ruling out some of the counterexamples gives us confidence in the validity of the original conjecture and helps build our intuition about the main problem. It gives us new ways to think about it," Shehper says.

Gukov says that navigating these math problems is like "getting from A to B" via convoluted routes that require thousands, millions, or even billions of steps. He compares the problems to solving an incredibly complex Rubik's Cube.

"Can you take this scrambled, complicated Rubik's Cube and get it back to its original state? You have to test out these very long sequences of moves, and you won't know if you are on the right path until the very end," says Gukov, who is also the director of Caltech's new Richard N. Merkin Center for Pure and Applied Mathematics .

The team's AI program learned to come up with long sequences of moves-what the researchers termed "super moves"-that are unexpected, or what the researchers call outliers. This contrasts with how AI programs like ChatGPT operate.

"If you ask ChatGPT to write a letter, it will come up with something typical. It's unlikely to come up with anything unique and highly original. It's a good parrot," Gukov says. "Our program is good at coming up with outliers."

To train their AI program, the researchers used a machine-learning model known as reinforcement learning. First, the team showed the AI easy problems to solve, and then progressively gave it harder and harder problems. "It tries various moves and gets rewarded for solving the problems," Shehper explains. "We encourage the program to do more of the same while still keeping some level of curiosity. In the end, it develops new strategies that are better than what humans can do. That's the magic of reinforcement learning."

At present, AI programs are typically not very good at predicting outlying, rare events that have dramatic consequences, such as financial market crashes. The team's new algorithm cannot make predictions like this either, but it may contain the seeds of what would be required to make intelligent predictions of this nature. "Basically, our program knows how to learn to learn," Gukov says. "It's thinking outside the box."

The team's new algorithm has already made a big splash in the math community. "We made a lot of improvements in an area of math that was decades old," Gukov says. "Progress had been relatively slow, but now it's hustling and bustling." In fact, three new mathematicians have joined the team-Lucas Fagan and Zhenghan Wang of UC Santa Barbara and Yang Qiu of Nankai University in Tianjin, China-and the group has posted another preprint paper that reports solving even more families of potential counterfactuals belonging to the Andrews-Curtis conjecture.

Rather than scale up the AI models, the team's approach has been to find new clever tricks and strategies that do not require large amounts of computing power. "We try to demonstrate good performance on small-scale computers, easily accessible to a small academic collaboration, so that any of our colleagues around the globe can easily reproduce these results."

This project was made possible by support from private donors. The donations also helped establish a new math and AI group at Caltech focused on developing AI systems that can tackle hard research-level math problems.

Other authors of the first preprint study titled "What makes math problems hard for reinforcement learning: a case study " are Anibal M. Medina-Mardones of Western University in Canada, Bartłomie Lewandowski and Piotr Kucharski of the University of Warsaw in Poland, and Angus Gruen (PhD '23) of Polygon Zero.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.