Marcus on AI · · 5 min read

Checking the math behind OpenAI and Anthropic’s latest headlines

Mirrored from Marcus on AI for archival readability. Support the source by reading on the original site.

OpenAI scored a big result yesterday, with respect to an Erdos problem:

Clearly impressive. But as with so much else, it should be viewed with skepticism.

In an email to me this morning, Cal Newport made a number of good points that he said I could share, summarizing both what was found and some limitations:

OpenAI used a new reasoning model (not yet released) to help identify a counterexample that disproved a conjecture from discrete geometry first proposed by Paul Erdos 80 years ago. The model was tuned for so-called chain-of-thought reasoning where the model endlessly “thinks out loud” about whatever it is trying to solve, an approach that lets you approximate something like memory and dynamic computation using LLMs, which are otherwise static and feed-forward. Professional mathematicians identified the counterexample from within a long transcript of the model’s reasoning, and then extracted the key parts and rewrote it as a more succinct proof in a more standard style.

How was the model able to solve something that human mathematicians had failed to do? In a companion article released by OpenAI, the mathematician Thomas Bloom, who reviewed the full model output, identified the factors that came together to make this counterexample ripe for LLM-aided discovery. He noted that though the conjecture is old, those who have worked on it have largely shared Erdos’s original belief that it was true and therefore focused on trying to solve it. What the LLM-based tool did instead was to systematically apply and extend existing techniques in search of evidence that the conjecture was false. Here’s Bloom: “[the AI’s] success here echoes previous achievements: it often produces the most surprising results by persevering down the paths that a human may have dismissed as not worth their time to explore, combining superhuman levels of patience with familiarity with a vast array of technical machinery.”

A few observations of my own:

(1) Non-mathematicians might not be familiar with the degree to which LLM-technology has been combined with existing computer-aided math tools in recent years to seek new math results through the systematic and patient exploration of techniques and corners of problem spaces that are too exhausting to interest most human mathematicians. The real technical headline of the new OpenAI result, therefore, is that chain-of-thought reasoning was able to accomplish this type of systematic solving without the much more intricate scaffolding used in most of these existing tools. That being said, the internal model used here, which many assume is OpenAI’s response to the truly massive Mythos LLM, is likely similarly massively expensive to prompt. The future of AI-assisted math will likely focus on smaller, cheaper, math-tuned LLMs combined with more powerful scaffolding. So, this experiment might be more about marketing the power of their new model than trying to actually advance computer-aided math.

(2) I don’t think it’s accurate to say these examples of AI-supported mathematics mean the models are somehow “smarter” than human mathematicians. I think a better analogy might be how computer tools helped architects produce much more daring and complicated designs (like the Frank Gehry-designed Stata Center where I did my CS doctoral and postdoctoral work at MIT). These tools weren’t better architects than humans but made humans more capable architects.

(3) From a business perspective, I actually think this announcement isn’t necessarily good news for OpenAI. There are few markets smaller and less lucrative than professional academic mathematics. The fact that this is the area where OpenAI is dedicating some of their top technical talent (like Noam Brown) underscores the degree to which, like the drunk searching for their keys under the streetlight, their most impressive results are limited to the smaller number of areas that are well-suited to LLMs (i.e., math + computer coding). If this model was brilliant in some more general way, obviously the better examples would be solving problems or automating processes that directly and obviously generate massive revenue or savings for the specific types of companies they hope to make their customers.

In conclusion: AI’s role in math is genuinely important and exciting. I can think of any number of results I’ve worked on in my career where I could have moved faster or been more comprehensive if I had access to the latest generation of tools. But this intersection of AI and math is also very specific to this field and more nuanced and complicated than simply imagining AI systems as standalone mathematicians who are becoming increasingly brilliant. One should be wary of making ambitious generalizations from fields like math and coding to other potential applications of these models.

Kareem Carr also gave a tempered view of what the mathematican Tim Gowers had written, converging in the same general direction:

Beyond that, we don’t really know how the model worked, how it was trained, or how general the result is, either within mathematics or outside, in the more open-ended everyday world. We have exactly zero data on how it works on other benchmarks, whether it can solve hallucinations, or how much it costs to run.

We also don’t know how many prompts they tried, and how many didn’t work; we have a numerator but not a denominator.

Definitely it is an interesting result; what it actually means in the real world is anybody’s guess.

§

The other big news per a scoop from Berber Jin at the WSJ, is that Anthropic is projecting its first (slightly) profitable quarter ever.

That is amazing—assuming it actually happens —but if it does it will be in no small part because (as revealed yesterday in SpaceX’s S-1 IPO filing) Anthropic is getting a one-time (nonrecurring) discount for that quarter on compute from SpaceX. The exact number is not given but that discount may well be bigger than the projected $559M profit. Context matters, and it is unclear whether subsequent quarters will be profitable.

Ed Zitron expresses even more skepticism, and raises some questions about accounting, here.

§

The wildest stat I saw this morning was this: Nvidia is spending so much on circular financing its cash flow is headed towards zero.

It’s so hard to know what’s really going on, factoring all that out.

Caveat emptor, IPO buyers.

Subscribe now

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Marcus on AI