The Information — AI · · 4 min read

OpenAI Could Release Internal Tool That Would Weaken Nvidia’s Software Advantage

Mirrored from The Information — AI for archival readability. Support the source by reading on the original site.

Morning! Anissa here. 

OpenAI is open to the idea of publicly sharing software it’s been developing to make its AI run on chips from different providers, a senior executive said, a move that could weaken one of Nvidia’s biggest advantages. 

The revealing comments came from Sachin Katti, who leads compute and infrastructure at OpenAI, during a discussion with me, Amp founder Anjney Midha and SemiAnalysis data center guru Jeremie Eliahou Ontiveros. (If you missed it, you can watch the recording here.

Our conversation centered on AI developers’ choice to run workloads across different AI server chips, not just Nvidia’s, as well as multiple cloud providers. OpenAI and its rivals Anthropic and Meta don’t want to depend on a single vendor for such a core part of their business, and no one provider can supply them with enough compute capacity anyway.

“We are going to end up in a very heterogeneous world,” Katti said, referring to firms using AI chips from multiple firms. 

OpenAI, which for years depended almost completely on Nvidia chips, recently signed deals to use Amazon’s AI chips as well as those from Cerebras and AMD. It’s also developing its own custom AI chips. (Katti declined to say whether OpenAI would consider using Google’s custom AI chips the way Anthropic and Meta do.)

Developing and running advanced AI across different types of chips isn’t easy, requiring engineers who know the ins and outs of hardware systems.

OpenAI is developing software that allows its researchers and product teams to run workloads without needing to think about which servers are powering the work, Katti said. Google scaled its products by developing such “abstractions,” or software that allowed it to run computing workloads across different hardware, Katti said in a nod to Google’s famous Borg compute management system. “That’s the journey we are on” with AI, he said.

Midha said if developers like OpenAI release that kind of internal software for AI to run efficiently across AI chips from Nvidia, Google, AMD and others, it would be disruptive to Nvidia.

When asked whether OpenAI would consider doing that, Katti said it was on the table. 

“We want to make [that] capability available to the whole world,” he said, describing it as “agentic optimization capability.”

Katti didn’t elaborate on why or how OpenAI might distribute such tools. 

The remarks are notable because Nvidia’s dominance has long been aided by CUDA, its proprietary system of software compilers, libraries and optimization tools that major developers use when running software on its chips. Katti suggested AI itself could eventually reduce some of those proprietary advantages by helping others generate optimized code for multiple hardware systems.

“We do expect that we will be producing optimized kernels using AI that can actually enable all of these different silicon options,” he said. 

New software tools and AI are already weakening CUDA to some extent. Many AI developers use software called PyTorch, originally made by Meta, that makes it easier to write code for AI workloads that run on several types of AI chips, not just Nvidia’s. And some startups sell AI to translate code written with PyTorch into more complex, lower-level code that runs directly on the chips.

Vera Rubins 

The panel also offered a glimpse into OpenAI’s preparations for Nvidia’s next-generation Vera Rubin chip systems, which are expected to begin rolling out later this year.

Katti said OpenAI already has early samples of the chips and expects to bring them online for AI training toward the end of this year. He also suggested Nvidia had learned from some of the operational headaches surrounding the rollout of its initial Blackwell systems, which many cloud providers struggled to use at scale because of networking, firmware and cabling complexity. (Newer Blackwell systems haven’t faced as many problems.)

“All credit to Nvidia,” Katti said. “They actually learned from many of the growing pains.”

Katti declined to say which cloud providers would host OpenAI’s first Vera Rubin clusters, but he noted there is “healthy competition” among them over who will go first. (Its primary providers are Microsoft, Oracle and Amazon.)

He suggested that the biggest bottlenecks to getting more computing capacity are less about the chips themselves and more about the power and the complexity of bringing brand-new hardware online.

“At this point, we are more gated by power and our engineering capability rather than anything else,” he said.

In Other News…

SoftBank said it plans to develop and operate five gigawatts of AI data center capacity in France, with an investment representing about $87.5 billion.

Nvidia unveiled a new chip for personal computers with Microsoft on Monday, a major step into the PC chip market long led by Intel, Advanced Micro Devices and Apple.

The Financial Times reported that Intel plans to ship a new AI chip by the end of this year that would compete with Nvidia’s. 

Base Power, a three-year-old home-battery startup, which is also considering working with data center developers, is in talks to raise funds at a $12 billion valuation, The Information reported.

New From Our Reporters

Anthropic’s Mythos Is a Security Powerhouse. It’s Also a Budget Buster

By Aaron Holmes
The Electric

The AI Data Center Boom Ignites a Tear in Lithium Shares

By Steve LeVine

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from The Information — AI