AI + a16z

How GPU Access Helps Startups Be Agile

Anjney Midha and Derrick Harris

Posted October 23, 2024

In this episode of AI + a16z, General Partner Anjney Midha explains the forces that lead to GPU shortages and price spikes, and how the firm mitigates these concerns for portfolio companies by supplying them with the GPUs they need through a program called Oxygen. 

The TL;DR version of the problem is that competition for GPU access favors large incumbents who can afford to outbid startups and commit to long contracts. Startups who commit to the same types of deals can be stuck paying tens of millions of dollars for GPUs that, absent heavy customer demand for inference workloads, they don’t need. With Oxygen — which Anjney leads along with fellow a16z Infra partner Guido Appenzeller — we can provide portfolio companies with the resources they need, when they need them, and at a cost that makes sense given their budgets.

But because this is such an important program for our founders, we wanted to share some of the rationale behind it in a written format, as well. So here is a brief and edited Q&A based on the podcast discussion. Listen to the entire episode for much more color and anecdotes from Anjney, as well as his thoughts on other topics related to the GPU market.

What is Oxygen?

Oxygen is a program at a16z, through which we help startup founders and our companies navigate their compute challenges. Whether it’s helping them find the capacity they need for training or for inference, we have a number of options for startup founders — particularly those working on large-scale AI infrastructure efforts who might have very capital-intensive, GPU-hungry business plans.

What’s great about a16z is we’re quite used to building products like this. For example, we’ve built out an operating platform to help companies with functions like marketing or recruiting, which are universal needs, but ones where most early-stage companies can’t invest too heavily. Because we have around 550 portfolio companies, we can pass on the economies of scale to each individual company much earlier in their life. With Oxygen, our goal is to let early-stage companies access the kind of prices, short-term duration, and flexibility on compute that only much later-stage companies — often big tech — can access without the help of somebody like us.

What was the catalyst for the program?

This started from realization that a number of the AI founders we serve every day had a common problem: We were in the middle of a supply crunch, where Nvidia H100 capacity was in short supply, and, as startups, they were being deprioritized by the large clouds in favor of larger customers, which was really tough for them. The hyperscale cloud providers have very sensitive margins tied to the occupancy rates or utilization rates for their clusters, so they were basically prioritizing long-term contracts over short-term contracts. 

That’s the rational thing to do, but if you were a startup it meant that to access the same hourly price for hourly GPUs that you could have gotten just a few months ago for a six-month contract, you had to now commit to a three-year contract. From about late-2020 to mid-2023, the market rate for short-term GPU capacity had increased by 3-4x. They were being asked to commit more capital than they’d raised or even planned to raise in the next year in order to get access to those rates. 

Last summer, we had a portfolio company who had a signed contract for delivery of a set amount of GPUs, and at the last minute was told they would need to wait another three months. It turns out a bigger customer had come in and offered 3x more than the startup had agreed to pay. It wasn’t like there was some mass conspiracy theory against startups, but just the natural market forces in action. 

However, that’s a really hard thing for a startup, for three reasons. One is that, early on, you haven’t raised much capital. It’s very daunting to commit more capital than you’ve even raised. 

The second thing is that it’s very difficult to do capacity planning when you don’t know what your inference needs are going to be. In the normal scheme of things, you’d buy some short-term capacity for, let’s say, six months. You’d then train your foundation model over that timeframe. Next, you’d release the model, start getting customers, and have a pretty good sense at that point of your inference demand — which days of the week it spikes, regions you’re getting the most inference demand from, what the queue times are like when you release new features — and use that to inform your purchasing for inference. 

Whereas if you have to do all of that capacity planning upfront, you’re basically guessing in the dark, and that often makes you overcommit to a chip set or a capacity level that might not be at all what you need later on. 

The third thing it did was really put a lot of pressure on these companies to try and raise at higher valuations than they should have, because if you need to raise more to pay for these GPUs, then the only way to prevent yourself from getting diluted down is to raise the valuation.

We were in a bit of a lose-lose situation. Founders were having to pay exorbitant prices to hyperscalers, and they were having to do long-term planning when they should have just been focused on being agile and nimble about the short term.

So it helps startups focus on product instead of capital expenditures?

Right. We were able to construct a win-win situation for all parties involved. We are able to tell founders that they have guaranteed capacity at prices they can’t get anywhere else, while giving our compute partners direct access to the world’s best foundation model startups and AI startups. The most sophisticated cloud partners we work with realize the value in that — if you can build a relationship with the best foundation model companies early on as their training provider, you have a really good shot at becoming their compute supplier for inference, as well. And for the best companies it’s clear that, over the long run, the bulk of their needs come from inference, not training.

For the founders, it’s very clear what they get. They’re able to raise less and take on less long-term risk, while still being able to train great models on Day 1. Our goal is always to try to give startups unfair advantages compared to big tech companies, and that’s what our goal was here, as well. Just by resetting compute to rational, normal market rates, we were able to do that.

It actually was one of our founders who came up with the name “oxygen.” He said, “If I don’t have that kind of compute on Day 1, I can’t breathe. We literally don’t have anything for our researchers to do yet.” 

What were some of the underlying causes of the GPU crunch?

One reason is there was just net-new demand that nobody had done capacity planning for. And data centers take time to build — six months to a year, depending on how prepared you were. As a result, most of the forecasts for data center demand were pretty low and outdated by about a year to a year-and-a-half. 

When ChatGPT came out in late 2022, all capacity forecasts were wrong by a factor of 5 to 10, because OpenAI basically introduced a research preview to the world and had no expectation that it would become this incredibly popular consumer app. I would say that most data center providers had started planning for increased demand from training runs, but there wasn’t this explosive killer app that was driving inference demand from the consumer market. That’s when stuff really exploded. 

Starting in January 2022, it was OpenAI trying to buy inference capacity wherever they could get it, and then every other foundation-model lab going, “Wow, we need to catch up. We need to build our own comparable frontier language model.” As a result, some of them were paying absurd rates to buy out existing contracts that clouds had already committed to. 

What does the near-term future look like for access to GPUs?

It really depends on any given company’s workloads at any given time.

Earlier this year, Nvidia announced the Blackwell line. We’re yet to see live production benchmarks of these but, based on early tests, the B200 has two-and-a-half times more horsepower than the H100 line. As a result, we’re seeing a number of companies who made long-term commitments to the H100s feeling really nervous now about when the Blackwells hit next year, because competitors who didn’t make long-term commitments now get access to the Blackwell chips and will be able to train things two-and-a-half times faster.

On the other hand, if you put out a product that a lot of people want to use — you have a lot of inference demand — you can essentially swap out your training workloads for your inference workloads on those H100s. They’re incredibly powerful chips that you can run really good inference workloads on. 

What are the cost differences between training and inference workloads?

While training is expensive, it’s also quite predictable. It might have cost around $20 million to train a model for three months at the height of the GPU shortage, but that number was not going to move much. 

Inference is cheaper because the workload is so much smaller — it might cost about one-tenth of 1 cent to generate a single image — but it’s much more unpredictable because it’s a variable cost. You have to serve it as and when your customers want it. If you haven’t even launched your model yet, you don’t even have product-market fit, you’re basically guessing in the dark for your inference demand. Buying inference and then having it sit idle is just wasting money.

How does the availability of open source models affect GPU demand?

The availability of high-quality open source models that have permissive licenses creates massive deflation for downstream developers, because they can piggyback off of all the millions of dollars that companies like Meta, Mistral, and Black Forest Labs spent on their open source models. Those are all compute cycles or FLOPs that are being given away for free to the developer community. The more open source there is, the more FLOPs are amortized across the entire community rather than having to be respent or repurchased by every individual developer. 

Is a program like Oxygen an ongoing necessity? 

Oxygen will be a pretty core value proposition to our companies as long as AI is growing and an important part of our world. What has changed is it has become more clear to the world’s clouds and data center providers that AI is here to stay, so they’re doing much better forecasting than they were when this whole wave got started. 

What hasn’t changed is that even as a tiny, seed-stage startup — as a little guy — you still need help to be treated like a big guy. The reality of market forces is that, as long as there are bigger customers who get better treatment because they’re buying in bulk, there will always be a need for us to help our companies be treated that way. And that’s really what the Oxygen program was designed all along to do: help the little folks get the same treatment as big tech on compute pricing, duration, and chip offerings. 

More About This Podcast

Artificial intelligence is changing everything from art to enterprise IT, and a16z is watching all of it with a close eye. This podcast features discussions with leading AI engineers, founders, and experts, as well as our general partners, about where the technology and industry are heading.

Learn More