AI + a16z

How GPU Access Helps Startups Be Agile

Anjney Midha and Derrick Harris

Posted October 23, 2024

In this episode of AI + a16z, General Partner Anjney Midha explains the forces that lead to GPU shortages and price spikes, and how the firm mitigates these concerns for portfolio companies by supplying them with the GPUs they need through a program called Oxygen.

The TL;DR version of the problem is that competition for GPU access favors large incumbents who can afford to outbid startups and commit to long contracts. Startups who commit to the same types of deals can be stuck paying tens of millions of dollars for GPUs that, absent heavy customer demand for inference workloads, they don’t need. With Oxygen — which Anjney leads along with fellow a16z Infra partner Guido Appenzeller — we can provide portfolio companies with the resources they need, when they need them, and at a cost that makes sense given their budgets.

But because this is such an important program for our founders, we wanted to share some of the rationale behind it in a written format, as well. So here is a brief and edited Q&A based on the podcast discussion. Listen to the entire episode for much more color and anecdotes from Anjney, as well as his thoughts on other topics related to the GPU market.

What is Oxygen?

Oxygen is a program at a16z, through which we help startup founders and our companies navigate their compute challenges. Whether it’s helping them find the capacity they need for training or for inference, we have a number of options for startup founders — particularly those working on large-scale AI infrastructure efforts who might have very capital-intensive, GPU-hungry business plans.

What’s great about a16z is we’re quite used to building products like this. For example, we’ve built out an operating platform to help companies with functions like marketing or recruiting, which are universal needs, but ones where most early-stage companies can’t invest too heavily. Because we have around 550 portfolio companies, we can pass on the economies of scale to each individual company much earlier in their life. With Oxygen, our goal is to let early-stage companies access the kind of prices, short-term duration, and flexibility on compute that only much later-stage companies — often big tech — can access without the help of somebody like us.

What was the catalyst for the program?

This started from realization that a number of the AI founders we serve every day had a common problem: We were in the middle of a supply crunch, where Nvidia H100 capacity was in short supply, and, as startups, they were being deprioritized by the large clouds in favor of larger customers, which was really tough for them. The hyperscale cloud providers have very sensitive margins tied to the occupancy rates or utilization rates for their clusters, so they were basically prioritizing long-term contracts over short-term contracts.

That’s the rational thing to do, but if you were a startup it meant that to access the same hourly price for hourly GPUs that you could have gotten just a few months ago for a six-month contract, you had to now commit to a three-year contract. From about late-2020 to mid-2023, the market rate for short-term GPU capacity had increased by 3-4x. They were being asked to commit more capital than they’d raised or even planned to raise in the next year in order to get access to those rates.

Last summer, we had a portfolio company who had a signed contract for delivery of a set amount of GPUs, and at the last minute was told they would need to wait another three months. It turns out a bigger customer had come in and offered 3x more than the startup had agreed to pay. It wasn’t like there was some mass conspiracy theory against startups, but just the natural market forces in action.

However, that’s a really hard thing for a startup, for three reasons. One is that, early on, you haven’t raised much capital. It’s very daunting to commit more capital than you’ve even raised.

The second thing is that it’s very difficult to do capacity planning when you don’t know what your inference needs are going to be. In the normal scheme of things, you’d buy some short-term capacity for, let’s say, six months. You’d then train your foundation model over that timeframe. Next, you’d release the model, start getting customers, and have a pretty good sense at that point of your inference demand — which days of the week it spikes, regions you’re getting the most inference demand from, what the queue times are like when you release new features — and use that to inform your purchasing for inference.

Whereas if you have to do all of that capacity planning upfront, you’re basically guessing in the dark, and that often makes you overcommit to a chip set or a capacity level that might not be at all what you need later on.

The third thing it did was really put a lot of pressure on these companies to try and raise at higher valuations than they should have, because if you need to raise more to pay for these GPUs, then the only way to prevent yourself from getting diluted down is to raise the valuation.

We were in a bit of a lose-lose situation. Founders were having to pay exorbitant prices to hyperscalers, and they were having to do long-term planning when they should have just been focused on being agile and nimble about the short term.

So it helps startups focus on product instead of capital expenditures?

Right. We were able to construct a win-win situation for all parties involved. We are able to tell founders that they have guaranteed capacity at prices they can’t get anywhere else, while giving our compute partners direct access to the world’s best foundation model startups and AI startups. The most sophisticated cloud partners we work with realize the value in that — if you can build a relationship with the best foundation model companies early on as their training provider, you have a really good shot at becoming their compute supplier for inference, as well. And for the best companies it’s clear that, over the long run, the bulk of their needs come from inference, not training.

For the founders, it’s very clear what they get. They’re able to raise less and take on less long-term risk, while still being able to train great models on Day 1. Our goal is always to try to give startups unfair advantages compared to big tech companies, and that’s what our goal was here, as well. Just by resetting compute to rational, normal market rates, we were able to do that.

It actually was one of our founders who came up with the name “oxygen.” He said, “If I don’t have that kind of compute on Day 1, I can’t breathe. We literally don’t have anything for our researchers to do yet.”

What were some of the underlying causes of the GPU crunch?

One reason is there was just net-new demand that nobody had done capacity planning for. And data centers take time to build — six months to a year, depending on how prepared you were. As a result, most of the forecasts for data center demand were pretty low and outdated by about a year to a year-and-a-half.

When ChatGPT came out in late 2022, all capacity forecasts were wrong by a factor of 5 to 10, because OpenAI basically introduced a research preview to the world and had no expectation that it would become this incredibly popular consumer app. I would say that most data center providers had started planning for increased demand from training runs, but there wasn’t this explosive killer app that was driving inference demand from the consumer market. That’s when stuff really exploded.

Starting in January 2022, it was OpenAI trying to buy inference capacity wherever they could get it, and then every other foundation-model lab going, “Wow, we need to catch up. We need to build our own comparable frontier language model.” As a result, some of them were paying absurd rates to buy out existing contracts that clouds had already committed to.

What does the near-term future look like for access to GPUs?

It really depends on any given company’s workloads at any given time.

Earlier this year, Nvidia announced the Blackwell line. We’re yet to see live production benchmarks of these but, based on early tests, the B200 has two-and-a-half times more horsepower than the H100 line. As a result, we’re seeing a number of companies who made long-term commitments to the H100s feeling really nervous now about when the Blackwells hit next year, because competitors who didn’t make long-term commitments now get access to the Blackwell chips and will be able to train things two-and-a-half times faster.

On the other hand, if you put out a product that a lot of people want to use — you have a lot of inference demand — you can essentially swap out your training workloads for your inference workloads on those H100s. They’re incredibly powerful chips that you can run really good inference workloads on.

What are the cost differences between training and inference workloads?

While training is expensive, it’s also quite predictable. It might have cost around $20 million to train a model for three months at the height of the GPU shortage, but that number was not going to move much.

Inference is cheaper because the workload is so much smaller — it might cost about one-tenth of 1 cent to generate a single image — but it’s much more unpredictable because it’s a variable cost. You have to serve it as and when your customers want it. If you haven’t even launched your model yet, you don’t even have product-market fit, you’re basically guessing in the dark for your inference demand. Buying inference and then having it sit idle is just wasting money.

How does the availability of open source models affect GPU demand?

The availability of high-quality open source models that have permissive licenses creates massive deflation for downstream developers, because they can piggyback off of all the millions of dollars that companies like Meta, Mistral, and Black Forest Labs spent on their open source models. Those are all compute cycles or FLOPs that are being given away for free to the developer community. The more open source there is, the more FLOPs are amortized across the entire community rather than having to be respent or repurchased by every individual developer.

Is a program like Oxygen an ongoing necessity?

Oxygen will be a pretty core value proposition to our companies as long as AI is growing and an important part of our world. What has changed is it has become more clear to the world’s clouds and data center providers that AI is here to stay, so they’re doing much better forecasting than they were when this whole wave got started.

What hasn’t changed is that even as a tiny, seed-stage startup — as a little guy — you still need help to be treated like a big guy. The reality of market forces is that, as long as there are bigger customers who get better treatment because they’re buying in bulk, there will always be a need for us to help our companies be treated that way. And that’s really what the Oxygen program was designed all along to do: help the little folks get the same treatment as big tech on compute pricing, duration, and chip offerings.

Contributors

Anjney Midha is a Venture Partner at Andreessen Horowitz where he serves on the boards of Black Forest Labs, Luma AI, Mistral AI, Sesame AI, LMArena, OpenRouter, and Periodic Labs.
- Follow
- X
- Linkedin
Derrick Harris is an editor at a16z, managing the content workflow across the Infra and American Dynamism teams.
- Follow
- X
- Linkedin

More From These Contributors

Views expressed in “posts” (including podcasts, videos, and social media) are those of the individual a16z personnel quoted therein and are not the views of a16z Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell — or a solicitation of an offer to buy — any securities, and may not be used or relied upon in evaluating the merits of any investment.

The contents in here — and available on any associated distribution platforms and any public a16z online social media accounts, platforms, and sites (collectively, “content distribution outlets”) — should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here or on a16z content distribution outlets are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, posts may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein. All content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website — or on associated content distribution outlets — be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles — which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters.

There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available here: https://a16z.com/investments/. Past results of a16z’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Excluded from this list are investments (and certain publicly traded cryptocurrencies/ digital assets) for which the issuer has not provided permission for a16z to disclose publicly. As for its investments in any cryptocurrency or token project, a16z is acting in its own financial interest, not necessarily in the interests of other token holders. a16z has no special role in any of these projects or power over their management. a16z does not undertake to continue to have any involvement in these projects other than as an investor and token holder, and other token holders should not expect that it will or rely on it to have any particular involvement.

With respect to funds managed by a16z that are registered in Japan, a16z will provide to any member of the Japanese public a copy of such documents as are required to be made publicly available pursuant to Article 63 of the Financial Instruments and Exchange Act of Japan. Please contact compliance@a16z.com to request such documents.

For other site terms of use, please go here. Additional important information about a16z, including our Form ADV Part 2A Brochure, is available at the SEC’s website: http://www.adviserinfo.sec.gov.

More About This Podcast

Artificial intelligence is changing everything from art to enterprise IT, and a16z is watching all of it with a close eye. This podcast features discussions with leading AI engineers, founders, and experts, as well as our general partners, about where the technology and industry are heading.

Learn More

RECOMMENDED FOR YOU

Performance and Passion: Fal's Approach to AI Inference Burkay Gur, Batuhan Taskaya, and Jennifer Li Read More
How to Vibe Code Securely Feross Aboukhadijeh and Joel de la Garza Read More
AI Is Upending SaaS Pricing Scott Woody and Martin Casado Read More
AI's Unsung Hero: Data Labeling and Expert Evals Manu Sharma and Matt Bornstein Read More
AI, Data Engineering, and the Modern Data Stack Tristan Handy, Jennifer Li, and Matt Bornstein Read More

go to top