How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025

Sarah Wang, Shangda Xu, Justin Kahl, and Tugce Erten

Table of Contents

Just over a year ago, we highlighted 16 changes to the way enterprises approached building and buying gen AI. Since then, the landscape has continued to evolve quickly—so we revisited our conversations with over two dozen enterprise buyers and surveyed 100 CIOs across 15 industries to help founders understand how these leaders are using, buying, and budgeting for gen AI in 2025 and beyond.¹

Even in a field where the only constant is change, the gen AI market structure has evolved significantly beyond our expectations since we ran our last survey over a year ago.

Enterprise AI budgets grew beyond already high forecasts and graduated from pilot programs and innovation funds to recurring line-items in core IT and business unit budgets.
Organizations are much more sophisticated at mixing and matching multiple models to optimize across both performance and cost. OpenAI, Google, and Anthropic took dominant overall market share in our survey while Meta and Mistral were popular among open source options.
Procurement now mirrors traditional software buying—with more rigorous evaluations, hosting considerations, and benchmark scrutiny—while increasingly complex AI workflows are driving higher switching costs.
Meanwhile, the AI app landscape has matured: off-the-shelf solutions are eclipsing custom builds and rewarding AI-native third party applications.

To give founders a more nuanced look at what’s top of mind for enterprise buyers today, we’ll dig into these shifts in resourcing, models, procurement, and application usage below.

Budgets: AI spend blew past high expectations and is here to stay

1. Budgets are bigger than expected with no signs of slowing down

LLM budgets have grown ahead of enterprises’ (already high) expectations from a year ago, and there are no signs of this slowing down. Enterprise leaders expect an average of ~75% growth over the next year. As one CIO noted, “what I spent in 2023 I now spend in a week.”

Spend growth is driven partially by enterprises discovering more relevant internal use cases and increasing employee adoption. On top of this, we’re beginning to see more customer-facing use cases—especially for tech-forward companies—that have the potential to drive exponential spend growth. One large technology company said, “we’ve been mostly focused on internal use cases so far, but this year we’re focused on customer-facing gen AI where spend will be significantly larger.”

2. Gen AI spend graduates to permanent budget lines

Last year, innovation budgets still made up a quarter of LLM spending; this has now dropped to just 7%. Enterprises are increasingly paying for AI models and apps via centralized IT and business unit budgets, reflecting the growing sentiment that gen AI is no longer experimental but essential to business operations. One CTO noted that, “more of our products are adding AI enablement, so our spending growth will rise across all of these products”—suggesting this shift toward core budgets will only accelerate.

Models: three leaders emerging as differentiated performance by use case drives more model diversification

3. The multi-model world is here to stay and model differentiation—not commoditization—is the key driver

With several highly capable LLMs now available, it’s become the norm to have multiple models deployed in production use cases. While one reason for this is certainly to avoid vendor lock-in, model differentiation by use case has become increasingly pronounced and is the main reason enterprises buy models from multiple vendors. In this year’s survey, 37% of respondents are now using 5 or more models as opposed to 29% last year.

While in some cases models appear to have comparable scores on general purpose evaluations, it’s clear that the enterprise model layer has not become commoditized. It’s well known, for instance, that Anthropic’s models excel in coding-related tasks, but there’s more nuance to this claim. Within coding, some users report that Claude performs better for fine-grained code completion, while Gemini is stronger in higher-level system design and architecture. In other domains, such as text-based applications, one customer observed that “Anthropic is a bit better at writing tasks—language fluency, content generation, brainstorming—while OpenAI models are better for more complex question-answering.” These differences have made it best practice to use multiple models, and we expect this strategy will continue as customers build applications for performance and keep an eye towards remaining vendor agnostic.

4. Model landscape is crowded but clear leadership is emerging

While enterprises continued to use different models across both experimental and production use cases as explored above, a few players took the lead on overall adoption: OpenAI maintained overall market share leadership, while Google and Anthropic made considerable strides over the last year. Market share differed somewhat by scale of the enterprise, with more open source adoption occurring at the larger end of enterprises where on-prem is still a major consideration.

Double-clicking further into usage:

Enterprises use a broad suite of OpenAI’s models. GPT-4o is the model most deployed to production, while OpenAI o3 has generated significant interest as reasoning models are more integrated into production use cases. OpenAI’s non-frontier models are adopted at much higher rates than other vendors’, so it’s Google and Anthropic’s frontier models—Gemini 2.5 and Claude Sonnet 3.5 onward—that have gotten them in the door. Specifically, 67% of OpenAI users have deployed non-frontier models in production, compared to just 41% for Google and 27% for Anthropic.
Google’s rise has been more pronounced within large enterprises, as they frequently have existing relationships with GCP and can tap into the brand trust of a mega-cap company. Gemini models have long touted best-in-class context windows, but the overall performance of Gemini 2.5 vaulted them into true frontier model status. What has also become apparent—and may give Google staying power—is Google’s performance-to-cost ratio. In an example of models with comparable intelligence, Gemini 2.5 flash costs 26 cents / million tokens while GPT-4.1 mini costs 70 cents.
In contrast, Anthropic has seen the highest adoption in the most tech-forward companies, specifically software companies and startups. Their models have excelled in certain use cases, most notably code, and they power the fastest growing AI coding applications as a result. This use case-level leadership has not gone unnoticed in the enterprise, with tech-forward leaders actively evaluating performance by use case and typically selecting Anthropic for engineering and coding. This has been a boon to Anthropic’s revenue and given them the scale and reputation valued by more traditional large enterprises as well.
Adoption of open source models like Llama and Mistral tended to be higher at larger enterprises relative to their adoption at smaller companies. This has typically been driven by the preference for on-prem solutions, given data security and compliance considerations, as well as the ability to fine-tune for specific enterprise use cases.
Newer model providers like xAI are seeing strong interest and early testing out of the gate—reminiscent of last year’s behavior and a reminder that model market share remains dynamic.

5. Closed source price-to-performance ratio has become more compelling for non-frontier models

As we’ve previously discussed, model costs are coming down by an order of magnitude every 12 months. Against this backdrop, we’ve also seen the price-to-performance ratio of closed source become much more compelling for small and medium models, with xAI’s Grok 3 mini and Google’s Gemini 2.5 Flash taking the lead on this count. In some cases, customers more frequently opt for closed source models given this shift, along with other ecosystem benefits. As one customer said, “The pricing has gotten appealing and we’re already embedded with Google: we use everything from G Suite to databases, and their enterprise expertise is attractive.” Or more concisely put by another: “Gemini is cheap.”

6. Fine-tuning viewed as less necessary as model capabilities improve

Improved model capabilities—chiefly higher intelligence and longer context windows—have made fine-tuning less critical to achieving strong model performance for a specific use case. Instead, companies have found that prompt engineering can drive similar or better results, often at much lower cost. As one enterprise observed, “instead of taking the training data and parameter-efficient fine-tuning, you just dump it into a long context and get almost equivalent results.”

This move away from fine-tuning also helps companies avoid model lock-in, as fine-tuned models require high upfront costs and engineering work while prompts can be more easily ported from one model to another. This is important in a world where models are rapidly improving and companies want the benefits of staying on the leading edge.

That said, companies with hyper-specific use cases are still fine-tuning models. For instance, one streaming service fine-tunes open source models for query augmentation in video search “where you need more domain adaptation.” We might also see a rise in fine-tuning if newer methods, like reinforcement fine tuning, become more widely adopted beyond the labs.

As model capabilities improve, most enterprises aren’t seeing as much ROI on fine-tuning as last year and mainly opt for open source models for highly cost-sensitive use cases.

7. Enterprises optimistic on reasoning models and poised to scale quickly

By allowing LLMs to complete more complex tasks more accurately, reasoning models have expanded the range of use cases that LLMs can tackle. Enterprises are still early in their testing of reasoning models and few have deployed them in production, but companies are very optimistic about their potential. One executive we interviewed captured this well: “[reasoning models] allow us to solve newer, more complex use cases, so I anticipate a big jump in our usage. But we’re still early and testing today.”

Among early adopters, OpenAI’s reasoning models have seen the greatest traction. Despite significant industry buzz around DeepSeek, enterprises are overwhelmingly adopting OpenAI, with 23% of enterprises surveyed already using OpenAI’s o3 model in production compared to just 3% for DeepSeek. DeepSeek’s adoption was higher among startups relative to its low pickup in the enterprise.

Procurement: enterprise AI buying adopts the rigor of traditional software buying

8. Buying process for models increasingly resembles traditional enterprise software procurement, complete with checklists and price sensitivity

Companies now approach model selection with disciplined evaluation frameworks, and factors such as security—which was heavily emphasized in our interviews—and cost have gained ground on overall accuracy and reliability. This shift underscores the increased trust enterprises have in model performance and the confidence that LLMs will be deployed at scale. As one leader succinctly summarized, “for most tasks, all the models perform well enough now—so pricing has become a much more important factor.”

As we mentioned in the “Models” section, enterprises are also becoming more sophisticated in matching specific use cases with the right model. For highly visible or performance-critical applications, companies typically prefer leading-edge models with strong brand recognition. In contrast, for simpler or internal tasks, model choice often comes down purely to cost. See below for how these LLM KPCs (key purchasing criteria) have changed over time.

9. Hosting preferences still vary widely, though enterprises have quickly built trust for model providers over the last year

While there is still some preference for existing cloud relationships (similar to other infra purchases), more enterprises are hosting either directly with model providers or via Databricks, particularly in cases where the model of choice is not hosted by their main cloud provider (e.g., OpenAI for AWS customers). This is typically because leaders “want direct access to the latest model with the best performance as soon as it’s available. Early access previews are important too.” The increased trust in going direct with model providers including OpenAI and Anthropic is a significant shift from what we heard in last year’s interviews with enterprises: many opted to access models via a cloud provider whenever possible, sometimes even if it wasn’t via their primary cloud provider.

10. Switching costs are rising as AI tackles more complex tasks

Last year, we found that most enterprises were designing their applications to minimize switching costs and make models as interchangeable as possible. As a result, many enterprises treated models as “easy come, easy go.” That might have worked well for simple, one-shot use cases, but the rise of agentic workflows has started making it more difficult to switch between models.

As companies invest the time and resources into building guardrails and prompting for agentic workflows, they’re more hesitant to switch to other models in the event that their results won’t be replicable or that they’ll need to invest significant time into engineering the reliability of a different model. Agentic workflows often require multiple steps to complete a task, so changing one part of a model’s workflow could impact all downstream dependencies. As one leader told us, “all the prompts have been tuned for OpenAI. Each one of them has their own set of instructions and prompts and details. How LLMs get instructions to do agentic processing—it takes lots of pages of instruction. Also, quality assurance of agents is not super easy, so changing models is now a task that can take a lot of engineering time.”

11. Enterprises are increasingly referencing external benchmarks as quasi-“Magic Quadrants” as an initial filter for model selection

As models proliferate, external evaluations offer a practical, Gartner-like filter that enterprises recognize from their traditional software procurement processes.

While internal benchmarks, golden datasets, and developer feedback are still critical parts of assessing LLM performance more deeply, the maturation of the LLM market has driven companies to increasingly reference external benchmarks like LM Arena. Though these external benchmarks help enterprise buyers sort the market, leaders also noted that these benchmarks are just one factor in a broader evaluation process: “we definitely look at the external benchmarks. But you still need to assess yourself. It’s hard to pick without really trialing things and getting employee feedback.”

Rise of the app: AI apps growth is gangbusters as more enterprises decide to buy instead of build across more use cases

12. Enterprises are shifting from “build” to “buy” as the AI application ecosystem takes shape

Early in the AI product cycle, enterprises largely opted to work directly with AI models and build their own applications. However, we’ve seen a marked shift towards buying third party applications over the last twelve months as the ecosystem of AI apps has started to mature. This is particularly true as the dynamic performance and cost differentiation across models has resulted in incremental ROI gains from constant evaluation and optimization by use case, often best tackled by a dedicated AI application team instead of an internal team.

Moreover: in a space as dynamic as AI, companies are finding that internally developed tools are difficult to maintain and frequently don’t give them a business advantage—which further cements their interest in buying instead of building apps.

As more application categories mature, we’d expect to see this trend swing harder towards third-party applications in the future, as evidenced by the leading indicator of leaders considering apps more heavily when testing new use cases. In the case of customer support, for instance, over 90% of survey respondents noted that they were testing third-party apps. One public fintech noted that while they had started to build customer support internally, a recent review of third-party solutions on the market convinced them to buy instead of continuing their build. The one area where we haven’t seen this trend play out is in regulated or high-risk industries like healthcare, where data privacy and compliance are more top of mind.

13. Buyers struggle with outcome-based pricing for apps

While there’s a lot of hype around outcome-based pricing for AI, CIOs are still uncomfortable with how outcome metrics are set, measured, and billed.

Some of the top concerns with outcome-based pricing were lack of clear outcomes that map to business goals, unpredictable costs, and attribution—but there was no consensus on how vendors could mitigate these issues. This isn’t surprising, as AI is a relatively new technology and it’s not yet clear how to implement it so it drives real value for businesses. Buyers don’t know how much they’re going to be charged and don’t want to be left holding the bag. Given this, most CIOs still prefer paying by usage for AI applications.

14. Software development emerges as a killer use case—with others close behind

While we’ve seen progressive adoption of AI use cases across the board—especially internal enterprise search, data analysis, and customer support—software development has seen a step change in adoption, driven by a perfect storm of extremely high-quality off-the-shelf apps, a significant increase in model capabilities, relevance for a broad set of companies and industries, and a no-brainer ROI use case.

One CTO at a high-growth SaaS company reported that nearly 90% of their code is now AI-generated through Cursor and Claude Code, up from 10–15% 12 months ago with GitHub Copilot. This level of adoption still represents the bleeding edge, but is likely a strong leading indicator for the enterprise.

15. Prosumer market has driven much of the early app growth and enterprise buying behavior

Strong consumer brands are translating into strong enterprise demand.

Like some of the early platform shifts (e.g., the internet), much of the early growth across leading enterprise AI apps has been driven by the prosumer market. This was kicked off by ChatGPT and underscored by coding apps and creator tools like ElevenLabs. Many CIOs noted their decision to purchase enterprise ChatGPT was driven by “employees loving ChatGPT. It’s the brand name they know.” This dual market pull has led to much faster growth in the next generation of AI companies than we’ve seen in the past.

16. AI-native quality and speed starting to outpace incumbents

Incumbents have always benefited from established trust and existing distribution, but in the AI era, they’re increasingly outperformed by AI-native competitors from a product quality and velocity perspective.

Unsurprisingly, the primary reason buyers prefer AI-native vendors is their faster innovation rate. The second reason is the recognition that companies built around AI from the ground up deliver fundamentally better products with superior outcomes compared to incumbents retrofitting AI into existing solutions.

This gap is especially clear in software development today, where one public security company CIO highlighted a stark difference in capabilities between first-generation and second-generation AI coding tools as coding becomes more agentic. The shift is also echoed in user satisfaction data: users who have adopted Cursor, a gen AI-native coding solution, show notably lower satisfaction with previous-gen tools like GitHub Copilot, underscoring how quickly innovation fundamentally reshapes the outcomes buyers can and should expect from AI.

Looking ahead

The enterprise AI landscape is no longer defined by experimentation: it’s shaped by strategic deployment, budget commitment, and maturing vendor ecosystems. As model choice diversifies, fragmentation by use case is not only expected but embraced, and a few key leaders are emerging. Enterprises are adopting structured procurement processes and increasingly turning to off-the-shelf applications to accelerate adoption. The result is a market that looks more like traditional software—yet moves with the speed and complexity unique to AI.

Notes

1 Survey as of May 8, 2025. Does not include models launched after this date, including the Claude 4 model family.

Contributors

More From These Contributors

Views expressed in “posts” (including podcasts, videos, and social media) are those of the individual a16z personnel quoted therein and are not the views of a16z Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell — or a solicitation of an offer to buy — any securities, and may not be used or relied upon in evaluating the merits of any investment.

The contents in here — and available on any associated distribution platforms and any public a16z online social media accounts, platforms, and sites (collectively, “content distribution outlets”) — should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here or on a16z content distribution outlets are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, posts may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein. All content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website — or on associated content distribution outlets — be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles — which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters.

There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available here: https://a16z.com/investments/. Past results of a16z’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Excluded from this list are investments (and certain publicly traded cryptocurrencies/ digital assets) for which the issuer has not provided permission for a16z to disclose publicly. As for its investments in any cryptocurrency or token project, a16z is acting in its own financial interest, not necessarily in the interests of other token holders. a16z has no special role in any of these projects or power over their management. a16z does not undertake to continue to have any involvement in these projects other than as an investor and token holder, and other token holders should not expect that it will or rely on it to have any particular involvement.

With respect to funds managed by a16z that are registered in Japan, a16z will provide to any member of the Japanese public a copy of such documents as are required to be made publicly available pursuant to Article 63 of the Financial Instruments and Exchange Act of Japan. Please contact compliance@a16z.com to request such documents.

For other site terms of use, please go here. Additional important information about a16z, including our Form ADV Part 2A Brochure, is available at the SEC’s website: http://www.adviserinfo.sec.gov.

More on AI in the enterprise

go to top