Enterprise

Avoiding Death on the Yellow Brick Road

Joe Schmidt Posted May 27, 2026

Avoiding Death on the Yellow Brick Road Table of Contents

The question I keep getting from founders and prospective employees: is there any AI application layer left to build, or are OpenAI and Anthropic going to kill everything?

There’s a particular flavor of AI psychosis behind the question. Some people have concluded the only durable places to avoid the permanent underclass are inside a big lab or out on the frontier building in robotics, hardtech, or similar – theoretically anything “the labs can’t touch.” If every piece of software is about to be eaten, either by Codex or Claude absorbing the work directly, or by a future model that will make whatever you’ve built unnecessary, then run!

Listen I’m as much of an AI maximalist as almost anyone, and I think they’re half right. The labs really are coming for a huge swath of the application surface. But “the application layer” isn’t just one homogenous opportunity. The right framing is whether you’re on the Yellow Brick Road or somewhere else in Oz.

The Yellow Brick Road is our shorthand for the path the labs are walking, where they’re committing extraordinary resources. The reason the labs are best-suited for problems like code generation, writing, or image-creation is because these problems improve with raw model capability: every dollar spent on pre-training and post-training improves product quality. Meanwhile, the rest of Oz is inhabited by more complex, often vertical problems, that aren’t as simple as giving a business user a horizontal tool with access to standard tools and computer use. The value comes less from the underlying model’s raw capability (though that’s still important!) than from the scaffolding around it that makes the output trustworthy, compliant, and operational inside a specific industry.

We’re seeing this play out in real time as OpenAI and Anthropic are effectively telling the market they can’t solve every problem with a generic AI coworker. They’ve announced massive forward-deployed joint ventures to build whole companies around configuring and customizing their models for the enterprise. You don’t pour billions into those programs if you think the next model release is going to take care of it.

So if you want to get rich building AI apps – avoid the yellow brick road and build somewhere else in Oz. Here’s what we’ve learned, and what some of our portfolio founders have learned, about what works.

The Yellow Brick Road

If you’re starting a company, The Yellow Brick Road is the most obvious path to go down, but it’s the most dangerous. Take a high performing model, plug in some off-the-shelf connectors (like G Drive, Slack, Salesforce, Notion, GitHub), and ship some sort of agentic orchestration layer on top of that. Magic!

The problem with this is that this is what the labs are doing with Cowork and Codex. Obviously, they own the model, which gives them better margins, control, and the ability to exert pricing power on anyone who’s downstream from them. But maybe most importantly also own the architectural choices that define what their products are built to solve well. They’ve been deliberate so far about the model plus tool calls pattern, and this is exactly what horizontal low-step-count work on the road requires. Even if a startup could somehow outperform Codex or Claude Code, the labs have massive distribution arms and the biggest brand halo in AI.

If you’re an AI app company running that playbook with the same connectors, no sub-agents or configuration below it, and no distribution, you’re likely walking down the road to nowhere.

The Rest of Oz

It’s not all doom and gloom for startups. There’s an enormous opportunity outside the Yellow Brick Road, where startups have a clear path to own their customer and solve complex problems.

These businesses are building agentic experiences where the model is woven through a complex web of tools, automations, and integrations (read: software), leading most of these startups to be vertical by default. They can focus on multi-step and multi-player work, with sub-agents for role- and vertical-specific tasks, that Anthropic and OpenAI can’t reach with horizontal platforms: gathering context across systems, then routing through multiple humans who have to approve at different stages. It often involves one or more legacy systems, tends toward needing deterministic outcomes where ambiguity isn’t acceptable, and is at times tied to some valuable business outcome. The labs understand how valuable these problems are: that’s why they’re building their own outsourced configuration shops, and why an entire upmarket class of reinforcement learning businesses exist.

Why the rest of Oz won’t be owned by the Wizard

The response to the above would be that to date, it’s been a pretty bad trade to bet against the models/labs improving. They’ll likely just keep getting better and eventually eat into the market served by these application layer businesses.

The labs will certainly improve, but I’d argue there are a few ways the rest of Oz can defend themselves over time:

Data and learning flywheels:

A lot of what you internalize isn’t in any training set — unwritten industry norms, undocumented standards, the tribal knowledge that lives in practitioners’ heads. None of it is on the public web. No amount of training compute substitutes for being inside the workflows where this knowledge actually lives. There are two flywheels stacked on top of each other here: an across-customer one — patterns that compound as you see more variants of the same problem — and a within-customer one — the why behind specific decisions, the unsaid exceptions, the firm’s own rules of thumb that only surface through real interaction with the system.

Even if customer data can’t be used across customers, application companies will be able to leverage pattern recognition across customer problem types, and use that to inform the right architecture for future problems. A company that has run its agents through a hundred legal redlines, a thousand insurance underwriting cycles, or ten thousand SDR campaigns has internalized the shape of the problem in a way the next entrant cannot replicate by spinning up a fresh agent for the first time.

A horizontal agent could in principle build the same learning infrastructure. The reason it doesn’t, beyond pure focus, is UX: capturing this kind of knowledge depends entirely on the workflow surfaces you give the user, and vertical players can shape those surfaces around exactly what their workflow needs to surface. Horizontal tools can’t. Eval sets, labeled outputs, and edge-case taxonomies can compound into a vertical-specific data flywheel which can fuel fine-tuning the next entrant can’t generate without comparable production exposure. Whether this is possible depends on data rights, the volume of production exposure accumulated, and the structure of customer contracts, but pattern recognition accrues regardless.

Managing model variability and complexity: The labs are already routing internally — different model classes for different requests, ensembles under the hood. What they can’t do is route across vendors, or evaluate a competitor’s model for a specific sub-task, or use an open-source fine-tune for the narrow piece where it’s actually best. The Rest of Oz company picks the right model for each sub-task across the entire model market, not just what its parent lab ships. It also does the work nobody wants to do — re-running evals on upgrades, recalibrating prompts for the customer’s edge cases, rolling out without breaking production — every time a new model lands. The labs aren’t doing this on the customer’s behalf; they sell you their next model and tell you to migrate. The Rest of Oz company absorbs the migration. What the customer gets is the best intelligence available across the whole market, plus continuity through every upgrade.

Cost optimization: Running every query through Opus 4.7 is the fastest path to negative gross margins. The best Rest of Oz companies route across tiers of models — frontier models for the hardest tasks, mid-tier for the bulk, smaller custom or fine-tuned models where they’ve earned the right to use them. Some are now post-training their own models on top of that, optimizing them for the narrow slice of work their customer cares about and serving them at a fraction of the cost of a frontier API call. The labs price the floor: the least intelligence available at $X. The Rest of Oz company sells the inverse — the lowest dollar cost for the specific level of intelligence the workflow actually requires. That’s only possible if you know exactly what level each sub-task needs, which the labs structurally can’t know across every vertical. It translates directly into lower, controlled prices for outcomes.

Governance: There is considerable value in becoming the control plane for how their customers run AI in that vertical – the place where permissions, auditing, what-the-agent-is-allowed-to-do, and what-the-agent-actually-did all converge. That control plane is built out of use case specific guardrails that look completely different across industries and job types. Because they own the tools, the workflows, and the data the agent touches end-to-end, they can provide deterministic outcomes in ways horizontal tools will struggle to. They are also the entity that absorbs the regulatory complexity for the end buyer — FRCP and bar rules in legal, HIPAA in healthcare, SEC and FINRA in finance, state insurance regulations, and so on. A horizontal player can’t credibly do that without becoming a hundred different verticals at once. CIOs want to have a partner that contractually states they are handling compliance for the agents they are providing.

All of these come back to the same thing: focus. That could be a vertical (insurance, legal, accounting) or a function done deeply (sales, customer support, finance). Either way, the work needs a team that’s heads-down on one customer set — its workflows, its edge cases, its regulations. The labs aren’t built for that. They have to be everywhere, for everyone, which is how they built the Yellow Brick Road in the first place. The same trade-off keeps them out of the rest of Oz — you can be everywhere at once, or you can be great at one thing. Not both.

Sales as an example – practical tips from 11x’s technical CEO

How should you think about this in practice? Here’s some practical tips from Prabhav Jain, the CEO 11x.

Focus on outcomes

A tactical path to building a company that is resilient to the labs is to just start from a specific outcome that your customers really care about. For us that was helping companies generate more pipeline. From there the questions get tactical. Which activities do we want to own end-to-end that actually drive pipeline? Decompose each activity into tasks. Which tasks are agentic and which aren’t. Which require intricate domain insight and which don’t. The labs will ship workflows too, but when the workflow has many steps, messy inputs, hard-to-interpret state, or real-world constraints, a better model alone won’t get you there. The work falls to good old-fashioned software engineering, and the labs hold no edge over a focused application company on that surface. For example, here are some of the tasks that we handle, some agentic, and some not: lead prospecting based on custom signals, lead enrichment, deep account research, context fetcher from CRM, channel-specific message writer, lead qualification agent, and email deliverability system. These aren’t tasks you can just one-shot and require deep engineering.

The critical insight in the Oz analogy is that roughly half of any real workflow that is non-agentic carries no lab advantage. They are no better than you are at writing the deterministic software underneath the model layer. And the half that is agentic still requires you to tune, train, and constrain the models against the result you actually want. Domain knowledge often doesn’t sit in general training data. Those skills get built from the ground up for the vertical or function, and fed into the model at the right moment in the workflow. When our agents are qualifying an inbound lead on the phone, I have to be trained on what a good sales conversation is for that specific industry and that persona. That is application company work, and it compounds.

More importantly, those skills become outdated all the time because businesses evolve, so your ability to evolve those workflows and context becomes a competitive advantage. As an example, when we started our scaled email outreach product, “AI” written emails were just starting to come into play. Fast forward to today, folks have a tuned sense of emails that are AI written vs human and crucially, this changes every few months. Our agents have to constantly adapt given the market dynamic, but this is where the moat is built. In fact, despite this dynamic, our positive reply rates have gone up 4x in the last few months and we’ve generated hundreds of millions in pipeline for our customers.

Work on problems where complexity is high

Complex problems are where real business value gets unlocked. Otherwise you’ll find yourself building a thin wrapper.

Decompose any sufficiently complex business problem and messiness shows up quickly. Here’s an example from the GTM world that sounds trivial: you shouldn’t reach out to a contact at a company if that company is already a customer. It’s anything but. Maybe you have the domain associated with the company in your CRM. What about companies with dozens of subsidiaries? What if the CRM record has the parent’s domain? What if a stale matching field in Salesforce sends a cold pitch to a current customer’s CRO? Real-world data is messy. Humans struggle with it. Models don’t magically clear that bar. Driving order out of that mess requires purpose-built agents engineered for the specific shape of the problem, not a general-purpose copilot pointed at a CRM. In fact, based on the data that we have, we have realized that the quality and freshness of our data is much higher than our customers, so by default, we anchor on our own.

Guardrails aren’t just to prevent bad stuff from happening. That’s what your customers are paying you for.

Guardrails are severely underestimated. Even inside the same product, every use case needs its own. For us, a regulated financial services prospect demands different guarantees than a mid-market SaaS customer, and those guarantees roll down into how the agent is allowed to write, who it can contact, what data it can touch, what it can say on a call and how every decision gets logged.

A one-size-fits-all system collapses under that variance. Guardrails have to be built per use case, configured per customer, and audited continuously, and that work sits squarely with the application company. This is why we have FDEs and technical deployment strategists that need to tune for each customer’s requirement. As an example, we worked with a F1000 institution to do consented outbound via voice to their large SMB customer base. The initial few iterations had low pickup rates – we had to quickly iterate and learn how to get this specific type of audience to engage in the first 10s of the call. SMB business owners behave very differently from larger B2B buyers or consumers. We now generate more sales opportunities for them in a day than their entire sales team for that segment in a month

Insurance as an example – practical tips from FurtherAI’s CEO

Sales is one example. Insurance is another, and it makes the same point from a different angle. Here’s how Aman Gour, CEO of FurtherAI, thinks about building off the road:

When we started deploying AI inside real insurance operations, we kept hearing a particular assumption: the model is the intelligence, and the workflow is just scaffolding around it.

The more carriers we worked with, the more convinced we became that this is backwards.

In insurance, a lot of the intelligence lives inside the workflow itself. Two carriers can run a submission through what looks like the same path: submission, review, quote, bind. But the path is the easy part. What separates the two carriers is everything inside it: which risks get escalated, which loss signals matter, which appetite rule wins when two of them conflict, when a human has to sign off, which external data gets pulled in, and how the final decision gets documented.

That logic does not live in one clean rules engine. It is spread across SOPs, manager reviews, underwriting philosophy, carrier-specific appetite, and years of operational experience. A lot of it is not written down in a form a model can simply read.

This is why we do not believe in a pure agent that reasons from scratch every time, and we do not believe in a rigid workflow that breaks the moment reality gets messy. And instead been building agentic workflows. The workflow gives you repeatability, auditability, and cost control. The agent handles variability and recovers when the happy path breaks. The human stays in the loop for the judgment calls where accountability matters.

On day one, this automates manual work. But over time, every escalation becomes a signal, every exception is a feedback and every human correction shows where the runbook was incomplete. Over time, the workflow stops being a script and starts becoming the carrier’s operating memory. This is the part the labs will find hard to reach. They will keep shipping better models and better general agents, and they should. But they do not sit inside a carrier’s production workflows long enough to learn why one account was escalated, why one risk was declined, or why an underwriter overrode the appetite guide and was right to do so.

That understanding only comes from running the workflow, in production, many thousands of times. The workflow you ship on day one is not the moat. The loop that production usage creates over time is.

For us, that is what it means to build off the road.

How do you decide if you are in the rest of Oz or not?

The tools-and-steps test: How many steps does the work take, and how complex are the tools you have to build to support it? Compare a horizontal AI search across Google Drive — one step against one tool with a forgiving outcome, the user reads the summary and re-asks if it’s wrong — to a multi-step legal redline against three years of firm precedent: dozens of steps across many tools, output that has to clear partner review and may need to be argued in court. Both look like “an agent doing work,” but only one of them requires the kind of deep software a focused team takes years to build.

The system test: Are you building a system the customer runs their work through, or a tool that sits on top of a system they already have? Systems own the workflow end-to-end — the data capture, the governance, the records of what got done — and they’re what the customer points to when describing how the actual work happens. Tools on the other hand just add intelligence to a workflow the customer already runs. The tool case generates real revenue and the labs can take it because the customer isn’t depending on you as the orchestration layer. High ACV is usually a signal of a system, since systems replace real headcount and get paid accordingly, but it isn’t a guarantee. Ask yourself if the customer would still need your tool if a lab shipped something that supposedly directly competes with you. If yes, you’re building a system. If no, you’re a tool — even if your ACV is high.

The hedge fund / P&L test: While lab performance is judged against benchmarks, rest of Oz performance is judged against your customer’s P&L. Your customer doesn’t care that your model scored well on SWE-Bench or MMLU — they care whether your agent closed the deal, redlined the contract correctly, or bound the right policy. If they’re fixated on their workflow-specific outcome, not on a generic capability score, you’re in the rest of Oz. If they’re paying for generic capability, you’re selling them something they can get with a Claude or Codex seat. The best agent businesses are going to need to execute like hedge funds — winning on alpha measured in customer P&L, not in benchmark scores.

Both can (and will) win

We’re going to see massive winners on and off the Yellow Brick Road. The models will continue to win because they own the model and they own the distribution for the horizontal tools they have designed.

The rest of Oz can win if they own the system of work — the surface where the work of the company actually executes and the data that flows from it gets captured. These companies own the data capture, the workflow system of action, and the governance. As more complex workflows mature in a vertical, they compound into one core experience the customer comes to depend on. As new model generations ship from incumbents and new entrants, the company becomes the layer that integrates and delivers them to the customer. The model is fungible underneath; the system of work is not.

The next generation of enterprise software is going to be built off the road.

If you’re building it, reach out: jschmidt@a16z.com.

About the Contributor

Joe Schmidt

is a partner at Andreessen Horowitz, where he focuses on software, fintech, and insurtech investments.

Want More a16z Enterprise?

News and resources for navigating the world of B2B technology, from AI and data, to security and SaaS, and more.

Learn More

Recommended For You

Enterprise

Everything is Recorded Now

David Haber

General

A16Z’s global mission

Ben Horowitz

Infra

The Next Frontier of Visual AI Is Code

Yoko Li

Everything, Everywhere is Compliance

James da Costa and Angela Strange

Enterprise

Is Software Losing Its Head?

Seema Amble

Recommended for You

Enterprise

Everything is Recorded Now

David Haber

General

A16Z’s global mission

Ben Horowitz

Infra

The Next Frontier of Visual AI Is Code

Yoko Li

Everything, Everywhere is Compliance

James da Costa and Angela Strange

Enterprise

Is Software Losing Its Head?

Seema Amble

General

Response to the FTC and the DOJ’s Request for Comment Concerning an Updated Guidance on Business Collaborations

Jai Ramaswamy and Matt Perault

Enterprise

Workday’s Last Workday?

Joe Schmidt

Enterprise

Where Enterprises are Actually Adopting AI

Kimberly Tan

Enterprise

Every Building You’ve Ever Been In Was Designed By Software Built in 1997

Joe Schmidt, David Haber, Caroline Goggins, and Zabie Elmgren

Fintech

Why the World Still Runs on SAP

Eric Zhou and Seema Amble

Consumer

The Top 100 Gen AI Consumer Apps — 6th Edition

Olivia Moore

Growth

Good news: AI Will Eat Application Software

Alex Immerman and Santiago Rodriguez

Want More Enterprise?

News and resources for navigating the world of B2B technology, from AI and data, to security and SaaS, and more.

Views expressed in “posts” (including podcasts, videos, and social media) are those of the individual a16z personnel quoted therein and are not the views of a16z Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell — or a solicitation of an offer to buy — any securities, and may not be used or relied upon in evaluating the merits of any investment.

The contents in here — and available on any associated distribution platforms and any public a16z online social media accounts, platforms, and sites (collectively, “content distribution outlets”) — should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here or on a16z content distribution outlets are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, posts may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein. All content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website — or on associated content distribution outlets — be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles — which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters.

There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available here: https://a16z.com/investments/. Past results of a16z’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Excluded from this list are investments (and certain publicly traded cryptocurrencies/ digital assets) for which the issuer has not provided permission for a16z to disclose publicly. As for its investments in any cryptocurrency or token project, a16z is acting in its own financial interest, not necessarily in the interests of other token holders. a16z has no special role in any of these projects or power over their management. a16z does not undertake to continue to have any involvement in these projects other than as an investor and token holder, and other token holders should not expect that it will or rely on it to have any particular involvement.

With respect to funds managed by a16z that are registered in Japan, a16z will provide to any member of the Japanese public a copy of such documents as are required to be made publicly available pursuant to Article 63 of the Financial Instruments and Exchange Act of Japan. Please contact compliance@a16z.com to request such documents.

For other site terms of use, please go here. Additional important information about a16z, including our Form ADV Part 2A Brochure, is available at the SEC’s website: http://www.adviserinfo.sec.gov.

The Latest

new How to Win the Largest Market in AI

new Travis is Back

new Making a Billion Intelligent Machines

new The 7 Hires a Hardware Startup Needs to Get Right

Avoiding Death on the Yellow Brick Road

Avoiding Death on the Yellow Brick Road Table of Contents

The Yellow Brick Road

The Rest of Oz

Why the rest of Oz won’t be owned by the Wizard

Sales as an example – practical tips from 11x’s technical CEO

Insurance as an example – practical tips from FurtherAI’s CEO

How do you decide if you are in the rest of Oz or not?

Both can (and will) win

Joe Schmidt

Everything is Recorded Now

A16Z’s global mission

The Next Frontier of Visual AI Is Code

Everything is Recorded Now

A16Z’s global mission

The Next Frontier of Visual AI Is Code

Everything, Everywhere is Compliance

Is Software Losing Its Head?

Everything is Recorded Now

A16Z’s global mission

The Next Frontier of Visual AI Is Code

Everything, Everywhere is Compliance

Is Software Losing Its Head?

Response to the FTC and the DOJ’s Request for Comment Concerning an Updated Guidance on Business Collaborations

Workday’s Last Workday?

Where Enterprises are Actually Adopting AI

Every Building You’ve Ever Been In Was Designed By Software Built in 1997

Why the World Still Runs on SAP

The Top 100 Gen AI Consumer Apps — 6th Edition

Good news: AI Will Eat Application Software

Want More Enterprise?

Avoiding Death on the Yellow Brick Road

Avoiding Death on the Yellow Brick Road Table of Contents

The Yellow Brick Road

The Rest of Oz

Why the rest of Oz won’t be owned by the Wizard

Sales as an example – practical tips from 11x’s technical CEO

Insurance as an example – practical tips from FurtherAI’s CEO

How do you decide if you are in the rest of Oz or not?

Both can (and will) win

Joe Schmidt

Want More Enterprise?

Power User Menu