To Help AI Startups Compete, Use Revenue-based Thresholds

Policymakers want a thriving national AI market—but model thresholds based on compute or training costs will harm startups.

Guido Appenzeller and Matt Perault

Policy proposals that impose steep compliance costs on AI model development will harm Little Tech, the startups and entrepreneurs that are so critical to American innovation and competitiveness. Compared to larger platforms with deeper pockets and vastly more employees, Little Tech will struggle to wade through complex compliance regimes, which would require diverting resources away from product and business development. The ultimate result will be a more concentrated AI market and fewer choices for consumers.

Many policymakers are sympathetic to this concern. Often, the intended targets of their proposals are large platforms, not startups. They typically do not intend to require startups to hire thousands of lawyers, to force them to divert huge sums to retaining high-priced accounting firms, or to make it difficult for Little Tech to develop or use open source AI software. When we raise these concerns, they often ask, “how can we carve out Little Tech?”

That question is simple, but the answer is complex. The most common tool for carving out Little Tech is to add a threshold to a policy proposal. Companies that exceed the threshold must comply, while those that fall below it need not. The idea is to ensure that startups with limited resources are not subject to the same regulatory burdens as deep-pocketed platforms.

To date, policymakers have introduced a variety of thresholds in AI policy proposals in an effort to carve out smaller AI developers. Options have included compute-based thresholds, training cost thresholds, and revenue thresholds, among others.

But despite the good intentions behind these proposals, many of them fail in their primary objective of exempting Little Tech from obligations intended for others. Some fail because they will not keep pace with rapidly evolving AI technology, while others are so challenging to calculate that they are unworkable in practice. The unique nature of the nascent AI industry—where small teams build costly models that are the basis for successful products—means that some thresholds that may seem feasible at first glance will not successfully delineate between small and large AI developers.

These proposals also face challenges for a more fundamental reason: they focus on regulating AI development, rather than regulating AI use. If a policymaker imposes a threshold based on the cost of training a model or the compute power of a model, the threshold is rooted in the development phase. Because both Little Tech and its larger competitors are seeking to build successful models, thresholds that target AI development will likely not succeed in separating out large from small. Thresholds that target the application of AI tools, like revenue-based thresholds, are more likely to properly identify companies that are well-positioned to bear more substantial regulatory burdens.

These proposals also face challenges for a more fundamental reason: they focus on regulating AI development, rather than regulating AI use.

The choices that policymakers make about which threshold to use ultimately determine the scope and impact of AI regulation, which will in turn determine whether the AI market becomes heavily concentrated with only a small number of powerful players. If our goal is an innovative, diverse AI market that positions the United States as a competitive, global leader in AI, then policymakers must ensure that Little Tech does not bear compliance burdens that inhibit competition with larger model developers in the United States and abroad. If designed and implemented successfully, thresholds can help policymakers to achieve that goal.

In this post, we explore the technical reasons that thresholds based on training costs or compute power are unlikely to succeed in their objective of insulating AI startups from regulatory burdens intended for larger platforms. We then explain one approach that might be more successful: thresholds based on revenue.

The problem: Thresholds focused on AI development will not carve out Little Tech

Some AI policy proposals include thresholds based on AI model development. The argument is that a developer capable of building a more expensive or powerful model is, by definition, not Little Tech. But the nature of AI development and competition in the AI market makes this assumption incorrect. Thresholds focused on model development often will fail to exempt Little Tech from regulatory burdens intended for larger platforms, even though they are typically introduced precisely for that purpose.

Cost-based thresholds

Consider, as an example, cost-based thresholds, which are based on the idea that if a model is expensive to build, then it must have been built by a large developer. In many domains, cost correlates to size, resources, and market power. When that is the case, regulating based on cost makes sense. Take the construction of a large building: measuring the cost of the building is relatively straightforward, and when it is expensive to construct a building, the chance that a large developer is behind the project is high, and it is not difficult to accurately assign construction costs to the responsible developer. Thus, a policymaker seeking to differentiate between large and small builders could do so by setting a threshold based on construction costs.

But building an AI model is not like constructing a building. The differences between physical construction and software development mean that policy tools that can be used to distinguish between small and large building developers may fail to distinguish between small and large AI developers.

To understand why, it’s important to distinguish the three phases of training a modern AI model:

  1. Pre-training involves unsupervised training of the model on extremely large datasets to build general capabilities. At this phase, humans have a limited role because manual labeling is not required. Traditionally, pretraining was the main cost of building a model.
  2. Supervised fine-tuning (SFT) refines the model on curated data, aligning it more closely with human intent. So for example, an AI developer may train a model on question-answer pairs to improve its functionality as a chat bot. Developers frequently rely on human labor to provide those pairs.
  3. Reinforcement learning (RL), the final phase of training, optimizes models through feedback on their outputs. So for example, a model generates a number of replies and a human or a reward model then ranks the replies from best to worst. Like SFT, this phase may involve human labor.

Implementing a cost-based threshold might be simple if one developer performed all three phases of the training process. To estimate cost for a specific model type and training data set, a regulator could start by calculating the total number of training operations. For example, once it knows the number of pre-training operations, it could multiply them by the cost of the GPUs used for training to yield an approximate total training cost. (For a more complete explanation of the math, see Navigating the High Cost of AI Compute.)

But increasingly, models are trained incrementally by multiple parties. The calculation method above is predicated on the idea that the end result of training is a monolithic model, created in one pass by one entity that bears all the costs. In reality, one AI developer may bear the costs of pre-training, another may bear the costs of supervised fine-tuning, and another might bear the costs of reinforcement learning.

The picture becomes even more complicated because open source AI model development facilitates downstream innovation: other developers can fine-tune their own models by using an open-source base model. For example, Nous Research offers fine-tuned models that are based on Meta’s Llama base models. Similarly, DeepSeek built small models that were based on both Llama and Alibaba’s Qwen model.

Open source model development is good for competition because it enables small developers to enter the market, but it creates a confusing puzzle for cost allocation. If Developer A spends $10 million on supervised fine-tuning, using pre-training data that cost Developer B $40 million to generate, how much did Developer A spend developing the model? When a model is not monolithic, it can be hard to properly assign costs to individual model developers or to assess the total costs of training a model.

The future of model development is likely to get even more complicated, with developers using even more of a “remix” approach that will make it even harder to determine whether true training costs exceed a statutory threshold. For image models, for instance, the dominant architecture today combines the use of CLIP, an image understanding model from OpenAI, with a denoising model. When Developer A uses two separate models from Developers B and C to build an image generating model, would regulators add all three training costs together to determine whether Developer A’s model exceeds the threshold? If so, and if the obligation falls to Developer A to assess whether it is exempt from a law that includes this threshold, how would it get definitive training cost data from other developers?

Similarly, LLMs may use a variety of techniques to train multiple independent models that work together. They can be trained together (commonly called Mixture of Experts), can use a “judge” model to pick the best answer from a set of different LLMs from different developers, or can even combine pieces of different models and remix them to a new model (e.g. Mergekit). The more a model is remixed, the more it will be difficult to attribute training costs to any one model developer.

New techniques for model development make cost allocation even fuzzier. Now, the most common technique to train a small LLM is called distillation: a small developer uses a large teacher model to train a new model. When this happens, the developer of the teacher model bears the bulk of the training costs, but the developer of the smaller model relies on those training costs in building its own model.

An accurate assessment of the true cost of the model would need to account for both developers’ costs, but in some cases, lumping together those costs would exceed the statutory threshold and make it impossible for a startup to avoid regulatory requirements intended for larger firms. Ironically, regulatory thresholds based on training costs could mean that innovations in AI development that should have the effect of lowering barriers to entry for startups—like the availability of distillation and high-quality open source models—might instead result in cost calculations that subject startups to new regulatory barriers.

Training costs may also serve as a poor proxy for whether a model developer is large or small because the cost of a final training run may represent only a small fraction of the total costs of the research and development that went into building the model. The final model that a consumer uses may be the result of extensive research costs and many failed attempts. By our estimates, as much as 90% of the true costs of model development come before the final run. These costs may include data curation, software development, experimentation, test runs and failed attempts to run the model.

To give an example of how this looks in practice, let’s try to define the cost of Deepseek-r1-Distill-Llama-70b. Oversimplifying slightly, this model was created by using an open source model (Deepseek-v3), applying automated RL to get reasoning traces, curating those traces and then using them to fine tune a different open source model (Llama-70b). In the strict sense, the training cost of the final training run that created Deepseek-r1 from the previously published v3 model was likely under $1000: DeepSeek applied approximately one million reasoning traces on a model with 70 billion parameters.

But in practice, the critical work for this specific model was the research that went into generating those traces, which very likely cost millions of dollars. A more inclusive assessment that incorporates all of Meta and DeepSeek’s research and development costs associated with these models would likely be closer to $1 billion.

These significant aggregate research and development costs mean that even high cost thresholds could capture smaller developers trying to keep pace. Some proposals have included a training cost threshold of $100 million, which might seem high. What startup could spend $100 million training a model? But the reality is that when costs of research and development are considered alongside the costs of the final training run, many models will exceed that threshold. Small developers receive significant investments precisely because developers need to devote substantial resources to the training process in order to compete with larger platforms. Treating them as “Big Tech” solely because of the amount of money they spend on training will put them at an even more significant competitive disadvantage.

Compute-based thresholds

Like cost-based thresholds, compute-based thresholds focus on AI model development, and are likely to sweep in Little Tech alongside larger developers. Compute-based thresholds were prevalent in policy proposals that followed the launch of ChatGPT in November 2022. The Biden administration’s AI executive order in October 2023 imposed certain obligations on models that used “a quantity of computer power greater than 10^26 integer or floating point operations.” Similarly, the European Union’s AI Act used a threshold of 10^25 floating point operations to define “general-purpose AI models with systemic risk” that were subject to the act’s primary regulatory obligations.

In prior work, we have detailed several reasons that compute-based thresholds could be harmful for Little Tech. Compute-based thresholds quickly become outdated, since computers become faster over time. What today takes a tremendous amount of effort and a huge data center can at some point easily be done on a single laptop. In 2023, Pause AI proposed to require government approval for any model above 1 billion parameters. At the time, that threshold might have seemed high, and might not have encapsulated a large number of models. But today, there are widely used open source models from inside and outside of the United States with up to 670 billion parameters, and most commercial models have more than 1 billion parameters. So now, only two years after Pause AI’s proposal, regulating models based on whether they exceed 1 billion parameters would capture almost any developer seeking to offer a compelling model. When every car on the market is capable of going over 100 mph, an exemption for cars that can’t go faster than 20 mph has little value (or, to map it more accurately to the scale of AI models, an exemption for cars going .2 mph, which is ⅓ as fast as the world’s fastest tortoise).

Regulators often suggest they will be able to address this issue by shifting thresholds to account for new technological developments. The EU AI Act states that the FLOPs threshold will be “adjusted over time to reflect technological and industrial changes.” But the pace of change in AI development is fast, even by technology standards. It is unlikely that a regulator could keep pace.

The technical realities of AI development mean that thresholds focused on classifications related to the development phase are likely to be an ineffective policy tool. Instead, regulators should consider thresholds rooted in the application layer.

The solution: Thresholds focused on the application layer, like revenue, can successfully carve out Little Tech

AI development is similar to classic software development. For example, Linux was developed by a myriad of developers and organizations over many years, and the vast number of additions and deletions from third parties makes it difficult to attribute precise contributions of any individual developer.

The similarities between software development and AI development suggest that regulators should consider similar approaches to governance. Just as with traditional software, where the nature of software development has led to governance focused on the application layer rather than the development layer, the nature of AI development suggests that AI governance will be more successful if it focuses on the application layer as well.

A revenue-based threshold is rooted in the application layer—it focuses on how the AI system is used, not how it was built—and will likely be more successful in separating Little Tech from larger developers. It necessarily focuses on a later stage in a company’s journey: to bring in a significant amount of revenue, an AI developer needs not only to build a model but also deploy it and demonstrate some level of success in the market. By definition, companies that hit a certain revenue threshold have more resources they can devote to compliance costs: if a company has sufficient revenue, it can hire a legal team, retain external law firms, and retain accounting firms to assist with audits or assessments. Using a revenue-based approach to shift from development-oriented regulation to use-oriented regulation leaves space for Little Tech to compete in AI development, rather than saddling startups with extensive compliance obligations.

Of course, a revenue threshold that is set too low could still capture AI startups who are in the early stages of bringing a product to market. At this point in a company’s development, costs may still be extremely high, as a company tries to maximize the resources it can devote to building its product. Because AI training can be so expensive, even substantial revenue numbers could send an inaccurate signal about the amount of resources a company could devote to compliance expenses without dramatically undermining its ability to compete with large AI developers.

One approach might be to use the thresholds members of Congress used in their proposals to address market power in search, social media, and e-commerce markets, such as the $550 million cutoff in the American Innovation and Choice Online Act. One Texas legislator introduced a proposal with a $100 million revenue threshold, which would capture many more companies, but would still likely differentiate between Little Tech and larger providers. In contrast, a Colorado proposal would exempt developers from certain requirements based on whether its revenue exceeds $5 million. This low revenue threshold would capture many startups, and largely fail to delineate between Little Tech and its larger competitors.

Regulating AI development requires tools that match the reality of how AI models are built and deployed today. In some circumstances, thresholds may be a useful policy tool for ensuring that Little Tech is not subject to regulatory obligations intended for larger platforms with deeper pockets. But unlike traditional industries, where inputs and cost closely correlate with market power, AI model development is fragmented, iterative, and collaborative. Small developers often build on the work of others, remixing and refining models in ways that defy clean cost attribution.

If we want to foster a vibrant, competitive AI ecosystem that empowers Little Tech, we must resist the urge to apply blunt policy tools at the development layer and instead pursue policies that reflect an understanding of the nuances of AI development and application.

Given this reality of AI development, many thresholds proposed by policymakers, such as training cost and compute thresholds, are hard to calculate and likely to inadvertently capture many startups. These approaches mistake input proxies—math and money—for the actual harms regulation seeks to prevent.

A better path forward lies in aligning policy design with the app layer rather than the development layer. Revenue-based thresholds reflect this more traditional approach to regulating software—where obligations are triggered by actual market success, not mathematical abstraction. They also provide a more administrable, accurate, and scalable mechanism to differentiate between large, established tech platforms and emerging innovators.

If we want to foster a vibrant, competitive AI ecosystem that empowers Little Tech, we must resist the urge to apply blunt policy tools at the development layer and instead pursue policies that reflect an understanding of the nuances of AI development and application.