Infra

Investing in LMArena: The Reliability Layer for AI

Anjney Midha Posted May 28, 2025

The AI industry has an evaluation crisis. Static benchmarks are contaminated the moment they’re published. Models overfit to metrics rather than utility. And no enterprise will bet their business on systems evaluated by their own creators.

So, when frontier labs need to know if their latest model actually works, they release it on LMArena and watch millions of real users vote with their preferences. When OpenAI evaluates chat performance, Google evaluates Gemini, xAI tests Grok, or when teams need to evaluate code generation, we believe LMArena’s growing body of evaluations — like Web Dev Arena have become the de facto standard. 

What started as a Berkeley research project has quickly become essential infrastructure, the continuous integration pipeline for intelligence. This isn’t because of marketing or sales. It’s because the platform solved a problem everyone had but no one addressed.

We believe the companies that make AI boring will create some of the most value. Not boring as in unimpressive, but boring as in reliable, predictable, and trustworthy. LMArena is building the infrastructure to make AI as boring as databases. 

That’s why we’re thrilled to be founding investors in LMArena’s seed round alongside UC Investments (University of California) and partners who share the team’s commitment to open science. 

What excites me most about LMArena is their north star: solving AI reliability at scale. The platform’s power comes from a simple flywheel: more models attract more users, generating more preferences, which attracts more models. With more than 400 models and millions of monthly users creating novel prompts daily, LMArena has built the largest living dataset of human preferences on AI outputs.

When models become reliable enough for hospitals to trust diagnoses, for courts to trust analysis, or for infrastructure to trust automation, that’s a generational transformation for the economy. Government agencies are already engaging. Regulated industries are piloting private arena deployments. The demand signal is clear: neutral, continuous evaluation isn’t optional for mission-critical AI.

Moving beyond a research project and incorporating as a company allows LMArena to take things even further. Already, it has plans to expand its scope into areas such as:

  • Platform scale: Supporting billions of evaluations as AI goes mainstream.
  • Enterprise infrastructure: Private arenas for regulated industries with compliance requirements.
  • Ecosystem tools: APIs and SDKs that embed continuous testing into every AI application.
  • New evaluation frontiers: Multimodal, agentic, and safety-critical assessments.

We envision a world where “Arena-tested” becomes the Good Housekeeping seal for AI, akin to a signal that a system has been validated by millions of real users, not just cherry-picked benchmarks. Where every AI interaction contributes to a shared understanding of what works. Where reliability isn’t promised by vendors, but is proven through transparent, continuous evaluation.

The challenges are substantial: maintaining neutrality under commercial pressure, scaling infrastructure for billions of users, and evolving evaluation methods as AI capabilities expand. But this team has already achieved something remarkable. They’ve made the entire ecosystem collectively invested in human preference at scale. In the race to build more capable AI, LMArena is on a mission to ensure those capabilities actually serve the people who use them. If that’s the future you want to build, they’re hiring.

Want More a16z Infra?

Analysis and news covering the latest trends reshaping AI and infrastructure.

Learn More
Recommended For You
Growth

new Investing in Mind Robotics

Sarah Wang
Growth

new Investing in Nexthop AI

Raghu Raghuram, Shangda Xu, and Guido Appenzeller
Fintech

new Investing in Lio

Seema Amble, James da Costa, Eric Zhou, and Brian Roberts
Bio + Health

new Investing in Ease

Daisy Wolf, Anish Acharya, and Eva Steinman
Infra

Investing in QuiverAI

Yoko Li, Guido Appenzeller, and Martin Casado

Want More Infra?

Analysis and news covering the latest trends reshaping AI and infrastructure.

Sign Up On Substack

Views expressed in “posts” (including podcasts, videos, and social media) are those of the individual a16z personnel quoted therein and are not the views of a16z Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell — or a solicitation of an offer to buy — any securities, and may not be used or relied upon in evaluating the merits of any investment.

The contents in here — and available on any associated distribution platforms and any public a16z online social media accounts, platforms, and sites (collectively, “content distribution outlets”) — should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here or on a16z content distribution outlets are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, posts may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein. All content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website — or on associated content distribution outlets — be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles — which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters.

There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available here: https://a16z.com/investments/. Past results of a16z’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Excluded from this list are investments (and certain publicly traded cryptocurrencies/ digital assets) for which the issuer has not provided permission for a16z to disclose publicly. As for its investments in any cryptocurrency or token project, a16z is acting in its own financial interest, not necessarily in the interests of other token holders. a16z has no special role in any of these projects or power over their management. a16z does not undertake to continue to have any involvement in these projects other than as an investor and token holder, and other token holders should not expect that it will or rely on it to have any particular involvement.

With respect to funds managed by a16z that are registered in Japan, a16z will provide to any member of the Japanese public a copy of such documents as are required to be made publicly available pursuant to Article 63 of the Financial Instruments and Exchange Act of Japan. Please contact compliance@a16z.com to request such documents.

For other site terms of use, please go here. Additional important information about a16z, including our Form ADV Part 2A Brochure, is available at the SEC’s website: http://www.adviserinfo.sec.gov.