Modern AI has allowed us to build the world’s first creative computers. Fields that once seemed uniquely human — like visual art, writing, music, or even the art of conversation — are now accessible to software in ways we couldn’t previously imagine. For many consumer AI apps, this is an incredible and groundbreaking feature.
For business applications, though, creativity poses a problem. Lack of reliability in AI systems (i.e., their tendency to say things that might offend customers, violate compliance rules, or otherwise damage the business) is one of the biggest barriers to widespread adoption in the enterprise.
There is probably no closed-form solution to AI reliability. Building perfectly well-behaved applications would defeat the purpose of using AI in the first place (i.e., its creative nature). But it is possible to put bounds around acceptable model outputs, if developers have the right tools and primitives to express their intentions. And it is certainly valuable to at least have a better understanding of whether AI behavior has changed. Building this framework is a massive opportunity.
That’s why we’re thrilled to announce our seed investment in Distributional. Distributional is building a platform for robust and repeatable testing of AI and machine learning models. The company’s core observation is that rigorous testing, as part of the CI/CD process, has been critical to achieving reliable performance in traditional software development. Good software teams don’t push to prod without passing their build tests first.
The Distributional team believes the same approach will work in AI, but that the right tools don’t yet exist. Traditional testing frameworks only provide static checks — confirming that a piece of code produces the same result every time — while most AI/ ML tools are based on bespoke evaluation functions or manual “vibe checks.” In a sense, Distributional combines these approaches. It provides a standard set of tests (e.g., Python functions) designed to run in the CI/CD pipeline, which work natively with dynamic systems (e.g., checking that code produces results within a certain acceptable range, or distribution, with each run).
The Distributional team has met with more than 25 AI leaders from Fortune 100 companies, AI startups, and financial institutions, and has received early validation on its approach. Each leader identified testing as one of the top barriers to AI adoption within their organization, and many even agreed to become early beta customers, provide ongoing product feedback, or support the company in an effort to help solve the problem. It’s easy to see why users are so eager to find a solution, when you consider what’s at stake: One CTO from a Fortune 100 technology company believes a lack of reliability in AI systems costs their company $10 million a day.
We first met Scott Clark, the CEO of Distributional, back in 2016 through our investment in his prior company, SigOpt. SigOpt was an MLops company in the supervised learning era, and was ultimately acquired by Intel in 2020, where Scott became a VP and GM for AI. He lived the AI-reliability problem every day through the customers he supported at those companies. He’s a zealot for solving hard statistical problems in the real world, and he’s now applying his passion and experience to tackle AI reliability.
We couldn’t be more excited to work again with Scott, Nick Payton, David Rosales, Michael McCourt, and the rest of the team to help make creative computers a bit more well-behaved.