Most things about developing software products that use large language models are just different. And we as an industry are just now grappling with those implications, which are as significant as the technology is disruptive.
I’ll give some examples.
For starters, unlike formal languages, with generative AI there is no systematic way to construct a prompt that’ll give you exactly what you want. Most product developers using LLMs hand-tweak prompts until things seem to mostly be working. Of course, those prompts don’t produce the same results across different models or even other instances of the same model. Further, wrangling the output of models is hard. If, for example, the response from the model is used to run a tool, the output must be mapped to structured data. Correctness is a perennial concern.
In addition to all of these, there are dozens of models already available — with new models arriving weekly — all with different strengths across features like context-window length, latency, price, and code-generation capabilities (and far more).
So, while it’s clear that developing products around LLMs requires a new approach to software, there is still a lot of confusion about how exactly to get there. Prompt evaluations for LLMs, for example, are a foundation of this type of product development, but they alone aren’t sufficient to ensure things work as intended. Good tooling for LLM development also includes logging, prompt serving, agentic tool support, and model proxying.
Fortunately, this sort of tooling is finally starting to make its way out of the leading LLM product companies and into the general public. That brings us our investment in Braintrust, which has built a remarkable set of tools for AI product development.
The best way to think about Braintrust is that it’s a devtool platform for any product effort based around LLMs. Because its focus is on product development, it natively supports JS/TS as a first-class concern. And while Braintrust includes features that are table stakes for LLM development, such as evals and logging, its facilitation of world-class LLM product development is what sets it apart. This includes support for prompt templating and management; real-time prompt serving; proxy support across all relevant model APIs; tool support; and much more.
Braintrust is simple enough for even the smallest projects built on LLMs, yet has the breadth of tooling needed for the most sophisticated project. This is evident by the number of marquee companies using Braintrust, such as Notion, Stripe, Vercel, Airtable, Instacart, Zapier, Coda, The Browser Company, and many others.
Much of the magic within Braintrust comes from its people. Founder and CEO Ankur Goyal and his team have built AI development platforms before, including at Figma and at Ankur’s previous company, Impira. Unlike most teams who come from the traditional Python data science and machine learning background, they have remained plugged into modern development efforts.
Further, Ankur has a deep background in systems, and in databases, in particular. Wrangling LLMs is a profoundly complicated systems problem, particularly when it comes to state management, and that deep systems background shines through in way Braintrust handles these problems.
In addition to leading their Series A round, I am also a very happy user of Braintrust. When I have the time, I enjoy building LLM-based apps. I’ve tried a number of frameworks to do so, but until I came across Braintrust, none of them felt like they were built for product developers. So as an investor and a user, I’d encourage you to try Braintrust, if you haven’t already.