Bringing Production-Ready GenAI to the Enterprise

Edo Liberty, Harrison Chase, and Sarah Wang

Posted August 7, 2024

As genAI expands through the enterprise, many leaders are figuring out how to evolve their genAI prototypes into production-ready tools. Pinecone CEO Edo Liberty and LangChain CEO Harrison Chase discuss which parts of the stack to build or buy, how to improve out-of-the-box models by helping customers select and ingest the right data, and picking the right partners to scale genAI applications with a16z Growth General Partner Sarah Wang.

[00:02:16] Navigating the gap from prototype to production
[00:07:03] How to educate partners on genAI tools and capabilities
[00:11:38] Deciding whether to build or buy
[00:17:36] Successful implementations of genAI
[00:21:20] Balancing enterprise and open-source community needs
[00:23:06] Balancing short-term revenue gains with long-term vision
[00:25:18] Picking the right partners to scale

Sarah: So the topic today is how to deliver on the hype for today’s demand and tomorrow’s promise. This is a topic very near and dear to me, as I lead growth investing at the intersection of enterprise and AI for Andreessen Horowitz. 

We noticed a key change when we surveyed about 70 enterprise leaders this year. and were very surprised to hear that folks were tripling their budgets on AI spend. They were shifting away from just OpenAI into open source models, and I think in a major way are moving from just experimental workloads to production workloads. 

With this has come an entire new set of challenges and opportunities, and that’s what we’re going to discuss today. So let’s talk about some of the hype. What are some of the common misconceptions or overhyped expectations regarding genAI that you guys have seen, and how can businesses navigate those?

Harrison: The main one that naturally comes to mind is that it’s easy to build with genAI, and in some sense it is. It’s really easy to get a prototype up and running, and that’s what we see. But I think it’s much harder to actually turn that into something that’s production ready. That’s where we see a lot of enterprises, or just a lot of people in general, getting hung up. 

I think of all the hype around AutoGPT and agents when it came out in March of last year. There was such an explosion of hype and interest around agents and genAI then, and I think around from the summer to December it cooled off a good amount in my mind in terms of agents and building with agents. I think people tried out AutoGPT, they tried out other projects that were super easy to get started with and that they expected to just work magically out of the box, and they didn’t because there’s still a lot of hard engineering that goes into building these things.

We’re starting to see the tide turn, as you mentioned. Starting as of this new year, I think we’re starting to see more of these agentic and interesting applications come into production in enterprises. it’s really exciting, but it’s still a lot of work. The general solution that you can get in five lines of code from a snippet of LangChain or wherever generally isn’t enough for high-quality production apps. There’s still a good amount of engineering that goes into it. It’s not quite magic yet. There’s still work that needs to be put in.

Sarah: Interesting.

Edo: I agree 100%. I tell people all the time that with AI impossible things are possible, but easy things are still hard. It’s 100% descriptive. 

Sarah: What are some of the challenges that you guys see in building and delivering tangible results and then obviously have you seen some of these folks overcome those challenges?

Edo: You know that a company’s going to be successful with building something by the commitment that they have in setting on that journey. If they think it’s going to be easy—“Oh, we’re going to just connect something here and here and here and that’s going to be great”—it’s not going to work. If they understand that this is an undertaking, this is a new stack, this is a new set of technologies, or new sets of services, infrastructures, nouns and objects and behaviors that they need to get used to, they need to hire their talent, they need to wait time, they need to fail a few times… if they are signed up for that, they’re going to be successful. 

So it’s a kind of meta point: it’s not specifically what they do to succeed but rather their commitment to success is pretty much pretty much the main indicator of whether they’re going to be successful. And then eventually, they figure out how to add knowledge and data and models and LangChain and chunking and persisting. They figure it out, but it takes time, it takes commitment.

Harrison: Yeah. The gap from prototype to production is one that we think about a lot. There has to be commitment to bridge that gap. And I think what that gap is also depends a little bit on the types of applications that people are building. For some types of applications, they work out of the box, but then scaling them up is when you run into [challenges]. I can easily do RAG over like 10 documents, but what if I need millions of documents? Performance and speed cost really comes into factor.

There are other types of applications where the gap is such that they don’t work out of the box, the AutoGPT doesn’t work out of the box. What helps bridge the gap is things that make it work, and that’s where we spend a lot of time trying to add controllability, observability, testing, and things like that. We’re basically bringing these very general and broad contraptions a little bit more in a controllable fashion. And so I think it kind of depends on the type of application that you’re building in terms of what’s needed. First, you gotta make it work, then you gotta make it fast and cheap so that you can actually hopefully service all the users that you have.

Sarah: That survey I mentioned, they all happen to start with C: control, customization, cost came up over and over again in terms of challenges as well as what people are looking for. So yeah, that resonates. 

This is a new stack and so it’s a good segue maybe to talking more about Pinecone and LangChain specifically. I’ll start with you, Edo. In many ways getting genAI right is about having the right data. So how do you guys ensure that your customers have access to the right data and they maintain these high standards of accuracy in AI models?

Edo: Well, sadly we don’t do enough. That’s true. We really focus on what we call knowledgeable AI. At the end of the day, whatever language model you’re using, it doesn’t know anything about your company, your contracts, your customers, your code base, whatever it is that you’re trying to be intelligent about. People tell me, “Oh, the models are getting better.” It doesn’t matter how good they are. It’s a question of what they know.

So now the question is if you’re trying to build a knowledgeable agent, chat, assistant, whatever, you’re going to need to figure out what data… what it needs to know to be useful. You’re going to have to ingest it somehow, maybe using LangChain. You’re going to have to figure out how to chunk it and break it up, how to embed it, and so on. There are many options on how to do that. LangChain is doing a phenomenal job at being opinionated and bringing people in through that journey. What Pinecone really specializes in is being able to scale that, run that efficiently, cheaply, scale in production, and so on.

So when that thing starts working, you can just run with it. You can just start operating it and at the same time scale it up. What we’ve seen is that the amount of data actually makes a huge difference. In some sense ingesting 10 documents is not really helping. It’s sort of like a large context. And 1000 documents, you start seeing stuff that you wouldn’t otherwise see but you might not still see the “aha!” moment. When people start ingesting tens of thousands, millions, sometimes literally billions of documents, suddenly something changes qualitatively. 

We published a blog post about what happens when you ingest the internet into an index in Pinecone. Some pretty wild sh*t happens. It ends up being significantly more accurate than even—even on question answering on web data, it ends up outperforming top models. That whole pipeline of what data needs to go in, how you bring it in, and how you ingest into Pinecone is incredibly important. Again, unfortunately, today we don’t do nearly enough to educate our customers on how to do that. [Referring to Harrison] You can tell them because you’re doing it.

Harrison: I was actually going to ask you… We talk about education a bunch because I do think that’s super important, but I actually don’t think we do enough of that either. The main thing we believe is that it’s just super early on and super fast-moving in this space, and stuff is changing pretty rapidly. I actually think it’s slowed down a little bit over the past few months, but I think as a whole it’s still extremely early on in this space, and education is so important. So I’d echo that sentiment and we want to do a better job of that as well. 

Edo: I will say maybe just as a funny aside to maybe explain how much people crave guidance. We have plenty of notebooks and examples on our website which are pretty much examples on how to use Pinecone. It’s kind of like tutorials for the API and the SDKs and so on. And we wanted to make them a little bit more fun so they actually… it’s like a micro application. You load data and whatever, it’s like semantic search or recommendation or something. 

The number of customers who run that in production is scary. This is not an application, this is not open source. This is a how-to guide. So this is kind of funny aside.

Sarah: The build-versus-buy decision has always been a quintessential decision for a lot of folks in companies, but the skew toward the decision to build, maybe just because we’re so early on, is fascinating to see. I think a lot of people are choosing to build, probably folks in this room, and as a result the education piece is really important. There’s folks out there like the Andrej Karpathys of the world, where his videos are getting more views than unboxing videos or whatever people look at on the internet. I think that reflects exactly what you guys are both saying, this desire and this hunger to learn and to adopt this new technology in sort of a first-hand way versus just buying. 

Harrison: On the build versus buy, what is reasonable to expect people to build versus buy? You want to own things that are core to your application logic. I’m going to use flow engineering as a term to describe the flow of your application. That is really important to the actual business logic of your application and that’s a competitive advantage. But what’s actually not super important is the infrastructure around that.

We think about testing and monitoring, and as Pinecone thinks about retrieval and those sorts of things, I think a lot of genAI is great because you can use the logic, the reasoning engines of these LLMs to kind of imitate and augment and supply these human processes. That’s the core bit that people should be building and focusing on. I think that’s why you see a lot of people using tools like LangChain and Pinecone, but that’s why there’s not a really good chatbot on every website out there. Every website has a different cognitive architecture that is needed to explain their services and their website, but the infrastructure will be the same. I think it was literally today I was having this conversation and it really reminded me of these points.

Edo: There’s many good examples of this. There’s an example I give often, which is: even if you do just question answering on documents, okay, it’s very different if those documents are Wikipedia articles or if it’s news, right. If it’s news, it’s the news from an hour ago, that absolutely makes the news from three hours ago obsolete, right? But if it’s religious texts, well the text from 1,000 years ago supersedes the text from two weeks ago, right? 

And so just time ordering, the AI wouldn’t know how much you care about time ordering, right? The fact that news is whatever, the fact that you publish two contradicting facts, which one of them takes precedence, right? That’s an application-level thing. 

That’s just one example. Every application that each and every one of you builds has a bunch of those, right? It’s funny, there are a lot of services that we use that use Pinecone. So our documentation question answering is actually using Pinecone, but we don’t build that application. We actually tried and built something, and it wasn’t great. And they came to us, “Hey, we built a bunch of stuff around it and all the logic and everything.” So that application level, that logic, that sort of, how do you delight your customers with all the components, is absolutely critical.

Sarah: Maybe just to extend that a little bit, I’ll tie it to Harrison’s point on the infrastructure piece maybe being a different layer that folks may not choose to build. LangChain obviously plays a pivotal role in the AI stack on the infrastructure side. How is LangChain helping your customers improve on large language models? Because I think this is also a theme of like, hey, what comes out of the box, even though it’s very magical, doesn’t get you all the way there. So we’d love for you to share more on the improvements you’ve seen. And then after that, I’d love actually some inspiring examples of successful implementations from both of you guys.

Harrison: Yeah, absolutely. So LangChain started off as an open source package. I think the initial pieces in there were individual components, building blocks. So mainly models and vector databases were the two big things there, and then really high-level entry points into building applications. But what I think we lacked at the beginning was a really flexible runtime for constructing your own logic and your own applications. And as we’ve just discussed, that’s the part that people really want to own and should be owning and should be thinking about.

So the evolution of LangChain over the past year, year and a half, has been really moving towards controllable orchestration logic. We’ve kept these same components, that’s a key part, but we’ve moved more away from the high-level entry points to make it super easy to get started. We’ve moved more towards the low-level, highly controllable frameworks for creating either chains or agents if you’re using LangGraph. And then the other thing in this aspect of going from prototype to production and controllability has been LangSmith, which is a platform that we’ve built for logging, testing, and debugging of LLM applications.

For some of the favorite applications which we see being built, I think a great example is—Elastic has an assistant that goes along with their logs and helps you debug and helps you troubleshoot. That’s built on top of LangGraph and LangSmith. I love using that one because it’s an example of basically an agent. It’s a highly controllable, highly specialized agent. I think it came out in February or March, so right around this timeline of when things started really clicking. It perfectly utilizes the highly controllable aspect of some of the newer things we’re working on, as well as the debugging and testing things which help people iterate at speed. 

So, yeah, that was maybe a bit long-winded, but that’s how I view the trajectory of LangChain and the stuff we’ve done over time. And all of it’s really just been a name of making it as easy as possible to build these types of applications which connect language models to data.

Sarah: I love the Elastic example, actually. Any interesting examples of successful implementations that you’d add?

Edo: Sure. I’ll give one of our larger customers, Notion, who builds their AI Q&A on the company’s data. And like with Inkeep on the docs, we are also Notion users. This is another situation where we use our own product and we see how people use it, which is great. 

The amazing thing is seeing their journey, and seeing what they’re grappling with. It fits exactly what Harrison was saying. In the beginning, it’s a period figuring out how to make this thing say the right things, okay? And that might take two, three months of experimentation and so on, but then you’re like, “Wait a second, we’re going to deploy it to how many users? And how much volume and how much is this going to cost? And what security guarantees do we have? And who’s going to wake up at night when this thing starts coughing?”

Then you start having real production-grade issues. It’s no longer a theoretical question of whether we’ll have to figure out costs at some point. Now the difference between seven cents and three cents is a big difference, right? That took several months (and we still engage with them, of course), but now I’m actually consuming the product. It’s so delightful. All the stuff that I would bug my team with on Slack, I’m like, “Wait a second, I don’t have to go to Slack. I can just ask the Q&A in Notion and say, ‘What happened with this and this customer? Did we tell them that XYZ?'” And the answer is yes or no, and here it is. 

A, I got my answer immediately, and B, I didn’t have to bug somebody on my team, right? And because of that, it created like this virtual cycle that now my sales team is actually loading Salesforce data into Notion just so they can ask questions about their accounts, right? Because you’ve added value to the data, now people bring more data into your platform. And funny enough, that actually comes back into Pinecone. So it’s like a big, weird flywheel.

Sarah: Wow, that’s really neat. There’s sort of the, I think, what was it, 2009, 2010, Mark famously said, software is eating the world, but AI is eating software. And I think if you’re not an AI company—well, what does that even mean, right? I would consider that example of Notion, why they’re an AI company with that AI product. And I think you’re kind of dead in the water if you don’t offer enhancements like that to your existing product. 

Edo: I think [genAI agents are] an interface that people just expect at this point. It’s the same way that my kids just go around the house like talking to stuff, you know? Or try to swipe my Mac screen, and it’s like, “What the hell is wrong with this computer?” Right?

Sarah: Totally.

Edo: They grew up in a world where you speak to stuff and get answers, you know? And they understand every screen is a touch screen. I mean, if it’s not, it’s just weird, right? And it’s the same thing with software. If your software doesn’t understand you when you talk to it and it replies back in language, it’s just going to look antiquated. So you’re not going to miss something.

Sarah: On the topic of building, just because in addition to helping your customers build genAI, you both are building fantastic companies with a whole set of trade-offs that come with the success that you’ve had. I’ll start with you, Harrison, on LangChain. LangChain is well known for being one of the fastest-growing open-source projects in history. How are you balancing that engagement and excitement with the demands of enterprise customers who maybe have different needs than your community?

Harrison: So I think to some extent, there are a lot of needs that are common. Because I think it is so early on, people are trying to figure out how to build these things, and a lot of what we want to do is enable that as much as possible. The main point where that’s maybe diverged is I would say early on last year when the space was really fast-moving. I think a lot of the community wanted the newest features right away, no matter what. And I think enterprises wanted stability and assurance that what they’re building won’t change.

I think that period of time, say spring of last year, it was probably the toughest for us to navigate because there was this pressure to move very, very quickly. But at the same time, we were hearing from enterprise customers that they wanted more support and more stability. And so those two things were a lot at odds. I think luckily that’s gotten easier. As I said, I think things have slowed down a little bit. We released our first stable version of LangChain this past January. I think that moment of just trying to stay up to speed with the fastest moving things but also provide stability to enterprises, that was the biggest mismatch in what people were asking for.

Sarah: I know you have to think about this on a daily basis, but how do you balance that trade-off between the here and now that could translate into near-term revenue and building for the longer-term vision that guides the company to success in leaving a bigger legacy beyond the here and now?

Edo: There are a lot of sharp edges and questions, requests, and so on that our customers have of us that oftentimes require easy solutions. I talked about the people who were pretty successful who have commitment. You don’t hear this from the people who are committed because they figure out the problems and then they bump up against a brick wall. And then they’re like, “Okay, fine. This is something we really can’t do. We’re going to figure out how to chunk a document. Well, we’re not going to go build a vector database for the love of God.” And what I focus on is breaking those brick walls.

I’m going to hire the right people. I’m going to build a culture and I’m going to build a company that has the ability to break brick walls. That’s what we’re going to focus on. We’re going to try to make it nice to use as a product and so on, but we’re going to focus on really hard problems and make them easy enough that they just become a non-issue, right? That’s how I think about it.

I will say that the user experience is also incredibly important. A lot of that is not hard in the same way, but also extremely hard. I had a conversation here with somebody, I forget who, about the difference between easy and simple, right? Making something simple is not easy. It’s incredibly difficult, right? So we really focus heavily on simplicity, which makes our lives a lot harder sometimes, but also a lot more delightful for our customers.

Sarah: Great. Right before this, we were just talking about partnership and the value of that within genAI in particular, partnering with the right startups and established companies has been crucial to advancing genAI. Obviously, in the foundation model side, you see that with the hyperscalers, and with you guys potentially as well in terms of distribution, etc. 

Can you guys share more how you think about setting up successful partnerships, how you pick who those partners should be? And then also what are some of the watch outs? And I don’t know if there’s anyone from a hyperscaler here, but how do you make sure they don’t eat your lunch by building the same tools and trying to commoditize folks?

Harrison: I think partnerships in a space like this are super important because there are so many different components that go into building a production-ready application. For us in particular, the two main partnerships that we need to have are with model providers and with vector databases.

Edo: With Pinecones?

Harrison: Yes, and Pinecones. You need those two things to build an application in addition to whatever LangChain is providing. And so then how do you choose which ones to work with? To some degree, it’s led by the community. We see a lot of people using OpenAI. We invest a lot in the OpenAI integration. To some degree, there’s also a difference between easy to get started and what’s able to carry through to production. It’s really easy to get started with a NumPy in-memory vector store. But I don’t even know if we have one of those in LangChain because we wouldn’t recommend that for anyone, right? That’s not a good path for people in the long run. And so it’s a combination of those two things.

LangChain has, I think, 700 different integrations. And the main thing that we’ve tried to do recently is really make it clear which one of those we both think are good, but also believe in the teams behind them. We have dedicated partner packages for a bunch of integrations. Pinecone obviously is one. I think there’s like 20 or 30 mostly model providers in vector databases. That’s how we’ve tried to unify our engineering teams with engineering teams at other companies because you need to have this together story to succeed, to build genAI applications, and it’s really, really hard to do it without that.

Edo: I can’t agree more. I can summarize. Let your customers speak and listen to them, right? If they’re using your tool with somebody else’s, listen to that and go make that better. And once you decide to invest in a partnership, make it really good. Really ask for bad feedback all the time. I just did that an hour ago.

Harrison: I’ll check my Slack to see if you responded. [laughs]

Sarah: Harrison, you alluded to progress. It’s all relative, right? Because progress has been lightning fast over the last 18 months, but over the last few, it seems to have slowed a little bit. I’m curious what you guys are most excited with regards to what’s around the corner with genAI. What are you guys looking out at? 

Harrison: I think the two main things that we’re spending a lot of time thinking about are evals and agents. So I’d say evals is more of a gap currently for people building. We’re building a bunch of tooling around there. The thing that I always like to think about all of this is like, what’s actually new with all these LLM things? There are a few  interesting things that are new about evals for LLMs, even compared to traditional ML models and definitely compared to software engineering. We’re building some tools for that. And then agents. 

So yeah, I think progress on the core model front has definitely slowed a little bit from where it was a year ago. I don’t think there’s any debate about that, but I think we’re seeing a lot of effort being put in the systems around the models and building up these agentic-like applications. That’s a fantastic path forward. We’re starting to see more and more applications like this and more and more success. And we’re extremely excited about this and doubling down on our investments there.

Edo: I’m unashamedly very excited about our mission to make AI knowledgeable. And people think about us as a vector database because that’s the main component that we had to build that was missing to get there, right? You really have to be able to do that at large scale to make that a reality, but that’s not enough. Being knowledgeable is not just to be really knowledgeable. When we meet somebody who is very intelligent, we often conflate two things as humans, right? We conflate one, there’s just IQ, how snappy and insightful and thoughtful they are. And at the same time, just how knowledgeable they are, what do they know, what information they bring to bear, like how much do they remember that they can bring up in a conversation?

When you think about the set of skills that that requires, it is actually incredibly deep. It’s not only a database. You have to encode information. You have to store it correctly. You have to process it correctly. You have to retrieve from it correctly. You have to figure out what’s relevant, what’s not relevant, and so on. This entire part of the stack is something that the entire community understands not nearly well enough, right? Just for the hardware between your ears, that’s more than 50% of what you do. And we know so little about it. It’s sort of scary and at the same time, super exciting because it’s a whole new frontier of knowledge that we barely start chipping away at.

Sarah: Maybe to echo both of you guys, I think what the companies are asking for, and I would echo, by the way, Harrison, we were at that dinner a month ago where I think we talked about evals half the time. What’s exciting at a meta-level is that the industry has evolved to the point from prototype to production, the demands are changing, right? If you take the analogy to software development, no one needed Datadog in the beginning, right? Only later on when you actually wanted observability for what you were building because what you were building was touching more users. Did that kind of need become really important? And so it’s exciting to see that evolution in AI development, and I think we’re just at the beginning. 

Can we have one more round of applause for our two panelists?

Harrison: And thank you, Sarah.

Edo: Thank you so much.