Posted April 27, 2023

With the inflection of large language models (LLMs), we’re witnessing a paradigm shift in software development and the computing industry as a whole. AI is happening and a new stack is forming before our eyes. It’s like the Internet all over again, which calls into service new infrastructure components built for the new way of doing things.

There is growing recognition that LLMs are actually a new form of computer, in some sense. They can run “programs” written in natural language (i.e., prompts), execute arbitrary computing tasks (e.g., writing Python code or searching Google), and return the results back to the user in a human-readable form. This is a big deal, for two reasons: 

  1. A new class of applications around summarization and generative content is now possible resulting in a change of consumer behavior around software consumption.
  2. A new class of developers is now able to write software. Computer programming now requires only mastery of English (or another human language), not training in a traditional programming language like Python or JavaScript. 

One of our top priorities at Andreessen Horowitz is to identify the companies building the key components of this new AI stack. We’re thrilled to announce that we’re leading a $100 million Series B round in Pinecone, to support their vision of becoming the memory layer for AI applications.

The problem: LLMs hallucinate and are stateless

A huge challenge with current LLMs is hallucination. They give very confident answers that are factually and sometimes logically incorrect. For example, asking an LLM for the gross margin of Apple for the last quarter can result in a confident answer of $63 billion. The model can even back up its answer by explaining that by subtracting $25 billion in the cost of goods from $95 billion in revenue, you get a gross margin of $63 billion. Of course, it’s wrong on several dimensions:

  • First, the revenue number is wrong, as the LLM doesn’t have real-time data. It is working off stale training data that is months or probably years old.
  • Second, it picked up those revenue and cost of goods numbers randomly from another fruit company’s financial statements.
  • Third, its gross margin calculation is not mathematically correct.

Imagine giving that answer to the CEO of a Fortune 500 company. 

All of it happens because, at the end of the day, LLMs are prediction machines trained on vast amounts of third-party internet data. Often, the information the user needs is simply not in the training set. So, the model will give the most probable and linguistically well-formatted answers based on its stale training data. We can already start to see a potential solution to the above problem — feeding contextually relevant private enterprise data in real-time to the LLMs.

The general form of this problem is that, from a systems perspective, LLMs and most other AI models are stateless at the inference step. Each time you make a call to the GPT-4 API, the output depends only on the data and parameters you send in the payload. The model has no built-in way to incorporate contextual data or remember what you’ve asked before. Model fine-tuning is possible, but it’s expensive and relatively inflexible (i.e., the model can’t respond to new data in real time). Since the models don’t manage state or memory on their own, it’s up to developers to fill the gap. 

The solution: Vector databases are the storage layer for LLMs

This is where Pinecone comes in.

Pinecone is an external database where developers can store relevant contextual data for LLM apps. Rather than sending large document collections back and forth with every API call, developers can store them in a Pinecone database, then pick only the few most relevant to any given query — an approach called in-context learning. It’s a must-have for enterprise use cases to truly bloom.

In particular, Pinecone is a vector database, which means data is stored in the form of semantically meaningful embeddings. While a technical explanation of embeddings is beyond the scope of this post, the important part to understand is that LLMs also operate on vector embeddings — so by storing data in Pinecone in this format, part of the AI work has effectively been pre-processed and offloaded to the database.

Unlike existing databases, which are designed for atomic transactional or exhaustive analytic workloads, the (Pinecone) vector database is designed for eventually consistent approximate neighbor search, the right database paradigm for higher-dimensional vectors. They also provide developer APIs integrating with other key components of AI applications, such as OpenAI, Cohere, LangChain, etc. Such a well-thought-out design makes developers’ lives much easier. Simple AI tasks like semantic search, product recommendations, or feed-ranking can also be modeled directly as vector search problems and run on the vector database without a final model inference step — something existing databases can’t do.

Pinecone is the emerging standard for managing state and contextual enterprise data in LLM applications. We think it’s an important infrastructure component, providing the storage, or “memory,” layer to a brand new AI application stack.

Incredible progress for Pinecone to date

Pinecone is not the only vector database, but we believe it’s the leading vector database — ready now for real-world adoption — by a significant margin. Pinecone has seen 8x growth in paid customers (approximately 1,600) in just three months, including forward-looking tech companies like Shopify, Gong, Zapier, and more. It’s used across a wide range of industries, including enterprise software, consumer apps, e-commerce, fintech, insurance, media, and AI/ML.

We attribute this success not only to the team’s deep understanding of the user, market, and technology, but also — critically — to their cloud-native product approach from the start. One of the hardest parts of building this service is providing a reliable, highly available cloud backend that meets a wide range of customer performance targets and SLAs. With multiple iterations over product architecture, and managing many high-scale, paid customers in production, this team has shown operational excellence that’s expected of a production database.

Pinecone was founded by Edo Liberty, who has been a long-time, diehard advocate for the importance of vector databases in machine learning, including how they can enable every enterprise to build use cases on top of LLMs. As an applied mathematician, he spent his career studying and implementing cutting-edge vector search algorithms. At the same time, he was a pragmatist, building core ML tools like Sagemaker at AWS, and translating applied ML research into practical products that customers can use. It’s rare to see such a combination of deep research and pragmatic product thinking.

Edo is joined by Bob Wiederhold, an experienced CEO and operator (formerly of Couchbase), as a partner on the operations side as President and COO. Pinecone also has a fantastic team of executives and engineers with deep cloud-systems expertise from places like AWS, Google, and Databricks. We’re impressed by the team’s deep engineering expertise, focus on developer experience, and efficient GTM execution, and we are privileged to partner with them to build the memory layer for AI applications.