AI + a16z

Agents, Lawyers, and LLMs

Posted February 21, 2025

In this episode of AI + a16z, Aatish Nayak, head of product at Harvey, sits down with a16z partner Kimberly Tan to share his experience building AI products for enterprises — including the legal profession — and how to address areas like UX, trust, and customer engagement. Importantly, Aatish explains, industries like law don’t need AGI or even the latest and greatest models; they need products that augment their existing workflows so they can better serve clients and still make it home for dinner.

Aatish speaks to the various areas of legal work that Harvey is targeting, as well as critical aspects of any successful vertical application, such as working closely with customers, integrating with their existing tools and workflows, and having industry expertise in-house. More broadly, he touches on how Harvey thinks about expanding into additional fields of knowledge work, as well as its strategy for innovating on top of today’s best foundation models.

Transcript

Aatish: I’ve been at Harvey for a year and a half now, when we were around 30 people, and since we’ve scaled to 250 people. So it’s been quite a journey going through that growth. And generally my background and where I spend most of my career is actually in hyper-growth AI startups. And so before this, I was at Scale for four and a half years as a product leader. And before that I was at Shield AI, also a16z portfolio company. And really it’s a privilege to be part of Harvey for a third time through hyper-growth because it’s such a pivotal moment in human history. And I think a lot of things are gonna change, and I’m excited to kind of be at the forefront.

Kimberly: Maybe for people who aren’t familiar in the audience or people who are listening online, what exactly does Harvey do? Because I think a lot of people know that Harvey exists but we might not know in extreme detail what the product offering is.

Aatish: So Harvey is domain-specific AI for legal and professional services. Our product basically helps users and lawyers automate drafts, synthesis, strategic advice, memos, and more.

Kimberly: Got it. And before we get just a little bit deeper into the practice of building applied AI, are there specific use cases that Harvey tackles the most knowing that there are a lot of different workflows one could theoretically do for legal or professional services?

Aatish: Broadly for legal, there’s maybe two or three types of legal work. So there’s transactional work, which is essentially for mergers, acquisitions, venture funding, large transactions that involve a tremendous amounts of money. And then there’s litigations, which is if someone sues someone, if there’s a case in court, also often involving a lot of money. And then probably the third is really focused on in-house. So enterprise councils and enterprise in-house teams. So these three are the larger buckets. We serve all of these in various ways. And so if you think about what you need in a merger or acquisition, you need to do due diligence in that. You need to understand all the liabilities, you understand the financials, you understand where the gotchas are of the target and the acquire. So each due diligence can be broken up into almost 10 to 12 different workflows. And we help in different ways in those workflows. And then same thing in litigation. So there’s that high level and really focused on different steps in that journey.

Kimberly: For a long time in Silicon Valley circles, people believed that selling to law firms or selling to professional services just wasn’t the most fruitful area given they weren’t known for adopting technology quickly. A lot of people thought the billing model wasn’t aligned to increasing efficiency or using technology. I’m curious, what has Harvey seen in that regard?

Aatish: So I think there’s two things. There’s the market and what Harvey has specifically done. So I think market timing for any startup is incredibly important. If you look overall, when ChatGPT November 2022 came out, that really unleashed the power of GenAI for a lot of people. So lawyers, in-house council, managing partners, CIOs really started understanding this technology and saying, “Oh, wow, this actually can change a lot of things.” It wasn’t really a hidden thing. Before ChatGPT, AI was just like, maybe this hidden thing that we don’t really know how to apply. But because it put it in people’s hands, the cat was out of the bag in terms of the practice of law was gonna change.

And so the cat was out of the bag. Everyone knows that this is gonna happen. And because everyone knows that’s gonna happen, a lot of enterprises were saying, “Hey, law firm, hey, my law firm X, we use AI, I’ve seen AI in action. You all should use AI to become more efficient, do more work, etc.” Law firms started feeling pressure from clients. And then the law firm market and legal market in general is very competitive. In any region, there’s four or five major players really going tooth and nail at each other. And so it’s important for a law firm to signal that they’re innovative because they’ll get more clients and they’re more efficient. And so because of this competitive dynamic now, everyone really wanted to adopt technology. And I think it’s this perfect storm of market timing and where Harvey was. And so there was these macro movements and pressures from the market.

And then I think what Harvey early on, and we still do, I think, really well is really embedding the legal expertise across all different functions. So what that meant was, early on we actually had lawyers selling the product. So lawyers as account executives. And our CEO is a lawyer. And our head of legal research actually is also a lawyer, which I’ll go into in a second. But yeah, we had lawyers selling the product. And so they would go to a law firm and speak the language, speak the lingo, be super empathetic. And they would actually come from a lot of the customers that we were serving. So they knew exactly how things worked. And that really allowed us to get the distribution and really get the GTM going.

And then on the product and AI side, we also have lawyers embedded in our product and AI teams. We have like a legal research function that works hand in hand with product managers and AI engineers. And what they really do is convert basically legal process into algorithms. So we have like an agentic or compound AI system that basically functions how a law firm would function. So in a law firm, if a partner gets a DOL or litigation, they break it up into multiple different pieces, maybe give it to the junior partners, junior partners break it up further, give it to associates, and it’s kind of passed down the chain. And then because law firms are fairly like hierarchical organizations, the associates do the work, then they pass it up for approvals and checks, and then ultimately the partner delivers the end product to the client. And our lawyers who work with our engineers actually just basically replicate that same model for different types of tasks and convert and literally whiteboard out different processes so that AI engineers can convert it into kind of model systems.

Kimberly: Do you consider these different agentic workflows then? Do you consider them replacing any kind of labor that people were previously doing, or do you view it more in the classic agentic labor replacement versus copilot model?

Aatish: That’s a good question. I think it’s a bit of a narrow take. I think the legal landscape overall is very complex and getting even more complex, and honestly very costly to navigate. You know, globalization, the internet, AI has increased legal work exponentially over the last few decades. And so you have basically infinite demand for legal work because companies are wanting to do different transactions, litigations, etc. So you have infinite demand. And then what that means is that the supply is very constrained. And the unfortunate human cost of supply constraints is very long hours, often doing very mundane, kind of boring tasks.

We talk to lawyers who we’ve hired, our customers, they haven’t become lawyers to write the fifth draft of the same document the fifth time, or ask the same legal research question, right? They became lawyers to apply the law in creative ways, publish opinions, shape the fabric of society. And so we hear this from customers all the time, like Harvey gives 30%, 40% of their time back because it really helps them automate that mundane root work. You know, actually the other day, one of our customers said Harvey allows them to go home to their family in time because it’s been able to accelerate a lot of things. So infinite demand, a lot of supply constraints, and it’s a great place for AI to help.

Kimberly: Can you talk more about that a little bit? What would that interaction pattern actually look like?

Aatish: So this is a general question with, I think, generative AI. Like what is the human component? How much is it fully automated? I think the reality is, let’s say you’re drafting an S4 or like an S1. S1 is when you go public. You’re not gonna one-shot that into the biggest reasoning model and say, “Hey, write me an S1,” and you’re done, right? It requires…

Kimberly: So all the bankers are safe.

Aatish: Yeah, all the bankers are safe. O1 is not gonna one-shot your S1. The process of doing an S1 or process of doing a merger is really interactive with both parties, but the law firm, the client, and any other parties involved. And so we think basically these agents have to collaborate well with humans to get the work done because humans may have some particular intent that they haven’t told the agent, or they may have some data that the agent doesn’t actually have. And so we think about building these agents in a nice kind of AI native UX way so that they can actually collaborate with different organizations to actually get the work done and say, “Hey, I wrote this draft. Am I on the right track? Give me this more information because I don’t know what to do about this decision here.” So I think we’ll start to see more kind of proactive agents that really ping different people at each firm and really collaborate effectively to get something done.

Kimberly: And maybe circling back a little to the question I had just asked, which is around how the market has changed overall and how ChatGPT really was this moment for enterprises to realize that the cat was out of the bag. Has that changed how legal or law firms think about charging, etc.? Because one of the things that people believed about legal for a long time was because of the billable model. It actually didn’t matter from a profit perspective how many hours you spent in it, even if maybe people just wanted to go home to their kids.

Aatish: I think this goes back to, again, the market dynamic where you have infinite demand. You just have to get more efficient to service all that demand. We started in a seat-based model. We charge basically on a per-seat basis. And it’s not because we don’t believe outcome-based pricing or paying for the work is the future. It’s really just because we want to make it understandable for enterprise buyers. There’s this VC statement that outcome-based pricing is the future or it’s happening. I think it will happen, but I think what people have to understand is enterprises don’t really know how to reason about buying outcome-based work, especially for such an experimental product like AI. And so I think it’ll happen over time.

Kimberly: I know one thing also about deploying AI into the enterprise for maybe the first time ever in some of these customers. People might not know how to use it. It’s sort of a new UI, UX experience. People don’t really know how to prompt agents a lot of the time. How do you guys think about the types of things that you need to do to actually get an enterprise to meaningfully get value out of an AI product?

Aatish: So our utilization has grown from 40% earlier last year to 70% now across user base.

Kimberly: What is the metric?

Aatish: So it’s active users over a number of seats on a monthly basis, basically. Yeah, I think a lot of that growth has been driven by good old-fashioned discipline across different functions. So maybe starting with the GTM sales team. As I mentioned, we have lawyers embedded in the sales team. And they really, because they come from this field, because they come from a lot of our customer archetypes, they put a lot of emphasis into a very specific kind of onboarding program and use case building where they speak the lingo, they speak exactly how to accomplish a certain use case. And so it makes it a lot more approachable for our users. So that’s one on the sales and GTM side.

On the customer success side, we’ve really tried to actually gamify a lot of deployments internally. So our customer success team often does big launches or use case contests, and law firms love to post on LinkedIn. And so if we say, “Hey, this person is the best AI prompt engineer,” or whatever, they love to talk about that on LinkedIn, and creates a nice kind of healthy competitive mentality.

Kimberly: Yeah. And then the other question is, as you expand to other industries, you’re two years or so into the company now and you actually want to expand beyond legal. So we’d love to maybe first understand just the rationale behind doing that versus maybe going deeper into legal. And then how applicable do you think the product set as well as the go-to-market strategy would be for the new verticals?

Aatish: Good question. I think first of all, we have a lot of legal customers, but we don’t want to rest on the laurels and become complacent. We actually have a cultural principle that says job’s not finished. It’s referencing the Kobe quote. I don’t know if you’re aware of it. And so we don’t…

Kimberly: I wasn’t, but now I am.

Aatish: We never want to be complacent. And so a lot of our effort is still focused on legal. But I think, overall, if you look at transactions, if you look at litigation, if you look at lawyers and legal work overall, there’s oftentimes a lot of professions involved that are not just legal, like in a transaction. If you’re doing an M&A, there’s tax people involved, there’s financial people involved, there’s HR people involved to combine the two teams. And so in general, I think it would be a disservice to say only lawyers can use the Harvey and take advantage of it inside of this transaction. And so the way we think about it is like, as we’re doing these larger project-based workflows, using that to expand to, “Hey, maybe the tax professional needs to know the same thing as the legal person with one maybe incremental thing on top.” And so we’re really using the lawyers and the projects that they work on to expand that kind of naturally to these verticals.

And there’s a few ways to do that. I mean, generally we take a very customer-driven approach. So not only a lot of our enterprise customers actually already have their compliance and HR teams on Harvey, because if you’re reviewing employment contracts, like the HR team is obviously gonna be very involved. And so that’s one avenue is kind of organically expanding inside of enterprises. And then being very customer driven and partnering with leading firms. So we work with PwC to build basically custom tax and financial diligence systems, because especially internationally, they’re the experts in tax law, they’re the experts in financial diligence. And they’ve really helped us learn a lot about those domains and really push us in that direction. So we’ve been kind of laying the seeds for that expansion for a bit. And over the next two, three years, really gonna have naturally expand to those areas.

Kimberly: What do you mean when you say custom models or custom workflows for those domains? Like, is that custom as in PwC specific, and therefore you actually actively don’t wanna bring into maybe similar customers, or?

Aatish: Particularly for the tax work, tax attorneys across the world ask a lot of questions about certain tax laws, how it can be applied to their clients. And so a lot of that knowledge is actually just in PwC. The world’s leading tax experts in UK law or UK tax law are actually at PwC. And so when we say we’re building custom systems there, we’re actually using a lot of the data that they’ve curated, as well as using the expertise and eval from their experts to improve that system. So we build various fine tune models, the RAG systems that incorporate that data and eval from those customers. So I think PwC is unique in that sense, but over time we may start to work with other professional service providers as well.

Kimberly: So I do wanna talk a little bit more about the product building and how you guys think about evals, how you think about selecting model providers, etc. But maybe one last point on this is, you talk about how PwC is a great partner in designing some of these more custom projects that you guys didn’t have previously. I imagine that that required first a lot of trust because they are giving you very sensitive data. And then a lot of open questions that I think anyone building for the enterprise or any enterprise buyers have around, “How is my data actually being used, both in this context? Is it getting fed back to the models? Is it gonna go to some of my competitors, etc.?” So I’m curious how you guys think about those questions.

Aatish: I think this is under-discussed topic in enterprise software in general, not just AI. Enterprise readiness goes way beyond just SOC 2. It is, I think, a culture you have to build, particularly your product and engineering teams really from the beginning. And so examples of what we’ve done really from the beginning because we started with the hardest customers first. They work on extremely sensitive work across the world. And it’s a big thing for them to actually trust a small startup relatively to do that.

So a few things that we implement from the beginning is I think one, a strict no training policy for data that’s sent. So by default, all our paperwork, everything doesn’t allow Harvey, certainly not to even train that data, but people at Harvey can’t even look at the data. We call this term eyes off, but no one in Harvey can even access most of our customer data because it’s such a sensitive set of data. Another part of it is we have a very strict external vendor list. So we’re only allowed to use, for example, Azure deployed models to improve our system and power our product. And it’s because, again, Azure has a lot of trust in the enterprise. Like all our customers, they’re all on huge Azure deployments. And so they do trust Azure a lot.

And what that also means though is, if a new model comes out, Google, Anthropic, or a new fancy tool comes out on Twitter or something, we can’t use it right away. We have to be very strict about that. And I think, again, this goes back to product and engineering culture. We really have to make sure our engineers understand that you can’t actually just use the product or deploy it. We are really strict about that.

And I think the last thing is we really hired a security team very early on, our head of security, I think as the first 15 employees or something, and he’s really helped us develop a really robust security program. And when he goes in front of a CIO or a C’s level person, they know we are legitimate and they know we…we don’t sound like a startup basically. So I think a lot of those mix of things has been really crucial to gaining that trust.

Kimberly: What is your philosophy around building applied AI products? So on the one hand, you get to own the customer and that’s great. And on the other hand, there’s new fun things coming out on Twitter every single day. There’s new models basically every month nowadays. And I imagine that’s a very tough foundation to be able to build a consistent product on top of.

Aatish: Yeah, so I think there’s a few ways. There’s another question also we often get, how much do you focus on existing workflows and existing surface areas for lawyers versus like a net new AI native UX? I think the one thing maybe to highlight is there is no IDE for lawyers. There’s no like VS Code or Cursor or whatever for lawyers. The two tools that they use the most are Word and email or basically Outlook. And we are intertwining with both of those in email and Word. But ultimately we didn’t really have a choice to build on top of existing tools or existing software because there really isn’t one. And so we’ve really opted for a AI native UX.

Kimberly: Yeah, what does that mean? Like what is an AI native UX?

Aatish: Ultimately, one of the main principles is we want Harvey to feel like a coworker, and not just a AI or a software. We want it to feel like a human. And if you’re working with a human at a law firm or an enterprise, you can basically talk to them and go back and forth a lot if you give them work. So let’s say I go up to someone and say, “Hey, can you draft me this one-on-one disclosure?” If they’re a good coworker, they will ask you, “Hey, I need more information. Can you give me what is information source? Like what should I base the format and the tone on, or what deal are we even doing?” And then they may write a draft of it and say, “Hey, can you check my work? Am I on the right track? And I think that’s really how we want Harvey to feel like is you’re going this back and forth, and you’re being guided to do that work.

Kimberly: Is it like a chatbot UI still? Or what is the actual UI that people are using here?

Aatish: It’s kind of like a chat UI with kind of dynamic UI components that are surfaced. And I think the other principle that we really want to take into account here is there’s this principle called the IKEA effect, which is basically the idea that people feel a lot more responsible for what they do if they help build it. And IKEA really took advantage of this, right? They’ve really kind of made the process of building their furniture really delightful and fun. And you know, really invests a lot in the manuals and everything. And there’s a cult-like following for IKEA because people assemble it themselves, maybe nowadays they don’t as much.

Kimberly: They used to.

Aatish: They used to, yeah. And so I think for us, it just goes back to, you can’t one shot at an S1 with O1. There’s so much back and forth that goes into this actual legal work. It’s complex, you need humans, and unique data sets where if we were just like, “Hey, draft this disclosure schedulem” and Harvey did it, no one would trust it because they had no idea what actually went into creating that. And so we want to bake in these nudges and kind of we call it shoulder taps so that Harvey asks for feedback, asks for data, asks for intent before actually producing the outcome.

Kimberly: Can you talk through like, if I’m an individual lawyer, what does that look like in practice? I know one of the UX experiments a lot of people are trying to figure out is while the agent is doing work, it’ll like expose and it’ll tell you what it’s doing, but there’s also like some level of downtime that happens there. Like does the lawyer get a little notification? It’s like, “Oh, come back, I have a question.” How do they integrate that with their day-to-day work so that it’s not just sitting there monitoring the agent?

Aatish: I think one interesting thing for our user base and our product is that we’re not very latency constrained. I think for a lot of chat products or your consumer AI products, most people expect an instant answer. But because the quality of the output that Harvey produces is so good and so human-like, people are okay waiting two minutes, three minutes, four minutes to actually get an outcome. And because of that, we’re able to basically shove more intelligence into every single pass, and more model calls, more algorithms. And so people can wait and are fine waiting. And we’re starting to add basically asynchronous agents that work where it’ll email you when it’s done or ping you when it’s done. And so that latency constraint is just not a big constraint for us, which allows a lot of freedom to work on. And as long as the agent is basically providing some transparency of what it’s doing, and it’s not just endless spinning, I think it works out for our user base.

Kimberly: Do you think we’ve arrived at the point that we know what is the best AI native UI or UX experience yet? And if the answer is yes, I’d love to know what it is. And if the answer is no, what do you think are the experiments still being run? Or what are the types of workflows you think people haven’t quite figured out yet?

Aatish: Yeah, so short answer is no. Chat is the command line of AI. I think when MS-DOS first came out, you were just typing into a terminal to move things around. That’s where we are with AI. And actually, I think, hopefully in 2025, we see a lot more innovation on AI native UX, dynamic UX, ways to interact with the model that is not just text-based. I think that first of all. I think what people have to realize is, most users and certainly our users have very unspecified queries or prompts. It’s interesting how comfortable people have gone with AI, where they just assume that AI knows everything. We get a lot of support tickets saying, “Go into my email and search up this thing and produce this outcome.” Or, “Do you remember when I talked about this last time? Use that to come up with the answer.” I think it’s an educational thing, but also I think AI really has to work collaboratively again with the individual to actually extract the intent from the individual versus just relying on the one-shot prompt to get it exactly right.

So I’m hoping to see more unique back and forths and guidance that the agent can provide instead of just a text-based prompt. I think with enterprises, you actually need this AI native UX even more because the work is so complex and difficult. And oftentimes the work is being done by teams of people or humans. And so you do need a more full-fledged, kind of natural UX versus I think consumer, because the use cases are so varied, and because there’s so many ways to use AI, maybe the best UI is a chat. Because it’s so open-ended, you can capture the whole market with just an open-ended UI and it’s kind of what we’re seeing. So I do think enterprises, there should be more experimentation with AI native UXes because the workflows are specific because the work is so difficult. And again, never one-shot.

Kimberly: Yeah, makes sense. Maybe switching gears slightly. I’d love to know, to the extent that you guys can’t talk about it, how do you think about the infrastructure under the hood? Are you primarily using one model and if so, what is that? How do you think about swapping out models as new capabilities come out, etc.?

Aatish: So as I mentioned previously, Harvey consists of hundreds of different models calls using the agentic or compound AI system to produce the output. And currently, we primarily use OpenAI models, either OpenAI directly or OpenAI through Azure in production. And that’s particularly because, well, one, models are really good, both the OpenAI and Azure’s infrastructure is really good and fast, and security and customer trust. As I mentioned earlier, people really, really want to make sure Azure is the kind of default cloud choice for us. And that’s really how we’ve been able to gain trust. But in general, we’re not really tied to OpenAI. We work with all the major labs already, actually, to basically eval their products and provide guidance on how they should think about legal reasoning and sharing data sets, sharing insights that we’ve gleaned. And so we are certainly open to using all sorts of different models. It’s just business security and infrastructure constraints. We haven’t gotten to that yet.

Kimberly: Yeah, how easy is it to swap a model for you guys? Because they’re non-deterministic, you can imagine something weird happens. How do you run evals on that afterwards to make sure that the experience is still consistent if you do swap out a model?

Aatish: So from an AI infrastructure perspective, again, I think early on, we’ve really tried to emphasize modularity so that we can swap model strings in and out, and endpoints in and out. The more difficult thing is actually the eval, as you mentioned. Each model has a different personality, characteristic, behavior, maybe the same prompts or data for fine tuning don’t work the same way for different models. And so swapping a model in and out does require a lot of eval because we want to make sure it doesn’t degrade quality.

Kimberly: So have you guys built out internal eval infrastructure to do this?

Aatish: Eval is a big focus for us. I come from scale. I know human expert data is extremely important to building AI systems. I think there’s two kind of aspects to eval that we think about. One is basically internal eval to improve our AI systems. And then there’s external eval to communicate the value. So on the internal side, we have basically a mix of human experts that we have internally or that we contract. So lawyers in all different countries, all different practice areas, to be able to do all sorts of kind of absolute or relative evals. So absolute is like, “Look at this piece of content and rank it based on this rubric,” or whatever it is. And then side by side, it’s like, “Okay, look at two different versions that are algorithm, and then rank it side by side.” And we really invested a lot in that and have kind of scaled that up as we’ve grown.

On the external side, the difficulty is, a lot of legal work is actually applying subjective opinions on objective facts. And judging subjective opinions is very hard. There’s certainly no objective truth. Like, did you apply the law in this way or is your interpretation worse or better than the other? So eval overall externally and communicating is really hard. And then generally legal tasks externally, there’s just so many. Like if you look at the legal taxonomy of tasks out there, there’s almost like 10,000 leaf nodes, and lawyers have actually mapped this out.

And so I think part of the challenge here is, how do you communicate to customers that Harvey is good or accurate, or doesn’t hallucinate or whatever it is? And so we’ve spent a lot of time, we released this benchmark called BigLaw Bench earlier last year, where it basically presents tasks that represent real billable work that lawyers do on a daily basis. And it’s the first benchmark of its kind. Like all public legal benchmarks so far have been multiple choice. I would love if legal was multiple choice, but legal is not multiple choice. It’s very open-ended and messy. And so the benchmark we produced is really saying, “Here’s real work that we know lawyers do and here’s how Harvey performs.” And I think that one other unique thing that we did is, we’re not measuring necessarily accuracy. We are measuring the percent of work that the model does compared to a 100% human response.

Kimberly: You mean like time is the metric?

Aatish: More like the total work. So maybe it gets you 85%, 90% of the way there to drafting a disclosure schedule. And maybe the human just gets us to get it to 10%. The reason is because if you just frame things based on accuracy, no one wants a 90% accurate agentic system, right? It’s not the exact right kind of framework to think about communicating value because even if you get a 90% complete product, that is still helpful than starting from zero.

Kimberly: Yeah. And then one last question on this front, which is a little bit of a tangent, but I was thinking as you were talking about the infrastructure around swapping out models, but doing evals to make sure that the experience is consistent and the product doesn’t degrade. What are your thoughts on the new OpenAI reasoning models? Because I imagine like legal is actually one of the use cases that is probably on the spectrum, more reasoning heavy than a lot of other use cases. Have you seen that to be a dramatic difference, and how has that applied to you guys thinking about which models you would actually want to use?

Aatish: I think that’s been a huge unlock for our product and our customers. One nice thing, as I mentioned earlier, is our customers are…the latency isn’t a big constraint. The one downside of these reasoning models is that they take time to think, and, you know, to show their thought process and chain of thought. And so our customers are already used to that. So putting in these reasoning models has actually been very natural because of the way we’ve designed our product. And then on the AI side, the models have been…they’re actually really, really good at long-form drafting and long-form reasoning. Like drafting a whole motion dismiss argument based on pulling from various different facts, wouldn’t have been possible before these reasoning models.

Kimberly: Maybe this is like getting a little bit too in the weeds, but I’m trying to think of some of the nice things about, like you said, like seat-based is that it’s a very clean metric, or usage-based is kind of a clean metric too. So for support tickets, it’s like a ticket is the unit of metric. How are you defining unit of work being done for one of these eval sets? Because I imagine people have a hard time given that this is relatively new also, grokking exactly what that means.

Aatish: Yeah, so it’s incredibly difficult in general, and it does vary a lot based on the task. I think there’s… Based on the task, but also based on our customers. The way you’d create a chronology for a case might be very different from law firm to law firm. And so I think we’ve thought about it like, let’s try to standardize the names and the taxonomies of these tasks first, and then devise rubrics for like, “Okay, you know, maybe law firm A and law firm B have the date column in a chronology in a different place, but it at least has the date,” right? And so I think we’ve actually developed a whole rubric, and this is where a lot of our internal legal expertise comes in for each kind of major task that we’ve evaluated that is unique, that rubric is unique to that task. And we’ve tried to standardize it, but there is so much variance.

Kimberly: Makes sense. Has Harvey built its own foundation model or do you guys have any plans to?

Aatish: The short answer is no, we have not built our own foundation model. And instead we’ve kind of worked really closely with OpenAI to fine tune, to post train, to prompt engineer, to do RAG, to build up these agentic or compound AI systems.

Kimberly: Do you guys want to build your own foundation model eventually? I’m just curious, whatever the answer is, like what was your rationale behind either yes or no?

Aatish: So short answer is no, we don’t want to build our own foundation model. I think the compute stats are out there, but it’s extremely expensive. And we’d rather leave it to…

Kimberly: You’ve raised a lot of money.

Aatish: Yeah. They did raise a lot of money, billions. And we’d rather leave that to the experts, and really focus on delivering our own customer value and kind of the products around that.

Kimberly: Okay, so you guys don’t want to build your own foundation model. I’m curious then as you think about the foundation models getting better and better, a lot of people are like, “AGI is almost three to five years away,” or whatever. Do you view the foundation models as ultimately competitors as they generally get better at reasoning capabilities, or is the ability to do more domain-specific things now?

Aatish: We have to assume that the models just get better and better. And so what does that mean for us? We have to accumulate different types of advantages and not just the model itself. And so few of those advantages are product, data, network and brand. So there’s UX and kind of the enterprise platform. So I think most people, again, underestimate what it takes to actually deploy products in the enterprise. I think even AGI is probably going to underestimate what it takes to go through security checks at a bank. And so, again, we’ve built a lot of these security checks, permissions, audit logging, usage dashboards, all this enterprise and admin functionality that’s really required. And companies like SAP, ServiceNow, Workday, they’ve invested decades in this stuff and this is why enterprises like them enjoy them. So I think investing in enterprise platform is important.

UX is also extremely important, as I mentioned. The UX that AI is going to use to collaborate with whole organizations is not gonna be a chat-based product. So we need to really innovate on the UX, and how you do workflow specific UX that you can collaborate with AI on. So that’s another one. And then data sets, I think is really important. So AGI is not gonna have the data that is sitting on some on-prem server at a law firm. And this happens, a lot of law firms have on-prem servers. And so what really makes a law firm unique is a lot of the historic deals and cases and data that they actually have. And so we’re starting to basically have Harvey be able to use that data and tailor outputs, workflows, based on that data. So I think overall there’s these product UX kind of advantages we need to accumulate.

Kimberly: How much has all the AI zeitgeist, all the things that we hear about coming out weekly, how much has that actually permeated in the enterprise? And what do you think is the latency there of us hearing about it versus something actually getting deployed there?

Aatish: Yeah, that’s a good question. I think in a similar way to how Silicon Valley gets information oftentimes through X now, a lot of our law firm customers get information through LinkedIn. And so the best way for me to understand our personas is actually to look at a lot of LinkedIn posts from a lot of our personas, and see what they’re liking, see who they’re following. Because that’s really where the zeitgeist and conversation happens. I think overall, maybe this time last year, we would actually go to customers and they would have never heard about ChatGPT. Sure, AI, but never heard about ChatGPT.

Kimberly: End of 2023, beginning of 2024?

Aatish: No, beginning of 2024, yeah. They would have never heard of ChatGPT, and they never even used it. And that was like a wake up call for me because, again, coming from Scale, I was surrounded by AI for a long time. I’m like, “Okay, wow, this has not actually permeated as much as I thought.” I think fast forward to now, most people have heard about ChatGPT, but oftentimes people don’t use it. I think now if you ask anyone in tech, “Why don’t you use ChatGPT? You’re at a disadvantage.” But most of our law firm customers in that world often don’t use it, but they’ve at least heard about it.

And then I think for the enterprises, they have…it’s been like two-ish, two and a half years since ChatGPT, they have at least deployed some internal chat bot or have purchased Copilot, and maybe use it to draft emails or whatever. But we haven’t really seen, even in leading enterprises and not just law firms, we haven’t really seen workflow specific adoption of AI in the way that Harvey is trying to push. This goes back to my bottleneck question, I just think we’re so, so early. Like AGI takeoff can happen and the LinkedIn law firms are never gonna hear about it for five years. So I think that’s honestly been a good empathy test for a lot of our team is most people don’t know that this is happening. And so another reason for a lot more applied AI startups to really go into these “hidden markets” because it is just wide open.

Kimberly: So then I guess my next question, you may have already answered it is, have they thought about how their business model or staffing model needs to adapt as a result of AI? And maybe the answer is no, because on LinkedIn, you’re not seeing people talk about impending AGI. But at least in Silicon Valley, people talk about that a lot when it comes to professional services or billing based models.

Aatish: Yeah, I think the mindset has…it actually changes maybe every three to six months. And that’s probably the leading or lagging time of information. But six months ago, clients of law firms basically would have said, “Don’t use AI on my projects because X, Y, and Z,” trust concerns, risk concerns. But end of last year, now they’re just like, “You have to use AI on our projects because it’s gonna be more efficient.” And so I think this is evolving quite a bit and this understanding is evolving quite a bit. I think there are more bleeding edge companies and customers that we’ve partnered with have really leaned into, “Hey, we think AI is gonna fully change how our practice works. We should get on and try to drive it and control it.” So I think there are the more visionaries who are thinking about this, but in general, people know something is gonna happen, but they don’t know what and they don’t know how it’s gonna change.

Kimberly: Neither do we.

Aatish: Yeah, neither do we.

Kimberly: AI gets better seemingly every single day and there’s new capabilities, there’s new companies popping up all the time now. How do you guys think about, or how do you think about the next couple of years? Like if you have any predictions on where do you think most people are actually gonna find value in the enterprise in particular in AI? Like what do you think are still the unlocks that need to happen such that more places can actually see ROI, etc.?

Aatish: I think, in Silicon Valley, we talk a lot about AI takeoff or AGI takeoff that the models can get so good, and it’s rampant, and everyone’s gonna live happily and never have to work again.

Kimberly: Yeah, retire in two years.

Aatish: And just retire again. I just think intelligence isn’t the only thing you need. You run into human bottlenecks deploying this stuff. You’ve run into “software bottlenecks,” like trust, like the ability to work well with the model. And so I think I would encourage, and hopefully we see this more in 2025, encourage more enterprise AI companies to get really, really deep with their customers, and understand their workflows at a pretty deep level so that they can bring AI to them in very specific ways, and build kind of the product and UX around it, and establish that enterprise trust. And so I don’t believe at least in the next two, three years we’re gonna reach AGI heaven. It is going to be really customer-focused builders applying AI in unique ways through enterprise workflows as well.

Contributors

More From These Contributors

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

More About This Podcast

Artificial intelligence is changing everything from art to enterprise IT, and a16z is watching all of it with a close eye. This podcast features discussions with leading AI engineers, founders, and experts, as well as our general partners, about where the technology and industry are heading.

Learn More

RECOMMENDED FOR YOU

AI, Data Engineering, and the Modern Data Stack Tristan Handy, Jennifer Li, and Matt Bornstein Read More
Enabling Agents and Battling Bots on an AI-Centric Web David Mytton and Joel de la Garza Read More
Giving New Life to Unstructured Data with LLMs and Agents Anant Bhardwaj and Guido Appenzeller Read More
Beyond Leaderboards: LMArena’s Mission to Make AI Reliable Anjney Midha, Anastasios N. Angelopoulos, Ion Stoica, and Wei-Lin Chiang Read More
Who's Coding Now? AI and the Future of Software Development Guido Appenzeller, Matt Bornstein, Yoko Li, and Derrick Harris Read More

go to top

Agents, Lawyers, and LLMs

Transcript

Want more a16z Enterprise?

Thanks for signing up for the a16z Enterprise newsletter

More About This Podcast