AI + a16z

MCP Co-Creator on the Next Wave of LLM Innovation

David Soria Parra and Yoko Li

Posted May 2, 2025

In this episode of AI + a16z, Anthropic’s David Soria Parra — who created MCP (Model Context Protocol) along with Justin Spahr-Summers — sits down with a16z’s Yoko Li to discuss the project’s inception, exciting use cases for connecting LLMs to external sources, and what’s coming next for the project. If you’re unfamiliar with the wildly popular MCP project, this edited passage from their discussion is a great starting point to learn:

David: “MCP tries to enable building AI applications in such a way that they can be extended by everyone else that is not part of the original development team through these MCP servers, and really bring the workflows you care about, the things you want to do, to these AI applications. It’s a protocol that just defines how whatever you are building as a developer for that integration piece, and that AI application, talk to each other.

“It’s a very boring specification, but what it enables is hopefully … something that looks like the current API ecosystem, but for LLM interactions.”

Yoko: “I really love the analogy with the API ecosystem, because they give people a mental model of how the ecosystem evolves … Before, you may have needed a different spec to query Salesforce versus query HubSpot. Now you can use similarly defined API schema to do that.

“And then when I saw MCP earlier in the year, it was very interesting in that it almost felt like a standard interface for the agent to interface with LLMs. It’s like, ‘What are the set of things that the agent wants to execute on that it has never seen before? What kind of context does it need to make these things happen?’ When I tried it out, it was just super powerful and I no longer have to build one tool per client. I now can build just one MCP server, for example, for sending emails, and I use it for everything on Cursor, on Claude Desktop, on Goose.”

And for more on MCP, you can read Yoko’s MCP primer here.

Transcript

David: So MCP is…first and foremost, it’s an open protocol and does not say much yet, but what it really tries to do, it tries to enable building AI applications in such a way that they can be extended by everyone else that is not part of the original development team through these MCP servers and really bring the workflows you care about, the things you want to do to these AI applications. And for that, it’s like a protocol that just defines whatever you are building as a developer for that integration piece and that AI application, how they talk to each other. And that’s really what it is. It’s a very boring specification. But then, what it enables is, hopefully, at least in my best-case scenario, something that looks like the current API ecosystem, but for LLM interactions, with some form of context providers or agents in any form or shape.

Yoko: Yeah. I really love the analogy with the API ecosystem, just because they give people a mental model of how the ecosystem evolves. It feels like API, when it first came out, was an abstraction on top of a set of things you can do on a different set of servers and services. Before, you may need a different spec to query Salesforce versus query HubSpot. Now, you can use a similarly defined API schema to do that. Not exactly the same, because everyone defines query parameters differently. And then, when I saw MCP earlier in the year, when I was building something with it, it’s very interesting that it almost felt like a standard interface for the agent to interface with LLMs. It’s like, what are the set of things that the agent wants to execute on that it has never seen before? What kind of context does it need to make these things happen?

When I tried it out, it was just super powerful, and I no longer have to build one tool per client. I now can build just one MCP server, for example, for sending emails, and I use it for everything, on Cursor, on Claude Desktop, on Goose. So I’m curious about, I guess, what’s the behind-the-scenes story? What inspired you when you first realized, “Oh, we need a protocol for this?” And how did you create it?

David: Yeah. Thank you. I think that’s an interesting question. And with all of these types of ideas, I think they never happen in a vacuum. So I joined Anthropic about a year ago, pretty much actually a year ago, and I was working mostly on how we can use Claude more internally to accelerate ourselves. And as part of that, one of these original ideas I was thinking through is I cannot be the person who builds for everyone their specific workflow, their specific things. But I need to enable them to build for themselves because they know best what they need and how their workflow, and their agentic bits that they want to build, fit into, you know, their system in the ecosystem that they’re working in. And so that was one aspect of that.

The second aspect of that was that, at the time, I was using both Claude Desktop, which was amazing with its artifacts that really allow you to greatly visualize things, but it had this limitation that, basically, there was no interaction with anything outside of the text box. You couldn’t add, you know, Google Files yet or anything like that. And at the same time, I was using a code editor, which was amazing because it had access to all my code and had access to all these cool things, but it couldn’t visualize anything very nicely as Claude Desktop. I mean, I was very frustrated by just copying things from Claude Desktop back into the editor, and back and forth. I’m like, “There needs to be a better way across these two applications.” And then, if you take these two things together of, in some way, enabling people to build something, so some form of API, but at the same time, I also want to work this across multiple applications, like a code editor.

You know, for me, there was the Zed code editor, which I really like, and Claude Desktop, which is, you know, obviously, my favorite desktop application. You look at, how do I solve the classic M times N problem of, I have, you know, M clients and I need N providers. And the answer is a protocol for that. And there’s always been protocols for this type of thing, and there are many patterns in the past that match that. And that’s how I came to go, “Hey, I would really love to have some form of protocol that enables me to tell Claude Desktop to tell, you know, Zed, to tell Cursor, the workflow I care about and the things that I miss from it,” because I’m a developer, and I want to build for this, and I know how to build for this, just let me do it. That was really the origin in that regard.

And that was just the idea. And I took this idea to a person called Justin Spahr-Summers, who is the co-creator of MCP with me. And he took a real liking to that idea and thought it was a good idea. And he was really one of the key people to, you know, prototype the initial version, really make it work within part of the product side of Anthropic and played a really big role in making this a rather big thing in Anthropic, initially. Yeah. And so we both basically co-created this together until, you know, we released it into the open in November of 2024.

Yoko: I love that. I love the creative partnership here. And then, with the framework or protocol, I will have to ask, like, it’s kind of a chicken and egg problem. Do you create the instance of, you know, you can implement using protocol first, or do you have the protocol in mind? And if you created something of, you know, concrete instance or examples of it, what was the first MCP server or client you created?

David: Yeah. That’s a very good observation, a very classic chicken and egg problem. And the way we usually do this internally, and Justin is amazing at this, is very rapid prototyping. So if you had just a very intense few weeks of writing prototypes, very simple things that just demo it, for the most part, initially, so one of the first ones that we wrote was the Puppeteer server. It’s, like, the ability to control a Chrome instance. And one of the reasons you do this is because it is a very active process. There’s something happening on the screen, and it makes people, wow, which is what the effect you want to get to people. You want to be like, “How do I convince people that there’s a lot of possibility here?” “Hey, I can control your browser, and I can do things you couldn’t do before.” And Claude is the one doing it, not yourself manually doing it.

But while we’re doing this, we’re refining the concept, and we spend a lot of time discussing, I wouldn’t say fighting, but definitely having an interesting discourse over certain primitives you want to put in and leave out. And there was a lot of changes to the way things were working in the first few weeks. And again, the first MCP client that I think Justin was writing was…he wrote it into Claude Desktop. I wrote it into Zed. So that happened kind of in parallel. And then I think the real use case MCP server that we really had for ourselves was one of these very boring ones, maybe a GitHub integration or something like that, to just help me in doing my work better, like a Postgres server, nothing super fun, nothing super creative, just the most obvious thing you would go and want to do, I’d still have to pay for.

Yoko: I would say the Puppeteer example is super creative, you know, because you can really have the agent do anything for you. Recently, I’ve seen an example of, you know, the Ghiblification process. So there’s someone who has an MCP server that controls the browser to ask for models to generate Ghiblified images so they don’t have to, you know, implement API endpoints. That blew my mind.

David: That’s very cool. That is very good.

Yoko: Yeah. I guess, outside of your first initial use cases, since you’ve probably seen every MCP client server out there in the community, what are some top interesting implementations you have seen for MCP servers or clients?

David: I like it when people get creative. I think it’s great that people build a lot of these integrations that are very sensible and are quite straightforward, again, like the Postgres servers, the GitHub servers, the Asana servers of the world. But what I really like is when people get creative. I think one of the things that made me just laugh was this person, very early on, around Christmas, just hooked up a Claude and Claude Desktop to their Amazon account and just had Claude buy their Christmas gifts. And I always just thought, “This is hilarious.” It’s, like, so funny, and it’s so creative.

Yoko: That is amazing. How is that implemented? Does it have payments?

David: I forgot the exact details, but I think it was some combination of, basically, Playwright or Puppeteer controlling the browser, but it was deliberately built around something from Amazon that I want to buy. Also, the set of gifts. And so I love these types of things a lot. And I like when, you know…and I’ve seen your Morse Code MCP server. I love these kinds of things. It’s, like, playful engagement with technology. Years and years ago, I was a pretty active member of local hacker spaces and these types of things in Germany, and I love the creative way people interact with technology and try to build things. And so, every time I see these kinds of combinations, they’re just beautiful. And we talk about this a bit later, right, when people deal with synthesizers in Unity and Blender. But then there’s obviously, like, a fun, interesting technology part to that. I thought JetBrains did a really good job of having an MCP server that can control their IDE. And that is a bit more of a complex setup. And I love that part. And then there’s fields that I didn’t even think about. There’s a somewhat famous YouTuber called Laurie, who’s a reverse engineer, and they used Claude to help with reverse engineering some files and using MCP. I thought that was very cool…

Yoko: Oh, interesting.

David: …because it’s like, some of these things, nobody would ever build a first-party reverse engineering tool into their desktop. That person can just go and build it themselves because, of course, they have the ability and they have the skill to do that. And so that’s the kind of stuff I love.

Yoko: I just love it when a protocol kind of unlocks the long tails, when a long tail is really long, because, as you said, no one else will build it as first party. But now, everyone can build the software for one.

David: Yeah. I’m actually a little bit curious. Like, what are one or two examples that you have that you found quite funny and interesting for yourself?

Yoko: Yeah. So there was one I built. It’s actually a very practical use case where sometimes I’m so into coding, I skip dinner. So, obviously, my husband will be texting me. He’s like, “Where are you? Are you home for dinner?” And then I just use the recent MCP I built. This is, like, another beauty of it, because with the same MCP server, you can unlock very different experiences by entering different prompts. So instead of sending him an email, so I ask Cursor Agent, “Can you text my husband at this number and explain why we’re late for dinner?” Because Cursor Agent has done most of the coding, I was just reviewing, and I text my husband. And it’s, like, a number my husband can reply to, too. So it’s, like, a very practical use case.

David: That is such a good use case and so fun.

Yoko: Right. Even explain, like, “We got stuck here. I couldn’t debug this.” I feel really bad for the agent.

David: I love this. This is so creative. This is, like, exactly the kind of little bit of magic that people get off using it.

Yoko: Yeah. And then, so the Morse code example, it was so much fun to build. It was more…there was someone on Twitter who asked, like, “I want the coding agent to notify me when it finishes the task, because sometimes it takes 5, 10 minutes.” So I thought, what’s a really funny way for it to communicate with a human? Obviously, like, you can text. You can play some music. But then we have a lot of Philips Hue light bulbs at home. So I thought, what does it take for the agent to get access to my local network? Because it’s, you know, under the same IP, and I just control my lights. And how do you speak with the lights through Morse code?

So I kind of picked up Morse code that week. It’s a lot to debug on, like, what’s long, what’s short, what’s the interval. So, in the end, experience with Cursor or, like, Claude Desktop, when it finishes the task, it will start a Morse code sequence on whatever it will have to say through Morse code, and now you just need to listen or see very closely. So that was a lot of fun to build, and that week…

David: That’s amazing.

Yoko: We have three cats at home, and they’re all freaking out because the lights are just on and off. Turning on and off. Another one, you know, since I started using MCP as a developer, I was going back to my previous projects. I really just built for fun. And thinking about, “How can I rewrite this, like, as an MCP client, so I can plug in any MCP server on it?” So, as an example, last year, so I built this Raspberry Pi cat narration project where using the Raspberry Pi camera to detect if my cat is jumping on the kitchen counter, and I will narrate what the cat is doing or yell at the cat. So I’m actually in the process of converting that agent loop into an MCP client so it can use ElevenLabs MCP server to actually yell at the cat. And then it just unlocks net new examples like this. I just love, you know, building and playing on the side.

David: I need that version for my dog.

Yoko: I’ll send you a Raspberry Pi later. Most of the LLMs nowadays are still too big to run on a device. So I still have to call, you know, Claude or some other models to make that happen. But the fact that, now, I can make the cat detector extensible, it’s very interesting to me. So now, not only can I call ElevenLabs MCP to, like, you know, yell at the cat. One, I think, underselling feature of MCP I’ve found is that the client could chain together different tool calls. So not only can it, you know, use ElevenLabs to yell at the cat, it can also send me an email to say what the cat is doing. So I guess, like, speaking of underutilized protocol features, most of the people today, they’re implementing MCP servers with tool calls. But we know that there are so many other features to be unlocked. So, curious about your thoughts here. Like, what are some underutilized features that you feel like people should start experimenting with?

David: Yeah, this is an interesting one, because when you’re creating a specification, you have all these use cases in mind, and you think about it in a very principled way. And out of that comes a set of primitives that you want people to use. And then, you know, reality hits you, and people use it very differently. And I think, obviously, people use it, as you said, for tools. But there’s, I think, two or three things that I really think that are quite underutilized, and I wish people would use it more. But I think there’s a problem, particularly around client support initially. But the one thing that I really love in the protocol, actually, is a very poorly named feature called sampling. Because it’s quite confusing what it does, I think, when you read the name.

Yoko: Do you want to explain? What is that?

David: Yeah. So when you really think about what you’re trying to do, it makes a lot of sense. What it is, what sampling is, it’s a way for the MCP server to say, “I want to call an LLM. But because I’m an MCP server, I don’t know what is the LLM that the client is using. And I could bring my own SDK, but then I’m binding myself to that SDK.” And, you know, that might be an Anthropic SDK. That could be an OpenAI SDK. But now, I’m expecting an OpenAI API key or a Claude API key from the user, and that’s really not great. And maybe they use a different model in Cursor.

And so sampling is a way for the MCP server to go back to the client and ask the client, “Hey, can you give me, with the current selected model completion, like, a sample from the LLM?” That’s where the name comes from. “And give that back to me.” And that way, I can do MCP servers who would go and summarize a Reddit post or summarize whatever I might want to do or even have their own agentic loops themselves. But the controller of the LLM inference is still the client. And so there’s a lot of…I think that’s the really cool bit that you can build these MCP servers that are very rich, that go way beyond tool calling, and have them all completely model-independent. And that’s really what it’s for.

And we can talk later. We’ve combined them in the right way. It has a lot of cool properties. But that’s one of the features I would love to do, see more people use. But again, this is a matter of clients don’t support this very well, or at all, and so I wish more clients would support it, and then more people can build it and can build these more richer things that go beyond just tool calling, be it agent loops, be it summarization bits, and so on and so forth. So that’s one of these features.

Yoko: Yeah. This is so interesting. I guess one very concrete example I always wanted to build with this model, you mentioned with sampling, is actually code review agent. So in this case, I will want to build, you know, a server that does code review, but it may want LLMs to complete, “Is this valid syntax?” since it doesn’t want to, you know, bring its own LLM. So it feels like a very natural jump-off point. What does it take for clients to support this?

David: They just need to do it. There are obviously reasons why certain clients wouldn’t want to do it, particularly, people, clients with fixed subscriptions might prefer not to do this because it suddenly becomes an API. But other than that, I think it’s just a matter of client support and priorities. Obviously, clients support what people do, and so they are mostly focused in tool calling. There’s so much going on with the spec that needs to be added. And the heavy lift in all the MCP land is very deliberately actually on the client side, because we expect that there’s way fewer clients than servers, and so we want to make it very trivial to build a server. And so, every complexity that we could shift to the client, we put to the client. As a result, like, it’s just hard to build a really good full spec compatible MCP client where it’s very trivial to use any feature you want on the MCP server-side.

And so they’re just a little bit behind, and it just will probably take time. And for some of them, it might just not make sense out of the way they deal with inference in general. But at the end of the day, it’s just a matter of, like, just waiting and seeing that some people, like, implement it. That’s the end of it.

Yoko: Right. Sampling is such an interesting concept, too. Kind of, at least when I first saw it, I was like, “Oh, this is so powerful,” because the divide between client and server is less of a physical one but more of a logical one. So technically, you could write a server that’s sampling with another client that’s also a server. I know it’s kind of complex when you describe it, but can you give us an example of how to best use this kind of chained server-client combo, and how does that relate to sampling?

David: Yeah, I think that’s a very interesting piece. Interestingly enough, we have, very early in the process, built ourselves. We had the prototype for what you’re describing, I’ll go into detail in a second, but prototypes of this actually before we even released it to the public. But what you’re describing is, you know, you take an application that is an MCP server that exposes tools to an MCP client, but also, within that MCP server, you are using an MCP client. And so you can also use other MCP servers downwards. And so you have this little program, which is an MCP client and an MCP server at the same time, and I think about this as upstream and downstream connections. And now, you can chain these things indefinitely long. It’s probably not very practical to do them indefinitely, but you can definitely think about a few chains.

And you can even go as far as create whole graphs out of this, and you can very quickly envision worlds where, you know, there’s an MCP server that has an agentic loop that orchestrates two or three other MCP servers, their tools, do a really, you know, good agentic loop, and then you can have this entity out of three or four servers and give them to a client, like a Cursor. And I think that’s a very interesting concept that feels very agentic, particularly if you then use additional primitives to go beyond tool calling, such as resources or prompts where there’s additional data streams, basically, that MCP servers can expose or data that they can expose upwards and downwards. And I think, then, you can actually model quite rich interactions.

And I would love to see people play around with more of that and use, for example, an AI framework, like agentic AI or LangChain, whatever, to build a connection of client upwards, client downwards, server upwards, and then chain these things and see what happens. And then you are suddenly free, and you can go to a user and say, “Hey, which five MCP servers do you want this agent to control?” And you might have a very general agent loop, and people can go and experiment. And they can suddenly have, you know, cat monitoring software connected to an agent that also speaks, you know, email, WhatsApp, whatever it might be. And as you mentioned before, there’s a lot of power in using LLMs for these orchestration tasks, and so you can build these complex systems, these complex agent graphs using that technique you described quite quickly.

Yoko: Yeah. Since you also mentioned resource and prompts, which are the other two very powerful and underutilized functionalities today in the spec, I really think these are the sleeper hits of MCP. Do you want to briefly explain, you know, how does the developer leverage resource, and what is prompts as a concept within MCP?

David: Yeah, I can. Yeah. I’m happy to do that. One of the things to understand when we think about MCP is that MCP is focused on how the primitive that you’re exposing interacts with the other side, usually the user, but it could be an agent, and prompts are meant to be driven by the user. For example, the user explicitly adds it to the context of a call. And so prompts are templates that people can insert. But the interesting bit is, on one side, they can be very static templates, you know, an example of how to use this MCP server, but they can also be very dynamic. They can be, just as much, API calls under the hood. So we had, for example, you know, an MCP server that exposed prompts that download a stack trace from, like, a Sentry API, and so now that goes into the prompt. But I, as a human, on the other side, I say, “I want this in the context now.” I don’t let the model decide it. I decide it. And that’s the difference between a prompt, for example, and a tool.

And so resources, on the other hand, they are quite unique because resources are just like blobs of data, and they, for example, can be very easily used to model something, like a file system, towards the MCP client. And in this interaction model, it described user-driven, model-driven, tools being model-driven, prompts being user-driven. Resources sit in between by being application-driven, whatever that might mean. And so an application, for example, Cursor, could choose to say a resource can be added to an agent, similar as you can add a file to an agent. But it could also, for example, do things like ingest a resource into a rack system first and do retrieval before, right? Because these resources could be arbitrarily long.

So one of the things we thought about of, like, you know, very early on MCPs, do you actually need to build something for retrieval into this? And we came to the conclusion, “Hey, if the client controls the retrieval bit, resources can just go into this retrieval system and can be used that way.” And if you wanted to do it on the server side, you would use a tool. So those are these distinctions that I think people have not really caught on yet. These things are also fairly rich. You know, both tools and resources can be audio in the new spec, they can be images. So there’s a lot people could do, you know. You could expose your current screenshot as a resource, these types of things that I think leave a lot more use cases open to explore that MCP has to offer. But I understand, people do tools because it’s the most obvious thing to do.

Yoko: This is such an interesting point. When I first looked at resources, I almost felt like it’s a mind shift. Like, traditionally, as a developer, I always thought resources will be on the side of clients. So the client will expose resources and query it locally. But in this case, it’s almost like the MCP server is exposing a file system that a client can query. Curious about your thoughts behind, like, how did you think of the model? Like, how did you decide that it’s going to be, like, a server-side versus a client-side thing? And what is the intel for the transport layer?

David: I think the initial model of MCP was, how do I provide context in these different user interaction models? And so, for that, resources came quite naturally, actually, out of the need of, like, “How do I actually enable an MCP client that doesn’t have access to the local file system by itself? But I want to give it access to the local file system, too.” And now, a bit of history, looking back into July, August, 2024, Claude Desktop would not have access. You can upload files and these types of things, but it’s not as natural to add a file system to this. And similar to some agents that we might have internally. And so it felt very natural to have something like that. That was really the genesis of this, of, how do these servers are supposed to provide context? And so there are some of these.

And now, for the transport layer, MCP, at the end of the day, is just transport-independent, which was quite important for us. So, initially, that came out of the local use case where I wanted to use standard I/O, which has a lot of niceties of, you know, the lifecycle the MCP servers control, but the client, automatically, there’s a lot of things they can do. But it also means you can’t really speak. You could technically speak HTTP, but really realistically, you’re speaking something that is line-based. And so you’re speaking something like JSON-RPC, and that’s very heavily inspired by how the language server protocol does this, which is very, very similar.

It has an interesting property that I’m somewhat ambivalent nowadays, today, about because it has some drawbacks and requires certain things that probably would be better in a more classic API-like way on the HTTP layer, but it still enables people to go at the same time and implement MCP over other transports. You can, if you like, you know. I used to work at Facebook for 10 years, and there, you use these Thrift/RPC mechanisms internally. And that’s all the security infrastructures built around this. And you could just build MCP over this, and there would be no change required. You just do a different transport, and both sides are still happy. I mean, so that’s why that’s one of these reasons we chose it, for that flexibility, and partially also because it was an evolution from standard I/O to HTTP.

Yoko: Yeah, that’s so interesting. One of the top questions from just talking to you, a lot of developers who are building MCP, is, how do I authenticate MCP, both from client to server, also server to tools? I know there’s so many different great ways to make it happen. There’s also spec involvement. So I guess, what’s your thoughts around, you know, how auth will shape up around MCP in general?

David: Oh, that’s such an interesting and deep topic. I think the interesting bit is that, you know, everybody wants authorization. I think it’s clear that the current implementation that people effectively use is local MCP server, which is just, “Give me an API key,” or some form of token via an environment variable is usable, but it’s not exactly great. And particularly for the case where servers will be remote, it’s impossible. And so we have an early part of the specification around authorization, which just uses OAuth. There’s some caveats to that, and we are working very closely with the original OAuth authors and experts in the field to make this really go well. But I think there will be…there is an initial focus on how the user authenticates, which is different, potentially. Not sure yet. It’s potentially different how agents will interact with each other and authenticate with each other.

And for now, we want to solve the user and the human server problem. And for that, we would just use whatever the OAuth spec in the best possible way, because it turned out, when you innovate on, like, the levels of primitives and other things, you want to stay as boring as possible for everything else. But what authorization does, of course, it enables a very different set of MCP servers because it enables MCP servers that are remote, that are, you know, bound to a company account, that are really driven by a professional service offering something for you that you have a subscription to. You can envision, you know…I think PayPal has an MCP server, for example. You can see, you know, I want to use this MCP server, I log in with my PayPal account. Now, I can use this MCP server, and now I’m authorized. And it opens, like, this company and corporate ecosystem that I think will be super important in our day-to-day lives, while at the same time, you know, MCP still retains as, like, a bottom-up hacker mentality that it had originally for developers. But it’s just the authorization is the key step to this much, much richer ecosystem for professionally developed MCP servers at the end of the day.

Yoko: I guess, like, when we talk about auth, there’s two layers. One is authentication. Do you get access to this, you know, thing? And then authorization, which is, what are you scoped to get access to? It’s very interesting because I see these concepts sprinkled in different layers on MCP. So, for example, you know, you could scope access to certain resources. Say, I can only access resources that’s this specific folder. And then there’s also, obviously, like, third-party auth from server to, you know, all the API providers. How do you think about it when it comes to authentication versus authorization? And what would you want to see from auth providers in a while that…you know, what needs the most help when it comes to what makes developers’ lives easier?

David: That’s a good distinction, good question. I think what I’m probably focusing at the moment is mostly authorization. So, like, am I allowed to access this resource? Because that’s what people want. We don’t necessarily…and I have yet to see a lot of use cases for this authentication part, which identity am I, who am I, and who are…there’s other parts to that. I think that will come later of who is acting on behalf of whom. Particularly in an agentic world, this will be important. But for now, we’re tackling one thing at a time, with the biggest boulder in the way first, right, which is, for the moment, how can I get access to something that is behind some form of authorization, and I need to do this? And so that’s what we’re tackling. And so I think, at the moment, the focus is 100% on authorization. And then I think, from there, we will, potentially in the future, go about authentication and identity and these types of aspects.

Now, for auth providers, the thing that…and luckily, a lot of the big ones are doing this, is just engaging with us and telling us of what is the common denominator that everyone has that we can build upon so that developers feel they have some safety and that it’s not like, “Oh, you can only use this with this provider,” and talk to us, what are you willing to implement, where there are, you know, potential in this agentic world, things missing in authorization pieces. And luckily, they do this, right? The authorization specification development that’s currently going on is driven by a combination of very engaged people on the security and identity side of Microsoft, from Okta, from AWS, and so it’s the right people in the room already that are, in many ways, way better suited to help me make these decisions than I, myself, because I’m not an identity and authorization expert. And so I just want to hear more people that are experts in that field to tell me what’s the right way to do this so that we can all figure this out together.

Yoko: Amazing.

David: That’s really what I want to learn from people, right?

Yoko: Yeah, amazing. I love this community-driven development and iteration in the spec, too. Every time I check out MCP specs, there’s, like, hundreds of issues. So a lot of respect on how you guys kind of groom the issues day in and day out. Another topic I want to dive into, we kind of talk about a little bit, is, in the creativity field, how does MCP work, and what are some use cases? Because today, most of the clients we have seen are very developer-focused. It’s very natural when it comes to a new technology’s adoption cycle, because developers, we know how to configure it. We know, like, to put in a JSON Blob. But then, recently, I started to see very interesting core use cases with more creativity tools like Blender. You can now use words to create a 3D model. And then you can use MCP servers from, like, a Unity instance. You can have your own synthesizer, you know, so on and so forth. What are some top creative use cases you have seen or you are most excited about and want people to build more of?

David: I’m actually curious about your take later because you’re a very creative person. But for me, I find, that goes back to what I love about MCP, is this ability to bridge gaps of what you care about in the world and what you care about in your life. And so, when I saw, for example, the Blender MCP server, which I think was one of the first original big ones, or there’s one where a person connects Claude to Ableton. I just find it so fascinating and really cool because, on one side, I’m just astonished that LLMs are really good at this, actually, and you’re surprised because it’s a side of LLMs you would have never seen before without MCP. But on the other side, I just love the creativity of connecting these tools and then actually getting something useful out of it.

Of course, you know, to create a creative process and use it better than others as a creative person yourself is an aspect of control that every artist wants to have. And LLMs and MCP doesn’t give that to you. But it gives you a different set of interfaces to something, and I think it’s very interesting and creative to play around how you can describe, for example, a 3D environment. And I think that’s a very unique thing. Because an artist that’s an environmental artist in Blender has probably never had the ability to really express itself in words. And maybe, you know, how can you write a poem and have it translated to a 3D environment? That’s super fun. And then, of course, you want to go back into Blender because you need control. But I think it’s a great, fun exercise and experimentation bit that I think helps creatives to look at it in different ways, if anything.

And then, of course, you know, I love synthesizers. I’m a terrible musician myself, but I love them. And I mean, I love this idea where people use, for example, Claude to program patches onto physical synthesizers. And that’s just, like, fascinating to me that Claude can do it, but also just cool to see that people have thought about connecting the LLM to something that’s a physical thing in the world that makes a sound afterwards. And so I love that part. But I’m curious about what you think about this, because you are a very creative person. How do you think about this aspect?

Yoko: You know, I’ve been thinking a lot kind of along the lines of what you mentioned, the input of, you know, clients. So today, the input is mostly words. So we describe what we want to see. But then we know that words and actions or visuals are never one-to-one. So it’s very cool to have a starter template described by words, but then the later iteration has to be dictated by, you know, the artist’s choice. So, for example, I’m a huge user of Procreate. And then, in Procreate, you don’t really describe what you want to see. You just draw what you want to see. And then so much of that is controlled by the latent space in my brain. Like, my brain is not describing what I should be drawing, right? That’s not how, like, my model works. It’s more controlling the muscle to kind of deciding, like, how to draw this curve of a line. What’s the color that looks good to me?

So, to some extent, I almost feel like the MCP client really severely dictates what the whole experience will be. For example, if the client sends, like, some Bézier curve to the server and then have the server decide, “Is this something that looks good to you?” Like, that’s not something that we have seen very often yet. So today, the input for either code or, you know, languages, that’s very common. But later, I wonder what kind of experience we’ll have if every design tool becomes MCP client.

David: I don’t know. I have no clue what this is going to look like, but I think it’s a very interesting thought exercise.

Yoko: Yeah. Here comes a philosophical question on agents, just based on everything we’ve talked about, which is, what do you think is the ultimate communication mechanism or modality for agents? On the one side, we have natural language. On the other side, we have programming language. I mean, technically, we could frame all the problems in the world into a programming language if that language supports it. And then, on the other side, we have, you know, input modality as pixels, screenshots, you know, sometimes videos. What do you think, based on what you’ve seen on MCP servers and client interaction, would be the ultimate, like, abstraction layer that you’ll be like, “This is the right way or a great way to provide all the necessary context for agents?”

David: I think, yeah, it’s such an interesting bit. I think I don’t know is one part of the answer, but I think the real answer is there’s probably a set of…there’s a primary between having either of them. I think programming languages are a very good interaction pattern between agents because there’s a lot to say about dense mathematical but, you know, slightly different form of syntax that is very, very clear about its intent and very constrained, in a way, which programming languages are. And then there’s a very free form of, you know, natural language. I think only natural language will not begin good enough personally. That’s a personal opinion I have. A combination of them might be the right thing.

So I don’t have really an answer because I feel it’s a bit too early to tell, and I think I want to see this space explored a little bit more. So I’m like, when I look at the development in this field, I’m like, I feel it’s a bit too early to really tell what’s the right abstraction there. But I think things like MCP enable people to experiment with different things. And then, of course, other frameworks that exist and a bunch of other things in the space enable people to experiment. But I think there’s a lot more experimentation to be done to really understand what the actual general abstraction should look like.

And if you think about MCP, you know, under the assumption that MCP sticks around and stays around, as I hope it will, you know, MCP is two, three years into tool calling existence already. So we have seen a lot of these interactions before we have a somewhat general abstraction. And I think that, you know, we’re a bit too early for agents to see what this is going to look like. But I think your observation around there’s so many different modalities and different options, and I just talked about the text side of things, right? You already had pixels and other bits in there. And so I think there’s so much interesting space to communicate. Who knows? Maybe models really like to talk about things over video streams, and we don’t know. Maybe that’s, at the end of a day, the modality we end up, is just, like, video streams everywhere because they just like watching pictures of things.

Yoko: That’s so interesting. You know, like, these modalities bleed into each other, as I do a lot of, like, random projects on the side. One of them is called AI-Tamagotchi. So it’s basically like an AI-driven seed for Tamagotchi. So instead of just eating one thing, the Tamagotchi can request 10, 20, 50 things, whatever the LLM state will let it do. One thing I realized is that I could use most of the models today to generate ASCII art and even ASCII animation. And then, when I was thinking about it, it almost felt like a visual task. But, like, a language model still generates a sequence of tokens. And then, if I give the task to, say, like, a diffusion model, it doesn’t really generate a token. It generates pixels. So the question is, what is a better way to generate sequence of images or sequence of ASCII characters to animate something like this? So it really…

David: What have you found? What do you think this is?

Yoko: I actually think ASCII. I actually am more on the language model side today for these stateful, like, very predictable animation sequences. It almost felt like this is, you know, a modality I didn’t think that would have worked, but it did, because predicting next token, turns out, also works with predicting the next ASCII character.

David: Like a lot of things, if you think about, you know, transformer models and attention, you know, it would fit them like there’s a sequential thing, so probably somewhat good to generate with it.

Yoko: Yeah.

David: Smart observation.

Yoko: Yeah. And then the funny thing is I tried out a lot of different generation tasks. It’s best at generating cats. That’s why I searched for the internet. ASCII cats is really well represented in the dataset. This actually brought me, like, our agent chat brought me to this other question kind of on the high level. When you think about the feature for MCP, what do you want to solve, and what do you want to keep evolving, and what do you not want to solve? Because it does feel like a lot can be MCP specs problem, right? You could implement a rack. You can implement a database. You can implement everything and anything in the world. So I guess, how do you think about it? You know, what kind of things you want to keep executing, and what kind of tasks you felt like are just not part of what specs should be taken care of?

David: Yeah, that’s such an interesting question. I think everyone who builds a spec is faced with this type of problem of you need to stick to your guns, so to speak, and focus on the area you want to be good at and not try to oil the ocean, so to speak, and try everything. I think, for MCP, there’s a few things. I think there’s evolution of the current part of MCP. I think there’s a very clear path for evolution around, like, authorization, around, you know, other parts of that. But then I think there’s potentially still place for a bit more abstraction regarding agents, but that’s a very low conviction opinion yet, because just, again, back to, I need to see this a little bit longer, and I feel I really want to explore that space, but I feel that I might…

Yoko: I have to ask a question here. How do you define an agent?

David: Oh, I’m not going to get into that. What do you think? What do you think is an agent? I don’t know.

Yoko: I think it’s a multi-step LLM reasoning chain. It’s very simple for me.

David: Okay.

Yoko: Yeah.

David: Okay. I think I can get behind that. I can get behind that. I think, there, for me, agents is potentially more, in this word, agencies or something that does some form of autonomous orchestration or autonomous task solving. And usually, anything that’s a multi-step thing is, for me, already like an agent, right? The moment it does two steps and it reacts to the first step, it’s basically an agent because it now has some agency over what it’s doing. And so I think that’s, at the end of the day, for me, to add for the most part. But there’s a lot of definitions of agents out there.

So I think there’s a potential there to think about this. I think, there, MCP is somewhat in a good position in the sense that it allows for these graphs. So I think some of these graph pieces that MCP inherently indirectly enables can also be dynamic, which I think is a very interesting and unique part of it. So maybe there’s a little bit around agents I’m not fully sure yet, but it’s something that I definitely take a look at. And beyond that, I think that, again, the rest at the moment is just evolution, like streaming and other bits, modalities. I think there’s other interesting bits to MCP of, like, how does something like that potentially fits into other model types that are not just pure text-based models? I think that’s a long-term interesting question. So, what does this look like for video, audio, images, whatever it might be?

I don’t know if there is a use case for this or something like it, and that does not have to be MCP, but I think it’s an interesting question to think about different modalities. But yeah, again, I think, for the most part, it’s modalities. It’s evolution. Then, maybe. There’s a big maybe, big question mark next to, “Do we need more for agents, or can agents be already very well formulated in MCP abstraction?” Again, that’s back to experimentation.

Yoko: That sounds like such a fun experimentation.

David: That is a lot of fun.

Yoko: Yeah. I often try to refactor my code and then try to refactor the single agent into multiple agents, like five agents. I just need multiple LLM calls to make decisions on the chain. And then, interestingly, most of the time for the tasks I’m trying to do, which is very simple, send an email or ping someone, and very long transaction workloads, a single agent worked just fine. So I hadn’t really come across a use case myself that requires multiple agent collaboration. It’s, like, a very complex task. But what’s your view there? Do you feel like we’re going to kind of go pretty deep in a single agent? It’s almost like a technical detail, a single call graph with LLM. Or do you feel like it will be multiple processes working together?

David: For me, one of these observations is that I think agents is less a function of as different of the task, but more a matter of trust boundaries. If you have a travel agent that needs to have access to your bank or whatever it might be, there might be interesting bits where there are trust boundaries, which is, like, that’s where a protocol wants to be used in between rather than just being the same framework or whatever it might be. And so I do feel there will be some form of composability based on these trust boundaries, because you will probably eventually want to use whatever interface your bank gives you for agents and nothing else. And so there’s a boundary that this needs to interact with something else. And so these things will happen to some of the more part of the world that require a bit more trust.

I think, you know, beyond that, it’s a bit tricky to see how these are going to work out, and I can totally see a single agent or agent framework being quite powerful. But again, composability, the ability to switch things out for users that are not developers, I think, can be very useful in a way. And I think there’s also a question of, will there be two or three meta agents that drive other pieces that are MCP-shaped, or will everything be very specialized? And then, you know, you have developers build these different agents. It’s a bit of a complex question to pay back to experimentations. But for my use case, at the moment, a lot of these single-agent, few interactions, do all the things I needed to do, which is, I think, similar to what you’re saying, but then we’re also very early. Very, very early, right?

You know, really, on exploring with agents, and the models are at the spot where these things become very powerful. So, you know, we’ll see what this is going to look like in a year. And again, I think that trust boundary is an interesting bit that I look at. And then, you know, how does an agent act on behalf of another agent in these types of aspects? And I think there might be protocol there that might be needed.

Yoko: That’s awesome. Amazing. Well, last question, I guess, I just love MCP as an open protocol from day one. So, as a result, you know, you all amassed, like, a huge community kind of contributing, giving suggestions. When you think about where you need the help the most in the next phase of MCP development, can you talk more about where you think you will want more contributors in? How do people reach you? How do people collaborate on the spec or other things related to the spec?

David: Yeah. I think, for contributions, at the moment, we run this as a very traditional open-source project. So what we’re looking for is people maintaining, helping, writing issues, reviewing issues, reviewing PRs, writing PRs, building trust with us as the maintainers to hopefully help us longer. And so we’re looking for people who just want to be active in the community, be it driven by companies, be it driven by individuals, it really doesn’t matter to us. So that’s a big part of just going through the Python SDK issues, helping people there, going to re-implement some of the bugs there and see if they’re actually a problem and get more detailed information, reviewing PRs when necessary, or probably better writing PRs and fix bugs. I think those are great starting points.

When it comes to the specification itself, the lift is a bit higher, and the bar is a bit higher. So there, it’s probably good if you either address a very specific need or write a very detailed RFC for it that might sit there for a while. You might rally up if you like. A company might rally up some support for it and come to us together. I think that helps quite a bit. And so I think those are good starting points. It works very much like a traditional, again, project. We look into governance models that are a bit more sustainable in the long run, that are a bit more consensus-driven. And so we’re going to work towards that. But, yeah, besides that, just come help out on the code. The specification is a bit hard to work with. But other than that, if you feel strongly, just go for it as well. And, yeah, build trust.

We have a lot of people helping us. The Pydantic people, for example, do a great job with the Python SDK. The Microsoft people do a great job with the authorization specification. Same with the Okta people and the AWS people. So there’s a lot of things already happening. We have people helping us so much with the inspector. There’s some, you know, just community contributors that I really highly appreciate. So, yeah, just go and help and work with us. That’s really what we need at the moment, for the most part.

Yoko: This is awesome. I really enjoyed the conversation. This has been so fun chatting about anything, from Tamagotchi to cat monitoring apps to MCP protocol and the future of it. Thank you so much for making the time, David. And then till next time.

More About This Podcast

Artificial intelligence is changing everything from art to enterprise IT, and a16z is watching all of it with a close eye. This podcast features discussions with leading AI engineers, founders, and experts, as well as our general partners, about where the technology and industry are heading.

Learn More