a16z Podcast

The Present Future of Audio: Talk, Music, Video, Interactivity

Connie Chan, Gustav Söderström, and Sonal Chokshi

Posted October 14, 2020

We’ve already talked a lot about podcasting, both evolution of the industry as well as the form, but where are we going with the future of audio, more broadly? Can we borrow from the present and future of video (e.g., TikTok) to see what’s next in audio (more layers, more interactivity)? Can we borrow from the past of audio (i.e., radio) to see what’s next for audio experiences (more blending of music, talk, podcasting)? Where do all these mediums converge and where do they diverge — when it comes to user experience, product design, recommendations, discovery?

Gustav Söderström, chief R&D officer (who oversees the product, design, data, and engineering teams) at Spotify — the world’s most popular audio streaming subscription service — joins this episode of the a16z Podcast for a deep dive on all things audio with a16z general partner Connie Chan and editor in chief Sonal Chokshi. They cover the past, present, and future of audio — going high level into the big trends and also dipping down into the trenches — especially given the increased blending of talk/ podcasting, music, more. What are the challenges to designing for different mediums, on both front end and back end (including machine learning and different graphs), when listeners want everything in one place when and where they want it… yet their contexts shift?

But the conversation more broadly is really more about what happens when we give creators (of all kinds!) tools — not just for expression but for fan engagement and monetization too. We also discuss the themes of super apps and full-stack approaches when it comes to innovating on top of a protocol, as well as how innovation happens in practice: How do mediums — and organizations — evolve, prioritize, “disrupt themselves”? All this and more in this episode.

Show Notes

How audio and podcasting have changed [2:30] and the merging of music and video through TikTok [5:01]

The importance of mobile devices [10:21]

Augmented audio and interacting with creators [13:04]

How Spotify designs products [16:19], and building “super-apps” [17:58]

Technical challenges to integrating media types [24:07], how audio is defined as it merges into new forms [26:54], and licensing issues [32:26]

Recommendations and discovery algorithms [36:54], challenges platforms face [43:11], and the importance of subscriptions [47:13]

Thoughts about the future of audio [53:12]

Transcript

Sonal: Hi, everyone. Welcome to the “a16z Podcast.” I’m Sonal. And today we are talking about one of my many, but actually probably [my] most favorite topics: the future of audio. Our special guest is Gustav Söderström, the chief R&D officer of Spotify, which is the world’s most popular audio streaming subscription service. As a reminder, none of the following should be taken as investment advice. Please see a16z.com/disclosures for more information.

Also joining this episode is a16z general partner Connie Chan, who covers consumer, writes a lot about tech trends and product in China and beyond, alternative monetization models, and more. And she and I have actually done a couple of podcasts on podcasting. One, a podcast about podcasting with Nick Quah. And the other, on how we, at a16z, podcast. You can find both of those episodes as well as other resources on the topic at a16z.com/podcasting.

Note, also, that Spotify actually got into podcasting in 2015. We were actually included as one of their launch partners for that, among select others, and say we’re huge fans of the pod.

Gustav: We still are, so it’s still true.

Sonal: Thank you. Anyway, in this episode, we actually go beyond podcasting to talk about the broader category of audio — past, present, and future. So we chat about the parallels and differences in audio and video including referencing an episode I recently did with Eugene Wei on TikTok, which you can also catch in this feed. We discuss the trend of interactivity as well as augmented audio, and where we are right now. What’s possible, what are the challenges? We talk about where podcasting and music converge and diverge, both on user experience and design, as well as technically in machine learning. And, finally, we go deep on recommender systems. The idea of “hearing” like an algorithm and where subscription models come into machine learning.

But we also talk throughout this episode about the trade-offs of full-stack approaches, regardless of what kind of company you are, and the topic of super apps as well. And we’re also really talking about how innovation happens in practice. Whether it’s having an opinionated point of view about the future, or listening to users, disrupting oneself — and how to change an organization, and much more.

But we begin, however, with a super quick debate on how much things have or haven’t changed in the podcasting world. At least, since we did our last podcasting episode over a year and a half ago.

The current state of audio

Connie: I actually personally think that audio hasn’t changed that much yet. A lot of things are still — I don’t know if broken is the right word. But just — problems that are not solved yet. Discovery is still difficult, search is still difficult. It’s really like a one-way listening experience. You aren’t interacting with other listeners, you aren’t interacting with the creators. Creators still have to rely on very old business models for monetization that ultimately don’t work for a lot of long-tail creators. A lot of those big problems still exist. But I do have this optimistic feel that we’re on the cusp of change that’s going to come to the broader audio market.

Sonal: You’re right, those things actually haven’t changed very much. I was thinking of the fact that the content landscape in podcasting has super exploded. In the last year, two years alone, Spotify itself has led a number of content acquisitions, which is such an interesting evolution.

Gustav: Yes. It’s both very much the same, but very much more of the same, right? So, like, the forklifting of your time into your AirPods, that just keeps increasing.

Sonal: Right.

Gustav: There’ve certainly been shifts in listening behavior due to COVID. A lot of listening was in the car, that shifted to speakers in the home — so, overall, there’s much more listening. And to your point, certainly, we’ve invested aggressively in content and exclusives. The creator side of this landscape has changed in a direction that we wanted to change.

But I would also agree that we’re on the cusp on the consumer experience. What’s so interesting about audio is, it feels like you have this cheat sheet, which is what happened in video. We just haven’t done monetization in a 21st century way yet. We have no interactivity. You can really just look at the other media industries and see what’s missing, in a sense.

Sonal: So, Edison Research, which publishes a lot of the leading work and studying podcasting behavior — they argued a few things last year. That one of the major inflection points in podcasting, interestingly, came through Spotify because of the streaming. And that brought in, kind of, a new generation of users. Two, the other argument they made. And this is, of course, pre a lot of the content acquisitions — is that for a new generation, the medium of audio is really not that different than video. That, in fact, for a lot of people, their default podcast player is often a video app, or just turning off the visuals and listening. And so, I’m curious, for your guys’ thoughts on where audio and video — which is another big trend — do and don’t intersect? Both from a trend perspective, and a product development perspective, and then we can dig in deeper on other aspects.

Audio vs. visual

Connie: I mean, video is really just the combination of using your ears and your eyes. It’s the audio plus the visual. Which means the stakes are actually higher for audio, because I can’t have, like, a 20-second gap of silence in a podcast and expect you to be okay with it. But in a video, you can go quiet and there might just be some visual distraction, and you don’t have to be “on” as much every second. And so it’s still a different medium. But I do think that the stakes in audio are higher.

Gustav: So I think that when you talk about audio, it’s different things, depending on the type of audio, actually. So you have, kind of, foreground audio, which is more similar to video. It is the main activity you’re doing. You’re really concentrating. It requires most of your attention. Then you have background audio. Like, you’re listening to music, and you’re actually paying attention to something completely different. You’re working out, or you’re studying or something, right? So there are these different modes of audio that don’t really exist in video. Video is mostly all your attention, or you’re doing something else, right?

This is also the benefit of audio. That’s why it’s so much engagement, because you have both foreground moments and background moments. But even in the foreground moments, when you’re paying full attention, you can still do other things. You can drive, you can do dishes, you can walk around the house, right? So, it is this other mode that video doesn’t cover. That’s why we think it is almost as much engagement as foreground video, but it’s not nearly valued the same yet. And that’s not because it’s less valuable — we think that’s because it’s undervalued.

And you can think about it the other way as well. You have some video that actually works quite well as audio, that you can background, that you watch every now and then. Joe Rogan, for example — it certainly has video, right? And that actually does help the user experience. But it is what we call backgroundable video or foregroundable audio, if you want to call it that.

Sonal: I just wanted to comment, Gustav, on your point about the modes. That’s a phrase that I use when I think about describing people’s behaviors. And I actually describe it less as foreground and background, and more as passive versus active mode. And so, I really believe strongly that audio has different modes. Sometimes you’re just in “hanging out in chill” mode, sometimes I’m in passive mode, which means I just want to listen to other people. Other times I’m in active mode, which means I want to talk, or super active mode which means I want to lead a discussion. So I just think it’s really interesting to think in terms of modes.

I’d love to hear your initial thoughts on just the mediums differences between audio and video. What do you make of the differences and similarities between TikTok, and what we can and can’t learn from TikTok when it comes to product in audio? Do you guys have any thoughts on that? I mean, Connie, you’ve written so many posts about TikTok since very early on.

Connie: Yeah. Like, TikTok is an extreme example. If you don’t look at the screen and you just listen, none of the videos make sense. You’ll miss the punchline, like, the whole video.

Sonal: Yeah.

Gustav: Exactly.

Connie: Value prop is also within the visual for TikTok.

Gustav: So, I think there are at least two similarities. What they do really well is — they take, to Connie’s point, commodity music — that if you just listen to it in the background, you miss the whole point. But then they let their users uniquify that commodity music, right, by adding uniqueness to it with their video.

Sonal: I think you just made up a word, by the way, uniquify.

Gustav: Yeah.

Sonal: Keep going.

Gustav: And I think that’s a great pattern, right? You have something that is commodity. You can use your user base to turn that into something that is non-commodity. It’s this engine that takes these clips and creates unique content around it. So I think that’s a really interesting pattern that you could probably copy to other businesses that has commodity content. Let your audience do something with it to make it unique.

The other analogy that I see to audio is specifically music. If you think about Eugene Wei’s post on seeing like an algorithm. What he said was that the medium itself is built to be understood by an algorithm. You’re presented with one item at a time, you either consume, or you swipe. So it’s built for the algorithm to understand what you’re paying attention to versus, for example, a scrolling feed, where the algorithm has no idea which item your eyes are actually looking at.

Sonal: Right, isolating the specific variables so that the product developer knows what is working or not working, essentially, for the user.

Gustav: Exactly. And if you think about music, actually, it’s the exact same thing. You present one audio track at a time. You either listen to it or you skip. So, in that sense, you can say it’s a similar sort of UI, but in audio.

Connie: The tricky part is actually just the length of the song versus the length of the TikTok video. Because you get to a very quick decision if you like that TikTok video or not — literally within, like, two, three seconds. For a song, as many of you know. Like, the first couple of seconds of a song doesn’t sound anything like the chorus or the ending, so you just have to go further into the song before you really gauge if someone truly likes it or not. But to me, that’s the only difference.

Gustav: Yeah. And TikTok, you have more evaluations per minute because they’re shorter clips. But it’s also more direct. But it is interesting that you mentioned this, because this is what is happening in the label industry. It is super clear that intro matters more and more, so you do have the TikTok effect in music. You know, songs used to start slow, they don’t anymore because people skip within the first 10 seconds.

Sonal: Oh, that’s so fascinating. So the TikTok effect — where people are now creating different kind of music.

Gustav: I would say one more thing on TikTok. So, while there are some similarities between evaluating audio one track at a time, and evaluating video one track at a time, there’s a big difference which is — TikTok has your full attention. If you’re at full screen and you’re paying full attention, then it’s a pretty good signal. But if you’re washing dishes and listening on a speaker, you get very poor signal. So it depends on the context and you have to take that into account when you look at the signal.

Sonal: I’d love to probe briefly on this part. Which is, you both have talked a lot. Connie, you, in particular, have written so much about how mobile is literally the thing that made a lot of China’s apps work the way they do, because everything was mobile first. And we talked about mobile leapfrogging in our posts from what now, five years ago?

Connie: Right, right.

Sonal: Wow, that’s been a long time. So, where does that come in when you think about innovation in audio? And then, Gustav, I’d love your thoughts on this as well. Because when you said that in the pandemic, a lot of the listening behavior has shifted to home speakers, I’m curious how that changes your views, given [the] initially mobile default interface?

Connie: So, if I just break down what a phone is and the different components of it. Like, you have the touch screen, which means whatever you’re doing on the phone, you can have more interactivity, ideally. But you also have camera and GPS. And, you know, the camera is the unlock for TikTok, and the microphone could be the unlock for a bunch of audio platforms. Because, now it means that I don’t just have to be listening. I’m not just leveraging the speaker on the phone, but I’m leveraging the microphone and I’m giving back. The microphone, in particular, for audio and video, I think is dramatic.

Gustav: Yeah. That is one of the sensors that is super interesting and under-leveraged for audio, I would say. So, one of the benefits of being a streaming service is that we understand the consumption situation. We understand if you’re listening on a speaker but putting on an Apple Watch or a phone — we understand if you’re in your car, for example, because the phone is connected and so forth. So we actually think that’s a very important signal, and we try to think of them as, kind of, different jobs to be done. And what we want to try to understand is the situation that you’re in. And it’s obviously a combination of your play history, your time, and your taste. But a device is actually a really good signal.

So there are two levels. One is the UI and the hardware that you can leverage. And that changes when you go from a phone to a connected speaker, for example. You have much less control. You actually still do have a feedback channel, in terms of a microphone, as Connie mentioned. But you have less UI, right? So we’re thinking about multimodal consumption quite a lot, where you have some devices that are really good for input on your body, but they’re not that good for output — you actually want the sound in your speakers. That’s why we built this remote-control protocol so that you don’t have to interact in the same place that you’re listening — you can interact on one device and so forth.

The other way to think about it is on the content level. So one of the things that happened during COVID, when a lot of consumption shifted from the car to the home, was that we have this very successful playlist called The Daily Drive, where we mix music and talk — and create, literally, your daily drive. Now people stopped driving, right? So then we tried to pivot and we create [the] same job to be done, but not while driving — it’s different. So these are the two levels — kind of, the content level and the pure UX interactivity level.

Augmented audio possibilities

Sonal: Okay. So we can shift into discovery and recommendations in a bit. But before we close this thread, what do you guys think of this trend and phrase — augmented audio? Which means different things to different people. But the idea that you can actually, to your point, Connie — much like video has many layers, you can actually bring more and more layers into audio as well. Do you guys have any quick thoughts on that?

Connie: Oh, so many. But that really just leads me to the belief that audio today is still this more “sit-back” experience. It’s very much like a one-way consumption experience, the same way that we consume television, or the same way that we consume movies. And, kind of, like — more YouTube, live streaming, that kind of format hasn’t really arrived in mainstream and audio yet. And so even just capturing the comments — the feedback to podcasts — like, that kind of content is not well harnessed today. So there’s so many more layers around the listener feedback, or interacting with other listeners, or interacting with the creator. A lot of fun should be added on and layered on into audio that, right now, at least, doesn’t exist.

Sonal: It doesn’t have to even necessarily be fun. I mean, as a creator, I found the news — when you guys rolled out your polls feature — to be quite interesting. Because we just had the debates here in the United States, and I literally was like, “I wonder if a lot of the political news shows should do, like, their own polling as part of their audio experience?”

Connie: I mean, it’s not just fun, it’s instant feedback.

Gustav: Yeah. I agree. We started with PULSE which is both a safe and constructive way to bring feedback. You mentioned the consumers or the listeners talking to each other. You mentioned the creator talking to the listener. We try to focus on the creator, and what tools does the creator want? And, actually, not just for having fun — but to your point, Sonal, to be a better creator. What information do you want from your fans, and what would make it easier for a creator to produce another episode, for example? And so we started with PULSE, which is one way to get clear answers on questions you have. And we want to continue in this way — focusing, not really on listening to listen to conversations. I mean, you have Instagram, Facebook, Twitter — there’s lots of places to go and talk to other users, but there aren’t a lot of places to have good conversations with the creators.

Connie: And I think if you focus on creators, there’s also a huge opportunity to expand the funnel of creators. If you look at trends in video, lots of the top trending YouTube videos are actually reaction videos, where people are watching a video and showcasing a reaction. And TikTok is all about remixing. There’s a lot of great audio content out there today, that if you talk about augmented audio — you could take a podcast and then have another person share their thoughts directly, just like a sports broadcaster, even — commenting directly on what’s happening in the audio, whether it’s music or even another podcast.

Gustav: Yeah. You have these two extremes like the old-world broadcast, one-way media. And then on the other extreme, I would put gaming, where the interactivity is the experience. You’re not being broadcasted anything, you’re actually creating it. And then you have this thing in between. And I think audio needs to move towards interactivity. And like I said, there is basically a cheat sheet where you can look at other types of media. And as soon as you add a feedback loop, the creator gets a chance to improve. So I think that’s vital.

Sonal: Tell me more about some of your thinking behind polls. When you guys design a product, do you actually have an opinionated philosophy that, “This is how we think people are going to use it?” Or are you just giving them the bare minimum and then unlocking your community to, kind of, let loose? A simplified way of asking that is also, is it a Steve Jobs point of view, or a Bezos point of view?

Gustav: That’s a great question and a great way to put it. And it’s a tough question to answer. It’s definitely not a Steve Jobs point of view, in the sense that we know how people are going to use it. But we try to be slightly more opinionated. We don’t have the complete bottoms-up, or throw stuff at the wall. I think it’s due to our history. So, when we’ve developed products in music, it usually involved — once you came up with the idea, you had a three-year roadmap to go and license that idea from four majors. And if you licensed the wrong thing, you lost four years. So you needed to be right, and you needed to be more sure, because the cost of being wrong used to be so high for us. And I don’t know if it’s good or bad. I think if we had grown up in a world where the cost of being wrong was just the engineering time put into it or something, and you can just pull it back, maybe we would be different. But we have a pretty specific culture where we actually do plan quite a lot more. I wouldn’t say Steve Jobs, for sure. And Daniel himself actually talks all the time about distributing decisions, but it is more opinionated.

And then for PULSE, we’re lucky enough to have Gimlet and all these studios in-house, with lots of fantastic creators. So we get to test this internally, and we use them as an internal inspiration. And sometimes they are the product owners, because they represent the user needs.

Sonal: That’s fantastic. Connie, more thoughts on interactivity? I feel like you live in this world, and you talk so much about China apps and what’s possible when it comes to interactive audio.

Connie: So another interesting thing about creators that comes from looking at what’s working in China, is not just giving them feedback on what the audience wants to hear next, or what the audience is thinking. But also separating your average listener from your super listener — the person who really wants to, even pay you directly for your work. And helping you identify who your real true fans are, right? If you think about the creator economy — very clear trend that’s already been in Asia for a while now.

So, something like the QQMusic, which is the main music app that people are using in China. If you have someone who is hosting a radio show or, kind of, a listen-together type of group chat, there’s the option to, basically, be part of their paid fan club. And then if you’re a part of their paid fan club, you get a different badge on your own profile, you get access to exclusive virtual gifts that you can send that host — so everyone knows that you’re a part of that paid fan club. You can get a different announcement when you enter the room, different kinds of bonus check-in tasks. There’s a bunch of new features that get unlocked if you’re a part of this creators’ fan club. And, ultimately, what that allows the creator to do is monetize better than just a traditional advertising route. Because in addition to receiving normal virtual gifts from their listeners, from anyone who drops in and participates, you also are cultivating your small following of super fans who really, really love you.

Sonal: I love that you’re pointing that out because it’s basically making this link, that these tools and features are not just about getting more information or data— but, actually, they’re paths to monetization as well, which is super interesting.

Connie: Well, it helps you create your own empire in a different way. Like one feature I love is this battle feature, where you can almost battle another radio station at the same time, and almost compare how many gifts each of you are able to aggregate in a certain period of time.

Sonal: It’s like duets with an audio challenge.

Connie: It’s really focused on how to help creators motivate their community and build that core fan base.

Gustav: So, one of the things that I think is really interesting with these things that you mentioned — they’re dependent on actually having a logged-in service, so that the creator can understand their audience. That wasn’t really possible over the previous protocols. You got download numbers, <Yeah.> but you couldn’t really understand your audience and who was your super fan. You know, what they look like, and who they are, and where they live, and so forth. Whereas, that protocol doesn’t actually support feedback to the creator — it’s a one-way broadcast protocol.

But because we’re now, sort of, full stack, we can start doing these things that have happened in other industries. And the thing that happened in video, and in many of these other things — like, you take text messaging, for example. It used to be [that] standardized and innovating on that text messaging protocol needed a ton of carriers to sit in different forums and agree, right? So the benefit was ubiquity and reach, but innovation was really slow. And then at some point, something like Snapchat happened, that verticalized the whole thing — and, you know, WhatsApp and so forth, and innovation just ran away. One day, you had disappearing messages, the next day you had stories, the third day you had lenses — because it didn’t really have to wait. And so, I’m really excited about that happening to audio.

Connie: Yeah. This is what we mean when we say, like, very early innings of audio.

Gustav: Exactly. But there was, like, a technical foundation that needed to exist. That does exist in China, to your point. They’re all vertical.

Sonal: Yeah. I’ve been very obsessed with — and the student of — the history of innovation. And to me, this is the classic arc from when you go from a utility layer to, like, a value-add layer. And, of course, there’s a lot of debates around what platforms should and shouldn’t have control over. And that’s something that’s playing out a lot with crypto, and a lot of other discussions. That said, I think the point you’re making, Gustav, which makes it less academic and more interesting to users is — it is really — comes down to — you are giving me something I can’t get right now.

Connie: Yeah. If you have one app that can give you a vertical solution — basically, give you everything you want — that app’s true understanding of you is very strong, and its ability to personalize things towards you is higher. Your ability to create a profile, that you then are proud to share with other people, or that you want to build upon — whether it’s earning different levels or different points, that also increases.

I mean, I love what Gustav is saying about how things are more vertical. There’s a lot of benefits when you take, kind of, the super-app mentality. And a super app is basically a product or a platform that focuses on all the different needs a particular customer wants, versus giving a single-feature solution. Recognizing that, “Oh, this person loves listening to these kinds of music, but this person also probably loves listening to all these other things. So why not let’s offer this all-in-one package? We now better understand that listener, and we can solve more of their problems.”

Gustav: So, we were actually quite inspired by the super apps of China when we thought about podcasting. The obvious solution, if you’re going to build a podcasting app — if you come from a pure design angle — is to build a standalone app. But the trade-off, then, is distribution. And so, we looked at it more from a super-app point of view. And we realized that what users actually wanted was all of their audio — you know, which they used to have on radio, music, and talk, and so forth mixed. And we had a zero-user base in podcasting, so we’d be starting from scratch. We had hundreds and millions of music users, and that’s an advantage in itself. But more importantly, we understood these users. They were logged in, and so we could just augment their moments. And one of the interesting things we found was that it turns out that your music listening is actually very predictive of your podcast listening.

Connie: You can probably guess a person’s age range from their music listening alone, right?

Gustav: Yes, you can. For sure.

Sonal: So, you’re saying people’s music listening predicted their podcasts taste?

Gustav: Yeah. When you want to cold start a podcast listener, it turns out that your music listening is actually a really good signal for that — for which podcast you recommend.

Sonal: That is incredible to me. I just think people’s music listening is so much more visceral and less intellectual — that I’m just so shocked by that fact.

Gustav: I would not say it was obvious to me either, but it’s, like, a very clear result. It also supports the idea of the audience — that you should think of them as one person, right? And try to serve them in the different needs they have.

Connie: Yes, think of the customer as one person.

Challenges of integrating media

Sonal: Right. What you’re basically both really saying is — when you think of the super-app mindset, it’s a cohesive identity of a user’s needs. And, in fact, if I were to visualize it, I think of that classic Da Vinci Renaissance man [Vitruvian Man], where you have like this person at the center, and then you have multiple spokes of interests — kind of, radiating around them. And then you think of each of these moments in their day. It could be time, it could be interests, it could be need. It could be whatever job to be done, to use a Clayton Christensen framework — and that you’ve referenced a few times, Gustav. But what you’re both also essentially saying, is that a super-app — once you have one — is built in distribution. And so you’d be silly not to use that base and do the cold start.

Gustav: Yeah. It’s much easier to say, “Let’s put a competing team over there and let evolution take care of [it].” They build their own app and they compete. But it’s at the cost of the user to do it that way. And so the first thing we did was, we figured out that instead of having the apps be as different as possible, you actually wanted to have them be the same thing. And you can say that radio has always done this. People have been mixing these mediums, so it didn’t seem that far fetched. But it wasn’t clear. And if you optimize for ease of implementation, you have small things such as — just the fact that the UI has to change from skipping a whole song, when you’re listening to music — to, all of a sudden, skipping 15 seconds back and forth, and scrubbing within a podcast. That’s a big challenge to solve dynamically in the same UI. It would have been much easier to just maximize the two different hypotheses.

Sonal: Yeah. So, basically, what I’m hearing is, even something as seemingly mundane to the user as the ability to scrub forward 15, 10 seconds — which I do all the time in my podcasts. If you’re in music, you can just skip an entire song forward. And even that kind of trade-off is, like, actually really complex when you’re doing it in the same UI. That’s super fascinating.

Gustav: Exactly. So the UI has to be much more dynamic.

Connie: I mean, even how you show a track versus an album cover, right? Or a podcast episode versus the podcast cover — like, it’s a very different thing. It’s not easy to pull off. And it gets harder and harder the bigger the company is, because it requires real changes that are top-down, that have to come from leadership. It’s a change in your org structure, it’s a change in your release cycle. It’s a massive change, and it’s very hard to pull off.

Gustav: It was painful. We needed to “force.” It’s not like people didn’t want to do it, but you needed to get people to work with each other instead of putting [it on] a different team. And it certainly needed global prioritization, from Daniel down. And we have this system to prioritize things globally, called <inaudible> in Spotify, which was very helpful to get these things through the company. And I don’t think if we’d had that global prioritization tool, we could get this through the company. It’s very hard to do. But this is the benefit of software, right? And this is one of the benefits of being full stack. We can actually try to solve these problems, and actually improve the consumer experience.

Redefining audio as media types merge

Sonal: So, let me ask you guys a quick question — especially you, given Spotify worked within the existing UI to blend from music to podcasting. Where do you stand on the definition of podcast, music, audio? I always talk about how audio is a huge category. Like, I honestly think trying to homogenize audio is like trying to homogenize text. It’s like — a word is the same thing as a book, is the same thing as an article, as a blog post, as a tweet. That’s ridiculous. However, Connie, you made the argument in our podcast about podcasting, with Nick Quah — how podcasting and music— and I agreed with you, as well, then — that there’s a big difference between the spoken word and the sung word. And so I’d love to hear your guys’ thoughts on, where are we today?

Connie: Radio is the integration of both talk and music. They live very symbiotically together. And if you look at most podcasts, they have a music introduction already. There are sound effects in a bunch of them too. So this combination, or this belief that normal talking can be improved with music, or music can be improved with talking breaks, has been here forever.

Sonal: But even then, where does, and doesn’t the blending of music and podcasting actually work, and where does it fall apart?

Gustav: Right. So we had this intuition that people wanted their music and their podcasts in the same app. And that certainly turned out to work. But there was a category where they’re actually related. It is the same session, right? So this is the thing that we just released. So now we are going to let creators do this new type of session, where they can mix talk with licensed music in a seamless session.

So, you see these two user needs. If you take the Clayton Christensen approach, you see podcasters really wanting to use and talk about music, but they can’t — because the creators do not get paid for some burnt-in song in a podcast. And then you see the music creators that would like to talk about the music. So you have both of these sides at the same time. And it’s been really hard to solve it, especially if they were two different apps. But now it feels very natural that you should be able to have this new type of show.

So you’ve seen us play around with things like Daily Driver, for example, for a long time, where we mix talk and music. And we’ve seen a lot of success. People love hearing their news, and then their new music in the same session. Especially when they’re driving — trying to switch to the music session and hear the new releases as well. But so what we were thinking now is, we want to enable anyone to do that.

And on the consumer side, it is neither a podcast nor a playlist. It’s just, <Yep.> the best of podcast and the best of playlisting. But it is neither, because podcasting has the problem that you actually aren’t allowed to feature music in it — and playlisting has the problem that you actually can’t comment between the tracks. So we created this new format where you can do some talk, then you can add a Spotify track in there — then you can do some more talking. And so the user can then listen to the talk part as if it was a podcast. They can listen to the track, they can skip the track — but they can also save the track if they like it. One of the things that radio has missed. So it’s a new format. But, hopefully, it’s not new in the bad sense, that you have to learn anything new — it should be just like listening. Because, then it works the way you, kind of, always wanted it to work.

Sonal: What would you call this new format? I think very broadly of, again — I mentioned how audio is as heterogeneous as text, so it’s ridiculous to use one word for everything. But it is a new kind of audio experience. It’s not a podcast, it’s not music, or a song.

Connie: I think of this as going back to radio. For me, this is the new radio station.

Sonal: Yeah.

Connie: This is the new way you can listen together.

Gustav: In a sense, a very obvious innovation — but also an innovation that requires tons and tons of licensing work over many years, and a big investment in podcasting and creator tools and so forth.

Connie: I’m smiling because it’s going to open the door for a whole batch of brand-new creators. People who don’t want to host a podcast and talk the whole way through, but now can use music as their passion — as their content — as the thing they’re, kind of, anchoring their talk around. And then this also brings about curation, social discovery. I mean, I can even think of several a16z colleagues, myself, that I think would be really good creators on this new platform.

Gustav: That’s what I’m hoping for. I’m hoping for you, Connie.

Sonal: I think she means Anish, because Anish is a side deejay.

Connie: No, my stuff will all be probably Chinese music.

Gustav: We want that too.

Connie: Yeah. But the point is, it really opens the door to new batches of creators. And it brings in social discovery, and it brings in the idea of curation. It’s back to, kind of, the Spotify playlist, but with more color, right? And with more storytelling.

Sonal: Augmenting, I might even argue.

Connie: And the interaction that you can have with the listener, right? In Asia, you can have people order different songs and pay to try and see what’s already on the playlist, and change that playlist — even in real time. So the kind of interaction you can build on top of this is also exciting.

Gustav: And you spoke about augmenting there, and I think that’s a great point. So we spoke about TikTok, and I mentioned this pattern of taking a, sort of, commodity licensed music and letting your users make it unique. So one way to think about this is, it’s a similar pattern. We’ve had tremendous success by letting our users work with the music catalog and playlist it. You know, they create billions and billions of playlists that have helped them, and has helped other users. But it has also helped all our algorithms to learn, right? So you can think of this as a similar pattern, where you take the commodity catalog, but you let any creator, through Anchor, work with it and make it more unique and uniquify it, right?

Sonal: I love it, uniquify again. Well, the other interesting point is when Eugene and I talked about TikTok on this podcast, he did bring up that one of the big unlocks, as minor as it might seem for the remix culture as well — was the ability to quickly license, combined with the creator tools, combined with the distribution — so that you do, then, get this “creativity network effects” flywheel. Which, sort of, then reinforces.

Connie: Yeah. It’s a big way that people are interacting with music on the QQMusic app. When you tap into radio stations or listen together, you see all these different hosts, and you can listen to them live. When you’re listening together with other people, you can choose different topics or categories — like friendship, music, emotions, talk shows. And the interactions that you already see happening on these radio stations are “listen together” — there’s a chat that’s usually going on while people are listening to music. There are different leaderboards for these different creators. You can have different tasks that the creator asks you to do. You can order songs, you can see what’s next on the playlist. You can gift the creator, and thank them for curating this kind of music. And you can even subscribe to their fan club, right? Like, if they always have great music choices, you can make sure that you’re always able to know when they release something new, or when they go on. So it does unlock a brand new batch of creators that today don’t live on YouTube. Today they’re not podcasters. But they have a lot of things to say, and they love music. So a lot more people will be able to participate — be creators themselves, build a following, and eventually monetize.

Gustav: I agree. The increased participation of new types of creators is really interesting, because there are all of these creators who clearly want to talk about music, and there are all of these artists who, you know — they’ve always wanted to be on radio. <Mmhmm.> Like, they want to be featured by someone, but business models [are] often a problem. No one has been able to solve that, [so] both parties actually get paid for that. We solved what I think is a harder part, actually — of licensing all the music in the world and paying royalties to all organizations. We’ve already solved that, so it feels like a very natural product for us to play with.

Connie: Yeah. When I was growing up, I used to listen to radio shows. You know, I used to listen to Delilah, and she would have stories in between. And then she would have audience people call in. And then she’d have nice, soft music to go with that story.

Gustav: Exactly.

Connie: And it was fantastic.

Gustav: And then you probably recorded the tracks, right? Because you really wanted the music?

Connie: And that’s how I discovered music too, right? And that’s how she could also resurface music from the past, rather than having us listen to only stuff that was released in the last 18 months. Let’s resurface some of these oldies, and this is potentially a great way to do that.

Sonal: What’s really fascinating to me about this is, it’s almost like a vector to social. Because there’s nothing more inherently social than music listening, and music sharing. As you’re noting of playlists, music curating — and to your earlier points about it — unlocking creators. One of my favorite podcasts, actually, is “Song Exploder” by Hrishikesh Hirway. And I actually think I heard about this podcast from Eugene, actually, like a year ago. And it’s now they’re going to be a Netflix show. And, you know, he really deconstructs these songs on air. But imagine all the people — like all the kids, all the adults, who just lie around listening to music, talking music with their friends, bonding over music. So, to me, what’s really fascinating here is — there is a social vector, both socially and para-socially with acquaintances and strangers, when you think about them connecting with fellow fans of those playlists and other people. So I think there’s actually a really interesting vector to all that too.

Gustav: Yeah.

Sonal: Because TikTok is not a social network, but this theoretically could be.

Gustav: So this is an interesting point. We think about Spotify more like YouTube and TikTok, than Facebook and Twitter. It’s actually not about following your friends, but I think you’re right. I think there are so many creators out there who would love to tell a story about a specific piece of music, right? Their own story, some story, or something. And we’ll see how it gets used. I’m hoping, obviously, that many artists would like to tell their story of their own album that they released, for example.

Sonal: Yeah.

Connie: Yeah. Amazing.

Gustav: There are many different things that could happen.

Connie: Even in that great example where the artist is telling the story, that artist doesn’t have to sign up and say, “Okay, I’m going to start a brand new podcast.” That is such a big responsibility and commitment to take on.

Gustav: Exactly.

Connie: And now you, kind of, have these, kind of…

Sonal: A Trojan horse is starting a podcast <inaudible> basically.

Connie: This really lowers the bar of commitment for creating a show. And you can try it with no real consequence, and get that distribution, too.

Recommendation algorithms

Sonal: Okay. So, now let’s, then, talk about — how do you solve — this is, like, the big elephant in the room — and, potentially, the big exciting thing in the room — recommendation and discovery. How do you, then, think about that side of this? Both in the context of Spotify shows, and also beyond. We opened this conversation about what has and hasn’t changed. This has been a broken problem “in podcasting.” It might not be as broken in music. We’ve talked about TikTok, we’ve talked about the parallels and differences between video. Let’s bring it all back together around this theme and topic of recommendation and discovery.

Connie: For music, there is a commitment of more than two or three seconds to figure out if you like a song, right? So, the bar for who you trust as your source for who is giving you that recommendation is higher. And so you either have to have a system that builds trust, showing that their algorithm has given you enough hits. Like, TikTok can’t be wrong five times in a row. Stakes are really high. So you either have an algorithm that is so good that it knows enough about you already — that the majority of the time, when they give you something, you like it. Or you have a creator that also has that same kind of hit rate. That you realize, “Hey, most of the stuff that that person likes, I also like.” And that is also a great way to, kind of, get that discovery element. It’s all about giving the user this end trust — that they’re willing to test your recommendation because, say, 80%, 90% of the time, you’re going to be right.

Gustav: So I think you’re completely right. That was a success with user playlists. There are literally many billions of different curations of the Spotify catalog, so you literally have something for everyone. And either they find that playlist, or you can use machine learning to learn from that to be able to serve users. Then you have the UI elements themselves. And I think that’s different between music and podcasts. Music is easier, in a sense, because it is three-minute items and you can skip through. And what we see in music is that it’s, like — the investment of how much time do you spend, versus finding one jam. So it is actually okay if even most of the songs, theoretically, are not that good. If they’re easy to skip through, and, like, the seventh song is, like, your dream song. Because that can make your entire week, or maybe month, right?

So I try to think about it — I think Chris Dixon said this, “a fault tolerant UI.” If your machine learning is perfect, you only need to unshow one item. If your machine learning is 1 out of 10, you probably need to show 10 items, because then there’s always one jam on the screen. You have to adapt your user interface to your, kind, of level of recommendation. And so, these playlist formats — we try to think of it as, kind of, a GTD — get things done. Can you quickly go through and like, “Yeah, that was perfect, save that to my library.” It’s like a productivity flow in the discovery moment, which is very different from the consumption moment, when you may be on a speaker. And then it’s not okay that you have three bad songs in a row, but it’s okay if the fourth one is good. Does that make sense?

Sonal: That goes back to modes, actually. Thinking about the mode the user is in.

Connie: Yeah. I also think, if there are good mechanisms in there for the creators to have potential financial payoff from participating, the creators are actually going to be incented to have discovery. That incentive is actually built in. Because you cannot have thousands of concurrent Spotify shows all showcasing the same music. No one is going to want to listen to that. And so, all these creators are naturally going to be incented to showcase you something brand new, because what they’re really being valued for is their ability to curate, and then match that with the storytelling.

Let me give you a concrete example. When I go to the gym and someone is trying to do a workout, and they’re talking through, and they have music sliced in between. Or just think about a yoga class — they want that variety of music. They don’t want you to be listening to the same thing, time and time again. And now even that gym workout, that yoga class — could exist as a Spotify show, where they’re making you do pushups and counting down, and then there’s music right there in the background. You have to really think [about] what this can unlock.

Gustav: I’m definitely hoping for that yoga and pushup workout to happen. You have to make it happen. <laughter>

Sonal: Okay, Connie. So either you make a yoga show, or you do, like, a Chinese song playlist.

Connie: No. But the point is like, there’s so much context that can now be wrapped around recommendations. Like, even the time of day — what are the right kinds of shows that work for the morning, what are the right kinds of shows that you want to wind down to. Those creators will have the incentive to naturally pick what they think makes sense for you.

Gustav: Exactly. So I think there are two things that are really interesting here. So one is, when we think about machine learning overall, and recommendations from a product point of view — and this is completely borrowed from Andrew Ang, by the way, so it’s nothing that we came up with. That we try to use is — if you think about what algorithms do really well, they tend to scale really well. They tend to be able to personalize, at an okay level, to hundreds of millions of people. Humans don’t do that really well. Humans are incredibly smart and creative, though, but they don’t scale so well. So one way to think about this, that I think Andrew Ang coined — was to let the editor, for example, or the creator, if we’re talking a Spotify show, but an editorial playlist — this algotorial principle that we use.

Sonal: Algorithm plus editorial.

Gustav: Exactly, algorithm plus editorial, that we call algotorial. You literally think of the editor as the product owner. This is the product person that has the idea and the hypothesis. And they come up with what the job to be done is, or what the hypothesis is, or what the use case is. So, for example, you take something like songs to sing in the car. No machine came up with that idea. It was a human who sat and said, like, “I think there’s a user need here. People want to scream their lungs out when they’re driving to work.” So how do you teach a machine this? The algorithm doesn’t understand what “songs to sing in the car” means. Is that, like, a bit of ’80s music, is it a bit of movie music? But for a human, it’s super clear — like, this is a song to sing in the car, this is not. So what the editor does is, they literally create, like, a playlist of a few thousand tracks, and then the algorithm can understand it. And they can personalize it to 300 million people and scale it, right? So the job of a product owner is to create this data example, this data wireframe — I think, is very useful. That loop has been very useful for us.

Sonal: So, basically, bundling the best of human creativity with the best of algorithmic scaling, in order to deliver on the personalization and recommendations to a mass of users.

Gustav: Exactly. Humans have to come up with the ideas. They have to show the ML system what that idea actually looks like for the ML system to understand it. Because the ML systems are great at scaling, but not great at coming up with new ideas.

Sonal: Can you give me a little bit more color on some of the challenges here? I’d love to hear about how you have to think about solving them — what’s hard about algotorial. But then more specifically, about how you had to negotiate that, when you transitioned from music to podcasting — and then now in blending the two. I want to hear a little bit more color about it, basically.

Gustav: So, in music, we have, really, two sources — traditionally — of recommendation information. One big source is the playlists, the other is editors. But then we have the third way, obviously, which is the engagement from the users — listens, and skips, and so forth. Those are the signals in music. But music is different because the items are three minutes long — like we spoke about, it’s more like TikTok. Then you go to podcasts, and it’s like maybe one and a half hours, and then you get one skip. <laughter> It doesn’t fit at all with, like, “Let’s just, you know, feed the machine,” right? It’s very low signal. So we had to think about it completely differently. But, not only is it much further between the skips — we don’t have anything equivalent to a billion playlists. So we had to go back and start working with “old tech,” like knowledge graphs.

You have other advantages in podcasts which is — there’s actually information in the audio. You have other signals. You have show notes, and you have the transcripts on the shows. So we started working with those technologies instead to get some understanding. So, actually, these two stacks are quite different. We certainly could leverage a lot of learnings, but they’re not the same thing because there’s such different objects.

Connie: Especially because podcasts are usually multiple people on a podcast. There’s oftentimes a host and a guest.

Sonal: You actually don’t know who people are following sometimes, who they care for.

Connie: You don’t know. If there’s, like, a Joe Rogan talking to Elon Musk, you don’t know if it’s because I like Elon Musk or if I like Joe Rogan. That’s quite different than music, where there’s a bunch of artists — any song they put out, I’m going to like, I’ll take a listen to.

Sonal: It’s like a cult of personality show, because you’re following the host, in that case. In this case, you’re following the artist. But one thing that I think is really interesting when talking about the knowledge graph is the mood graph. I always talk about — coined the phrase when I assigned an op-ed on it a number of years ago at WIRED. Because I actually think we’re missing a huge opportunity in optimizing things. Frankly, my playlists are all organized by mood and emotion, they’re not organized by any other criteria.

Gustav: That’s a great point. And in music, that is one of our biggest vectors. Like, one of the biggest sections of editorial playlists are the mood playlists. You’re completely right.

Sonal: Ooh, that’s great. It’s interesting you bring up a knowledge graph, Gustav. Because it’s tough to know — is it a book author? They’re just listening to every single podcast they’re on? Is it a content thing? It’s so complex and multi-dimensional.

Gustav: Exactly. And the answer, as far as we can see is — it’s all of the above. There’s “personality cult,” there is, you following a certain guest around all the podcasts that they visit. There’s interest. It’s just going to computing — I don’t care who is talking, right? So you really need this knowledge graph with all of those dimensions, and then you need to be able to let the user, kind of, traverse along these different dimensions. And then you can lead them to some discovery. You remember this debate around music — everyone had a music friend that influenced them. And for a while, early Spotify, we invested heavily in social to try to replicate that. But it turned out that most of your friends on Facebook — they don’t inspire you so much musically. If you average them, it’s just the U.S. billboard. So we take the same approach in podcasting.

I mean, we have a core belief that if Spotify can make you discover something that you wouldn’t otherwise have discovered, it will be more important in your life. So we really try to make sure that we measure and understand how many discoveries we generate for you.

Sonal: It’s almost like a new metric of return on discovery. Instead of return on investment or return on energy, if I think about every app, what is my return on discovery — or ROD — on that particular platform?

Gustav: I’ll borrow that from you. But another difference from these things is that we are, revenue-wise, mostly a subscription service. So in machine learning, in the practical world, there’s been a lot of deep learning and so forth. But in the academic world, for a long time, there’s been a lot of focus and discovery and exciting results around reinforcement learning. But, you know, AlphaGo and all these things.

Sonal: Yeah. We’ve actually talked about it on this podcast quite a bit, too.

Gustav: And not to go through it. But the main idea is just — you look for some long-term reward and you backpropagate it through time — instead of looking at, what is the most likely next click? And so, I think, if you have a service that is free only, and, you know, you have an average engagement — same every day — it’s going to be really hard to, like, backpropagate signal. It’s going to be noisy. But if you have an event four months down the line — that is, you know, I went from just consuming ads to paying $120 per year, you have this massive amount of, sort of, gradient you can backpropagate through time.

Sonal: Oh, I love this.

Gustav: And the thing that is different between, for example, YouTube, or TikTok is — every month, all the paying users, hundreds of millions of them, they go and they evaluate. It’s like, “Should I still pay?” And they vote with their wallet regardless of how much they actually consumed. So we have a different signal that is not just engagement, and consumption, and attention. We can see — do you keep paying? And, obviously, as you know, it’s not really possible to do the real reinforcement learning. You basically need a perfect simulator of the world. But you can approximate it quite well. And so that’s something that is happening in the rest of the industry as well, slowly. You need enough signal for that to really be valuable. So that’s something I’m excited about in the recommendation space.

Sonal: What you’re basically saying — I talk about this quite often on the podcast, about how subscription models change so much. But what you’re saying, which is so fascinating to me, is that it’s also a way to get much better signal into your system.

Gustav: Right.

Sonal: You’re also basically saying — you’re essentially weighting higher people with more skin in the game, which is exactly how you want to design something.

Gustav: Exactly. Everyone has saves and likes. But you can think of, like, paying $10 as a super big like every month.

Sonal: Yes, exactly. You’re weighting it higher. And you have that data because people are logged in and they’re streaming. One of my favorite books is James Carse’s “Finite and Infinite Games.” And he just died, actually.

Gustav: Yes.

Sonal: Rest in peace, James Carse. But the idea — what you’re saying is, you’re playing a repeated game with your users. Which then gives them an even better game board to play on, versus a transactional game only.

Gustav: That’s exactly it. Which is a big problem that is important to solve, I think. You can try to understand what the user actually values long-term versus just in the moment.

Connie: Yeah, subscription fees is a fantastic business model. But also, I can see how that would allow new revenue streams for these creators. And I’m not just talking about the people who create the music, but I’m talking also about the people who are going to create and deliver a brand new experience that lives on top of the music. If those people can find some kind of financial payoff in participating, that’s a brand new revenue stream. And then think about the possibilities — the kind of interaction you have with that listener at that moment — is another area you can charge for.

Sonal: I also love that. While we’ve talked so much about putting the power back for creators, it really does actually most empower the listener. Just one quick question, Gustav. How do you think about the tension between data, and all the data you’re getting, and all the signals, and where it goes too far? Like, is there a risk that sometimes, listening to your users, you’re missing out on what they don’t tell you? And how do you think about that as a head of R&D at a company where you’re not just abstract R&D — you’re actually building product?

Gustav: Yeah. I think that’s a fantastic question and really hard to answer. It is an age-old problem. I think one way to think about it is to simplify it a little bit. Algorithms — they, kind of, look in the rear-view mirror and draw a straight line into the future. And so, that’s great for a while. But product development — usually, good product development is based on some sort of ideally contrarian hypothesis. And your machine learning is not going to come up with a contrarian hypothesis, right? So you need some mechanism for that to happen. And so, we try to think of this in different ways. I mentioned algotorial, where the editor actually has the ability to say, like, “No, I believe in something different.” So we try to build in this mechanism where humans can go in and, you know — they have the steering wheel, they can take a left turn or something, and then the algorithms follow.

And, you know, there are incentives to not do it. It is always going to be safer to keep going straight for a while more. Why take risks? All of these things, right? But back to playing infinite games. If you play the game, you know, many times — think about it as game theory — now you have to end up in a place where the optimal thing is to try new things every now and then, to try to cover as much space as possible. And as I said, we have a culture of being quite specific in the hypothesis we have. And we try to think about it, as do many companies, sort of, a portfolio. I want to have some things that are quite contrarian, and [have] a pretty high chance of failing. Whereas, I want a bunch of things that are obvious. But that balance — I mean, no one has the perfect solution, but everyone at some scale has to start thinking about it.

And so we found a few mechanisms that were useful for product development. One was to take the concept of simple prioritization, and the Kanban board, all the way to the C-suite. You know, everyone thinks they’re good at prioritizing, but they’re not. And I bet that in most companies, the C-suite is the worst at prioritizing. They actually want to do everything. And so, we have something like five to seven things that the company needs to do. And Daniel owns that. But the one rule is — two things cannot have the same priority.

Sonal: It reminds me of the Steve Jobs bio anecdote, where at one of their off-sites, they put a whole list of things, and he literally crossed everything off the list and they only did the first four. What you’re describing, though, is not just siphoning off what to do versus not to do, but what to order the priority, from the top, so that the managers don’t have this friction and they don’t waste in terms of building things.

Gustav: And that’s the trick.

Sonal: Yes, I agree. And the other thing that I think is fascinating about that, is that when you say that Daniel, kind of, owns that too — when you are disrupting yourself, so to speak — like, when you went from music to podcasting. Putting that higher up on the bets board in his office is, like, “Hey, no complaints, guys. This is it.”

Gustav: So that’s exactly what happened. Podcasts was the number one company bet for two years, and everyone in the company knew it. And so, what happens if you don’t have that? You push that decision to managers and you create conflict in your world. The truth is, Daniel can’t have any idea, in a company of thousands of people, what is going to clash with what resources.

Sonal: Of course.

Gustav: The only thing he can do is, like, when you clash, this is the priority.

Sonal: I love that as a management thing.

Gustav: It’s so simple. Everyone thinks it’s so complicated. It’s actually very simple. It’s — the discussion is hard. Actually, prioritizing is very hard.

The future of audio

Sonal: Okay. So we started with talking about where podcasting has been. We’ve gone through what’s shifted — the parallels and differences between video and music. We’ve talked about the trend of interactivity, and augmenting audio in different ways. We’ve talked about recommendations and hearing like an algorithm, even, and an editor. What do you guys think is, sort of, the future of a lot of these? Like, where do you think the future is, kind of, going?

Gustav: My guess is that if we use the cheat sheet of other media, I think audio is going to increase on the creator side just like the other mediums. I think it’s going to increase in numbers of creators.

Sonal: The market for audio is bigger than I think people realize. Or, as Connie said earlier too, we’re still in the very early innings. So my obsession is this two-word phrase that I use all the time — of world-building. And to me, one of the missed opportunities in audio for a long time — and, you know, Gustav, you painted this range from gaming models, all the way to music models, to different things. I actually think we’re starting to increasingly see more game-like behavior in audio. And I’m so excited for that kind of world building.

But it’s a very different kind of world building, because audio has an immersiveness that’s very different than the visual-based world-building of other worlds. And so I’m super excited for what we can do. I mean, I already think about our expanding podcast network as a form of world-building. And when you’ve mentioned Spotify shows, that, to me, is another form of world-building, because you’re essentially bridging different worlds and creating new experiences. And so, to me, that’s actually the thing that I’m most excited about.

Gustav: So I think that’s a great way to think about it. And you think of the music world, the podcast world — and now you can think of this new world where you can mix them, and then you can have other worlds. The thing that I think is going to happen is, you look at something like audio — and it’s so easy to create, it’s even easier to create than video. So, as we both make it even easier and lower the friction for everyone, we let creators make more money and we add these new formats. What I’m hoping is that that market is going to grow as well, just like we’ve seen the market for creators growing in other media.

Connie: I think audio will be further optimized in the sense that you can almost peel apart the different nuggets of a podcast, right? You can take certain segments now. You can take a commentary around it now. And you’re going to be able to do new things when you break apart a song, when you break apart a podcast, and you can see what that will unlock. TikTok is breaking apart a song — kind of, getting to a specific 5, 10 seconds slice of it, right? A snippet. And then, this idea of now taking something that used to be, you know, one piece of content and chunking it down to different things — now [that] gives you new building blocks to build new kinds of shows, new kinds of interactions. Which means things will get much more participatory. More people can become creators. More people can probably become listeners. More listeners will find each other, listeners will become stronger fans of their creators. So I think there’s a very hopeful, very optimistic future, where now technology actually can help everyone win.

Sonal: That’s fantastic. I love that. Gustav, Connie — thank you so much, you guys. Thank you for joining the “a16z Podcast.”

Gustav: This was super fun.

Sonal: Super fun.

Connie: Thank you.

Sonal: I wish we could all talk for hours. Take care, everyone. Bye.

Connie: I should put in a plug for my Spotify show. <laughter>

Sonal: The China Song Show, Connie?

Gustav: It’s going to be huge.

Sonal: Bye, guys. Have a really good day or evening for you. Take care, everybody.

Gustav: Bye. You, too.

Sonal: Thank you.

Contributors

Connie Chan is a General Partner at Andreessen Horowitz where she focuses on investing in consumer technology.
- Follow
- X
- Linkedin
Gustav Söderström
Sonal Chokshi is Editor in Chief of Crypto at Andreessen Horowitz.
- Follow
- X
- Linkedin

More From These Contributors

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

More About This Podcast

The a16z Podcast discusses the most important ideas within technology with the people building it. Each episode aims to put listeners ahead of the curve, covering topics like AI, energy, genomics, space, and more.

Learn More

RECOMMENDED FOR YOU

How to Build with the Department of Defense Alex Miller, Justin Fanelli, and Leila Hay Read More
The Top 100 GenAI Products, Ranked and Explained Anish Acharya, Olivia Moore, and Steph Smith Read More
Jensen Huang, Anjney Midha and Arthur Mensch on Winning the Global AI Race Jensen Huang, Arthur Mensch, and Anjney Midha Read More
Why AI Voice Feels More Human Than Ever Anish Acharya, Olivia Moore, and Steph Smith Read More
From Thesis to Meme to Fund: Building American Dynamism David Ulevitch and Katherine Boyle Read More

go to top

The Present Future of Audio: Talk, Music, Video, Interactivity

Show Notes

Transcript

The current state of audio

Audio vs. visual

Augmented audio possibilities

Challenges of integrating media

Redefining audio as media types merge

Recommendation algorithms

The future of audio

Want more a16z?

Thanks for signing up for the a16z newsletter.

More About This Podcast