a16z Podcast

Text to Video: The Next Leap in AI Generation

Anjney Midha, Andreas Blattman, and Robin Rombach

Posted December 20, 2023

General Partner Anjney Midha explores the cutting-edge world of text-to-video AI with AI researchers Andreas Blattman and Robin Rombach.

Released in November, Stable Video Diffusion is their latest open-source generative video model, overcoming challenges in size and dynamic representation.

In this episode, Robin and Andreas share why translating text to video is complex, the key role of datasets, current applications, and the future of video editing.

Show Notes

00:00 – Text to Video: The Next Leap in AI Generation

02:41 – The Stable Diffusion backstory

04:25 – Diffusion vs autoregressive models

06:09 – The benefits of single step sampling

09:15 – Why generative video?

11:19 – Understanding physics through AI video

12:20 – The challenge of creating generative video

15:36 – Data set selection and training

17:50 – Structural consistency and 3D objects

19:50 – Incorporating LoRAs

21:24 – How should creators think about these tools?

23:46 – Open challenges in video generation

25:42 – Infrastructure challenges and future research


Find Robin on Twitter: https://twitter.com/robrombach

Find Andreas on Twitter: https://twitter.com/andi_blatt

Find Anjney on Twitter: https://twitter.com/anjneymidha