In this episode of the
AI + a16z podcast, a16z General Partner
Martin Casado sits down with Sujay Jayakar, cofounder and Chief Scientist at
Convex, to talk about his team’s latest work benchmarking AI agents on full-stack coding tasks. From designing
Fullstack-Bench to the quirks of agent behavior, the two dig in to what’s actually hard about autonomous software development, and why robust evals — and guardrails like type safety — matter more than ever. They also get tactical: which models perform best for real-world app building? How should developers think about trajectory management and variance across runs? And what changes when you treat your toolchain like part of the prompt? Whether you’re a hobbyist developer or building the next generation of AI-powered devtools, Sujay’s systems-level insights are not to be missed.