Research in artificial intelligence is increasing at an exponential rate. It’s difficult for AI experts to keep up with everything new being published, and even harder for beginners to know where to start.
So, in this post, we’re sharing a curated list of resources we’ve relied on to get smarter about modern AI. We call it the “AI Canon” because these papers, blog posts, courses, and guides have had an outsized impact on the field over the past several years.
We start with a gentle introduction to transformer and latent diffusion models, which are fueling the current AI wave. Next, we go deep on technical learning resources; practical guides to building with large language models (LLMs); and analysis of the AI market. Finally, we include a reference list of landmark research results, starting with “Attention is All You Need”—the 2017 paper by Google that introduced the world to transformer models and ushered in the age of generative AI.
A gentle introduction…
These articles require no specialized background and can help you get up to speed quickly on the most important parts of the modern AI wave.
- Software 2.0: Andrej Karpathy was one of the first to clearly explain (in 2017!) why the new AI wave really matters. His argument is that AI is a new and powerful way to program computers. As LLMs have improved rapidly, this thesis has proven prescient, and it gives a good mental model for how the AI market may progress.
- State of GPT: Also from Karpathy, this is a very approachable explanation of how ChatGPT / GPT models in general work, how to use them, and what directions R&D may take.
- What is ChatGPT doing … and why does it work?: Computer scientist and entrepreneur Stephen Wolfram gives a long but highly readable explanation, from first principles, of how modern AI models work. He follows the timeline from early neural nets to today’s LLMs and ChatGPT.
- Transformers, explained: This post by Dale Markowitz is a shorter, more direct answer to the question “what is an LLM, and how does it work?” This is a great way to ease into the topic and develop intuition for the technology. It was written about GPT-3 but still applies to newer models.
- How Stable Diffusion works: This is the computer vision analogue to the last post. Chris McCormick gives a layperson’s explanation of how Stable Diffusion works and develops intuition around text-to-image models generally. For an even gentler introduction, check out this comic from r/StableDiffusion.
Foundational learning: neural networks, backpropagation, and embeddings
These resources provide a base understanding of fundamental ideas in machine learning and AI, from the basics of deep learning to university-level courses from AI experts.
Explainers
Courses
- Stanford CS229: Introduction to Machine Learning with Andrew Ng, covering the fundamentals of machine learning.
- Stanford CS224N: NLP with Deep Learning with Chris Manning, covering NLP basics through the first generation of LLMs.
Tech deep dive: understanding transformers and large models
There are countless resources—some better than others—attempting to explain how LLMs work. Here are some of our favorites, targeting a wide range of readers/viewers.
Explainers
Courses
- Stanford CS25: Transformers United, an online seminar on Transformers.
- Stanford CS324: Large Language Models with Percy Liang, Tatsu Hashimoto, and Chris Re, covering a wide range of technical and non-technical aspects of LLMs.
Reference and commentary
- Predictive learning, NIPS 2016: In this early talk, Yann LeCun makes a strong case for unsupervised learning as a critical element of AI model architectures at scale. Skip to 19:20 for the famous cake analogy, which is still one of the best mental models for modern AI.
- AI for full-self driving at Tesla: Another classic Karpathy talk, this time covering the Tesla data collection engine. Starting at 8:35 is one of the great all-time AI rants, explaining why long-tailed problems (in this case stop sign detection) are so hard.
- The scaling hypothesis: One of the most surprising aspects of LLMs is that scaling—adding more data and compute—just keeps increasing accuracy. GPT-3 was the first model to demonstrate this clearly, and Gwern’s post does a great job explaining the intuition behind it.
- Chinchilla’s wild implications: Nominally an explainer of the important Chinchilla paper (see below), this post gets to the heart of the big question in LLM scaling: are we running out of data? This builds on the post above and gives a refreshed view on scaling laws.
- A survey of large language models: Comprehensive breakdown of current LLMs, including development timeline, size, training strategies, training data, hardware, and more.
- Sparks of artificial general intelligence: Early experiments with GPT-4: Early analysis from Microsoft Research on the capabilities of GPT-4, the current most advanced LLM, relative to human intelligence.
- The AI revolution: How Auto-GPT unleashes a new era of automation and creativity: An introduction to Auto-GPT and AI agents in general. This technology is very early but important to understand—it uses internet access and self-generated sub-tasks in order to solve specific, complex problems or goals.
- The Waluigi Effect: Nominally an explanation of the “Waluigi effect” (i.e., why “alter egos” emerge in LLM behavior), but interesting mostly for its deep dive on the theory of LLM prompting.
Practical guides to building with LLMs
A new application stack is emerging with LLMs at the core. While there isn’t a lot of formal education available on this topic yet, we pulled out some of the most useful resources we’ve found.
Reference
- Build a GitHub support bot with GPT3, LangChain, and Python: One of the earliest public explanations of the modern LLM app stack. Some of the advice in here is dated, but in many ways it kicked off widespread adoption and experimentation of new AI apps.
- Building LLM applications for production: Chip Huyen discusses many of the key challenges in building LLM apps, how to address them, and what types of use cases make the most sense.
- Prompt Engineering Guide: For anyone writing LLM prompts—including app devs—this is the most comprehensive guide, with specific examples for a handful of popular models. For a lighter, more conversational treatment, try Brex’s prompt engineering guide.
- Prompt injection: What’s the worst that can happen? Prompt injection is a potentially serious security vulnerability lurking for LLM apps, with no perfect solution yet. Simon Willison gives the definitive description of the problem in this post. Nearly everything Simon writes on AI is outstanding.
- OpenAI cookbook: For developers, this is the definitive collection of guides and code examples for working with the OpenAI API. It’s updated continually with new code examples.
- Pinecone learning center: Many LLM apps are based around a vector search paradigm. Pinecone’s learning center—despite being branded vendor content—offers some of the most useful instruction on how to build in this pattern.
- LangChain docs: As the default orchestration layer for LLM apps, LangChain connects to just about all other pieces of the stack. So their docs are a real reference for the full stack and how the pieces fit together.
Courses
- LLM Bootcamp: A practical course for building LLM-based applications with Charles Frye, Sergey Karayev, and Josh Tobin.
- Hugging Face Transformers: Guide to using open-source LLMs in the Hugging Face transformers library.
LLM benchmarks
- Chatbot Arena: An Elo-style ranking system of popular LLMs, led by a team at UC Berkeley. Users can also participate by comparing models head to head.
- Open LLM Leaderboard: A ranking by Hugging Face, comparing open source LLMs across a collection of standard benchmarks and tasks.
Market analysis
We’ve all marveled at what generative AI can produce, but there are still a lot of questions about what it all means. Which products and companies will survive and thrive? What happens to artists? How should companies use it? How will it affect literally jobs and society at large? Here are some attempts at answering these questions.
a16z thinking
- Who owns the generative AI platform?: Our flagship assessment of where value is accruing, and might accrue, at the infrastructure, model, and application layers of generative AI.
- Navigating the high cost of AI compute: A detailed breakdown of why generative AI models require so many computing resources, and how to think about acquiring those resources (i.e., the right GPUs in the right quantity, at the right cost) in a high-demand market.
- Art isn’t dead, it’s just machine-generated: A look at how AI models were able to reshape creative fields—often assumed to be the last holdout against automation—much faster than fields such as software development.
- The generative AI revolution in games: An in-depth analysis from our Games team at how the ability to easily create highly detailed graphics will change how game designers, studios, and the entire market function. This follow-up piece from our Games team looks specifically at the advent of AI-generated content vis à vis user-generated content.
- For B2B generative AI apps, is less more?: A prediction for how LLMs will evolve in the world of B2B enterprise applications, centered around the idea that summarizing information will ultimately be more valuable than producing text.
- Financial services will embrace generative AI faster than you think: An argument that the financial services industry is poised to use generative AI for personalized consumer experiences, cost-efficient operations, better compliance, improved risk management, and dynamic forecasting and reporting.
- Generative AI: The next consumer platform: A look at opportunities for generative AI to impact the consumer market across a range of sectors from therapy to ecommerce.
- To make a real difference in health care, AI will need to learn like we do: AI is poised to irrevocably change how we look to prevent and treat illness. However, to truly transform drug discovery to care delivery, we should invest in creating an ecosystem of “specialist” AIs—that learn like our best physicians and drug developers do today.
- The new industrial revolution: Bio x AI: The next industrial revolution in human history will be biology powered by artificial intelligence.
Other perspectives
- On the opportunities and risks of foundation models: Stanford overview paper on Foundation Models. Long and opinionated, but this shaped the term.
- State of AI Report: An annual roundup of everything going on in AI, including technology breakthroughs, industry development, politics/regulation, economic implications, safety, and predictions for the future.
- GPTs are GPTs: An early look at the labor market impact potential of large language models: This paper from researchers at OpenAI, OpenResearch, and the University of of Pennsylvania predicts that “around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted.”
- Deep medicine: How artificial intelligence can make healthcare human again: Dr. Eric Topol reveals how artificial intelligence has the potential to free physicians from the time-consuming tasks that interfere with human connection. The doctor-patient relationship is restored. (a16z podcast)
Landmark research results
Most of the amazing AI products we see today are the result of no-less-amazing research, carried out by experts inside large companies and leading universities. Lately, we’ve also seen impressive work from individuals and the open source community taking popular projects into new directions, for example by creating automated agents or porting models onto smaller hardware footprints.
Here’s a collection of many of these papers and projects, for folks who really want to dive deep into generative AI. (For research papers and projects, we’ve also included links to the accompanying blog posts or websites, where available, which tend to explain things at a higher level. And we’ve included original publication years so you can track foundational research over time.)
Large language models
New models
- Attention is all you need (2017): The original transformer work and research paper from Google Brain that started it all. (blog post)
- BERT: pre-training of deep bidirectional transformers for language understanding (2018): One of the first publicly available LLMs, with many variants still in use today. (blog post)
- Improving language understanding by generative pre-training (2018): The first paper from OpenAI covering the GPT architecture, which has become the dominant development path in LLMs. (blog post)
- Language models are few-shot learners (2020): The OpenAI paper that describes GPT-3 and the decoder-only architecture of modern LLMs.
- Training language models to follow instructions with human feedback (2022): OpenAI’s paper explaining InstructGPT, which utilizes humans in the loop to train models and, thus, better follow the instructions in prompts. This was one of the key unlocks that made LLMs accessible to consumers (e.g., via ChatGPT). (blog post)
- LaMDA: language models for dialog applications (2022): A model form Google specifically designed for free-flowing dialog between a human and chatbot across a wide variety of topics. (blog post)
- PaLM: Scaling language modeling with pathways (2022): PaLM, from Google, utilized a new system for training LLMs across thousands of chips and demonstrated larger-than-expected improvements for certain tasks as model size scaled up. (blog post). See also the PaLM-2 technical report.
- OPT: Open Pre-trained Transformer language models (2022): OPT is one of the top performing fully open source LLMs. The release for this 175-billion-parameter model comes with code and was trained on publicly available datasets. (blog post)
- Training compute-optimal large language models (2022): The Chinchilla paper. It makes the case that most models are data limited, not compute limited, and changed the consensus on LLM scaling. (blog post)
- GPT-4 technical report (2023): The latest and greatest paper from OpenAI, known mostly for how little it reveals! (blog post). The GPT-4 system card sheds some light on how OpenAI treats hallucinations, privacy, security, and other issues.
- LLaMA: Open and efficient foundation language models (2023): The model from Meta that (almost) started an open-source LLM revolution. Competitive with many of the best closed-source models but only opened up to researchers on a restricted license. (blog post)
- Alpaca: A strong, replicable instruction-following model (2023): Out of Stanford, this model demonstrates the power of instruction tuning, especially in smaller open-source models, compared to pure scale.
Model improvements (e.g. fine-tuning, retrieval, attention)
Image generation models
Agents
Other data modalities
Code generation
Video generation
Human biology and medical data
Audio generation
Multi-dimensional image generation
Special thanks to Jack Soslow, Jay Rughani, Marco Mascorro, Martin Casado, Rajko Radovanovic, and Vijay Pande for their contributions to this piece, and to the entire a16z team for an always informative discussion about the latest in AI. And thanks to Sonal Chokshi and the crypto team for building a long series of canons at the firm.