General

Digital Biology

Daphne Koller and Vijay Pande Posted September 25, 2023

This conversation is part of our AI Revolution series, which features some of the most impactful builders in the field of AI discussing and debating where we are, where we’re going, and the big open questions in AI. Find more content from our AI Revolution series on www.a16z.com/AIRevolution.

Daphne Koller is the founder and CEO of insitro, a company using AI and machine learning to engineer drug discovery. In this conversation with a16z’ Vijay Pande, she discusses how an LLM for cells could revolutionize drug discovery and how to bridge the technological and cultural divide between building with atoms and building with bits.

  • [00:32] Why life sciences?
  • [03:23] AI in the life sciences
  • [07:02] LLM for cells
  • [11:37] Engineering disease and drug discovery
  • [13:33] Bits vs. atoms
  • [17:36] The opportunity ahead

Why life sciences?

Vijay: Daphne is the OG’s OG in AI. She was a pioneer at Stanford in different areas of AI, especially in PGMs. She left Stanford to cofound Coursera with Andrew Ng and is now the founder and CEO of Insitro, a tech bio company using AI to develop drugs in life sciences. Daphne, given all the things you could be doing, why life sciences?

Daphne: I think there are 3 parts to the question: why life sciences, why now, and why me? I’m going to answer all 3 parts. Life sciences is one of the really hard and really important problems, and there are very few things that are as challenging and exciting as intervening in human health in a safe and effective way. It’s just a thing that absolutely needs to be done if we are going to use AI for good, which is one of the things I really strive to do.

The second part of the question is why now? What brought me back to this field back in 2016, post-Coursera, was the realization that we can now, finally, for the first time, measure biology at scale, both at the cellular level—sometimes at subcellular level—and at the organism level via ways of quantitating human biology. For the very first time, that gives us the ability to deploy machine learning in ways where it is truly meaningful because the data sets are large enough for really interesting machine learning methods to be deployed.

I am a big believer in leverage, or places where you can have a disproportionately large impact. I spent a large part of my Stanford career working in these 2 spaces simultaneously—core machine learning and machine learning in service of biomedical data—I actually have the ability to bridge the chasm between these 2 very disparate disciplines.

When I was leaving Coursera in 2016 I saw even at that time, which is tiny compared to where we are today, that while machine learning was changing the world, it wasn’t having much of an impact in the life sciences. I believe one of the main reasons is because there are so very few people who actually have the language of both disciplines and are able to bring them together. I felt like I could have an impact in AI across many things, but here I could have disproportionate impact.

AI in the life sciences

Vijay: You spoke about the why now. What’s your take on AI for life sciences? What’s the “why now” there? What’s different now than even what we could do even just 5 years ago?

Daphne: I think it comes back to this ability to collect, but even more than collect, generate data at scale. One of the truly unique things that we have at Insitro is a data factory. We have put together the tools that have been developed by people who are taking pluripotent stem cells which are cells from you, or me, or anyone in this audience, and turning them into this pluripotent status—which can make a Daphne neuron in a dish or a Daphne hepatocyte. That condition is going to be different from the Vijay neuron and the Vijay hepatocyte because we have different genetics. That’s going to manifest in how these cells look and behave, and different measurements.

We can engineer those to introduce a disease-causing mutation and ask, “What does that disease-causing mutation do to a Daphne neuron versus what does it do to a Vijay neuron? What does this mutation do versus that mutation?” We’re able to do data generation on spec. That is a truly unique capability, which, frankly, is not that easy to do, even in other areas where AI is being deployed. You don’t get to make your own data in many cases, but here we do. That creates really important discovery opportunities for life sciences, but also really cool and interesting machine learning problems. You could start doing active learning or do experimental design, and it’s a really exciting technical discipline at this point.

Vijay: Could you dive a little deeper and give an example? Your paper on the POSH approach came out on Archive. Could you double-click on that? Tell people what you did there, especially why AI in life science is a big deal. What could you hope to get?

Daphne: First of all, let me tell you a little bit about that platform which is called POSH, or Pooled Optical Screening in Humans. You take a bunch of cells, they can be cancer cells or whatever, and you put them with a pool of CRISPR guides that edit them. Each cell gets a different guide. Now you have a bunch of cells, each with a genetically diverse mutation, that are all sitting there in a pool. You can measure them with a microscope as they move around and do their thing. You can basically fix them and sequence the barcode that came with a guide. So now you can say, “This cell that got this guide behaved this way and this other cell behaves that way.”

I can tell you that 1 of the really challenging things about cells is because they’re alive, if you put different cells in different wells, then they each have a slightly different environment and you get subtle differences and it’s really hard to reconcile. When they’re all in a pool, you eliminate all of those artifacts, and all of a sudden you have the ability to measure a genome-wide CRISPR screen. You have 20,000 genes in the genome all modifying the same cellular background in the same dish with a different genetic intervention—and you’re measuring that on a genome-wide scale in 10 or 12 plates in 2 weeks.

Imagine doing that, rinse, repeat, and doing genome-wide scale on this genetic background or in this cell type. You can really start to decipher the genotype-phenotype connection and the effect in which individual genetics makes a difference on cellular phenotypes, which we then translate to what we believe they will have in terms of clinical impact. That is the beginning of an understanding of what we want to modify to have meaningful therapeutic interventions. This is a truly engineered approach to discovery.

Developing an LLM for cells

Vijay: The biology part is really critical because now you get the data, and we all know how important that is. One of the things I found intriguing is the creation of a latent space for human biology, and especially being able to tell the difference between disease and non-disease, or even different disease phenotype. How does that come about and how is AI driving that?

Daphne: I’m going to go back a step because you said one of the things we need to do is get the data. I should have mentioned that it’s impossible to run this instrument without AI being built into it because you can’t even segment the cells; you can’t call the barcodes. All of it is an AI-enabled architecture.

Every part of our technology stack is intrinsically AI-enabled. To your point, Vijay, now you have a whole bunch of cellular images, what do you do with them? The first thing we do is we built this latent space. We built a language model for biology, but you’d have to explain this to people. No one knew what I was talking about. Now I’m just saying, “It’s just like GPT, but for cells.”

We have the language of cells and what cells look like, or the transcriptional, or gene expression profiles of cells. You measure hundreds of millions of cells in different states. With a much more limited amount of data—because we have this latent space—then just like the large language models for natural language with a small amount of data, you can start asking, “How does a disease-causing gene move you from one place to the other? How does a treatment move you hopefully back from the disease state back to the healthy state?” That’s super powerful. And it’s the gift that keeps on giving.

Like other language models, it keeps getting better the more data you feed it. Over time, you end up with a better and better competitive moat of how understanding the core foundations of biology help you better understand disease and health. This is not just for cellular data. The other source of data that we use is clinical data.

We do the same thing with histopathology. There’s so much more in histopathology than your pathologist typically looks at. In MRI data, your radiologist doesn’t see more than a small percentage of what’s there in your radiology images. It’s also not just imaging. There are also other modalities where there’s an equal amount of information left on the table. Over time, we’re learning the languages of different biological modalities and the ability to translate between them.

Engineering disease and drug discovery

Vijay: This concept of a foundation model for biology is particularly exciting because 10 years ago, you could have ML that was predictive, you just needed maybe 100 activities. The problem is if you have 100 examples of a drug that works, you don’t need to design a drug. These low-shot, 0-shot approaches that come from a foundation model are really night and day. How far does this go? The big problem in biology is that biology is hard.

Daphne: Biology is really hard. Sometimes I ask myself, “Why am I doing this?” I could go write an app for, like, a chat agent company.

Vijay: It would be a lot easier. So, why are you doing it? What is the big win? Where does this go by the end of the decade? What could you hope to do that we couldn’t do before?

Daphne: We want do it in a different way and come up with a very systematic recipe for you to go from a decision that I want to work on ALS or fatty liver disease, through a sequence of steps toward something that results in a meaningful intervention in the right patient population.

The hope is by the end of this decade, we will have built this process, run through it a number of times, and delivered some medicines to patients in our first tranche of indications. Then we will have learned enough from that so we can now say, “Here’s how we’re going to do it here, and here, and here.”

It’s not only machine learning that moves forward over time, it’s also the biological tools that we’re relying on. It used to be that there wasn’t any CRISPR. There was just siRNA. Actually, there wasn’t even that. Then there’s CRISPR base editing and now there’s CRISPR prime that replaces entire regions of the genome. The tools that we’re building on also get better and better over time, which unlocks more and more diseases that we could tackle in a meaningful way.

Vijay: Let’s step back for a second because it may not be clear for everyone why biology is so hard. One of the biggest reasons is that if we can do tons of experiments on mice, it’s a great time to be a rich mouse. You could be cured of any disease. All these diseases can be cured in mice, but it’s obviously unethical to experiment on people. That’s one of the big reasons why trials fail. When you go into a clinical trial, you spend all this money to get there. You’re spending hundreds of millions of dollars in the trial, and it turns out mice are different than people—and it fails. How can AI help that?

Daphne: First of all, this notion that we can cure lots of mice is something that really drove our discovery strategy at Insitro, which is that all of our work is done in human and human-derived systems. That incorporates at least some subset of human cells working together.

That’s one piece, and the nice thing about it is that it allows you to intervene in those systems and ask the “what if” questions. The counterfactuals like, “What if I had this person’s biology, but in a world where this gene was inactive versus active, or the other way around?” That’s great, but obviously you want to cure people, not cells or even organoids, so the other source of data we bring in is data from people, from clinical records.

Without machine learning, without AI, the space would be so complex and so high-dimensional that you couldn’t even make sense of it, far less a bridge between those 2 different worlds.

Bridging the divide between bits and atoms

Vijay: That makes sense. Let’s change gears a bit and talk a bit about company building. One of the interesting things that you’ve done is you’ve brought together people who are biology experts with people who are ML/AI experts. How do you build that culture? What does that look like, especially since they’re from fairly different parts of the universe?

Daphne: First of all, it may not have been obvious to everybody, but the company name, Insitro, is actually the blend of “in silico” and “in vitro”—”in silico” being in the computer and “in vitro” being in the lab. Those elements of bringing those 2 strands together are so deeply woven even into our logo.

How you build that is really hard. If you take your average machine learning scientist and your average life scientist, even if they’re very well-intentioned, and put them into the room together, they might as well be talking Thai and Swahili to each other. The languages are different, the ways in which they think are totally different. So how do you create a shared language, a shared vision?

There are a few tricks or approaches that we use. First of all, we hire some number of people—you can’t get enough of them, unfortunately—who are in the middle and can be translators for both sides and bring them together. The other really important part is that you create a culture and you hire very rigorously to that culture of people who are genuinely interested in engaging with the other side.

We have a list of company values. The final value, which is one that I hold particularly dear—it’s last, not because it’s least important, but because they’re ordered from what we do to how we do it—is that we engage with each other openly, constructively, and with respect. Each of those words matters. Engage means that we’re not siloed. All of our work is done in cross-functional project teams. “Openly” means an openness to asking really naive questions when you don’t understand and to accepting really naive suggestions from somebody else because sometimes, the best ideas come from an orthogonal mindset.

Vijay: Especially as AI gets into areas that are not just the world of bits, but in the world of atoms: any advice for how to bridge those gaps?

Daphne: Having an appreciation for the complexity of atoms, especially when your atoms are part of life’s systems, they behave in unexpected, unpredictable, idiosyncratic ways that sometimes cause a lot of pain. When you do biological experiments, 1 of the strongest signals when you apply machine learning is: who was the technician who actually did the experiment? They behave a little bit differently, they pipette a little bit differently, they treat the cells a little bit differently. It’s amazing how hard it is to clean that up, which is one of the reasons why we spend so much of our time building robots. They do the same thing over and over again.

I think having a lot of respect for atoms but also an appreciation for the fact that the next frontier of the impacts that AI can have is when AI starts to touch the physical world. We’ve all seen just how much harder that is. We’ve all seen how hard it is to build a self-driving car compared to building a chatbot. We’ve made so much progress on building chatbots and self-driving cars are still blocking fire trucks in San Francisco. Having an appreciation for that complexity, but also an appreciation for the magnitude of the impact, if you can actually nail it.

The opportunity ahead

Vijay: You’re talking about life sciences in terms of healthcare and drug design, but there’s a lot more to biology than just drugs. Where do you think this confluence between AI and life sciences goes from here?

Daphne: I actually think that there is this incredible opportunity at this intersection between the 2 fields. Think back on the history of science. At certain times in our history, there have been eras where a particular scientific discipline has made incredible amounts of progress in a relatively short amount of time because there was a click. We started to see the world in different ways or there was a tool that wasn’t available before.

If you think back to the late 1800s, that was chemistry where we suddenly realized we couldn’t really turn lead into gold. There was this thing called the periodic table and there were electrons. It really shifted chemistry. Then, in the early 1900s, obviously that discipline was physics. The connection between energy and matter and between space and time completely shifted our understanding of the universe.

In the 1950s, that discipline was computing. We got these machines that perform calculations that, up until that point, only a human was able to perform. Then in the 1990s, there was this interesting bifurcation. On the one side, there was data science that ultimately drew on computers, but also had elements of neuroscience and optimization, and statistics. That ultimately gave us modern-day machine learning and AI.

And then the other side was quantitative biology, which was the first time where we started to measure biology on a scale that was more than tracking 3 genes across an experiment that took 5 years. That was the first microarray data and the first human genome.

This time that we’re living in is when those last 2 disciplines are actually going to merge. They’re giving us an era of what I think of as digital biology, which is the ability to measure biology at unprecedented fidelity and scale, interpret the unbelievable masses of data, different biological scales, and different systems using the tools of machine learning and data science. Then bring that back to engineer biology using tools like CRISPR and genome editing, so we can make biology do things that it would otherwise not want to do.

Vijay: Like what?

Daphne: There’s applications in human health and agriculture. I don’t think we need to tell anybody anymore, although there’s still some people who might need to hear it, about the impact of global warming and climate change on our world, and the fact that we need to have crops that are much more resistant to drought and severe weather.

Vijay: And to feed 10B people.

Daphne: To feed 10B people. There are opportunities in the environment to maybe do better carbon sequestration using plants or algae. There’s biomaterials and so on. There are so many opportunities at this intersection that I would encourage any of you in this audience who are looking for something truly aspirational and exciting to do. This convergence is a moment in time for us to make a really big difference in the world that we live in using tools that exist today that did not exist even 5 years ago.

Vijay: I think that’s the opportunity at hand. We’ll wrap up there. Let’s thank Daphne one more time.

More from the AI Revolution Series
General

Where We Go From Here

Mira Murati and Martin Casado
General

Leveling Up

David Baszucki and Jonathan Lai
General

Democratizing Design

Dylan Field and David George
General

Improving AI

Dario Amodei and Anjney Midha
General

AI Copilots and the Future of Knowledge Work

Kevin Scott and Bob Swan
General

AI Food Fights in the Enterprise

Ali Ghodsi and Ben Horowitz
General

Universally Accessible Intelligence

Noam Shazeer and Sarah Wang

Expert News by a16z

We have built a network of experts who are deeply rooted in technology and how it’s shaping our future. Subscribe to our newsletters to receive their perspectives.

Views expressed in “posts” (including podcasts, videos, and social media) are those of the individual a16z personnel quoted therein and are not the views of a16z Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell — or a solicitation of an offer to buy — any securities, and may not be used or relied upon in evaluating the merits of any investment.

The contents in here — and available on any associated distribution platforms and any public a16z online social media accounts, platforms, and sites (collectively, “content distribution outlets”) — should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here or on a16z content distribution outlets are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, posts may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein. All content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website — or on associated content distribution outlets — be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles — which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters.

There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available here: https://a16z.com/investments/. Past results of a16z’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Excluded from this list are investments (and certain publicly traded cryptocurrencies/ digital assets) for which the issuer has not provided permission for a16z to disclose publicly. As for its investments in any cryptocurrency or token project, a16z is acting in its own financial interest, not necessarily in the interests of other token holders. a16z has no special role in any of these projects or power over their management. a16z does not undertake to continue to have any involvement in these projects other than as an investor and token holder, and other token holders should not expect that it will or rely on it to have any particular involvement.

With respect to funds managed by a16z that are registered in Japan, a16z will provide to any member of the Japanese public a copy of such documents as are required to be made publicly available pursuant to Article 63 of the Financial Instruments and Exchange Act of Japan. Please contact compliance@a16z.com to request such documents.

For other site terms of use, please go here. Additional important information about a16z, including our Form ADV Part 2A Brochure, is available at the SEC’s website: http://www.adviserinfo.sec.gov.