Posted February 11, 2021

“Premature optimization is the root of all evil.” 

– Donald Knuth, The Art of Computer Programming (1968)

Imagine a 3D surface with peaks and valleys of complex shapes. Then expand that surface into many more than three dimensions, to account for the dozens of required properties of a successful medicine — target affinity, biological function, safety, manufacturability, and more. This is what drug discovery is like. If each point on this vast surface is a drug, how do you find the best one? How do you avoid prematurely optimizing along just a single parameter, and missing the most useful candidates entirely?

Now bring this framework to biologic drugs—one of the fastest-growing and most impactful categories of new therapeutic products. They include widely used monoclonal antibodies, fusion proteins, growth factors, hormones, and more. Biologics are, in principle, among our most programmable medicines: unlike small molecule drugs, every protein therapeutic has an underlying genetic sequence. But biologic drugs also pose unique challenges (e.g., stability, immunogenicity), adding many more dimensions to manage. So which (of many) protein sequences will solve the therapeutic problem at hand? How do you optimize enough, but not too much, and not too early?

The traditional approach to solving this problem relies on experimental screens with large sequence libraries. These have been the workhorses of antibody discovery for a long time, with growing library sizes, and incrementally improving in-vitro display technologies. But even a very large library (e.g. 10^13 sequences) only captures the diversity of ~10 amino acids (20^10 sequences). The relevant search space for most protein therapeutics spans a much longer range (e.g., 50-250 amino acids), which would require library sizes that are many orders of magnitude larger. As a result, experimental screens represent only a tiny fraction of all possible sequences.

This is where machine learning can be so powerful. BigHat Biosciences is building a combined experimental and computational platform for the future of biologic drug design, moving from a screening-driven mindset towards iterative engineering. Machine learning models can integrate data from many experiments over time, and optimize over the multi-parameter search space of protein sequences far more completely than any individual screen ever can. The effects of infusing engineering into every step of the process are exciting. This means BigHat can design smarter, more informative experiments—telling us where we need to go, instead of asking us to explore the space blindly. They are also building an automated, high-throughput wet lab, leveraging tools from the world of synthetic biology. And everything in this lab, from common lab tasks such as cloning to quality control, is optimized with computational insights.

I’ve known Mark DePristo, co-founder and CEO of BigHat, for over 10 years; we first met at the Broad Institute, where Mark was leading one of the largest and most impactful professional bioinformatics software teams across all of academia; he also subsequently ran Google Brain’s Genomics team. BigHat’s co-founder Peyton Greenside is an equally prolific computational biologist, who combines an infectious intellectual curiosity with the pragmatism needed to grow a company.

The BigHat team represents the next wave of biotech entrepreneurs who are building at the intersection of engineering and biology, ushering in a new wave of productivity in biologic drug development, with tremendous potential for patient impact. We are honored to lead their Series A financing, and I’m thrilled to be joining the BigHat board of directors.