Peter Donnelly is a professor of statistical science at the University of Oxford and the CEO of Genomics PLC, which uses large-scale human genetic data to inform healthcare and drug discovery. In this interview he, Lauren Richardson, and a16z general partner Vineeta Agarwala discuss the concept of polygenic risk scores with and how this lets us quantify how much your personal genetic configuration, as a whole, affects your risk for disease.
They explore how our deepening understanding of the genetic components of common diseases, such as diabetes, heart disease, and cancer, is helping empower a shift from ‘sick care’ to risk management and preventative medicine. But first, the conversation begins with an explanation of how geneticists’ thinking about the basis of disease has been shifted by technological advances that make genome sequencing cheaper and faster.
Note: This interview was originally published as an episode of Bio Eats World. The transcript has been lightly edited for clarity. You can listen to the full episode here.
PETER DONNELLY: So, we’ve known for a long time that genetics is involved in susceptibility to human diseases. And there’s a spectrum of diseases as far as that’s concerned. At one end of the spectrum, there are some diseases where genetics is all the story. If you inherit in some cases one or two mutated or incorrect versions of a gene, you definitely get sick. Cystic fibrosis is an example, Huntington’s disease is an example. They’re usually serious diseases, but individually they’re very rare.
Then at the other end of the spectrum are all of the common chronic diseases, all of the common cancers where there are other risk factors often to do with our lifestyles or our environment. For those diseases, over about the last 20 years, there’s been an explosion in our knowledge and our understanding of how genetics plays into that. We’ve learned that many, many parts of our genome play into the risk for those diseases but in small ways. It isn’t that there’s a single gene for heart disease or a single gene for diabetes. Instead thousands, or tens of thousands, or maybe even hundreds of thousands of positions in our genome affects risk for a particular disease.
VINEETA AGARWALA: So, one visual that I like using when explaining this concept is a video game character marching through the genome, and he has a pillowcase with him. And every time he hits a variant site where you could have, let’s say an A or a T, sometimes the A will be a risk factor. And so he picks up a stone and puts it in his pillowcase and then keeps marching along the whole genome. Every time you encounter a risk site, you pick up one [stone] and you march through the whole genome this way.
At the end, each patient’s pillowcase, if you will, is full of a different burden of genetic risk stones. But if you look across the population, we all share many of the same stones in our bag of risk. And so we have a lot of shared overlap in our risk factors and pathways by which we get to a certain disease outcome like diabetes, but we all have different amounts and different mixes and blends of that risk profile.
On hindsight, it’s really hard to explain such a common disease with a very large collection of rare mutations.
PETER: It’s a really helpful visual, I think, Vineeta. The two things I’d add to it are that we’re probably thinking of grains of sand or pebbles because there’re so many of them.
VINEETA: Yes. Because there’s millions.
PETER: And the other thing to say is that sometimes the little video game character would get to a variant that actually reduced risk, so they’d be taking out a pebble. So, some of us will have more burden for that specific disease because we’ve got rather more of these variants that have a tiny increase in effect than the ones that decrease, and some of us will have a low burden for that disease because we’ve got rather more of the ones that decrease risk, so [we get to] take the pebbles out of the sack.
VINEETA: But essentially for those of us who do end up with diabetes, actually the majority of our risk factors are likely to be found in other people’s sacks. And that’s why if Peter and I both have diabetes, we’re still actually likely to respond to the same drugs, which from a clinical perspective is great.
LAUREN RICHARDSON: I see. So, it’s the commonality of these variants and the shared way in which we accumulate them that leads to being able to treat a disease through a particular route and have that be efficacious.
One looks back somewhat wistfully at those days because there are now almost 100,000 variants in different places in our genome which individually are associated with risk.
PETER: It’s a really good point. All of the variants that contribute to risks, they’re all individually common. You know, we talk about an A or a T at some position in the genome, and maybe 30% [of people] have the A and 70% have the T. So, they’re both widespread in the population. It’s not like the other end of the spectrum where there are very rare variants that have a huge impact on an individual. These are common variants which have a tiny impact.
VINEETA: I think it is worth noting that this whole hypothesis that it is actually common variants explaining the bulk of common disease sounds obvious now but was fairly controversial even a few decades ago. People were convinced that it was only rare disease mutations that could produce big enough effects to cause disease. The thought was that mutations that cause disease should be selected against evolutionarily and they should therefore be rare in the population. So, when we look at a disease like diabetes or heart disease, you know, gosh, we better be able to find a large collection of rare mutations that individually produce disease in different people.
On hindsight, it’s really hard to explain such a common disease with a very large collection of rare mutations. Instead, we’ve now learned through a number of large statistical studies that so many sites of the genome are involved, but it was a really meaningful step forward that was based on data.
PETER: Yeah. People have been trying to identify the genetic variants which are associated with risk for common diseases for 15 or 20 years. And [we haven’t found many], apart from certain genes involved in immune function or maybe a handful of examples of individual genetic variants that affected risk to common diseases.
Then we were able to do studies of a different type and at a different scale, so-called genome-wide association studies. [These are where] you look at a large number of people who have the disease and a large number of people who don’t, and you measure them at, say, half a million positions in their genome. You just look for places amongst those half-million which differ in frequency between the sick people and the healthy people because if one of those variants is more likely to make someone sick–to develop, say, heart disease in our example–then it’ll be more common amongst people with heart disease.
We’ve known for a long time that genetics is part of the risk of it. We’ve now got a way of quantifying it, and we can measure the impact.
So we were able to do those at scale for the first time about 15 years ago. It was an extraordinary time in the field because, having tried for many years and gotten almost nowhere, suddenly we found about 15 or 20 variants for 7 common diseases. One looks back somewhat wistfully at those days because there are now almost 100,000 variants in different places in our genome which individually are associated with risk. But in the early days, having gotten nowhere, suddenly we were able to find these, and as Vineeta was saying, they are common variants.
VINEETA: Now we know from those studies the size of the pebble at each position because those studies tell us what the effect of having a particular mutation is on risk for a certain disease. So, now, we’ve annotated these hundreds of thousands of sites by exactly the weight of the stone that you would pick up if you were doing the genome march that we talked about.
LAUREN: Yeah. It’s just a different way of thinking about variants as opposed to being like, “This is the one variant that’s causing disease.” It’s adding to our cumulative knowledge of how, you know, a large number of variants all contribute to our disease risk.
LAUREN: Now that we’ve talked about how complicated and difficult to unravel human genetics is, let’s talk about what we have for genetic tests in the clinic and what they’re used for today.
PETER: So, genetic testing today is all about the end of the spectrum where there are genetic variants which have very big effects. There are cases where we know that there are particular genes or particular variants that have a big impact on an individual’s risk. These are either cases where someone’s already really ill and we’re trying to work out what’s causing it, or cases where a family history of disease suggests that there might be one of these mutations which has a big effect in a family, so we try and check the gene in a particular individual to get a bit better sense of their risk.
An example would be testing the genes BRCA1 or BRCA2, where a particular type of mutation has a big impact on a woman’s risk of breast cancer. The risk for an average woman is a bit over 10% lifetime, but with a BRCA mutation, it might be 50%, 60%, or even 80% lifetime risk. So, there are a number of examples where we know there are genes where mutations have a big impact, and most current genetic testing is around those.
VINEETA: I would say, that’s born out in even the clinical training path for clinical genetics today. Certainly in the U.S., training in clinical genetics is essentially a tour through very rare disease clinics where it’s young kids with developmental disorders, it’s patients with inherited rare cardiomyopathies.
That’s the focus because that’s where we are comfortable using genetic testing in clinical practice today. It’s not in common disease even though we now understand that maybe upwards of half of the risk for many common diseases is actually inherited, but it doesn’t fit that original mold of one variant, one disease.
Now, in a sensible version of the world, if we knew about these scores, we’d be doing something very different.
LAUREN: Right. So, what we’re using genetics testing for today is looking for rare mutations that cause a big effect, but this is not the majority of how genetics influences our risk of disease. Instead, it is this amalgamation of common variants that all add up to affect our disease risk. So, how are we now thinking about wrangling and resolving this complexity of human genetics into actionable information for patients and doctors?
PETER: We’ve known for a long time that genetics is part of the risk of it. We’ve now got a way of quantifying it, and we can measure the impact. So in the analogy of someone walking along with a sack picking up pebbles or pieces of sand, we can tell where the positions that matter are, how big the pebble is that you are adding or taking out of the sack, and that they’ll end up with a sack of a particular weight. We call the numerical things the polygenic risk score. It’s just effectively adding up the effects of these million or so positions in the genome.
VINEETA: A weighted sum.
PETER: Yep. For an individual, for a specific disease, you can get the polygenic risk score. If we did that for a lot of individuals in a population, we get a range of scores. Some of the individuals would be at the high end because they just happen to get more of the variants that increase their risk, and some would be at the low end.
It’s only in the last couple of years that we’ve had the ability to detect enough of those variants to know which they are and look at the impact of these scores in large cohorts and say, “Okay. Suppose I know where you are on that range of scores, how much difference does it make for that disease?” It turns out it makes a lot of difference. If you take breast cancer, for example, putting aside the BRCA genes where mutations have a big impact on a woman. If you calculate this polygenic risk score, a woman in the middle of the distribution would have a lifetime risk of breast cancer of maybe 10% or 11%.
In clinical medicine, we already stratify patients by risk using different tools, but we’ve had no good way of adding the genetic component into the scores we already use. And now we have that way.
The women in the top few percent for this score, the ones who have got rather more of the unhelpful variants for breast cancer, their lifetime risk is more like 35%, so they’re at about threefold risk compared to the average. In the other direction, the women in the bottom few percent have a lifetime risk which is about 3%. So there are quite big differences across different women.
You can look at it a different way and look at it in terms of age. A woman in the top few percent has the same risk of breast cancer in her early 40s as a typical woman does in her early to mid-50s. In the UK, we offer screening to all women for breast cancer by mammograms at age 50.
Now, in a sensible version of the world, if we knew about these scores, we’d be doing something very different.
We’d be picking the women who are at [the 50-year-old] level of risk when they’re 40 and screening them earlier and probably more often. For the women who are at lower risk, we’d probably screen a bit less often, and we’d start the screening later. We can do that for many, many diseases now. We have the opportunity to try and use this on people who are currently healthy to understand their risk going forward for the next 10 or 15 or 20 years.
We can work out how to react to that, such as changing the way we screen, which would be natural for some of the cancers, to give us a chance to catch disease much earlier when the outcomes are better. Or we can take targeted therapeutic interventions, statins for heart disease, for example, and target those at the right individuals more effectively than we can do it currently. Or, individuals themselves understand their risk and work out, with the help of their doctors, what kinds of changes they should be making in lifestyle or diet to reduce their risk for the one or two diseases that they happen to be at increased risk of.
VINEETA: Peter mentioned the idea of scores. Just stepping away from genetics for a second, clinicians love scores. Every doctor has several apps on their phone built explicitly to calculate scores to separate patients into different groups. We just aren’t used to using genetic data in those scores.
So, Peter mentioned cardiology. Every primary care doctor and cardiologist is familiar with the Framingham risk score. This is a score that’s based on clinical risk factors and predicts a patient’s 10-year risk of developing coronary artery disease. The score is based on age, gender, cholesterol levels, blood pressure, smoking history. Based on that, it produces a rough risk of how likely the patient is to develop coronary artery disease. And based on certain admittedly arbitrary cutoffs, we decide whether or not to give patients statins and aspirin to modulate that risk of coronary artery disease, right?
There’s a common joke that although we refer to what we currently do as healthcare in many parts of the world, in both the UK and the U.S., it’s really sick care.
So there’s an example of an imperfect score that we already use widely in practice to stratify patients based on their risk of developing a certain disease in the future. So, we now apply this score but it doesn’t include all the genetic information that Peter is talking about using in a polygenic risk score. It’s just now that we’ve had enough data across the whole genome to be able to bring genetics into the picture, but we love scores. We love calculating them, and we like telling patients about their scores, and we like making decisions based on scores.
PETER: That’s exactly right, I think. In clinical medicine, we already stratify patients by risk using different tools, but we’ve had no good way of adding the genetic component into the scores we already use. And now we have that way.
LAUREN: Do we know enough about the variants? Have we characterized all the variants, or is this something that’s gonna continue to evolve as we keep sequencing more genomes?
PETER: I’m a statistician by training, and you’ll never get me to say that more data will make us worse off. So we’ll keep improving as we get more and more data. But there are some diseases now where we already know enough to be able to make meaningful predictions of risk. Within our company, Genomics PLC, for example, we have polygenic risk scores for more than 45 diseases where we think you can make a meaningful contribution to understanding risk. In many cases, genetics will be one part of risk. Vineeta talked about coronary disease where we already combine blood pressure and BMI and age and sex and smoking history and so on. So we would naturally want to combine them.
But one thing that genetics has is that because these risk scores effectively don’t change during our life, we can use them in younger individuals. So, in heart disease, blood pressure, BMI, cholesterol levels, and so on, we sort of wait until the body’s starting to be creaking and showing signs of problems, and then we say, “Oh. Crikey, we better do something here. The statins and aspirin and so on.” This gives us the opportunity to do that much earlier in life to be able to know who’s gonna be at risk in 20 or even 30 years’ time and think about getting those interventions in even earlier than we would otherwise. So, it has big advantages in that way.
LAUREN: So, you would know what your risk is when you are young. Whereas the other risk factors, such as your cholesterol levels, have to wait until you basically are already manifesting the disease. You’re reacting to a condition that has already onset as opposed to providing you with a mechanism for preventative care.
PETER: Exactly.
VINEETA: I think it’s a really interesting time for this to be coming online just as consumer-facing digital health products are also taking off, right? Imagine that you knew at age 18 that your risk for coronary artery disease was in the top 5 percentile of the population. Maybe regardless of the absence of smoking and lots of other decisions that you were making, that’s just the risk burden that you were dealt. How motivating could that be to encourage a healthy lifestyle, healthy diet, and potentially pharmacologic interventions, as Peter mentioned? We finally are also building, in parallel to the genetics story, a whole world of digital health, apps, engagement platforms and incentive structures for patients to participate in that care and to learn about their risk and actually try to move the needle on it.
Almost every paper I wrote, every talk I gave, every grant application, I would say at the beginning and the end, this stuff is really interesting and it’ll have impacts on patients. But while we talked about that, it just wasn’t happening.
PETER: I think one very interesting perspective is if you look at it from the point of view of the individual. So, we have these scores, or as you like to think of it your bag full of pebbles, and you have a different bag for different diseases. If you take any one of those, the chance of a given individual being in the top few percent [of risk] is low. It is literally a few percent, but across 40 diseases or 50 diseases, and that’s where we are at now, you’ll be in the high-risk category for a few of them.
So, if you think about it from the individual or their healthcare providers, it gives you a way of knowing what are the two or three things where the constellation of millions of genetic variants they happen to have inherited make them a substantially increased risk. That will be different for different people, and we have the chance of finding out what it is.
LAUREN: That’s a huge shift.
PETER: It’s a massive shift. There’s a common joke that although we refer to what we currently do as healthcare in many parts of the world, in both the UK and the U.S., it’s really sick care. We wait till people get sick, and then we try and sort out the problem.
These approaches allow us a whole new paradigm, which has been called genomic prevention. I think it’ll be a paradigm shift in the way we do population health management because it allows us to do this prediction much better, and then it’s all about stratifying. It’s about stratifying the right sorts of treatments, interventions, or screening to the people who are at particular risk for those conditions.
VINEETA: It’s also a paradigm shift in how we have historically thought about genetic testing because historically we thought about testing for each disease, and it was difficult to sort of compute. Well, what are you going to do? Are you going to get a separate test for diabetes? Are you gonna run a special test for a set of variants associated with just coronary artery disease? And then will the number of variants associated with each of those diseases keep mounting? And so the paradigm shift there is that it’s one test that gives you access to knowledge about your genetic risk for maybe hundreds of different diseases.
LAUREN: That’s so powerful. It seems like a real shift in how we think about preventative care, how we think about delivering care, how we think about maintaining health as opposed to responding to disease. So what do we still need to do to improve polygenic risk scores?
VINEETA: I think diversity in genetic data is a place where the field still needs to make progress. A lot of the best largest genome-wide association studies done to date were conducted in Caucasian populations, and as a result, we don’t know for sure whether those polygenic risk scores derived from those studies and those populations will be the best possible scores for patients of other ethnicities and other genetic backgrounds. We know there’s a lot of shared genetic risk across populations, and so we do expect there to be information that can be ported over between populations.
But that’s something that I’m personally looking forward to. And I’m now aware of a number of large biobanks being built in India, in Africa, in Japan, and more and more genetic data coming online so that these polygenic risk scores can be useful in a diverse set of people.
PETER: That’s a really good point. In all cases, polygenic risk scores tend to be helpful in different groups, but they’re usually most predictive in the group from which the original studies were derived, and that’s mainly people of European ancestry. One key and important thing is to substantially expand the diversity of the ancestry groups in which we do genetic studies.
I think another key part of it is to make our methods cleverer, and we’ve had some success in that. So one of the things I’m proudest of in our company, to go back to the breast cancer example, is that our breast cancer polygenic risk score is substantially more powerful than the best-published breast cancer score. But using the methods we developed, our breast cancer score for women of East Asian origin is more powerful than the published score for European women. So that’s a good example where we’ve been able to use data and clever methods to improve performance in ancestries, but it’s a challenge for the field.
LAUREN: Peter, you started this research creating these polygenic risk scores in your academic lab, but now you’ve created this company, Genomics PLC. I’m curious about how you decided to make that transition from academia into industry.
PETER: I was lucky enough in my academic role to be right at the center of many of the major discoveries of the last 10 or 15 years in our understanding of genetic variation and its role in diseases. But about 5 or 6 years ago, I realized that although there had been extraordinary progress in our scientific knowledge of learning about the variants, it had had very little impact in healthcare. Almost every paper I wrote, every talk I gave, every grant application, I would say at the beginning and the end, this stuff is really interesting and it’ll have impacts on patients. But while we talked about that, it just wasn’t happening.
It became clearer and clearer to me that that’s because there were lots of challenges to go between the science and having something that actually makes a difference in healthcare. Having realized that, along with a number of my colleagues, we founded Genomics PLC to do two things. One of them was to keep doing science absolutely at the world-leading cutting edge. And the other one was to tackle the very hard, in some ways even harder challenges of going from the science into healthcare.
So we took a bunch of the best and brightest minds in our field, formed them, and organized ourselves as a startup. And then we’ve continued to do outstanding science but to develop the products and to work out how to get the products into healthcare systems. Think about all of the pieces like health economics and supply chains and clinical operations and, you know, the software part of the product exactly to make that difference.
If we can look in our data and we can find a genetic variant whose effect is broadly the same as the drug that you’re thinking of, then we can look at people who have that genetic variant and to see what the impacts are.
LAUREN: Why can’t you do that in academia? What is it about the system that exists in academia that prevents you from making that transition into application?
PETER: I think there are a couple of factors. One of them is the incentive system. I think both the things that people find rewarding individually and the things the system rewards are publications or doing pieces of science. So it’s natural to do that and then stop and move on to the next piece of science rather than working out how to get it into practice.
The other one, I think, is about skill sets. The challenges of actually producing software of a standard that’ll work in a product environment, of doing the health economics, of having the business development in place to get it into a healthcare system. That’s well beyond the skill set of most academics. It needs different sorts of skills to solve those problems and to get that piece to work.
LAUREN: I think some people think about becoming a professor as being the final step on the journey. But you could even think of it as the next step is becoming a professor, making a discovery, and then deciding which discovery, which finding you really wanna take and turn into a real-world product is a different way to think about the academic path.
LAUREN: Speaking of your company, I’m curious, how do you turn something like a polygenic risk score into a product?
PETER: Good question. As you might imagine, it’s one we think about quite a lot. Our company was founded on the premise that large amounts of data linking genetic variation to outcomes in people coupled with clever algorithms would be a really powerful tool. So that’s the kind of core resource we have.
We exploit that in two different directions. One of them is in drug discovery to try and learn about the biology to find better drug targets. And the other piece is around risk prediction, polygenic risk scores. We’re at the stage of now doing implementation studies. We’re running a pilot with the National Health Service in cardiovascular disease in exactly the use case we were discussing earlier to add genetics into the current risk prediction tools to see how that works. In the U.S., we’re already working with a number of healthcare systems. We’re working with the Taiwan Precision Medicine Initiative.
I think the thing beyond that is to do one or two large-scale population health initiatives with a vision in 5 or 10 years to this kind of approach being much more routine. Not just in the UK and the U.S., but broadly in systems where healthcare’s available because the genetic part of it is quite cheap per individual, compared to other healthcare costs per individual. And it gives you predictions across many diseases. So we’ve got a major focus on going from the science into those products.
LAUREN: So right now you’re working out the implementation of polygenic risk scores into their clinical systems. But you also mentioned that you can use genomics to inform drug discovery.
PETER: That’s a parallel piece that goes back to each of these variants, which has a small effect on risk for an individual, is potentially telling us a story about a key part of biology for that disease. So, statins target a gene called HMG-CoA reductase. There are genetic variants which dial that gene up or down a little bit. So if you have the variant that dials it down a little bit, it’s like you’ve been on a weak statin all your life. It’s possible to look and see that those individuals, in fact, get less heart disease. So if we can look in our data and we can find a genetic variant whose effect is broadly the same as the drug that you’re thinking of, then we can look at people who have that genetic variant and to see what the impacts are.
VINEETA: Those targets that Peter described where we have genetic evidence to show that naturally occurring variants in the population modify risk of disease at that target are now being described as genetically de-risked human disease targets. There are actually some really nice papers that have gone back and looked at drugs that have gone into clinical trials over the last several years and looked at the probability of success of those drugs, stratified by whether the target was a genetically de-risked target or a target nominated in another way.
We now have evidence that the drugs against those genetically de-risked targets are simply more likely to succeed in a clinical trial.
PETER: People have called it nature’s clinical trial, and it’s a helpful way of thinking about it.
LAUREN: So, essentially, you’re using your vast genomics resource to help you identify good targets for future drugs.
LAUREN: Let’s wrap up the conversation now with a high-level takeaway on the evolving role of genetic testing in the clinic.
PETER: Healthcare systems are under increasing strain and pressure because of spiraling costs. One reason for that is that we tend to do healthcare later on in disease. And one way of solving that problem is to get much better at preventing disease altogether or at intervening early. Genomic prevention allows us to identify individuals who are at higher risk of disease, intervene early, screen more effectively. It’s much better for patients because they’ll have better outcomes. It’s much better for healthcare systems because it substantially reduces costs in the long term.
VINEETA: I’d say this shift that’s happening is from genetics being rarely useful, rarely invoked, rarely ordered, to being at the cusp of an era where we’re finally ready to use that information in the same way that we have been using all kinds of other imprecise but useful information in clinical medicine. So I’m really excited to see the work that Peter and other teams are doing across the world in bringing genomics mainstream.
Peter Donnelly is a Professor of Statistical Science at the University of Oxford and a Fellow of St. Anne’s College.
Vineeta Agarwala is a general partner on the Bio + Health team at Andreessen Horowitz, focused on biotech, digital health, and life sciences tools/diagnostics.
Lauren Richardson is the Founder and Principal of LR Bio LLC.