Disease is Polygenic

Disease is Polygenic

Giordano Bottà

Giordano Bottà

CEO & Co-Founder at Allelica

In the late 19th Century, a Moravian monk named Gregor Mendel passed his time by running experiments cross-breeding different types of pea plant.

Amongst his botanical collection of perennials he had legumes with a variety of physical characteristics. He had short and tall pea plants, pea plants with different coloured flowers, and plants with different types of leaf and pea shape.

Mendel was interested in how these traits were inherited over time and the three laws he discovered paved the way for the modern science of genetics.

Mendelian Genetics and Human

To understand why Mendel is important for human disease genetics, we need to understand a bit about human genetics. Humans have two copies of the same set of roughly 20,000 genes. (The exact number is still a matter of fierce debate).

Perhaps counterintuitively, genes account for just 2% of the human genome with the remaining 98% (so-called non-coding DNA) involved in regulating the function of genes, for example by turning them on or off, or of no known function.

Genes are short sequences of DNA found on larger stretches of DNA called chromosomes, of which we have 23. We inherit one chromosome from each of our parents, meaning that we actually have two copies of each gene, one from each parent.

The genetic code within genes, and elsewhere in the remaining non-coding genome, will vary amongst people due to changes in the code, which are often just called variants.

Often these variants don’t change anything about how a gene works or how it’s regulated, but occasionally they do. Over time, different variants can be found within populations which in turn can lead to differences in the ways that things look and behave.

Mendels laws provide a genetic framework for single gene traits and disease

Genetic variation is the basis of evolution. But because Mendel was able to see the physical representation of genetic variation in his pea plants, it also allowed him to understand genetic inheritance for the very first time.

His first law of dominance says that when an individual inherits two different forms of the same gene, one variant is often dominant over the other. For example, when Mendel crossed plants with round peas with plants with wrinkled peas, he found that the round pea shape was dominant over the wrinkled. Only plants that inherited two versions of the wrinkled type would have wrinkled peas. In other words, the wrinkled pea type is recessive. In human disease genetics, variants that cause monogenic diseases are often recessive, whilst the ‘normal’ variant is dominant.

His second law concerns how variants segregate over time. Remember that we have two copies of each of our chromosomes. When we have children we only pass on one of each of the copies, with our partner passing on just one of their two copies. Mendel’s second law states that the two versions of a given gene will always be split apart and won’t both get inherited together. So if a disease is monogenic and recessive, then the disease variant of the gene will be inherited from both parents.

Mendel’s third law says that genes underlying different traits will be independently inherited. In his experiments, the gene for round peas didn’t always get inherited with the gene for tall plants, so it’s not the case that all tall plants have round peas.

Whilst these laws form the bedrock of modern genetics, they can be violated and when they are, we can learn something new about the inheritance of different traits and diseases.

Mendelian inheritance is the exception not the rule in disease genetics

So why did I just describe the esoteric experiments of an Eastern European monk? Well it turns out that Mendel’s laws have been incredibly useful for our understanding of the influence of our genes on disease.

For example, by studying families of people with cystic fibrosis, researchers could see that the inheritance of cystic fibrosis followed Mendel’s laws and so was likely due to variation in a single gene. Later, painstaking genetic work identified that mutations in a single gene called CFTR cause the disease.

Another example is familial hypercholesterolemia, a monogenic condition that affects just under 1 in every 1000 Americans. It causes individuals to have unnaturally high amounts of LDL cholesterol and people without treatment are at least three times as likely to suffer from coronary artery disease (CAD) from a young age and are at an increased risk of heart attack. People are often diagnosed with familial hypercholesterolemia after failing to lower their LDL cholesterol through lifestyle changes, after which their risk can largely be mitigated with cholesterol reducing drugs. Now a simple genetic test can reduce the time taken to identify this disease and prescribe the necessary treatment sooner.

Mendelian inheritance can’t explain the inheritance of most disease however. It is the exception rather than the rule when it comes to disease risk, so most disease risk is more complex than the case of cystic fibrosis would suggest. There is a growing understanding that, in addition to a large environmental component, most diseases and many human traits are polygenic: they are controlled by variation across the genome and not just by variation at single genes.

The message from this new conceptual framework is clear: there are cases where testing for monogenic diseases can identify those people who are likely to get a disease, but for the vast majority of common diseases new methods are required that can account the polygenicity of disease.

The role of Polygenic Risk Scores

Understanding an individual’s genetic predisposition to disease is important in healthcare. As we saw above with the example of familial hypercholesterolemia, it can help to identify those at high risk of CAD so that targeted interventions can happen. However, as most disease is polygenic this simple test and treatment option is not as easy, so we need to think about different ways of testing for genetic risk.

The concept of a Polygenic Risk Score (PRS, also known as Genome-wide Polygenic Score GPS) is straightforward, but it’s application is far from trivial.

First, you work out the effect genetic variants have on a disease. A method called a Genome Wide Association Studies GWAS can provide this information. This study design compares people with and without a trait or disease and identifies the systematic differences in the DNA of the two groups and the effect that each difference has on the trait or disease. Ever larger GWASs involving huge, well curated datasets which have only recently been of sufficient size (hundreds of thousands of individuals) are teasing out these effect sizes, helping to identify the effect of variants on a whole range of traits and diseases.

You then develop and run an algorithm that refines and adds up all the effects based on an individual’s DNA to output a score. A significant amount of computational power is required to perform the numerical analyses necessary to asses polygenic risk across the whole genome. Novel methodologies and algorithms are also needed.

A final requirement is large datasets on which to validate the new methods. The UK Biobank, about which we wrote last week is one example of such a dataset, but there are also aggregated datasets of individuals collected to examine particular diseases that have contributed effect sizes for many thousands of variants across the genome.

Assessing the clinical utility of PRS

A couple of years ago a seminal paper from Amit Khera and colleagues at the Massachusetts General Hospital in Boston presented a framework for computing PRS from millions of variants for five different diseases.

Using the latest information on effect sizes for CAD, different methods for computing PRS, and the UK Biobank to validate their results, they showed that stratification of individuals based on PRS have clinical utility.

For example they showed that this individuals in the top 8% of the PRS risk distribution in the UK Biobank were at a three times increased risk of having CAD. Importantly, this information is actionable; without knowledge from their DNA, these individuals at the top the PRS risk distribution will only be considered at risk if they have worryingly high traditional risk factors, for example high LDL cholesterol.

The strength of the CAD PRS is that it identifies people who would otherwise be invisible to these traditional risk models. So the CAD PRS can identifies people that need to monitor their blood lipids levels more often, as marginally higher cholesterol in these individuals turns out to be much more dangerous than it is in individuals with normal genetic risk of CAD.

PRS identifies more of the people at risk of cardiovascular disease

Importantly, these individuals were at the same risk level as carriers of the monogenic familial hypercholesterolemia mutation mentioned above, but represented a much more significant proportion of the population.

The paper examines four other diseases and provides compelling evidence that similarly strong predictive power of risk can be achieved with PRS.

At Allelica, we have updated and refined these models (for more information read our white paper) and have expanded our disease set to include an additional two important polygenic diseases: breast and prostate cancer.

We believe that these will be routinely use in clinical practice for reclassifying people between risk categories and identifying those individuals at high genetic risk of disease so that interventions, treatments, and monitoring can be focused on these groups.

New data; new challenges

There are important challenges to using PRS more broadly. The most pressing of which is to understand how portable PRS are between populations with different ancestry.

Geographical ancestry causes subtle but meaningful differences in the frequencies of variants in different populations, which affects how PRS should be computed. Both the genetic variants underpinning disease risk and our ability to access them will differ across populations.

Encouragingly, several initiatives are underway to increase the diversity of data available for researchers to work on.