The UK Biobank : unleashing the power of genome

The UK Biobank : unleashing the power of genome

Giordano Bottà

Giordano Bottà

CEO & Co-Founder at Allelica

A central goal of preventative medicine is to identify the people most at risk of disease early enough that something can be done. Spotting those at higher risk of getting cancer, for example, means that screening programs can be targeted at this group, allowing the disease to be caught early and dealt with whilst it’s relatively cheap and the cancer is small. As we learn more about the drivers of many diseases, from breast cancer to cardiovascular disease, understanding the combination of factors that contribute to the development of disease will provide information that will help people reduce their risk through lifestyle change and therapeutics.

Nature versus nurture

Our predisposition to disease is the result of a complex mixture of our biology and our environment. Many decades of scientific research have uncovered the key role of lifestyle and diet in the aetiology of many diseases. We all know that to keep fit and healthy, we should exercise daily, eat well, drink in moderation and not smoke. But we also know that other innate factors are at play when it comes to our risk of disease.

The role of DNA

Whilst our environment has a clear role in the onset of many diseases, our genetics also plays a part. Genetics is the study of inheritance and many of the first genetic studies of humans explored the occurrence of different diseases in family pedigrees to demonstrate that in some diseases run in families. When we observe a higher than normal amount of disease in a family this is a clue that DNA might be someway responsible for risk. A well known example of this is breast cancer. If you are a woman with close female relatives who have suffered from breast cancer whilst still young, then you are known to be at higher risk of getting the disease yourself, and have the option of testing your DNA for known genetic variants that predispose you to cancer. So family history of disease is often one of the first key pieces of information that a doctor asks for when developing a diagnosis.

Big genomic data

Analysing pedigrees to show that DNA has a role in a disease is only the first step towards understanding which genes are involved in disease. Before the advent of cheaper sequencing technologies in the first decade of this century, a painstaking amount of research was required to elucidate which specific genes were involved with disease. This all changed as genomic data has become cheaper to generate at scale. Over the last ten years, datasets have got larger and the amount of sequence generated on individuals has got deeper. These data have led to more and more genetic variants being associated with different diseases. This in turn has led to us understanding in greater detail how variation at many different parts of the genome can impact disease predisposition. This can help with the identification of potential therapeutic targets but also allows us to predict an individual’s genetic predisposition to different diseases.

The Uk Biobank : a model for 21st Century science

The many advances provided by large scale genomic analyses have confirmed that whilst disease risk has both genetic and environmental components, it’s complicated. So although there is undoubtedly a genetic component to an individual’s disease risk, deeply technical statistical methods are needed to tease apart the signal from the noise. Partly this is due to the scale of the datasets, which can involve upwards of 10 million data points for each of tens of thousands of individuals. But it’s also down to the complicated nature of disease risk which is the result of an interplay between the environment and several genetic variants. Whilst we are getting ever better at measuring genetic variation, measuring the environment is still incredibly difficult.

It was against this backdrop that the UK Biobank was developed. The UK Biobank is a unique and innovative resource. It is a prospective cohort study, which means that it aims to research a group of people over time as different diseases begin to affect them. The UK Biobank cohort contains 500,000 people who were between the ages of 40 and 69 at the time of recruitment between 2006 and 2010. At the beginning of the study it was unknown what diseases would affect individuals, but because the number of individuals is large, it’s likely that many of the most common diseases will affect a sizeable chunk of the cohort.

The major innovation of this cohort is that a wide variety of measurements were taken at the beginning of the study and continue to be taken now. These include things like family history of disease, early life experiences, current and former lifestyle choices and cognitive function, that were answered by participants through detailed questionnaires. A wide variety of physical objective measurements were also taken, such as hand grip strength, height, weight, biochemical measurements. Importantly, genome-wide genotype data has also been collected. Ongoing analyses linking electronic health records, deaths, and hospital inpatient data mean that all of these measurements can be linked to the onset of different diseases, making this resource the first of its kind in getting large amounts of data on individuals that can be interrogated by any bona fide researcher or commercial organisation. Indeed, this latter point is important to stress, as a key aspect of this data is that it has always been planned to be made available to academics and private companies.

Our use of the UK Biobank data resource

At Allelica we have been using the UK Biobank data to train our Polygenic Risk Score (PRS) algorithms and to test their ability to identify those people most at risk of Coronary Artery Disease, Breast and Prostate Cancer. Our newly developed PRSs showed the highest predictive power to date, demonstrating clinical utility and cost-effectiveness. At Allelica we develop the first Software as a Service Allelica that allows clinical genetics laboratories and researchers to perform PRS analysis without writing a single line of code.  We’ll write more about our methods over the coming weeks, but those who are interested can read our white paper.