When the human genome was sequenced almost 20 years ago, many researchers were confident they’d be able to quickly home in on the genes responsible for complex diseases such as diabetes or schizophrenia. But they stalled fast, stymied in part by their ignorance of the system of switches that govern where and how genes are expressed in the body. Such gene regulation is what makes a heart cell distinct from a brain cell, for example, and distinguishes tumors from healthy tissue. Now, a massive, decadelong effort has begun to fill in the picture by linking the activity levels of the 20,000 protein-coding human genes, as shown by levels of their RNA, to variations in millions of stretches of regulatory DNA.
By looking at up to 54 kinds of tissue in hundreds of recently deceased people, the $150 million Genotype-Tissue Expression (GTEx) project set out to create “one-stop shopping for the genetics of gene regulation,” says GTEx team member Emmanouil Dermitzakis, a geneticist at the University of Geneva. In a brace of papers in Science, Science Advances, Cell, and other journals this week, GTEx researchers roll out the final big analyses of these free, downloadable data, as well as tools for further exploiting the data.
“This resource is invaluable” for anyone interested in particular diseases, or studying tissues or cell types, says Jan Korbel, a human geneticist at European Molecular Biology Laboratory (EMBL), Heidelberg. “It’s a public treasure trove,” says Jun Li, a geneticist at the University of Michigan, Ann Arbor.
But the complex main analysis drives home just how convoluted the interconnections between genes and their regulatory DNA can be. The papers “are written in bureaucratese,” and the announced results are hard to decipher, says Dan Graur, an evolutionary biologist at the University of Houston and a well-known critic of big science. And like other critics, he notes that the project, with 85% white donors, sorely lacks diversity and thus will miss genetic variation in other groups.
GTEx can’t yet pin down sequences responsible for illnesses such as heart disease and kidney failure, or trace how the layers of gene regulation work together. “We shouldn’t pack up our bags and say gene expression is solved,” says genomicist Ewan Birney, deputy director general of EMBL, who led another big genomics project called ENCODE.
After GTEx was launched in 2010, families of more than 900 deceased subjects who had already pledged their organs or tissues for transplants agreed researchers could also take samples of their loved ones’ healthy tissues, for example brain, muscle, fat, pancreas, and heart. Having multiple tissues from the same subject gave researchers confidence that variation in gene expression between, say, muscle and pancreas, was real and meaningful. “For the first time, we have this homogeneous set so we could get at biological differences between tissues,” says GTEx member Barbara Stranger, a geneticist at Northwestern University.
Researchers described each sample, then imaged and froze all the tissues for future analysis. They deciphered genomes and quantified RNA to measure gene activity. In addition to comparing tissues within one person, they could also compare the same tissue in different individuals. They were able to link variations in DNA to gene expression levels using statistical analyses to find correlated patterns of change. The heart of the GTEx database is a compilation of the complex relationships between stretches of regulatory DNA called expression quantitative trait loci, or eQTLs, and the genes they regulate.
A pilot phase, completed in 2015, examined nine tissues in depth and demonstrated that samples from corpses were reasonable stand-ins for living tissue, says GTEx co-leader Tuuli Lappalainen, a human geneticist at the New York Genome Center. Now, after analyzing almost 20,000 samples, GTEx “has reached a size where we can gain much clearer, crisper insights,” says co-leader Kristin Ardlie, a human geneticist at the Broad Institute. She and her colleagues found that almost every human gene is regulated by at least one eQTL, many of which target multiple genes and presumably affect multiple traits.
Stranger uncovered another key result: Almost every tissue including, for example, skin and heart, showed differences in gene expression between males and females. “The vast majority of biology is shared by males and females,” Stranger says, but the expression differences may help explain why men and women have different disease patterns or reactions to drugs. “I consider that a major finding,” Korbel says.
Likewise, Broad co-leader François Aguet and colleagues confirmed certain eQTLs extend their reach to distant genes, even those on other chromosomes. GTEx documented 143 such “trans” elements, some of which affect multiple genes across the genome.
Kelly Frazer at the University of California, San Diego, is already using the data to help make sense of so-called genome-wide association studies (GWAS), which pose major mysteries. In a GWAS, massive consortia look at the genomes of thousands of patients with a particular disease or trait and note hundreds of subtle genetic changes, often outside of genes themselves. But researchers often have no clue which of these many suspects triggers the disease or shapes the trait.
For example, GWAS studies had identified more than 500 genetic variations that appeared to affect heart rhythm and electrical conductance. Frazer wanted to know how a heart-specific transcription factor called NKX2-5 influenced those traits. Her team had identified thousands of DNA variations that might affect NKX2-5’s activity and so perhaps shift heart rhythm.
Paola Benaglio in Frazer’s lab analysed and compared those DNA variations, GWAS data, and GTEx data in order to identify which DNA variations actually regulate NKX2-5 activity. She was able to first narrow the candidate eQTLs to 55, then to nine and finally, using GWAS data on heart rhythms and other tools, she zeroed in on a single variable base on chromosome 1. Next, she blocked that DNA base using the genome editor CRISPR and confirmed that it alters NKX2-5 binding, Benaglio, Frazer, and their colleagues reported last year in Nature Genetics.
“I’m sure there are hundreds of people like me” who appreciate the database, Frazer says. The statistics back her up. Monthly, 16,000 people visit the GTEx portal, and others examine the data on other sites. In 2018, 900 papers cited it. Birney understands the enthusiasm, but cautions that spurious correlations between eQLTs and genes can arise. Homing in on a disease-causing variant via GTEx “is not a slam dunk.”
Graur, for his part, remains skeptical that gene activity in corpses adequately reflects what’s going on in the living, despite the team’s data on the preservation of gene expression. “It’s like studying the mating behaviour of roadkill,” he says.
As the project winds down, the U.S. National Institutes of Health is planning a developmental GTEx that will enroll people under age 20 to create an atlas of gene expression from birth to adulthood. In such follow-up efforts, a more diverse set of tissue donors “would be very valuable,” Korbel says. GTEx initially shot for that goal but faltered because tissue and organ donors are disproportionately white. Researchers need to “communicate more effectively,” says Laura Siminoff, a social scientist at Temple University who was funded early on to look at GTEx ethics. “Otherwise we will be doing this science for white people.”
The results so far cannot tell the full story of how the genome gives rise to a human being’s myriad tissues and diseases. Still, Birney predicts, “GTEx will get used and reused again and again, and there will be some uses I cannot predict.”