Jan 5, 2015

Finding Needles in the Haystack of Genomic Data

Faculty & Staff, Research
Professor Fritz Roth
By

Jovana Drinjakovic

Causes of many genetic diseases remain stubbornly hidden despite our ability to read all letters of a patient’s DNA.  Genome-wide association studies (GWAS) are commonly used to identify regions of our genome that contain a gene important for that disease.  The trouble is that these regions often contain multiple genes, and it is hard to pinpoint the actual culprit.

Now Professor Fritz Roth and colleagues have developed two powerful computational methods to help reveal disease-causing genes within these suspicious regions. Roth, a scientist at the Donnelly Centre at the University of Toronto and Mount Sinai Hospital’s Lunenfeld-Tanenbaum Research Institute, led an international collaboration with researchers from the University of California and Harvard Medical School.  To help identify disease genes from a “lineup” of many suspects, the team developed two new types of “detective work”.

One approach, published in Nature Methods, uses a new “guilt-by-association” technique to find a group of candidate disease gene “suspects” for which there is evidence of working together.

The second “guilt-by-profiling” strategy, published in Genome Biology, pinpoints the most suspicious candidate genes by comparing a profile of gene characteristics to the profile of known disease genes.

In the Nature Methods study, Roth and Dr. Murat Tasan, also of the University of Toronto, exploited the fact that, for complex diseases, there are often many suspicious regions around the genome.  They describe a strategy to find “gangs” of genes (one gang member per region) that are mutually connected to one another.  They then score individual candidate gene suspects based on whether the gang would be less “tightly knit” without that gene.

Using data from almost 100 genome-wide association studies, covering 10 cancer types, Roth’s team gathered enough statistical power to prove that their method works.

“The best validation of the algorithm is to test how well we can recover the known ‘smoking gun’ cancer genes.  And we show that for 21 out of 34 suspicious regions the bona fide cancer gene was our top candidate. For 21 of the 34 ‘lineups’ the guilty party was our top suspect,” says Roth, a Canada Research Excellence Chair and a Senior Fellow of the Canadian Institute for Advanced Research.

This “guilt-by-association” strategy regularly outperformed the standard alternative method in identifying cancer genes.  Better still, it could identify new genes and gene pathways involved not only in cancer but in other diseases as well.

For example, the top-ranked lung cancer genes included genes involved in behavioral response to nicotine, whereas the top scores for cognitive function involved genes involved in learning and memory.

“This shows we’re not just picking some random group of genes that happen to share a function - the function they are sharing makes sense,” Roth says.

In the second study Roth, with Dr. Rahul Deo from the University of California, describe a “guilt-by-profiling” algorithm that predicts disease gene culprits based on how similar they are to already-known disease genes. In this study, published in Genome Biology, they successfully applied this strategy to uncover new genes involved in heart disease.

 “You’ve got a collection of criminals (genes) and you look at the profile of each criminal and you try to develop a stereotypical profile of a criminal,” says Roth.

To obtain a profile on each gene, Roth’s team took into account the huge amount of systematically collected experimental data that was available on human genes.

Roth’s team first showed that the software works by successfully identifying a whole host of known heart disease genes, like those involved in cholesterol levels and blood pressure.

They then ran the software to find the genes that are linked to having an enlarged left ventricle, which can cause heart failure. The software retrieved several new genes as the top candidates. As virtually nothing was known about these genes, Roth’s team decided to experimentally test the software’s findings.

They turned to zebrafish larvae – an excellent model for humans and other vertebrates, where it is possible to block gene function with ease and see the outcome within a couple of days.

Zebrafish larvae are transparent, so that scientists can see in real time what happens when gene activity is disrupted. Roth’s collaborators, Drs. Gabe Musso and Calum MacRae at Harvard Medical School and Brigham and Women’s Hospital, first blocked the function of each candidate gene in the larvae and then looked at their heart function and blood flow. He found that three of the predicted genes were indeed essential for a healthy heart.

Next Deo, who is a cardiologist, looked to see whether he could detect mutations in those genes in his patients with heart disease. After gathering the patients’ DNA sequence data, Deo discovered that one of his patients, whose left ventricle was bigger than in healthy people, had a mutation in one of the newly discovered genes, called FLNC.

This finding really cemented the power of Roth’s computational approach – the software successfully predicted a previously unknown gene as the culprit at the root of a genetic heart condition. It identified FLNC as essential for heart formation and the promising new drug target in future therapy.

The power of both the “guilt-by-association” and the “guilt-by-profiling” approaches will only grow as more patient DNA and other experimental data become available. In his future work Roth will combine these two strategies to develop a new unified way of finding more disease-causing genes.