Using data mining to understand cell function and disease

June 05, 2015 − by Suzanne Elvidge − in Big data, Big data in research, Data mining − No Comments

Knowing how genes work together in different cells and tissues would be a step forward in the study of health and disease. According to some recent research at from universities, foundations and medical schools across the US, big data and data mining could be bringing us closer to understanding, according to research published in Nature Genetics.

Understanding how genetics underpins and controls the structure and function of individual cells and tissues will help researchers develop new and better diagnostics and therapeutics, and drive the growth of personalised medicine. In this study, the team collated and integrated data from around about 38,000 studies, extracted from about 14,000 publications. The studies included data from people with a variety of different diseases. They used a data-driven Bayesian methodology to create genome-wide functional interaction networks for 144 human tissues and cell types (including kidney, liver and brain) by looking at the functional genetic interconnections, and combining this tissue-specific information with disease-based genome-wide association studies (GWAS). This allowed the researchers to identify statistical associations between genes and diseases that would otherwise be undetectable, including functional gene disruptions in hypertension, diabetes and obesity.

Leslie Greengard, director of the Simons Center for Data Analysis, one of the institutions involved, says: “Olga [Troyanskaya] and her collaborators have demonstrated that extraordinary results can be achieved by merging deep biological insight with state-of-the-art computational methods, and applying them to large-scale, noisy and heterogeneous datasets.”

This technique, dubbed NetWAS (network-guided association study), allows more accurate identification of disease-gene associations than GWAS alone, and avoids the bias caused by better-studied genes and pathways. NetWAS is based on a webserver called GIANT (Genome-scale Integrated Analysis of gene Networks in Tissues), located at the Troyanskaya Laboratory at Princeton University. GIANT allows users to explore the networks, compare how genetic circuits vary across tissues, and analyze data from genetic studies to find genes that cause disease.

“A key challenge in human biology is that genetic circuits in human tissues and cell types are very difficult to study experimentally,” says Troyanskaya. “For example, the podocyte cells in the kidneys that perform the kidney’s filtering function cannot be isolated for study in the lab, nor can the function of genes be identified by genome-scale experiments. Yet we need to understand how proteins interact in these cells if we want to understand and treat chronic kidney disease. Our approach mined these big data collections to build a map of how genetic circuits function in the podocyte cells, and in many other disease-relevant tissues and cell types.”

As Troyanskaya goes on to explain, these findings could help in drug development, by identifying causal or target genes, and anticipating previously unexpected drug interactions and disruptions: “Biomedical researchers can use these networks and the pathways that they uncover to understand drug action and side effects in the context of specific disease-relevant tissues, and to repurpose drugs. These networks can also be useful for understanding how various therapies work and to help with developing new therapies.”

These results show how the combination of computer science and statistical methods can be used to aggregate and analyze large and diverse genomic ‘big-data’ collections. While GenoKey has not been involved in this research, its combinatorial data mining platform technology is capable of analysing and mining complex and multifaceted data sets, including GWAS data.

Post a Comment

Your email address will not be published. Required fields are marked *