Gephyrin is a protein that multitasks – it regulates receptors in the brain, it has been linked with epilepsy, Alzheimer’s disease, schizophrenia and other neurological diseases, and it means that the body can synthesize an essential trace nutrient. Using big data, computer scientists at Washington University in St. Louis’ School of Engineering & Applied Science have been able to find out more about this hardworking protein, including its role in human history. The finding could also explain more about rapid evolutionary events and historic human migrations.
DNA can evolve to create two divergent forms, described as yin-yang haplotypes (sequences of markers). The Washington University in St. Louis researchers found a major yin-yang haplotype pair that included the gephyrin gene on human chromosome 14, according to data published in Nature Communications. This region evolved rapidly after its split, and the yin and yang forms are still seen in different populations of people around the world today.
The researchers analysed data from 3,438 people. The information came from the International HapMap Project (a partnership of scientists and funding agencies from Canada, China, Japan, Nigeria, the UK and USA to create a resource that will help researchers find genes linked with disease and treatment) and the 1000 Genomes project (a resource on human genetic variation).
The team used a technique called BlocBuster, which looks at correlations between each pair of single nucleotide polymorphisms (SNPs), and builds a network of those correlations, which shows up clusters of correlated markers.
“With an efficient algorithm and an adequate number of processors and time, we can look at every pair of SNPs, build these networks and observe clusters of interconnected SNPs,” says Sharlee Climer.
“The BlocBuster approach is a paradigm shift from the conventional methods for genome-wide association studies, or popularly known as GWAS, where one or a few markers were examined at a time,” adds Weixiong Zhan. “It is truly a data mining technique for big data like those from HapMap and 1000 Genomes projects.”
The analysis showed that up to 80% of the haplotypes could be categorised as yin or yang, and this split could be traced back to the ancestral haplotype, that of the most recent common human ancestor.
“We observed that the ancestral haplotype split into two distinct haplotypes and subsequently underwent rapid evolution, as each haplotype possesses about 140 markers that are different from the ancestral haplotype,” says Climer. “These numerous mutations should have produced a large number of intermediate haplotypes, but the intermediates have almost entirely disappeared, and the divergent yin and yang haplotypes are prevalent in populations representing every major human ancestry.”
Looking at data from the HapMap Project, the team found that people with African origin had more yang haplotypes in the gephyrin region, while people with European origin have more yin haplotypes. Those of Asian descent have nearly equal numbers of yin and yang haplotypes – 30% of people with Japanese origin carry two yin haplotypes or two yang haplotypes, and another 30% have both a yin and a yang haplotype, reflecting the roughly equal probability of inheriting either one.
The BlocBuster technique could also be used to look at combinations of networked genetic markers that are characteristic of complex traits and diseases, and shed light on the genetic roots of disease.
“Most complex diseases arise due to a group of genetic variations interacting together,” says Climer. “Different groups of people who get a disease may be affected by different groups of variations. There is not enough power to see most of these intricate associations when looking at single markers one at a time. We’re taking a combinatorial approach — looking at combinations of markers together — and we’re able to see the patterns.”
While GenoKey has not been involved in this research, its combinatorial data mining platform technology is capable of analysing and mining complex combinations of SNPs and other multifaceted data sets.