Big data and data analytics is a growing field – by 2029, data production is likely to be 44 times greater in 2020 than it was in 2009, with a 4300% increase in annual data generation by 2020. The GenoKey blog, which has been looking at the growth of big data, particularly in healthcare and life sciences, since March 2013, has seen some exciting uses of big data, some of which are reviewed here.
Big data and genetics
The genome is full of genetic typos and many of these have no impact on health or wellbeing. However, some can lead to a tendency to disease. Research carried out by the EU-based consortium COGS (Collaborative Oncological Gene-environment Study) used big data analysis of genome-wide association studies (GWAS) found more than 80 genetic ‘typos’ that can increase the risk of breast, prostate and ovarian cancer. Combining clinical and genetic data, using data analytics, also has potential to predict breast cancer recurrence risk.
GenoKey’s combinatorial data mining algorithms, based on array-based logic and powered by GPU computing, can screen vast quantities of genetic data to find significant patterns in combinations of genotypes. This technology was used to calculate all the combinations of three genotypes from 3×803 SNP genotypes in 1355 healthy people and 607 patients with bipolar disorder. The researchers found statistically significant connections between the clusters of SNPs and the symptoms of bipolar disorder, particularly in patients with clustered manic episodes and alcohol-related bipolar episodes.
Medical records as a source of big data
Medical records contain a huge amount of data, but it’s not always easy to access, because it may include paper and electronic information, and can cover everything from prescribing information to pre- and post-treatment genome sequences, or even the outputs from EEGs.
Researchers have mined electronic medical records to get information on adverse events, look into child health and outcomes, and to improve patient-centred care, change healthcare management, and reduce readmission. And less formal medical records also have potential – social media big data could be used to predict outbreaks of the winter vomiting virus or suicide risk in the military.
Paper records include a lot of useful information, but this is difficult to mine. A team of researchers have painstakingly digitised printed information in order to highlight the efficacy of vaccines in infectious disease.
Because GenoKey’s technology can create a knowledge model from the analysis of huge, complex datasets, which gives a complete summary of all the known interdependencies and positive and negative correlations within the data, it could have potential to create useful diagnostic and research tools from large pools of patient data from medical records. The company is mining electronic medical records for case-control data, with an aim to link blood test results and diagnoses.
Data mining in drug discovery and development
Big data has an exciting and vital role in drug discovery and development. The National Cancer Institute (NCI) is creating a database of cancer-specific genetic coding variants that includes six billion data points linking drugs with the changes in the genome. This has potential to find new uses for existing drugs or combinations of drugs, or to screen for new therapeutics.
During 2013, big data has also been used to validate drug safety in peripheral artery disease, reduce drug side effects, mine mouse movements to understand more about drug development in psychiatric disease, and to understand more about drug resistance in tuberculosis.
We hope you enjoyed our Christmas wishes, exploring seasonal uses of big data. For more general reviews of big data in 2013, have a look at the reports in Datanami, Dataversity and ComputerWorld. And watch out for our next post on big data in 2014.