Data mining creates new cancer classification system

August 27, 2014 − by Suzanne Elvidge − in Big data, Big data in research, Data analytics, Data mining, Drug development, Healthcare big data analysis − 1 Comments

Data mining information from more than 3500 tissue samples has found a way to classify cancer into 11 subtypes, finding characteristics that are shared between tumours that arise in different tissues. These findings could help doctors predict patients’ outcomes and choose treatments, as well as guiding researchers in developing new therapeutics, and stratifying patients for clinical trials.

Cancer is a complex disease. While the initial classification of cancer types is based on where it originates, sequencing the genome of different cancer samples has shown that there are different subtypes within cancers that arise from the same tissue-of-origin. A team of researchers from Europe and North America have created a new classification system based on molecular subtypes. The research involved analysing molecular data from 3,527 specimens from patients with 12 different cancer types, the most comprehensive and diverse collection of tumours ever analyzed by systematic genomic methods. The team analysed each tumour type using five genome-wide platforms and one proteomic platform, and was part of the Pan-Cancer Initiative of the Cancer Genome Atlas (TCGA).

The research team carried out data analysis, first analyzing the data from each platform separately and then combining them in an integrated cross-platform analysis. The analyses from the six platforms as well as the integrated analysis all revealed that the tissues could be divided into 11 major subtypes, providing the researchers with confidence in the subtypes they identified, but also suggesting that different kinds of data can be used to classify a tumour.

“We can now say what the telltale signatures of the subtypes are, so you can classify a patient’s tumour just based on the gene expression data, or just based on mutation data, if that’s what you have,” Stuart said. “Having a molecular map like this could help get a patient into the right clinical trial.”

Five of the subtypes were nearly identical to their tissue-of-origin counterparts. However, the analysis also split some tissue-of-origin classifications into different molecular subtypes, and combined certain cancers into common subtypes – for example, bladder cancer split into seven different clusters, with most samples falling into one of three subtypes. One subtype was bladder cancer only, but some bladder cancers clustered with lung adenocarcinomas, and others with a subtype called ‘squamous-like’ that includes some lung cancers, some head-and-neck cancers, and some bladder cancers. TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes typified this latter cluster.

“If you look at survival rates, the bladder cancers that clustered with other tumour types had a worse prognosis. So this is not just an academic exercise,” Stuart said.

Different subtypes of breast cancers are already recognised, but the findings were able to add more detail here, though – for example, the results reinforced the idea that ‘basal-like’ breast cancers were a unique tumour type, and were clearly different from luminal breast cancers. The findings, which were published in Cell, resulted in one in ten cancers being reclassified.

“It’s only ten percent that were classified differently, but it matters a lot if you’re one of those patients,” said senior author Josh Stuart, a professor of biomolecular engineering at UC Santa Cruz.

Further studies are planned; the next major Pan-Cancer analysis will include 21 tumour types.

One Comment

  1. Pingback: Data Mining creates new cancer classification system « Big Data Made Simple

Post a Comment

Your email address will not be published. Required fields are marked *