One of the challenges with big data in the biological and medical sciences is dealing with the inconsistencies in collecting and reporting. A literature-based database developed at Carnegie Mellon University, described as a ‘Wikipedia’ for neurones, could help researchers to understand neuronal diversity, one of the bigger big data problems in health and disease. The research was published in the Journal of Physiology.
There are around 300 different types of neurones, all with different physical and functional properties, and there is decades’ worth of data collected about these billions of neurones, in tens of thousands of papers in the scientific literature. However, it is difficult to use this biophysical data to characterise the features and roles because of the way it is collected and organised. The Carnegie Mellon team’s NeuroElectro database has been populated with more than 10,000 published papers that contain physiological data describing how neurones responded to various inputs. The researchers used text mining algorithms to find the portions of each paper that identify the type of neurone studied and then isolate the electrophysiological data related to the properties of that neuronal type. The algorithms can also retrieve information about how each of the experiments in the literature was completed, and correct the data to account for any differences that might be caused by the format of the experiment.
“If we want to think about building a brain or re-engineering the brain, we need to know what parts we’re working with,” said Nathan Urban, director of Carnegie Mellon’s BrainHubSM neuroscience initiative. “We know a lot about neurones in some areas of the brain, but very little about neurones in others. To accelerate our understanding of neurones and their functions, we need to be able to easily determine whether what we already know about some neurones can be applied to others we know less about.”
To demonstrate potential uses for the database, the team compared the electrophysiological data from more than 30 neurone types. They found that the biggest variables came from experimental conditions such as electrode types, recording temperatures, or animal age, and if this was corrected for, then a substantial degree of the biophysical variability seen within neurone types was normalised, showing that electrophysiological data were more reproducible across labs than previously thought.
The researchers were able to divide the neurone types in the brain into six to nine super-classes, which include intuitive clusters, such as fast-spiking basket cells, as well as previously unrecognized clusters, including a novel class of cortical and olfactory bulb interneurones that exhibit persistent activity at theta-band frequencies.
NeuroElectro, which is a publically-available database and website for organizing information on the cellular neurophysiology of different neurone types, was created by Urban team at Carnegie Mellon University and is developed and maintained by Shreejoy Tripathy at the University of British Columbia (previously in Urban’s lab) and Richard Gerkin at Arizona State University. This centralised resource, covering around 100 different neurones, will help researchers to collect and compare data on neuronal function. While the team have validated much of the data, they also allow site users to flag data for further evaluation, and contribute new data.