Big data got too big.
A research team with statisticians from Cornell has developed a data representation method inspired by quantum mechanics that handles large data sets more efficiently than traditional methods by simplifying them and filtering out noise.
This method could spur innovation in data-rich but statistically intimidating fields, like health care and epigenetics, where traditional data methods have thus far proved insufficient.
"Physicists and allied scientists have developed quantum mechanics-based tools that offer concise mathematical representations of complex data," said Martin Wells, the Charles A. Alexander Professor of Statistical Sciences in the Cornell Ann S. Bowers College of Computing and Information Science and the ILR School. He is a co-author of "Robust Estimation of the Intrinsic Dimension of Data Sets with Quantum Cognition Machine Learning," which was published in Scientific Reports on February 26. "We're borrowing and using their mathematical structure from quantum mechanics to understand the structure of data."
Before spinning data into innovation or medical breakthroughs, data scientists must first get a sense of the data's complexity. To do this, scholars - particularly those working in areas like network analysis and health sciences - have traditionally turned to a technique called intrinsic dimension estimation, which helps data scientists get the gist of a massive data set without analyzing every detail. The problem is, intrinsic dimension estimation can get thrown off by noise and complexity, and real-world data is often both, researchers said.
"When you use these intrinsic dimension estimation techniques, they very often get the wrong answer by quite a big margin, and they disagree with each other," said Luca Candelori, lead author and director of research at Qognitive, an artificial intelligence startup. "It's very hard to apply them on real data sets and to get an actual estimate."
Read the full story on the Cornell Bowers website.