Caltech scientists have developed a method driven by machine learning that allows them to accurately measure the mass of individual particles and molecules using complex nanoscale devices. The new technique opens the possibility of using a variety of devices for the measurement of mass and, therefore, the identification of proteins, and could pave the way to determining the sequence of the complete proteome, the collection of all the proteins in an organism.
Proteins are the engines of living systems. Which proteins are made, where, and in what amounts can provide important information about the health of systems, clues as to what happens in the case of disease, and potential approaches to fighting disease. But scientists do not yet have a way of characterizing entire proteomes.
"We're now talking about mass spectrometry at the single molecule level; the ability to look at entire proteins in real time without chopping them up," says Michael Roukes , the Frank J. Roshek Professor of Physics, Applied Physics, and Bioengineering and an author of a paper in the journal Nature Communications that describes the new technique. "If we have a single-molecule technique that has high enough throughput so we can measure millions of proteins within a reasonable time, then we can actually understand the complete proteome of organisms, including humans."
Mass spectrometry is a common analytical tool scientists use to accomplish all sorts of molecular sleuthing. Start with a mysterious sample, ionize it (i.e., give it a charge by removing one or more electrons), and send it speeding along a specified path. Then use a magnetic or electric field to give the ions a shove from the side and see how far they move. The lighter and more positively charged the ions, the more they will get deflected; this provides a way to measure the mass and charge of each of the various ions present. With that information, researchers can try to solve for the sample's chemical makeup.
Mass spectrometry is used for many purposes, including the analysis of trace elements in forensics, detection of disease biomarkers, and analysis of pesticide residues. But the initial ionization step is not ideal for all samples, especially biological samples that can be altered by the process.
Things get more complicated when samples become minuscule-for example, when scientists want to determine the mass of an individual protein. Over the past two decades, with the development of sophisticated nanoscale devices called nanoelectromechanical systems (NEMS), it has become possible to perform a type of mass spectrometry that does not require a sample to first be ionized. This has led to routine measurements of the masses of small molecules in real time. With this approach, scientists do not have to make best guesses when interpreting which chemical species are most likely to be found in a sample. But the method has ruled out certain complex NEMS devices from being used for mass spectrometry.
NEMS mass spectrometry is typically accomplished with a silicon device that you can think of as a tiny beam tethered on either end. When the beam is struck, it resonates like a guitar string and moves up and down with certain mode shapes occurring at different frequencies.
If a sample is placed on such a beam, the individual frequencies of the beam's vibrational modes will change. "From these frequency changes, you can infer the mass of the sample," says John Sader , a Caltech research professor of aerospace and applied physics and lead author of the new paper. "But to do that, you need to know the shape of each mode. That's at the core of all these measurements currently-you need to know how these devices vibrate."
With the newest NEMS devices, it is not always possible to determine a precise mode shape. That is because, at the nanoscale, there are device-to-device variations or imperfections that can slightly change the mode shapes. And the advanced NEMS devices that researchers have developed to study the fundamental physics of the quantum realm have extremely complicated three-dimensional modes whose frequencies are very close to each other. "You can't just simply calculate the mode shapes and their frequencies using theory and assume these hold during a measurement," Sader says.
A further complication is that the precise location at which a sample is dropped within a device affects the frequency measurements of the beam. Thinking again of that simple beam device, if the sample is placed close to one of the tethered ends, the frequency will not change as much as if it were placed near the center, for example, where the vibrational amplitude is likely to be greater. But with devices roughly a single micron by a single micron in size, it is not possible to visualize the exact placement of a sample.
Fingerprints Indicate Location and Lead to Mass
Sader, Roukes, and their colleagues have developed a new technique they call "fingerprint nanoelectromechanical mass spectrometry," which bypasses these problems.
Following this method, the researchers randomly place a single particle on the NEMS device under ultrahigh vacuum and at ultralow temperature. In real time, they measure how the frequencies of several device modes change with that placement. This allows them to construct a high-dimensional vector representing those changes in frequency, with one vector dimension for each mode. By doing this repeatedly for particles placed in a variety of random locations, they built a library of vectors for the device that is used to train the machine-learning software.
It turns out that each vector is something of a fingerprint. It has an identifying shape-or direction-that changes uniquely depending on where a particle lands.
"If I take a particle with an unknown mass and place it anywhere on the NEMS device-I don't know where it has landed; in fact, I don't really care-and measure the frequencies of the vibrational modes, it will give me a vector that points in a specific direction," Sader explains. "If I then compare it to all the vectors in the database and find the one which is most parallel to it, that comparison will give me the unknown particle mass. It's simply the magnitude ratio of the two vectors."
Roukes and Sader say that this fingerprint technique can work with any device. The Caltech team theoretically analyzed phononic crystal NEMS devices developed in the lab of their colleague, Stanford physicist Amir Safavi-Naeni, for this study. These advanced NEMS devices effectively trap vibrations so that at certain frequencies they continue to "ring" for a long while, giving researchers plenty of time to gather quality measurements. The fingerprint method enables mass spectrometry measurements with these state-of-the-art devices. In preparation, the team used alternate devices to benchmark their fingerprint method. This included measuring the mass of individual particles of GroEL, a molecular chaperone protein that helps with proper protein folding in the cell.
Roukes notes that for large protein complexes and membrane proteins such as GroEL, standard methods of mass spectrometry are problematic for several reasons. First, those methods provide the total mass and charge, and those measurements do not uniquely identify a single species. For such large complexes, there would be many possible candidates. "You need to disambiguate that in some way," Roukes says. "The preeminent method of disambiguation at this point is taking the puzzle and chopping it up into fragments that are between 3 and 20 amino acids long." Then, he says, you would use pattern recognition to identify the mother molecule from all the daughter fragments. "But you no longer have a unique identifier of what the configuration or conformation of the original thing was because you destroyed it in the process of chopping it up."
The new fingerprint technique, Roukes notes, "is heading toward an alternative called native single-molecule mass spectrometry, where you look at large proteins and protein complexes, one-by-one, in their native form without chopping them up."
The paper, "Data-driven fingerprint nanoelectromechanical mass spectrometry" appears in the October 22 issue of the journal Nature Communications. Additional authors of the paper are Alfredo Gomez, a graduate student at Carnegie Mellon University who was a Schmidt Academy for Software Engineering Scholar at Caltech during this project; Adam P. Neumann (PhD '20), a former graduate student in Roukes's lab at Caltech; and Alex Nunn, a graduate student in applied physics at Caltech who completed the work as a junior computational scientist at the Institute. The work is supported by the Wellcome Leap Foundation through its Delta Tissue program.