Lee Miller vividly recalls the day in 2021 when he met a woman who had lost the function of her vocal cords. In hoarse, whispering tones she explained how her voice had been instrumental to her vocation. Losing it, she said, undercut her life's purpose. Her words were faint, but the lesson was powerful.
"Our voice is so important to our sense of identity and empowerment," said Miller, a professor of neurobiology, physiology and behavior in the University of California, Davis College of Biological Sciences, a professor of otolaryngology and head and neck surgery at the UC Davis School of Medicine and technical director of the Center for Mind and Brain.
Now, Miller is working to restore original voices to those who have lost them - based partly on adapting technology for interpreting gestures and controlling robotic limbs.
Every year, nearly 1 million people worldwide are diagnosed with head and neck cancer. Many of them lose their ability to speak intelligibly due to surgical removal of - or radiation damage to - the larynx, mouth and tongue. These people can learn to speak again using devices that emit artificial sounds, which they can shape into words. But their new voices are often weak, mechanical or distressingly unfamiliar.
Miller and his collaborators are developing a system that could one day restore a person's unique, original voice.
Decoding the mind
Miller is working on a project to record electromyographic (EMG) signals on a person's skin, generated by the muscle contractions created by reaching out or clenching a fist, and decode them into digital instructions that can be used to control a robotic arm. Doing this might one day allow astronauts to repair equipment outside a space station without undertaking a potentially risky spacewalk.
Miller has worked with the company Meta to use EMG signals to recognize and interpret a person's gestures so they can interact with computers using natural body language - rather than a mouse and keyboard.
The difficulty is that EMG signals often vary from person to person, depending on their age, skin characteristics, body weight and other factors. These biological signals also produce mountains of data per second - which computers will need to be able to quickly process.
"We have only a limited amount of time," said Miller, "perhaps only 50 milliseconds, before the computer causes a delay, which would make real-time interpretations impossible."
Miller and graduate student Harsha Gowda (in the Electrical and Computer Engineering Graduate Group), solved this by using only tiny bits of the incoming signals while ignoring everything else. Rather than tracing the chaotic ups and downs of each EMG electrode on a person's arm, Gowda employed a strategy that simply measures signal relationships among various pairs of electrodes.
These simplified signal representations turn out to be "very well-behaved," and don't vary from one person to another, Miller said. "So now we have a gesture decoder that works for everybody."
Restoring voice
Miller became interested in applying these lessons to speech during a visit in 2021 with Peter Belafsky, professor of otolaryngology at the School of Medicine and director of the UC Davis Center for Voice and Swallowing. It was at Belafsky's clinic that he met the woman whose voice had been part of her vocation and others who had lost their voices. Hearing their stories "was profoundly motivating," said Miller.
Miller embarked on the Silent Speech project in 2022, collaborating with Belafsky, Sergey Stavisky, assistant professor of neurological surgery, and David Brandman, a professor of neurosurgery at the School of Medicine.
Miller and Gowda began the project by working with healthy volunteers, using EMG electrodes to record the movements of their mouth and face muscles during speech. Then, they used the simplified EMG signals with simultaneously recorded speech to train a computer to match different EMG patterns with different speech sounds for each person. The result is tailored, computer-generated speech that is created using the unique tones of the person's voice.
"We don't need that much data to clone the person's voice," said Miller. In his experience, it requires only about five minutes of speech combined with that person's EMG signals.
With the support of a STAIR (Science Translation and Innovative Research) grant from the UC Davis Office of Research, Miller and colleagues are now trying to use this system to restore the voices of people who no longer have functional larynxes. For those individuals, it is no longer possible to record natural voices, so Miller and his colleagues try to stitch together meaningful samples from other sources like family videos. One patient had recorded an audio diary to capture a record of his voice in the weeks before his larynx was surgically removed.
"It was a very personal choice that he made, preserving a memento of his voice that he knew he was about to lose," said Miller. "It was very special that he shared those recordings with us." They turned out to be a perfect trove of raw material for digitally recreating his voice.
Miller's team is now pairing those recordings with EMG and video of the man's face, which they recorded as he spoke the same words silently, without his larynx. Here is an example of restored "silent speech" from an individual who no longer has a voice due to laryngectomy.
Miller envisions that this system might one day run on a smartphone. The person would move their mouth to speak silently into their phone as though doing a video call. The phone would simultaneously record EMG signals and video of their face - combining these with a sample of the person's voice to create natural-sounding speech.
Engineering this system so that it works outside of the laboratory for a wide array of people could take several years, said Miller. Even so, "Ultimately, we want this to work easily for anybody."
Seed funding for this project was provided by the generous support of the Mike and Renee Child Family Fund for the Center for Mind and Brain in the fall of 2022. Subsequent funding was obtained through the Center for Information Technology Research in the Interest of Society (CITRIS) and the Banatao Institute, and a close collaboration with technology consultancy Accenture, led by Adolfo Ramirez-Aristizabal in Accenture Labs San Francisco.