Restoring voices — and identity — with neuroengineering

UC Davis researchers decode facial and muscle signals to restore authentic speech

(SACRAMENTO)

Lee Miller vividly recalls the day in 2021 when he met a patient who had lost the function of her vocal cords. In hoarse, whispering tones she explained how her voice had been instrumental to her vocation. Losing it, she said, undercut her life’s purpose. He had to listen carefully to hear her faint words, but the lesson “was powerful.”

“Our voice is so important to our sense of identity and empowerment,” shared Miller. He is a professor of neurobiology, physiology, and behavior at UC Davis, a professor of otolaryngology and head and neck surgery at the UC Davis School of Medicine and technical director at the Center for Mind and Brain

Now, Miller is working to restore original voices to those who have lost them — based partly on adapting technology for interpreting gestures and controlling robotic limbs. 

Every year, nearly one million people worldwide are diagnosed with head and neck cancer. Many of them lose their ability to speak intelligibly due to surgical removal of — or radiation damage to — the larynx, mouth, and tongue. These individuals can learn to speak again using devices that emit artificial sounds, which they can shape into words. But their new voices are often weak, mechanical, or distressingly unfamiliar.

Miller and his collaborators are developing a system that could one day restore a person’s unique, original voice. 

Lee Miller
“Our voice is so important to our sense of identity and empowerment.”Lee Miller

Decoding the mind

Miller has spent 25 years studying how biological signals, such as the sound of a voice, travel from person to person and from one brain area to another.

Simply perceiving another person’s speech in a crowded room is surprisingly challenging.

“The brain has to focus on the needle in the haystack,” explained Miller. “It must screen out a mountain of irrelevant signals, such as background noise, music, echoes, and other peoples’ voices. If engineers can learn to isolate that tiny relevant signal from all the noise, as the brain does, they could accomplish some amazing things.”

Miller is working on a project to record electromyographic (EMG) signals on a person’s skin, which are generated by the muscle contractions created by reaching out or clenching a fist, and decode them into digital instructions that can be used to control a robotic arm. Doing this might one day allow astronauts to repair equipment outside a space station without undertaking a potentially risky spacewalk.

Lee Miller displays how EMG electrodes are placed on patients.

Miller has worked with the company Meta to use EMG signals to recognize and interpret a person’s gestures so they can interact with computers using natural body language — rather than a mouse and keyboard.

The difficulty is that EMG signals often vary from person to person, depending on their age, skin characteristics, body weight and other factors. These biological signals also produce mountains of data per second — which computers will need to be able to process, and quickly.

“We have only a limited amount of time, perhaps only 50 milliseconds, before the computer causes a delay, which would make real-time interpretations impossible,” Miller explained.

Miller and his Ph.D. student, Harsha Gowda (in the Electrical and Computer Engineering Graduate Group), solved this by using only tiny bits of the incoming signals while ignoring everything else. Rather than tracing the chaotic ups and downs of each EMG electrode on a person’s arm, Gowda employed a strategy that simply measures signal relationships between various pairs of electrodes.

“These simplified signal representations turn out to be very well-behaved,” said Miller. “You can actually see the different gestures by just glancing at the data.” 

And unlike the noisy signals from individual electrodes, they don’t vary from one person to another. 

“So now we have a gesture decoder that works for everybody, regardless of differing body types, skin types and ages,” he added.

Restoring voice

Miller became interested in applying these lessons to speech during a visit in 2021 with Peter Belafsky, a professor of otolaryngology at UC Davis Health and director of the UC Davis Center for Voice and Swallowing. It was at Belafsky’s clinic that he met the woman whose voice had been part of her vocation and others who had lost their voices. 

“Hearing their stories was profoundly motivating,” said Miller.

Miller embarked on the Silent Speech project in 2022, collaborating with Belafsky, Gowda, Sergey Stavisky, an assistant professor of neurological surgery, and David Brandman, a professor of neurosurgery at UC Davis Health.

The team began the project by working with healthy volunteers, using EMG electrodes to record the movements of their mouth and face muscles during speech. Then, they used the simplified EMG signals with simultaneously recorded speech to train a computer to match different EMG patterns with different speech sounds for each person. The result is tailored, computer-generated speech that is created using the unique tones of the person’s voice. 

“We don’t need that much data to clone the person’s voice,” said Miller. In his experience, it requires only about five minutes of speech combined with that person’s EMG signals. 

With the support of a UC Davis STAIR (Science Translation and Innovative Research) grant, the group of researchers is now trying to use this system to restore the voices of people who no longer have functional larynxes. For these individuals, it is no longer possible to record natural voices, so Miller and the team are trying to stitch together meaningful samples from other sources like family videos.

Carlos Perez wearing electrodes used to record EMG signals during speech. 

Carlos Perez, a UC Davis Health patient, had recorded an audio diary to capture a recording of his voice in the weeks before his larynx was surgically removed.

“It was a very personal choice that Carlos made, preserving a memento of his voice that he knew he was about to lose,” said Miller. “It was very special that he shared those recordings with us.” 

The audio diary turned out to be a perfect trove of raw material for digitally recreating his voice. 

The team is now pairing those recordings with EMG and video of the Perez’s face, which they recorded as he spoke the same words silently, without his larynx. 

Miller envisions that this system might one day run on a smartphone. The person would move their mouth to speak silently into their phone as though doing a video call. The phone would simultaneously record EMG signals and video of their face — combining these with a sample of the person’s voice to create natural-sounding speech.

“Engineering this system so that it works outside of the laboratory for a wide array of people could take several years,” said Miller. “Ultimately, we want this to work easily for anybody.”

Related Articles