The voice-changer system will produce computer-generated speech within milliseconds, allowing users to control factors like age, gender, and dialect.
Researchers are developing a new system that will allow people to speak anonymously in real time through computer-generated voices to help protect privacy and avoid censorship or retaliation. The technology is intended to help people such as intelligence officers carrying out sensitive missions, crime witnesses concerned about being identified by perpetrators, and whistleblowers who fear retaliation.
The three-year project, led by Honeywell and including collaborators from the University of Rochester, Texas A&M, and the University of Texas at Dallas, is funded by the Intelligence Advanced Research Projects Activity (IARPA) and part of the Anonymous Real-Time Speech (ARTS) program.
The voice-changer project has three main objectives. First, the system will transform what a user says into a digital voice within a few milliseconds, ensuring that it can be used in real-time conversations. Second, the team aims to allow users to specify what they call static traits, allowing control over the digital voice's age, gender, and dialect. Lastly, they want to neutralize what they call dynamic traits, such as emotions or health status that could potentially tip the identity of the user.
"In the end, a 30-year-old woman from Texas will be able to instantaneously transform her voice to be output by the virtual speaker to sound like a 50-year-old man with a British accent, for example, without producing artifacts that can be traced back to the identity of the user," says Zhiyao Duan, an associate professor of electrical and computer engineering and Rochester's lead on the project. "And in addition to the latency requirements, we'll also be working to ensure the intelligibility and naturalness of the computer-generated voice."
Duan says that while the roles on the project are fluid, his team at Rochester will initially focus on generating the virtual speakers and controls for the static traits, building on their experience in speaker modeling, disentangled speech representations, and voice synthesis. The team will first develop the technology to work in English. If successful, they plan to expand it to other languages such as Spanish, Mandarin, and Korean.
The team hopes these open-source voice-changer tools will have positive benefits far beyond the intended initial use cases. Still, the researchers recognize that people may have concerns about such powerful software.
"I think it's natural for people to wonder what will happen if these tools get in the hands of bad actors," says Duan. "It's important to note that my lab and others around the globe are also working to develop deepfake detection tools so that people can discern whether something is said by an actual person or generated through algorithms. Those tools will be equally important to have."