With AI, the voice has acquired a new significance. Behind the words lies data that can be used both to diagnose a health problem and to steal someone's identity.
Speaking to machines is no longer the stuff of science fiction. Alexa (Amazon) has been present in our homes for over a decade, and an increasing number of users now favor voice interactions with chatbots. Whether dictating a message or asking for directions, this shift is not only technical - although AI systems are becoming ever more powerful - but also societal, reflecting how humans engage with machines. Behind the words, however, lies data.
Unlike a password, a voice cannot easily be changed. It is shaped by physiological, linguistic and personal characteristics. This "voiceprint" can identify an individual and reveal sensitive information such as origin or gender. Voice is, therefore, an especially rich form of biometric data.
"When a user interacts with a voice-based system, they not only convey content but also implicit information: emotions, physical traits or behavioral patterns," explains Andrea Cavallaro, professor and head of the Multimedia and Intelligent Sensing Laboratory at EPFL. The voice indeed contains numerous, sometimes subtle, features like rhythm, accent, tone, speed, intonation, volume or vocabulary that can all reveal something about its owner.
A resource for healthcare
Cavallaro's research shows that such information can be exploited by analytical systems, raising significant privacy concerns. Far from being a simple communication channel, voice constitutes a dataset in its own right.
The potential uses of voice data are numerous, particularly in healthcare. The same characteristics making voice identifiable also make it highly informative. Subtle variations in speech may reveal neurological disorders, respiratory diseases or emotional states. This is the premise behind Virtuosis AI, a start-up led by EPFL alumna Lara Gervaise, which explores the use of voice as a diagnostic tool.
Voice analysis could offer a non-invasive approach to medical monitoring. However, this promise also entails greater responsibility, as health data remains among the most sensitive categories of personal information.
With AI, the voice becomes a vector for identity theft at scale
Legal challenges
In another context, actors and dubbing professionals have taken legal action against companies accused of using their voices to train AI models without consent. The argument is straightforward: a voice is part of a person's identity and is therefore protected under personality or image rights.
At the same time, voice cloning tools are now widely accessible, sometimes even free of charge. It is no longer only the voices of professional actors that can be replicated, but potentially anyone's.
"You can imagine the scenarios: spam phone calls, deceiving relatives, or fabricating audio evidence. The voice has long been perceived as a personal signature. With AI, it becomes a vector for identity theft at scale," warns Cavallaro.
Protecting privacy from the start
How, then, can voice data be protected? One promising avenue is voice anonymization. Cavallaro's work explores ways of transforming speech to preserve intelligibility while masking the speaker's identity or gender. The approach involves generating "ambiguous" voices, reducing the ability of systems to detect sensitive attributes.
The challenge lies in balancing utility and privacy. Excessive transformation degrades the quality of the signal, while insufficient modification leaves personal information exposed. This research shows that a compromise is achievable.
"We are seeing a broader shift towards 'privacy by design,' where data protection is embedded from the outset in system development," says Cavallaro.
As the voice becomes a dominant interface, it invites us to rethink the relationship between technology, identity and privacy. Speaking may feel ephemeral, words seem to vanish as soon as they are uttered. Yet with AI, they are captured, analyzed and potentially stored.
On the consumer side, widespread usage is now well established. As early as 2025, Forbes reported that around 60% of smartphone users regularly used a voice assistant, highlighting a clear increase over recent years. Globally, the number of voice assistants is estimated at 8.4 billion, more than the world's population. This is explained by the multiple devices used within a single household, including smartphones, televisions and cars.
This rapid adoption is driven not only by technological progress but also by behavioral factors. Advances in natural language processing and generative AI have enabled smoother, conversational, hands-free interactions. The voice is no longer just about issuing commands: it represents a new form of interaction that is reshaping how we access and process information, services and artificial intelligence itself.