Anti-Erkennung11. Dezember 2025

Speech Synthesis Fingerprinting: How Voice Lists Identify Your OS

The speechSynthesis API exposes installed text-to-speech voices, creating an OS-specific fingerprint. Learn how voice enumeration works.

Introduction

The Web Speech API's speechSynthesis.getVoices() method returns a list of text-to-speech voices available in the browser. Since each operating system ships with a different set of voices, this list creates a strong platform fingerprint. The voice list, voice URIs, language support, and the distinction between local and network voices all contribute to identification.

How Voice Fingerprinting Works

const voices = speechSynthesis.getVoices();
voices.forEach(voice => {
  console.log({
    name: voice.name,           // "Microsoft David - English (United States)"
    lang: voice.lang,           // "en-US"
    localService: voice.localService, // true (local) or false (network)
    voiceURI: voice.voiceURI,   // Unique identifier for the voice
  });
});

Platform-Specific Voices

Each OS has distinctive voices:

Windows:

Microsoft David, Microsoft Zira, Microsoft Mark
Voice names start with "Microsoft"

macOS:

Alex, Samantha, Victoria, Daniel
High-quality voices: Ava (Premium), Tom (Premium)
Voice names are personal names without company prefix

Linux:

espeak voices (if installed): "English (Great Britain)", "English (America)"
Minimal voice set compared to Windows and macOS

Android:

Google voices: "Google US English", "Google UK English"

Voice Count

The number of available voices varies significantly:

Platform	Typical Voice Count
macOS	60-80+
Windows 10/11	20-40
Linux (with espeak)	100+ (many low quality)
Android	5-15

The voiceschanged Event

Voices may not be immediately available. Browsers load them asynchronously:

// Voices may be empty initially
let voices = speechSynthesis.getVoices();

// Listen for the voiceschanged event
speechSynthesis.onvoiceschanged = () => {
  voices = speechSynthesis.getVoices();
  // Now the full list is available
};

The timing of this event and whether it fires at all is itself a fingerprinting signal. Some browsers fire it immediately, others after a delay, and some do not fire it if voices are already loaded.

Network vs Local Voices

The localService property distinguishes between:

Local voices: Installed on the device, work offline
Network voices: Require internet, often higher quality

The presence and number of network voices varies by browser and platform, adding another fingerprinting dimension.

How BotCloud Manages Voice Identity

BotCloud profiles control the speech synthesis voice list:

Voice names and URIs match the claimed operating system
Voice count is realistic for the platform
Local vs network voice classification is correct
The voiceschanged event timing matches normal browser behavior

A Windows profile returns Microsoft voices, a macOS profile returns Apple voices, regardless of the server's actual operating system.

Best Practices

Ensure voice list matches the claimed OS - Microsoft voices on a macOS profile is an obvious tell
Consider voice count - An empty voice list or an impossibly large list is suspicious
Test voiceschanged event timing - Unusual timing patterns can be detected
Verify consistency with navigator.platform - The voice list should align with other platform signals

#speech-synthesis#tts#fingerprinting#privacy