Zuruck zum Blog
Anti-Erkennung

Speech Synthesis Fingerprinting: How Voice Lists Identify Your OS

The speechSynthesis API exposes installed text-to-speech voices, creating an OS-specific fingerprint. Learn how voice enumeration works.

Introduction

The Web Speech API's speechSynthesis.getVoices() method returns a list of text-to-speech voices available in the browser. Since each operating system ships with a different set of voices, this list creates a strong platform fingerprint. The voice list, voice URIs, language support, and the distinction between local and network voices all contribute to identification.

How Voice Fingerprinting Works

const voices = speechSynthesis.getVoices();
voices.forEach(voice => {
  console.log({
    name: voice.name,           // "Microsoft David - English (United States)"
    lang: voice.lang,           // "en-US"
    localService: voice.localService, // true (local) or false (network)
    voiceURI: voice.voiceURI,   // Unique identifier for the voice
  });
});

Platform-Specific Voices

Each OS has distinctive voices:

Windows:

  • Microsoft David, Microsoft Zira, Microsoft Mark
  • Voice names start with "Microsoft"

macOS:

  • Alex, Samantha, Victoria, Daniel
  • High-quality voices: Ava (Premium), Tom (Premium)
  • Voice names are personal names without company prefix

Linux:

  • espeak voices (if installed): "English (Great Britain)", "English (America)"
  • Minimal voice set compared to Windows and macOS

Android:

  • Google voices: "Google US English", "Google UK English"

Voice Count

The number of available voices varies significantly:

PlatformTypical Voice Count
macOS60-80+
Windows 10/1120-40
Linux (with espeak)100+ (many low quality)
Android5-15

The voiceschanged Event

Voices may not be immediately available. Browsers load them asynchronously:

// Voices may be empty initially
let voices = speechSynthesis.getVoices();

// Listen for the voiceschanged event
speechSynthesis.onvoiceschanged = () => {
  voices = speechSynthesis.getVoices();
  // Now the full list is available
};

The timing of this event and whether it fires at all is itself a fingerprinting signal. Some browsers fire it immediately, others after a delay, and some do not fire it if voices are already loaded.

Network vs Local Voices

The localService property distinguishes between:

  • Local voices: Installed on the device, work offline
  • Network voices: Require internet, often higher quality

The presence and number of network voices varies by browser and platform, adding another fingerprinting dimension.

How BotCloud Manages Voice Identity

BotCloud profiles control the speech synthesis voice list:

  • Voice names and URIs match the claimed operating system
  • Voice count is realistic for the platform
  • Local vs network voice classification is correct
  • The voiceschanged event timing matches normal browser behavior

A Windows profile returns Microsoft voices, a macOS profile returns Apple voices, regardless of the server's actual operating system.

Best Practices

  1. Ensure voice list matches the claimed OS - Microsoft voices on a macOS profile is an obvious tell
  2. Consider voice count - An empty voice list or an impossibly large list is suspicious
  3. Test voiceschanged event timing - Unusual timing patterns can be detected
  4. Verify consistency with navigator.platform - The voice list should align with other platform signals
#speech-synthesis#tts#fingerprinting#privacy