Speaker Diarization

Automatically identify and distinguish between different speakers in your audio recordings.

Overview

Speaker diarization answers the question "who spoke when?" by segmenting audio based on speaker identity. This is essential for meetings, interviews, podcasts, and any multi-speaker content.

Supported Models

ModelDiarization SupportMax Speakers
lucid-monoUnlimited
lucid-multiUnlimited
lucid-agentUnlimited
lucid-liteUnlimited
cipher-fast-
cipher-max-

Enable Diarization

import requests

response = requests.post(
    "https://yourvoic.com/api/v1/stt/lucid/transcribe",
    headers={"X-API-Key": "your_api_key"},
    files={"file": open("meeting.mp3", "rb")},
    data={
        "model": "lucid-mono",
        "diarize": "true"
    }
)

result = response.json()

Response Format

When diarization is enabled, the response includes speaker information:

{
    "success": true,
    "text": "Hello everyone, welcome to the meeting. Thanks for having me.",
    "utterances": [
        {
            "speaker": 0,
            "text": "Hello everyone, welcome to the meeting.",
            "start": 0.0,
            "end": 2.5,
            "confidence": 0.95
        },
        {
            "speaker": 1,
            "text": "Thanks for having me.",
            "start": 3.0,
            "end": 4.2,
            "confidence": 0.92
        }
    ],
    "speakers": {
        "count": 2
    }
}

Processing the Results

# Print transcript with speaker labels
result = response.json()

for utterance in result.get('utterances', []):
    speaker = utterance['speaker']
    text = utterance['text']
    start = utterance['start']
    
    print(f"[Speaker {speaker}] ({start:.1f}s): {text}")

Best Practices

  • Audio Quality: Clear audio with minimal background noise produces better speaker separation
  • Speaker Distance: Ensure speakers are at similar distances from the microphone
  • Overlapping Speech: Minimize simultaneous talking for cleaner diarization
  • Consistent Voices: Works best when speakers have distinct voice characteristics
💡 Note: Speaker labels (0, 1, 2...) are assigned based on order of first appearance in the audio, not by any pre-defined identity.