Speaker Diarization
Automatically identify and distinguish between different speakers in your audio recordings.
Overview
Speaker diarization answers the question "who spoke when?" by segmenting audio based on speaker identity. This is essential for meetings, interviews, podcasts, and any multi-speaker content.
Supported Models
| Model | Diarization Support | Max Speakers |
|---|---|---|
lucid-mono | ✅ | Unlimited |
lucid-multi | ✅ | Unlimited |
lucid-agent | ✅ | Unlimited |
lucid-lite | ✅ | Unlimited |
cipher-fast | ❌ | - |
cipher-max | ❌ | - |
Enable Diarization
import requests
response = requests.post(
"https://yourvoic.com/api/v1/stt/lucid/transcribe",
headers={"X-API-Key": "your_api_key"},
files={"file": open("meeting.mp3", "rb")},
data={
"model": "lucid-mono",
"diarize": "true"
}
)
result = response.json()
Response Format
When diarization is enabled, the response includes speaker information:
{
"success": true,
"text": "Hello everyone, welcome to the meeting. Thanks for having me.",
"utterances": [
{
"speaker": 0,
"text": "Hello everyone, welcome to the meeting.",
"start": 0.0,
"end": 2.5,
"confidence": 0.95
},
{
"speaker": 1,
"text": "Thanks for having me.",
"start": 3.0,
"end": 4.2,
"confidence": 0.92
}
],
"speakers": {
"count": 2
}
}
Processing the Results
# Print transcript with speaker labels
result = response.json()
for utterance in result.get('utterances', []):
speaker = utterance['speaker']
text = utterance['text']
start = utterance['start']
print(f"[Speaker {speaker}] ({start:.1f}s): {text}")
Best Practices
- Audio Quality: Clear audio with minimal background noise produces better speaker separation
- Speaker Distance: Ensure speakers are at similar distances from the microphone
- Overlapping Speech: Minimize simultaneous talking for cleaner diarization
- Consistent Voices: Works best when speakers have distinct voice characteristics
💡 Note: Speaker labels (0, 1, 2...) are assigned based on order of first appearance in the audio, not by any pre-defined identity.