Feature Overview
Explore the powerful features available in our Speech-to-Text API for batch transcription.
Speaker Diarization
Automatically identify and distinguish between different speakers in your audio. Perfect for meetings, interviews, and multi-speaker content.
- Automatic speaker detection and labeling (Speaker 0, Speaker 1, etc.)
- Word-level speaker attribution
- Works with any number of speakers
- Available on Lucid models:
lucid-mono,lucid-multi,lucid-agent
response = requests.post(
"https://yourvoic.com/api/v1/stt/lucid/transcribe",
headers={"X-API-Key": "your_api_key"},
files={"file": open("meeting.mp3", "rb")},
data={
"model": "lucid-mono",
"diarize": "true"
}
)
Word-Level Timestamps
Get precise timing for every word in the transcript. Essential for video subtitles, audio editing, and accessibility.
- Start and end time for each word
- Millisecond precision
- Available in
verbose_jsonresponse format - Use
timestamp_granularities=wordfor word-level detail
response = requests.post(
"https://yourvoic.com/api/v1/stt/cipher/transcribe",
headers={"X-API-Key": "your_api_key"},
files={"file": open("audio.mp3", "rb")},
data={
"model": "cipher-max",
"response_format": "verbose_json",
"timestamp_granularities": "word"
}
)
Context Prompts
Guide the transcription with domain-specific vocabulary. The model will prioritize recognizing these terms.
💡 Tip: Use context prompts for industry jargon, product names, or technical terms that might be unfamiliar to the model.
- Medical: "Patient diagnosis, MRI, CT scan, hypertension, cardiologist"
- Technical: "API, SDK, microservices, Kubernetes, containerization"
- Legal: "plaintiff, defendant, deposition, habeas corpus"
response = requests.post(
"https://yourvoic.com/api/v1/stt/cipher/transcribe",
headers={"X-API-Key": "your_api_key"},
files={"file": open("medical_recording.mp3", "rb")},
data={
"model": "cipher-max",
"prompt": "Medical terms: hypertension, myocardial infarction, echocardiogram"
}
)
Keywords Boost
Improve recognition accuracy for specific words without full context prompts.
- Simply pass comma-separated keywords
- Perfect for proper nouns and brand names
- Available on Lucid models
response = requests.post(
"https://yourvoic.com/api/v1/stt/lucid/transcribe",
headers={"X-API-Key": "your_api_key"},
files={"file": open("interview.mp3", "rb")},
data={
"model": "lucid-mono",
"keywords": "YourVoic,API,transcription"
}
)
Smart Formatting
Automatic formatting of numbers, dates, times, currency, and more.
- Numbers: "one hundred twenty three" → "123"
- Dates: "January first twenty twenty four" → "January 1, 2024"
- Currency: "fifty dollars" → "$50"
- Times: "three thirty PM" → "3:30 PM"
Multiple Output Formats
Choose the output format that best fits your needs:
| Format | Description | Use Case |
|---|---|---|
json | Simple JSON with text | Basic transcription |
text | Plain text output | Simple text extraction |
verbose_json | Full details with timestamps | Detailed analysis |
srt | SubRip subtitle format | Video subtitles |
vtt | WebVTT format | Web video captions |