Feature Overview

Explore the powerful features available in our Speech-to-Text API for batch transcription.

Speaker Diarization

Automatically identify and distinguish between different speakers in your audio. Perfect for meetings, interviews, and multi-speaker content.

Automatic speaker detection and labeling (Speaker 0, Speaker 1, etc.)
Word-level speaker attribution
Works with any number of speakers
Available on Lucid models: lucid-mono, lucid-multi, lucid-agent

response = requests.post(
    "https://yourvoic.com/api/v1/stt/lucid/transcribe",
    headers={"X-API-Key": "your_api_key"},
    files={"file": open("meeting.mp3", "rb")},
    data={
        "model": "lucid-mono",
        "diarize": "true"
    }
)

Word-Level Timestamps

Get precise timing for every word in the transcript. Essential for video subtitles, audio editing, and accessibility.

Start and end time for each word
Millisecond precision
Available in verbose_json response format
Use timestamp_granularities=word for word-level detail

response = requests.post(
    "https://yourvoic.com/api/v1/stt/cipher/transcribe",
    headers={"X-API-Key": "your_api_key"},
    files={"file": open("audio.mp3", "rb")},
    data={
        "model": "cipher-max",
        "response_format": "verbose_json",
        "timestamp_granularities": "word"
    }
)

Context Prompts

Guide the transcription with domain-specific vocabulary. The model will prioritize recognizing these terms.

💡 Tip: Use context prompts for industry jargon, product names, or technical terms that might be unfamiliar to the model.

Medical: "Patient diagnosis, MRI, CT scan, hypertension, cardiologist"
Technical: "API, SDK, microservices, Kubernetes, containerization"
Legal: "plaintiff, defendant, deposition, habeas corpus"

response = requests.post(
    "https://yourvoic.com/api/v1/stt/cipher/transcribe",
    headers={"X-API-Key": "your_api_key"},
    files={"file": open("medical_recording.mp3", "rb")},
    data={
        "model": "cipher-max",
        "prompt": "Medical terms: hypertension, myocardial infarction, echocardiogram"
    }
)

Keywords Boost

Improve recognition accuracy for specific words without full context prompts.

Simply pass comma-separated keywords
Perfect for proper nouns and brand names
Available on Lucid models

response = requests.post(
    "https://yourvoic.com/api/v1/stt/lucid/transcribe",
    headers={"X-API-Key": "your_api_key"},
    files={"file": open("interview.mp3", "rb")},
    data={
        "model": "lucid-mono",
        "keywords": "YourVoic,API,transcription"
    }
)

Smart Formatting

Automatic formatting of numbers, dates, times, currency, and more.

Numbers: "one hundred twenty three" → "123"
Dates: "January first twenty twenty four" → "January 1, 2024"
Currency: "fifty dollars" → "$50"
Times: "three thirty PM" → "3:30 PM"

Multiple Output Formats

Choose the output format that best fits your needs:

Format	Description	Use Case
`json`	Simple JSON with text	Basic transcription
`text`	Plain text output	Simple text extraction
`verbose_json`	Full details with timestamps	Detailed analysis
`srt`	SubRip subtitle format	Video subtitles
`vtt`	WebVTT format	Web video captions

Previous ← Pre-Recorded Audio Next Code Examples →