Feature Overview

Explore the powerful features available in our Speech-to-Text API for batch transcription.

Speaker Diarization

Automatically identify and distinguish between different speakers in your audio. Perfect for meetings, interviews, and multi-speaker content.

  • Automatic speaker detection and labeling (Speaker 0, Speaker 1, etc.)
  • Word-level speaker attribution
  • Works with any number of speakers
  • Available on Lucid models: lucid-mono, lucid-multi, lucid-agent
response = requests.post(
    "https://yourvoic.com/api/v1/stt/lucid/transcribe",
    headers={"X-API-Key": "your_api_key"},
    files={"file": open("meeting.mp3", "rb")},
    data={
        "model": "lucid-mono",
        "diarize": "true"
    }
)

Word-Level Timestamps

Get precise timing for every word in the transcript. Essential for video subtitles, audio editing, and accessibility.

  • Start and end time for each word
  • Millisecond precision
  • Available in verbose_json response format
  • Use timestamp_granularities=word for word-level detail
response = requests.post(
    "https://yourvoic.com/api/v1/stt/cipher/transcribe",
    headers={"X-API-Key": "your_api_key"},
    files={"file": open("audio.mp3", "rb")},
    data={
        "model": "cipher-max",
        "response_format": "verbose_json",
        "timestamp_granularities": "word"
    }
)

Context Prompts

Guide the transcription with domain-specific vocabulary. The model will prioritize recognizing these terms.

💡 Tip: Use context prompts for industry jargon, product names, or technical terms that might be unfamiliar to the model.
  • Medical: "Patient diagnosis, MRI, CT scan, hypertension, cardiologist"
  • Technical: "API, SDK, microservices, Kubernetes, containerization"
  • Legal: "plaintiff, defendant, deposition, habeas corpus"
response = requests.post(
    "https://yourvoic.com/api/v1/stt/cipher/transcribe",
    headers={"X-API-Key": "your_api_key"},
    files={"file": open("medical_recording.mp3", "rb")},
    data={
        "model": "cipher-max",
        "prompt": "Medical terms: hypertension, myocardial infarction, echocardiogram"
    }
)

Keywords Boost

Improve recognition accuracy for specific words without full context prompts.

  • Simply pass comma-separated keywords
  • Perfect for proper nouns and brand names
  • Available on Lucid models
response = requests.post(
    "https://yourvoic.com/api/v1/stt/lucid/transcribe",
    headers={"X-API-Key": "your_api_key"},
    files={"file": open("interview.mp3", "rb")},
    data={
        "model": "lucid-mono",
        "keywords": "YourVoic,API,transcription"
    }
)

Smart Formatting

Automatic formatting of numbers, dates, times, currency, and more.

  • Numbers: "one hundred twenty three" → "123"
  • Dates: "January first twenty twenty four" → "January 1, 2024"
  • Currency: "fifty dollars" → "$50"
  • Times: "three thirty PM" → "3:30 PM"

Multiple Output Formats

Choose the output format that best fits your needs:

FormatDescriptionUse Case
jsonSimple JSON with textBasic transcription
textPlain text outputSimple text extraction
verbose_jsonFull details with timestampsDetailed analysis
srtSubRip subtitle formatVideo subtitles
vttWebVTT formatWeb video captions