Getting Started with the Fastest STT API
Convert audio to text with enterprise-grade accuracy using our Speech to Text API. Features include speaker recognition, real-time streaming, and support for Python and Javascript.
90+ Languages
Automatic language detection or specify the source language.
Batch & Real-time
Cipher for batch processing, Lucid for live streaming.
High Accuracy
Enterprise-grade with word-level timestamps.
Speaker Diarization
Identify and label different speakers automatically.
Keywords Boost
Improve recognition of domain-specific terms.
Smart Formatting
Auto-format dates, numbers, and currency.
Choose Your Use Case
Pre-Recorded Audio
Upload audio files for batch transcription. Perfect for podcasts, meetings, interviews, and media content.
Real-time Streaming
Live transcription via WebSocket. Ideal for voice assistants, live captions, and call centers.
Quick Example
Get started with a simple transcription request:
import requests
url = "https://yourvoic.com/api/v1/stt/transcribe"
headers = {"X-API-Key": "your_api_key"}
with open("audio.mp3", "rb") as f:
response = requests.post(url, headers=headers,
files={"file": f},
data={"model": "cipher-fast"})
result = response.json()
print(result["text"])
Available Models
| Model | Type | Best For | Credits/sec |
|---|---|---|---|
cipher-fast | Batch | Quick turnaround, general content | 2 |
cipher-max | Batch | Complex audio, accents, technical content | 2 |
lucid-mono | Both | Single-language content | 3 |
lucid-multi | Both | Mixed-language content | 3 |
lucid-agent | Both | Conversational, voice agents | 3 |
lucid-lite | Both | High volume, cost-effective | 3 |