Response Formats
Understanding the different response formats available from the Speech-to-Text API.
Standard JSON Response
Default response format for all batch transcription requests:
{
"success": true,
"text": "Hello, this is a transcription test.",
"duration": 5.2,
"language": "en",
"model": "cipher-fast",
"credits_used": 16
}
Verbose JSON Response
Extended response with word-level timestamps (use response_format=verbose_json):
{
"success": true,
"text": "Hello world",
"duration": 1.5,
"language": "en",
"words": [
{"word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.98},
{"word": "world", "start": 0.6, "end": 1.0, "confidence": 0.97}
],
"segments": [
{
"text": "Hello world",
"start": 0.0,
"end": 1.0,
"confidence": 0.97
}
]
}
Diarization Response
Response when speaker diarization is enabled:
{
"success": true,
"text": "Hello everyone. Thanks for having me.",
"duration": 4.5,
"utterances": [
{
"speaker": 0,
"text": "Hello everyone.",
"start": 0.0,
"end": 1.5,
"confidence": 0.95
},
{
"speaker": 1,
"text": "Thanks for having me.",
"start": 2.0,
"end": 3.8,
"confidence": 0.93
}
],
"speakers": {
"count": 2
}
}
SRT Format
SubRip subtitle format for video captioning:
1
00:00:00,000 --> 00:00:02,500
Hello, welcome to our video.
2
00:00:03,000 --> 00:00:05,500
Today we'll be discussing the API.
WebVTT Format
Web Video Text Tracks format for HTML5 video:
WEBVTT
00:00:00.000 --> 00:00:02.500
Hello, welcome to our video.
00:00:03.000 --> 00:00:05.500
Today we'll be discussing the API.
Real-time Streaming Messages
Transcript Message
{
"type": "transcript",
"text": "Hello, how are you?",
"is_final": true,
"confidence": 0.95,
"start": 0.0,
"end": 1.5,
"words": [
{"word": "Hello", "start": 0.0, "end": 0.3},
{"word": "how", "start": 0.4, "end": 0.5},
{"word": "are", "start": 0.6, "end": 0.7},
{"word": "you", "start": 0.8, "end": 1.0}
]
}
Speech Events
// Speech started
{
"type": "speech_started",
"timestamp": 1234567890.123
}
// Speech ended
{
"type": "speech_ended",
"timestamp": 1234567891.456
}
Session Status
{
"type": "session_status",
"duration": 45.5,
"credits_used": 137,
"credits_remaining": 9863
}
Error Response
{
"success": false,
"error": {
"code": "INVALID_FILE_FORMAT",
"message": "Unsupported audio format. Please use MP3, WAV, M4A, or WebM."
}
}