Word Timestamps
Get precise timing information for every word in your transcription.
Overview
Word-level timestamps provide start and end times for each word, essential for:
- Video subtitle generation
- Audio/video editing alignment
- Karaoke-style highlighting
- Accessibility applications
- Search and navigation within audio
Enable Word Timestamps
Use the verbose_json response format with timestamp_granularities=word:
import requests
response = requests.post(
"https://yourvoic.com/api/v1/stt/cipher/transcribe",
headers={"X-API-Key": "your_api_key"},
files={"file": open("audio.mp3", "rb")},
data={
"model": "cipher-max",
"response_format": "verbose_json",
"timestamp_granularities": "word"
}
)
result = response.json()
Response Format
{
"success": true,
"text": "Hello world, this is a test.",
"words": [
{"word": "Hello", "start": 0.0, "end": 0.45},
{"word": "world,", "start": 0.52, "end": 0.89},
{"word": "this", "start": 1.02, "end": 1.18},
{"word": "is", "start": 1.22, "end": 1.35},
{"word": "a", "start": 1.38, "end": 1.42},
{"word": "test.", "start": 1.45, "end": 1.82}
],
"duration": 1.82,
"language": "en"
}
Timestamp Granularities
| Value | Description | Use Case |
|---|---|---|
word | Individual word timing | Subtitles, karaoke |
segment | Sentence/phrase timing | Paragraph-level alignment |
Generate SRT Subtitles
def create_srt(words, max_chars=42, max_duration=4.0):
"""Generate SRT subtitle file from word timestamps"""
subtitles = []
current_line = []
current_chars = 0
line_start = None
for word in words:
if line_start is None:
line_start = word['start']
if current_chars + len(word['word']) > max_chars or \
word['end'] - line_start > max_duration:
# Save current line and start new one
subtitles.append({
'start': line_start,
'end': current_line[-1]['end'] if current_line else word['start'],
'text': ' '.join(w['word'] for w in current_line)
})
current_line = [word]
current_chars = len(word['word'])
line_start = word['start']
else:
current_line.append(word)
current_chars += len(word['word']) + 1
# Don't forget the last line
if current_line:
subtitles.append({
'start': line_start,
'end': current_line[-1]['end'],
'text': ' '.join(w['word'] for w in current_line)
})
return subtitles
# Use it
result = response.json()
subtitles = create_srt(result['words'])
Direct Subtitle Formats
For convenience, you can request subtitles directly:
# Get SRT format directly
curl -X POST "https://yourvoic.com/api/v1/stt/cipher/transcribe" \
-H "X-API-Key: your_api_key" \
-F "file=@video.mp4" \
-F "model=cipher-max" \
-F "response_format=srt"
# Get WebVTT format
curl -X POST "https://yourvoic.com/api/v1/stt/cipher/transcribe" \
-H "X-API-Key: your_api_key" \
-F "file=@video.mp4" \
-F "model=cipher-max" \
-F "response_format=vtt"