Supported Audio Formats
File formats and encoding requirements for audio transcription.
Supported File Formats
| Format | Extension | MIME Type | Notes |
|---|---|---|---|
| MP3 | .mp3 | audio/mpeg | Most common, widely supported |
| WAV | .wav | audio/wav | Uncompressed, highest quality |
| MP4/M4A | .mp4, .m4a | audio/mp4 | AAC audio container |
| WebM | .webm | audio/webm | Web-native format |
| OGG | .ogg | audio/ogg | Vorbis codec |
| FLAC | .flac | audio/flac | Lossless compression |
File Size Limits
| Limit | Value |
|---|---|
| Maximum file size | 25 MB |
| Maximum duration | ~2 hours (depends on encoding) |
Recommended Settings
For best transcription quality:
| Property | Recommended | Minimum |
|---|---|---|
| Sample Rate | 16000 Hz or higher | 8000 Hz |
| Channels | Mono (1) | Mono or Stereo |
| Bit Depth | 16-bit | 16-bit |
| Bitrate (MP3) | 128 kbps or higher | 64 kbps |
Real-time Streaming Format
For WebSocket streaming, audio must be sent in specific formats:
| Property | Requirement |
|---|---|
| Encoding | Linear16 (PCM), FLAC, or Opus |
| Sample Rate | 16000 Hz recommended |
| Channels | 1 (mono) |
| Bit Depth | 16-bit |
Audio Quality Tips
- Clear audio: Minimize background noise for better accuracy
- Consistent volume: Avoid very quiet or clipped audio
- Mono preferred: Stereo files are converted to mono internally
- No music: Background music can interfere with speech recognition
Converting Audio
Use FFmpeg to convert audio to optimal format:
# Convert to optimal format for transcription
ffmpeg -i input.mp4 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
# Convert to MP3
ffmpeg -i input.mp4 -ar 16000 -ac 1 -b:a 128k output.mp3
# Extract audio from video
ffmpeg -i video.mp4 -vn -ar 16000 -ac 1 audio.wav