Supported Audio Formats

File formats and encoding requirements for audio transcription.

Supported File Formats

FormatExtensionMIME TypeNotes
MP3.mp3audio/mpegMost common, widely supported
WAV.wavaudio/wavUncompressed, highest quality
MP4/M4A.mp4, .m4aaudio/mp4AAC audio container
WebM.webmaudio/webmWeb-native format
OGG.oggaudio/oggVorbis codec
FLAC.flacaudio/flacLossless compression

File Size Limits

LimitValue
Maximum file size25 MB
Maximum duration~2 hours (depends on encoding)

Recommended Settings

For best transcription quality:

PropertyRecommendedMinimum
Sample Rate16000 Hz or higher8000 Hz
ChannelsMono (1)Mono or Stereo
Bit Depth16-bit16-bit
Bitrate (MP3)128 kbps or higher64 kbps

Real-time Streaming Format

For WebSocket streaming, audio must be sent in specific formats:

PropertyRequirement
EncodingLinear16 (PCM), FLAC, or Opus
Sample Rate16000 Hz recommended
Channels1 (mono)
Bit Depth16-bit

Audio Quality Tips

  • Clear audio: Minimize background noise for better accuracy
  • Consistent volume: Avoid very quiet or clipped audio
  • Mono preferred: Stereo files are converted to mono internally
  • No music: Background music can interfere with speech recognition

Converting Audio

Use FFmpeg to convert audio to optimal format:

# Convert to optimal format for transcription
ffmpeg -i input.mp4 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

# Convert to MP3
ffmpeg -i input.mp4 -ar 16000 -ac 1 -b:a 128k output.mp3

# Extract audio from video
ffmpeg -i video.mp4 -vn -ar 16000 -ac 1 audio.wav