🔌 STT Integrations
Integrate YourVoic Speech-to-Text into your applications, automation workflows, voice AI platforms, and telephony systems. Our API supports both batch transcription and real-time streaming.
Integration Overview #
YourVoic STT provides multiple integration options to fit your needs:
REST API
Upload audio files for batch transcription with full feature support.
WebSocket
Real-time streaming transcription for live audio applications.
Webhooks
Receive notifications when transcriptions complete.
Telephony
Integrate with Asterisk, FreeSWITCH, and SIP systems.
🤖 Automation Platforms #
Integrate YourVoic STT into your automation workflows using popular no-code/low-code platforms.
n8n Workflow Automation
Use the HTTP Request node in n8n to transcribe audio files or process voice recordings automatically.
{
"url": "https://yourvoic.com/api/v1/stt/transcribe",
"method": "POST",
"authentication": "predefinedCredentialType",
"nodeCredentialType": "httpHeaderAuth",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{ "name": "X-API-Key", "value": "={{ $credentials.apiKey }}" }
]
},
"sendBody": true,
"contentType": "multipart-form-data",
"bodyParameters": {
"parameters": [
{ "name": "file", "value": "={{ $binary.data }}" },
{ "name": "model", "value": "cipher-fast" },
{ "name": "response_format", "value": "json" }
]
}
}
n8n Workflow: Voicemail to Slack
- Trigger: Webhook receives audio file from phone system
- HTTP Request: Send to YourVoic STT API
- Set Node: Extract transcription text
- Slack Node: Post transcription to channel
- Optional: Store in Google Sheets or Notion
Zapier Automation
Create Zaps to automatically transcribe audio files when they're uploaded or received.
- Choose your trigger (e.g., "New File in Google Drive")
- Add "Webhooks by Zapier" as an action
- Select "Custom Request"
- Method: POST
- URL:
https://yourvoic.com/api/v1/stt/transcribe - Add header:
X-API-Key: your-api-key - Upload the audio file as form-data
Make.com (Integromat)
Use the HTTP module to transcribe audio files in your Make.com scenarios.
{
"url": "https://yourvoic.com/api/v1/stt/transcribe",
"method": "POST",
"headers": [
{ "name": "X-API-Key", "value": "your-api-key" }
],
"bodyType": "multipart/form-data",
"fields": [
{ "key": "file", "type": "file", "value": "{{audio_file}}" },
{ "key": "model", "type": "text", "value": "lucid-mono" },
{ "key": "diarize", "type": "text", "value": "true" },
{ "key": "punctuate", "type": "text", "value": "true" }
]
}
🎙️ Voice AI Platforms #
Build real-time voice AI applications with streaming STT capabilities.
Pipecat Voice Agent Framework
Integrate YourVoic as a Speech-to-Text service in your Pipecat voice agent pipeline.
import asyncio
import websockets
import json
from pipecat.services.stt import STTService
class YourVoicSTT(STTService):
"""YourVoic Real-time STT for Pipecat"""
def __init__(self, api_key: str, model: str = "lucid-mono"):
self.api_key = api_key
self.model = model
self.ws_url = f"wss://yourvoic.com:8443/api/v1/stt/realtime/stream?model={model}"
self.ws = None
async def connect(self):
"""Establish WebSocket connection"""
headers = {"X-API-Key": self.api_key}
self.ws = await websockets.connect(self.ws_url, extra_headers=headers)
async def transcribe_stream(self, audio_chunk: bytes) -> str:
"""Send audio and receive transcription"""
if self.ws:
await self.ws.send(audio_chunk)
response = await self.ws.recv()
data = json.loads(response)
if data.get("type") == "transcript":
return data.get("text", "")
return ""
async def close(self):
"""Close WebSocket connection"""
if self.ws:
await self.ws.close()
# Usage in Pipecat pipeline
async def main():
stt = YourVoicSTT(api_key="your-key", model="lucid-mono")
await stt.connect()
# Process audio frames from microphone
async for audio_frame in microphone_stream():
text = await stt.transcribe_stream(audio_frame)
if text:
print(f"Transcribed: {text}")
await stt.close()
LiveKit Real-time Communication
Use YourVoic STT with LiveKit for real-time transcription in video/audio calls.
import asyncio
import websockets
import json
from livekit import agents
from livekit.agents import stt
class YourVoicSTTPlugin(stt.STT):
"""YourVoic STT Plugin for LiveKit Agents"""
def __init__(self, api_key: str, model: str = "lucid-agent"):
self.api_key = api_key
self.model = model
async def recognize(self, audio_buffer: stt.AudioBuffer) -> stt.SpeechEvent:
"""Transcribe audio buffer"""
import httpx
# Convert to WAV format
audio_data = audio_buffer.to_wav()
async with httpx.AsyncClient() as client:
response = await client.post(
"https://yourvoic.com/api/v1/stt/transcribe",
headers={"X-API-Key": self.api_key},
files={"file": ("audio.wav", audio_data, "audio/wav")},
data={
"model": self.model,
"punctuate": "true",
"smart_format": "true"
}
)
result = response.json()
return stt.SpeechEvent(
text=result.get("text", ""),
is_final=True,
confidence=result.get("confidence", 0.95)
)
# Usage
stt_plugin = YourVoicSTTPlugin(api_key="your-key")
Vocode Voice AI
Integrate YourVoic STT as a transcriber in Vocode voice applications.
from vocode.streaming.transcriber import BaseTranscriber
from vocode.streaming.models.transcriber import TranscriberConfig
import websockets
import json
class YourVoicTranscriberConfig(TranscriberConfig):
api_key: str
model: str = "lucid-mono"
language: str = "en"
class YourVoicTranscriber(BaseTranscriber):
"""YourVoic STT Transcriber for Vocode"""
def __init__(self, config: YourVoicTranscriberConfig):
super().__init__(config)
self.config = config
self.ws = None
async def create_websocket(self):
url = f"wss://yourvoic.com:8443/api/v1/stt/realtime/stream"
url += f"?model={self.config.model}&language={self.config.language}"
headers = {"X-API-Key": self.config.api_key}
self.ws = await websockets.connect(url, extra_headers=headers)
async def send_audio(self, audio: bytes):
if self.ws:
await self.ws.send(audio)
async def receive_transcription(self):
if self.ws:
response = await self.ws.recv()
data = json.loads(response)
return data.get("text", "")
📞 Telephony Integration #
Integrate YourVoic STT into telephony systems for call transcription, IVR, and voice analytics.
Asterisk PBX Integration
Transcribe calls in real-time or process call recordings with YourVoic STT. Use AGI scripts for batch processing or ARI for real-time streaming.
Method 1: AGI Script for Call Recording Transcription
#!/usr/bin/env python3
"""YourVoic STT AGI Script for Asterisk Call Transcription"""
import sys
import requests
import os
API_KEY = "your-api-key"
API_URL = "https://yourvoic.com/api/v1/stt/transcribe"
def agi_command(cmd):
"""Send AGI command and get response"""
sys.stdout.write(f"{cmd}\n")
sys.stdout.flush()
return sys.stdin.readline().strip()
def transcribe_recording(audio_path, model="cipher-fast"):
"""Transcribe an audio file"""
with open(audio_path, 'rb') as f:
response = requests.post(
API_URL,
headers={"X-API-Key": API_KEY},
files={"file": f},
data={
"model": model,
"punctuate": "true",
"diarize": "true" # Identify caller vs agent
}
)
if response.status_code == 200:
return response.json()
return None
def main():
# Read AGI environment
env = {}
while True:
line = sys.stdin.readline().strip()
if not line:
break
if ':' in line:
key, value = line.split(':', 1)
env[key.strip()] = value.strip()
# Get recording path from AGI argument
recording_path = env.get('agi_arg_1', '')
if recording_path and os.path.exists(recording_path):
result = transcribe_recording(recording_path)
if result:
# Set channel variables with transcription
text = result.get('text', '').replace('"', '\\"')
agi_command(f'SET VARIABLE TRANSCRIPTION "{text[:1000]}"')
agi_command(f'SET VARIABLE STT_DURATION "{result.get("duration", 0)}"')
agi_command(f'SET VARIABLE STT_LANGUAGE "{result.get("language", "unknown")}"')
agi_command('VERBOSE "Transcription completed" 1')
else:
agi_command('VERBOSE "Transcription failed" 1')
else:
agi_command('VERBOSE "Recording file not found" 1')
if __name__ == "__main__":
main()
Dialplan: Post-Call Transcription
[call-recording]
; Record and transcribe calls
exten => _X.,1,Answer()
same => n,Set(RECORDING=/var/spool/asterisk/recordings/${UNIQUEID}.wav)
same => n,MixMonitor(${RECORDING},b)
same => n,Dial(SIP/${EXTEN},60)
same => n,StopMixMonitor()
same => n,AGI(yourvoic_stt.py,${RECORDING})
same => n,NoOp(Transcription: ${TRANSCRIPTION})
same => n,Hangup()
[ivr-voice-input]
; Capture voice input and transcribe
exten => s,1,Answer()
same => n,Playback(please-say-your-name)
same => n,Set(VOICE_FILE=/tmp/voice_${UNIQUEID}.wav)
same => n,Record(${VOICE_FILE},3,30,q)
same => n,AGI(yourvoic_stt.py,${VOICE_FILE})
same => n,NoOp(User said: ${TRANSCRIPTION})
same => n,GotoIf($["${TRANSCRIPTION}" != ""]?process:retry)
same => n(process),AGI(process_name.py,${TRANSCRIPTION})
same => n(retry),Playback(sorry-try-again)
same => n,Goto(s,1)
Method 2: ARI Real-time Transcription
import ari
import asyncio
import websockets
import json
# ARI connection
ari_client = ari.connect('http://localhost:8088', 'asterisk', 'asterisk')
YOURVOIC_API_KEY = "your-api-key"
class RealtimeTranscriber:
def __init__(self, channel_id):
self.channel_id = channel_id
self.ws = None
self.transcript = []
async def connect(self):
url = "wss://yourvoic.com:8443/api/v1/stt/realtime/stream"
url += "?model=lucid-mono&language=en"
headers = {"X-API-Key": YOURVOIC_API_KEY}
self.ws = await websockets.connect(url, extra_headers=headers)
async def send_audio(self, audio_data):
if self.ws:
await self.ws.send(audio_data)
async def receive_loop(self):
while self.ws:
try:
response = await self.ws.recv()
data = json.loads(response)
if data.get("type") == "transcript" and data.get("is_final"):
self.transcript.append(data.get("text", ""))
print(f"[{self.channel_id}] {data.get('text')}")
except:
break
def get_full_transcript(self):
return " ".join(self.transcript)
# Handle external media from Asterisk
def handle_external_media(channel, ev):
"""Process audio from Asterisk ExternalMedia"""
transcriber = RealtimeTranscriber(channel.id)
asyncio.create_task(transcriber.connect())
# Audio streaming logic here...
FreeSWITCH Call Transcription
Integrate YourVoic STT with FreeSWITCH for call transcription and voice analytics.
-- YourVoic STT for FreeSWITCH
local api_key = "your-api-key"
local api_url = "https://yourvoic.com/api/v1/stt/transcribe"
function transcribe_recording(session, recording_path)
-- Use curl to send recording to YourVoic API
local cmd = string.format(
'curl -s -X POST "%s" ' ..
'-H "X-API-Key: %s" ' ..
'-F "file=@%s" ' ..
'-F "model=cipher-fast" ' ..
'-F "punctuate=true"',
api_url, api_key, recording_path
)
local handle = io.popen(cmd)
local response = handle:read("*a")
handle:close()
-- Parse JSON response
local cjson = require("cjson")
local result = cjson.decode(response)
if result and result.text then
-- Set channel variable
session:setVariable("transcription", result.text)
session:setVariable("stt_language", result.language or "unknown")
return result.text
end
return nil
end
-- Usage in dialplan
--
SIP/VoIP General Integration
For custom SIP/VoIP applications, capture RTP audio streams and send to YourVoic STT.
import asyncio
import websockets
import json
import socket
import struct
class RTPtoSTTBridge:
"""Bridge RTP audio stream to YourVoic STT WebSocket"""
def __init__(self, api_key: str, rtp_port: int = 10000):
self.api_key = api_key
self.rtp_port = rtp_port
self.ws = None
self.running = False
async def connect_stt(self):
"""Connect to YourVoic STT WebSocket"""
url = "wss://yourvoic.com:8443/api/v1/stt/realtime/stream"
url += "?model=lucid-mono&sample_rate=8000&encoding=mulaw"
headers = {"X-API-Key": self.api_key}
self.ws = await websockets.connect(url, extra_headers=headers)
def extract_rtp_payload(self, packet: bytes) -> bytes:
"""Extract audio payload from RTP packet"""
# Skip RTP header (12 bytes minimum)
return packet[12:]
async def rtp_receiver(self):
"""Receive RTP packets and forward to STT"""
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('0.0.0.0', self.rtp_port))
sock.setblocking(False)
while self.running:
try:
data, addr = await asyncio.get_event_loop().sock_recvfrom(sock, 4096)
audio = self.extract_rtp_payload(data)
if self.ws and audio:
await self.ws.send(audio)
except:
await asyncio.sleep(0.001)
async def receive_transcripts(self):
"""Receive transcriptions from WebSocket"""
while self.running and self.ws:
try:
response = await self.ws.recv()
data = json.loads(response)
if data.get("type") == "transcript":
is_final = data.get("is_final", False)
text = data.get("text", "")
print(f"{'[FINAL]' if is_final else '[...]'} {text}")
except:
break
async def run(self):
"""Start the bridge"""
await self.connect_stt()
self.running = True
await asyncio.gather(
self.rtp_receiver(),
self.receive_transcripts()
)
# Usage
bridge = RTPtoSTTBridge(api_key="your-key", rtp_port=10000)
asyncio.run(bridge.run())
📦 SDKs & Libraries #
Use our official SDKs or community libraries for easier integration.
Python SDK
pip install yourvoic
from yourvoic import STT
stt = STT(api_key="your-key")
result = stt.transcribe("audio.mp3")
print(result.text)
Node.js SDK
npm install @yourvoic/sdk
import { STT } from '@yourvoic/sdk';
const stt = new STT('your-key');
const result = await stt.transcribe('audio.mp3');
console.log(result.text);
cURL Examples
Batch Transcription
curl -X POST "https://yourvoic.com/api/v1/stt/transcribe" \
-H "X-API-Key: your-api-key" \
-F "file=@audio.mp3" \
-F "model=cipher-fast" \
-F "response_format=json"
With Speaker Diarization
curl -X POST "https://yourvoic.com/api/v1/stt/transcribe" \
-H "X-API-Key: your-api-key" \
-F "file=@meeting.mp3" \
-F "model=lucid-mono" \
-F "diarize=true" \
-F "punctuate=true" \
-F "smart_format=true"
⚡ Real-time WebSocket #
Connect directly to our WebSocket endpoint for live transcription.
WebSocket Connection
| Endpoint | wss://yourvoic.com:8443/api/v1/stt/realtime/stream |
|---|---|
| Authentication | Header: X-API-Key: your-api-key |
| Audio Format | PCM 16-bit, 16kHz, mono, little-endian |
| Models | lucid-mono, lucid-multi, lucid-agent, lucid-lite |
Query Parameters
| Parameter | Default | Description |
|---|---|---|
model | lucid-mono | STT model to use |
language | en | Language code |
sample_rate | 16000 | Audio sample rate |
encoding | linear16 | Audio encoding |
punctuate | true | Add punctuation |
interim_results | true | Send partial results |
const API_KEY = 'your-api-key';
const WS_URL = 'wss://yourvoic.com:8443/api/v1/stt/realtime/stream';
async function startRealtimeTranscription() {
// Get microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);
// Connect to WebSocket
const ws = new WebSocket(`${WS_URL}?model=lucid-mono&language=en`);
ws.onopen = () => {
console.log('Connected to YourVoic STT');
// Send API key as first message
ws.send(JSON.stringify({ type: 'auth', api_key: API_KEY }));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'transcript') {
console.log(data.is_final ? '[FINAL]' : '[...]', data.text);
}
};
// Process audio and send to WebSocket
processor.onaudioprocess = (e) => {
const inputData = e.inputBuffer.getChannelData(0);
const pcm16 = new Int16Array(inputData.length);
for (let i = 0; i < inputData.length; i++) {
pcm16[i] = Math.max(-32768, Math.min(32767, inputData[i] * 32768));
}
if (ws.readyState === WebSocket.OPEN) {
ws.send(pcm16.buffer);
}
};
source.connect(processor);
processor.connect(audioContext.destination);
}
startRealtimeTranscription();
🔧 Troubleshooting #
Authentication Errors
- 401 Unauthorized: Check your API key is correct and has STT credits
- 403 Forbidden: Verify your plan includes STT access
Audio Format Issues
- No transcription: Ensure audio is not silent or corrupted
- Poor accuracy: Check sample rate matches API expectations
- WebSocket disconnect: Verify audio format is PCM 16-bit, 16kHz mono
Rate Limits
- 429 Too Many Requests: Implement exponential backoff
- Concurrent connections: Check your plan's WebSocket limits