Speech-to-Text › Integrations

🔌 STT Integrations

Integrate YourVoic Speech-to-Text into your applications, automation workflows, voice AI platforms, and telephony systems. Our API supports both batch transcription and real-time streaming.

Integration Overview #

YourVoic STT provides multiple integration options to fit your needs:

📡

REST API

Upload audio files for batch transcription with full feature support.

WebSocket

Real-time streaming transcription for live audio applications.

🔔

Webhooks

Receive notifications when transcriptions complete.

📞

Telephony

Integrate with Asterisk, FreeSWITCH, and SIP systems.

🤖 Automation Platforms #

Integrate YourVoic STT into your automation workflows using popular no-code/low-code platforms.

n8n Workflow Automation

Use the HTTP Request node in n8n to transcribe audio files or process voice recordings automatically.

💡 Use Case: Automatically transcribe uploaded audio files, meeting recordings, or voicemails and send the text to Slack, email, or a database.
n8n HTTP Request Node - Batch Transcription
{
  "url": "https://yourvoic.com/api/v1/stt/transcribe",
  "method": "POST",
  "authentication": "predefinedCredentialType",
  "nodeCredentialType": "httpHeaderAuth",
  "sendHeaders": true,
  "headerParameters": {
    "parameters": [
      { "name": "X-API-Key", "value": "={{ $credentials.apiKey }}" }
    ]
  },
  "sendBody": true,
  "contentType": "multipart-form-data",
  "bodyParameters": {
    "parameters": [
      { "name": "file", "value": "={{ $binary.data }}" },
      { "name": "model", "value": "cipher-fast" },
      { "name": "response_format", "value": "json" }
    ]
  }
}

n8n Workflow: Voicemail to Slack

  1. Trigger: Webhook receives audio file from phone system
  2. HTTP Request: Send to YourVoic STT API
  3. Set Node: Extract transcription text
  4. Slack Node: Post transcription to channel
  5. Optional: Store in Google Sheets or Notion

Zapier Automation

Create Zaps to automatically transcribe audio files when they're uploaded or received.

  1. Choose your trigger (e.g., "New File in Google Drive")
  2. Add "Webhooks by Zapier" as an action
  3. Select "Custom Request"
  4. Method: POST
  5. URL: https://yourvoic.com/api/v1/stt/transcribe
  6. Add header: X-API-Key: your-api-key
  7. Upload the audio file as form-data

Make.com (Integromat)

Use the HTTP module to transcribe audio files in your Make.com scenarios.

Make.com HTTP Module Configuration
{
  "url": "https://yourvoic.com/api/v1/stt/transcribe",
  "method": "POST",
  "headers": [
    { "name": "X-API-Key", "value": "your-api-key" }
  ],
  "bodyType": "multipart/form-data",
  "fields": [
    { "key": "file", "type": "file", "value": "{{audio_file}}" },
    { "key": "model", "type": "text", "value": "lucid-mono" },
    { "key": "diarize", "type": "text", "value": "true" },
    { "key": "punctuate", "type": "text", "value": "true" }
  ]
}

🎙️ Voice AI Platforms #

Build real-time voice AI applications with streaming STT capabilities.

Pipecat Voice Agent Framework

Integrate YourVoic as a Speech-to-Text service in your Pipecat voice agent pipeline.

Python - Pipecat STT Service
import asyncio
import websockets
import json
from pipecat.services.stt import STTService


class YourVoicSTT(STTService):
    """YourVoic Real-time STT for Pipecat"""
    
    def __init__(self, api_key: str, model: str = "lucid-mono"):
        self.api_key = api_key
        self.model = model
        self.ws_url = f"wss://yourvoic.com:8443/api/v1/stt/realtime/stream?model={model}"
        self.ws = None
    
    async def connect(self):
        """Establish WebSocket connection"""
        headers = {"X-API-Key": self.api_key}
        self.ws = await websockets.connect(self.ws_url, extra_headers=headers)
    
    async def transcribe_stream(self, audio_chunk: bytes) -> str:
        """Send audio and receive transcription"""
        if self.ws:
            await self.ws.send(audio_chunk)
            response = await self.ws.recv()
            data = json.loads(response)
            
            if data.get("type") == "transcript":
                return data.get("text", "")
        return ""
    
    async def close(self):
        """Close WebSocket connection"""
        if self.ws:
            await self.ws.close()

# Usage in Pipecat pipeline
async def main():
    stt = YourVoicSTT(api_key="your-key", model="lucid-mono")
    await stt.connect()
    
    # Process audio frames from microphone
    async for audio_frame in microphone_stream():
        text = await stt.transcribe_stream(audio_frame)
        if text:
            print(f"Transcribed: {text}")
    
    await stt.close()

LiveKit Real-time Communication

Use YourVoic STT with LiveKit for real-time transcription in video/audio calls.

Python - LiveKit Agent with STT
import asyncio
import websockets
import json
from livekit import agents
from livekit.agents import stt

class YourVoicSTTPlugin(stt.STT):
    """YourVoic STT Plugin for LiveKit Agents"""
    
    def __init__(self, api_key: str, model: str = "lucid-agent"):
        self.api_key = api_key
        self.model = model
    
    async def recognize(self, audio_buffer: stt.AudioBuffer) -> stt.SpeechEvent:
        """Transcribe audio buffer"""
        import httpx
        
        # Convert to WAV format
        audio_data = audio_buffer.to_wav()
        
        async with httpx.AsyncClient() as client:
            response = await client.post(
                "https://yourvoic.com/api/v1/stt/transcribe",
                headers={"X-API-Key": self.api_key},
                files={"file": ("audio.wav", audio_data, "audio/wav")},
                data={
                    "model": self.model,
                    "punctuate": "true",
                    "smart_format": "true"
                }
            )
            
            result = response.json()
            return stt.SpeechEvent(
                text=result.get("text", ""),
                is_final=True,
                confidence=result.get("confidence", 0.95)
            )

# Usage
stt_plugin = YourVoicSTTPlugin(api_key="your-key")

Vocode Voice AI

Integrate YourVoic STT as a transcriber in Vocode voice applications.

Python - Vocode Transcriber
from vocode.streaming.transcriber import BaseTranscriber
from vocode.streaming.models.transcriber import TranscriberConfig
import websockets
import json

class YourVoicTranscriberConfig(TranscriberConfig):
    api_key: str
    model: str = "lucid-mono"
    language: str = "en"

class YourVoicTranscriber(BaseTranscriber):
    """YourVoic STT Transcriber for Vocode"""
    
    def __init__(self, config: YourVoicTranscriberConfig):
        super().__init__(config)
        self.config = config
        self.ws = None
    
    async def create_websocket(self):
        url = f"wss://yourvoic.com:8443/api/v1/stt/realtime/stream"
        url += f"?model={self.config.model}&language={self.config.language}"
        
        headers = {"X-API-Key": self.config.api_key}
        self.ws = await websockets.connect(url, extra_headers=headers)
    
    async def send_audio(self, audio: bytes):
        if self.ws:
            await self.ws.send(audio)
    
    async def receive_transcription(self):
        if self.ws:
            response = await self.ws.recv()
            data = json.loads(response)
            return data.get("text", "")

📞 Telephony Integration #

Integrate YourVoic STT into telephony systems for call transcription, IVR, and voice analytics.

Asterisk PBX Integration

Transcribe calls in real-time or process call recordings with YourVoic STT. Use AGI scripts for batch processing or ARI for real-time streaming.

Method 1: AGI Script for Call Recording Transcription

Python AGI - yourvoic_stt.py
#!/usr/bin/env python3
"""YourVoic STT AGI Script for Asterisk Call Transcription"""
import sys
import requests
import os

API_KEY = "your-api-key"
API_URL = "https://yourvoic.com/api/v1/stt/transcribe"

def agi_command(cmd):
    """Send AGI command and get response"""
    sys.stdout.write(f"{cmd}\n")
    sys.stdout.flush()
    return sys.stdin.readline().strip()

def transcribe_recording(audio_path, model="cipher-fast"):
    """Transcribe an audio file"""
    with open(audio_path, 'rb') as f:
        response = requests.post(
            API_URL,
            headers={"X-API-Key": API_KEY},
            files={"file": f},
            data={
                "model": model,
                "punctuate": "true",
                "diarize": "true"  # Identify caller vs agent
            }
        )
    
    if response.status_code == 200:
        return response.json()
    return None

def main():
    # Read AGI environment
    env = {}
    while True:
        line = sys.stdin.readline().strip()
        if not line:
            break
        if ':' in line:
            key, value = line.split(':', 1)
            env[key.strip()] = value.strip()
    
    # Get recording path from AGI argument
    recording_path = env.get('agi_arg_1', '')
    
    if recording_path and os.path.exists(recording_path):
        result = transcribe_recording(recording_path)
        
        if result:
            # Set channel variables with transcription
            text = result.get('text', '').replace('"', '\\"')
            agi_command(f'SET VARIABLE TRANSCRIPTION "{text[:1000]}"')
            agi_command(f'SET VARIABLE STT_DURATION "{result.get("duration", 0)}"')
            agi_command(f'SET VARIABLE STT_LANGUAGE "{result.get("language", "unknown")}"')
            agi_command('VERBOSE "Transcription completed" 1')
        else:
            agi_command('VERBOSE "Transcription failed" 1')
    else:
        agi_command('VERBOSE "Recording file not found" 1')

if __name__ == "__main__":
    main()

Dialplan: Post-Call Transcription

extensions.conf
[call-recording]
; Record and transcribe calls
exten => _X.,1,Answer()
same => n,Set(RECORDING=/var/spool/asterisk/recordings/${UNIQUEID}.wav)
same => n,MixMonitor(${RECORDING},b)
same => n,Dial(SIP/${EXTEN},60)
same => n,StopMixMonitor()
same => n,AGI(yourvoic_stt.py,${RECORDING})
same => n,NoOp(Transcription: ${TRANSCRIPTION})
same => n,Hangup()

[ivr-voice-input]
; Capture voice input and transcribe
exten => s,1,Answer()
same => n,Playback(please-say-your-name)
same => n,Set(VOICE_FILE=/tmp/voice_${UNIQUEID}.wav)
same => n,Record(${VOICE_FILE},3,30,q)
same => n,AGI(yourvoic_stt.py,${VOICE_FILE})
same => n,NoOp(User said: ${TRANSCRIPTION})
same => n,GotoIf($["${TRANSCRIPTION}" != ""]?process:retry)
same => n(process),AGI(process_name.py,${TRANSCRIPTION})
same => n(retry),Playback(sorry-try-again)
same => n,Goto(s,1)

Method 2: ARI Real-time Transcription

Python - ARI Real-time STT
import ari
import asyncio
import websockets
import json

# ARI connection
ari_client = ari.connect('http://localhost:8088', 'asterisk', 'asterisk')

YOURVOIC_API_KEY = "your-api-key"

class RealtimeTranscriber:
    def __init__(self, channel_id):
        self.channel_id = channel_id
        self.ws = None
        self.transcript = []
    
    async def connect(self):
        url = "wss://yourvoic.com:8443/api/v1/stt/realtime/stream"
        url += "?model=lucid-mono&language=en"
        headers = {"X-API-Key": YOURVOIC_API_KEY}
        self.ws = await websockets.connect(url, extra_headers=headers)
    
    async def send_audio(self, audio_data):
        if self.ws:
            await self.ws.send(audio_data)
    
    async def receive_loop(self):
        while self.ws:
            try:
                response = await self.ws.recv()
                data = json.loads(response)
                if data.get("type") == "transcript" and data.get("is_final"):
                    self.transcript.append(data.get("text", ""))
                    print(f"[{self.channel_id}] {data.get('text')}")
            except:
                break
    
    def get_full_transcript(self):
        return " ".join(self.transcript)

# Handle external media from Asterisk
def handle_external_media(channel, ev):
    """Process audio from Asterisk ExternalMedia"""
    transcriber = RealtimeTranscriber(channel.id)
    asyncio.create_task(transcriber.connect())
    # Audio streaming logic here...
💡 Audio Format: For telephony, use 8kHz sample rate (G.711) for standard calls or 16kHz for HD Voice. YourVoic STT automatically handles format conversion.

FreeSWITCH Call Transcription

Integrate YourVoic STT with FreeSWITCH for call transcription and voice analytics.

Lua Script - yourvoic_stt.lua
-- YourVoic STT for FreeSWITCH
local api_key = "your-api-key"
local api_url = "https://yourvoic.com/api/v1/stt/transcribe"

function transcribe_recording(session, recording_path)
    -- Use curl to send recording to YourVoic API
    local cmd = string.format(
        'curl -s -X POST "%s" ' ..
        '-H "X-API-Key: %s" ' ..
        '-F "file=@%s" ' ..
        '-F "model=cipher-fast" ' ..
        '-F "punctuate=true"',
        api_url, api_key, recording_path
    )
    
    local handle = io.popen(cmd)
    local response = handle:read("*a")
    handle:close()
    
    -- Parse JSON response
    local cjson = require("cjson")
    local result = cjson.decode(response)
    
    if result and result.text then
        -- Set channel variable
        session:setVariable("transcription", result.text)
        session:setVariable("stt_language", result.language or "unknown")
        return result.text
    end
    
    return nil
end

-- Usage in dialplan
-- 

SIP/VoIP General Integration

For custom SIP/VoIP applications, capture RTP audio streams and send to YourVoic STT.

Python - RTP to STT Bridge
import asyncio
import websockets
import json
import socket
import struct

class RTPtoSTTBridge:
    """Bridge RTP audio stream to YourVoic STT WebSocket"""
    
    def __init__(self, api_key: str, rtp_port: int = 10000):
        self.api_key = api_key
        self.rtp_port = rtp_port
        self.ws = None
        self.running = False
    
    async def connect_stt(self):
        """Connect to YourVoic STT WebSocket"""
        url = "wss://yourvoic.com:8443/api/v1/stt/realtime/stream"
        url += "?model=lucid-mono&sample_rate=8000&encoding=mulaw"
        headers = {"X-API-Key": self.api_key}
        self.ws = await websockets.connect(url, extra_headers=headers)
    
    def extract_rtp_payload(self, packet: bytes) -> bytes:
        """Extract audio payload from RTP packet"""
        # Skip RTP header (12 bytes minimum)
        return packet[12:]
    
    async def rtp_receiver(self):
        """Receive RTP packets and forward to STT"""
        sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        sock.bind(('0.0.0.0', self.rtp_port))
        sock.setblocking(False)
        
        while self.running:
            try:
                data, addr = await asyncio.get_event_loop().sock_recvfrom(sock, 4096)
                audio = self.extract_rtp_payload(data)
                
                if self.ws and audio:
                    await self.ws.send(audio)
            except:
                await asyncio.sleep(0.001)
    
    async def receive_transcripts(self):
        """Receive transcriptions from WebSocket"""
        while self.running and self.ws:
            try:
                response = await self.ws.recv()
                data = json.loads(response)
                
                if data.get("type") == "transcript":
                    is_final = data.get("is_final", False)
                    text = data.get("text", "")
                    print(f"{'[FINAL]' if is_final else '[...]'} {text}")
            except:
                break
    
    async def run(self):
        """Start the bridge"""
        await self.connect_stt()
        self.running = True
        
        await asyncio.gather(
            self.rtp_receiver(),
            self.receive_transcripts()
        )

# Usage
bridge = RTPtoSTTBridge(api_key="your-key", rtp_port=10000)
asyncio.run(bridge.run())

📦 SDKs & Libraries #

Use our official SDKs or community libraries for easier integration.

🐍

Python SDK

pip install yourvoic

from yourvoic import STT

stt = STT(api_key="your-key")
result = stt.transcribe("audio.mp3")
print(result.text)
🟨

Node.js SDK

npm install @yourvoic/sdk

import { STT } from '@yourvoic/sdk';

const stt = new STT('your-key');
const result = await stt.transcribe('audio.mp3');
console.log(result.text);

cURL Examples

Batch Transcription

cURL - Transcribe Audio File
curl -X POST "https://yourvoic.com/api/v1/stt/transcribe" \
  -H "X-API-Key: your-api-key" \
  -F "file=@audio.mp3" \
  -F "model=cipher-fast" \
  -F "response_format=json"

With Speaker Diarization

cURL - Diarization
curl -X POST "https://yourvoic.com/api/v1/stt/transcribe" \
  -H "X-API-Key: your-api-key" \
  -F "file=@meeting.mp3" \
  -F "model=lucid-mono" \
  -F "diarize=true" \
  -F "punctuate=true" \
  -F "smart_format=true"

⚡ Real-time WebSocket #

Connect directly to our WebSocket endpoint for live transcription.

WebSocket Connection

Endpoint wss://yourvoic.com:8443/api/v1/stt/realtime/stream
Authentication Header: X-API-Key: your-api-key
Audio Format PCM 16-bit, 16kHz, mono, little-endian
Models lucid-mono, lucid-multi, lucid-agent, lucid-lite

Query Parameters

ParameterDefaultDescription
modellucid-monoSTT model to use
languageenLanguage code
sample_rate16000Audio sample rate
encodinglinear16Audio encoding
punctuatetrueAdd punctuation
interim_resultstrueSend partial results
JavaScript - Browser WebSocket
const API_KEY = 'your-api-key';
const WS_URL = 'wss://yourvoic.com:8443/api/v1/stt/realtime/stream';

async function startRealtimeTranscription() {
    // Get microphone access
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    const audioContext = new AudioContext({ sampleRate: 16000 });
    const source = audioContext.createMediaStreamSource(stream);
    const processor = audioContext.createScriptProcessor(4096, 1, 1);
    
    // Connect to WebSocket
    const ws = new WebSocket(`${WS_URL}?model=lucid-mono&language=en`);
    
    ws.onopen = () => {
        console.log('Connected to YourVoic STT');
        // Send API key as first message
        ws.send(JSON.stringify({ type: 'auth', api_key: API_KEY }));
    };
    
    ws.onmessage = (event) => {
        const data = JSON.parse(event.data);
        if (data.type === 'transcript') {
            console.log(data.is_final ? '[FINAL]' : '[...]', data.text);
        }
    };
    
    // Process audio and send to WebSocket
    processor.onaudioprocess = (e) => {
        const inputData = e.inputBuffer.getChannelData(0);
        const pcm16 = new Int16Array(inputData.length);
        
        for (let i = 0; i < inputData.length; i++) {
            pcm16[i] = Math.max(-32768, Math.min(32767, inputData[i] * 32768));
        }
        
        if (ws.readyState === WebSocket.OPEN) {
            ws.send(pcm16.buffer);
        }
    };
    
    source.connect(processor);
    processor.connect(audioContext.destination);
}

startRealtimeTranscription();

🔧 Troubleshooting #

⚠️ Common Issues

Authentication Errors

  • 401 Unauthorized: Check your API key is correct and has STT credits
  • 403 Forbidden: Verify your plan includes STT access

Audio Format Issues

  • No transcription: Ensure audio is not silent or corrupted
  • Poor accuracy: Check sample rate matches API expectations
  • WebSocket disconnect: Verify audio format is PCM 16-bit, 16kHz mono

Rate Limits

  • 429 Too Many Requests: Implement exponential backoff
  • Concurrent connections: Check your plan's WebSocket limits