Documentation Index
Fetch the complete documentation index at: https://docs.modulate.ai/llms.txt
Use this file to discover all available pages before exploring further.
Real-time speech-to-text over WebSocket. Streams audio to the server and receives transcribed utterances as they are processed, with optional speaker diarization, emotion detection, accent detection, and PII/PHI tagging.
Endpoint
wss://modulate-developer-apis.com/api/velma-2-stt-streaming
Authentication
Pass your API key as a query parameter when opening the connection.
wss://modulate-developer-apis.com/api/velma-2-stt-streaming?api_key=YOUR_API_KEY
Unlike the batch endpoints, the streaming API does not use an X-API-Key header. The key must be in the query string at connection time.
See Authentication and rate limits for how to obtain and manage API keys.
AAC, AIFF, FLAC, MP3, MP4, MOV, OGG, Opus, WAV, WebM
Opus is recommended for optimal quality and bandwidth efficiency.
Query parameters
| Parameter | Type | Default | Description |
|---|
api_key | string | — | Required. Your API key |
speaker_diarization | boolean | true | Identify and label distinct speakers |
emotion_signal | boolean | false | Detect emotional tone per utterance |
accent_signal | boolean | false | Detect speaker accent per utterance |
pii_phi_tagging | boolean | false | Wrap PII/PHI in tags within utterance text |
For a full explanation of what each feature does and when to enable it, see STT enrichment features.
Connection flow
- Connect to the WebSocket endpoint with
api_key and any optional feature parameters.
- Stream raw audio as binary WebSocket frames. Frames can be any size.
- Receive
utterance JSON messages as speech is transcribed.
- Send an empty text frame (
"") to signal end of audio.
- Receive a
done message containing total audio duration.
- The connection closes automatically.
Server messages
utterance
Sent each time a speech segment is transcribed.
{
"type": "utterance",
"utterance": {
"utterance_uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"text": "Hello, how are you today?",
"start_ms": 0,
"duration_ms": 2500,
"speaker": 1,
"language": "en",
"emotion": "Neutral",
"accent": "American"
}
}
Utterance fields
| Field | Type | Description |
|---|
utterance_uuid | string (UUID) | Unique identifier for this utterance |
text | string | Transcribed text |
start_ms | integer | Start time in milliseconds from the beginning of the stream |
duration_ms | integer | Duration of the utterance in milliseconds |
speaker | integer | Speaker number, 1-indexed |
language | string | Detected language code (e.g. "en", "fr") |
emotion | string | null | Detected emotion. null when emotion_signal is disabled |
accent | string | null | Detected accent. null when accent_signal is disabled |
For all valid emotion and accent values, see STT enrichment features.
done
Sent after all audio has been processed, in response to the end-of-stream signal.
{
"type": "done",
"duration_ms": 45000
}
error
Sent if transcription fails. The connection closes after this message.
{
"type": "error",
"error": "Internal server error"
}
WebSocket close codes
| Code | Meaning |
|---|
4001 | Invalid API key |
4003 | Model access not enabled for your organization |
4029 | Rate limit exceeded — monthly usage or concurrent connections |
Rate limits
- Concurrent connection limits apply per organization.
- Monthly usage limits (in audio hours) apply per organization.
- Connections that exceed limits are rejected during the WebSocket handshake with close code
4029.
See Authentication and rate limits for retry guidance.
Examples
Python (aiohttp)
JavaScript (Node.js)
import asyncio
import json
import aiohttp
API_KEY = "YOUR_API_KEY"
AUDIO_FILE = "recording.opus"
CHUNK_SIZE = 8192
async def transcribe_streaming():
url = (
f"wss://modulate-developer-apis.com/api/velma-2-stt-streaming"
f"?api_key={API_KEY}"
f"&speaker_diarization=true"
f"&emotion_signal=true"
f"&accent_signal=true"
)
utterances = []
async with aiohttp.ClientSession() as session:
async with session.ws_connect(url) as ws:
async def send_audio():
with open(AUDIO_FILE, "rb") as f:
while chunk := f.read(CHUNK_SIZE):
await ws.send_bytes(chunk)
await asyncio.sleep(CHUNK_SIZE / 4000)
await ws.send_str("")
send_task = asyncio.create_task(send_audio())
try:
async for msg in ws:
if msg.type == aiohttp.WSMsgType.TEXT:
data = json.loads(msg.data)
if data["type"] == "utterance":
u = data["utterance"]
utterances.append(u)
print(f"[Speaker {u['speaker']}] {u['text']}")
elif data["type"] == "done":
print(f"Done. Duration: {data['duration_ms']}ms")
break
elif data["type"] == "error":
print(f"Error: {data['error']}")
break
elif msg.type in (
aiohttp.WSMsgType.ERROR,
aiohttp.WSMsgType.CLOSE,
aiohttp.WSMsgType.CLOSED,
):
break
finally:
if not send_task.done():
send_task.cancel()
full_text = " ".join(u["text"] for u in utterances)
print(f"\nFull transcript:\n{full_text}")
asyncio.run(transcribe_streaming())
const WebSocket = require("ws");
const fs = require("fs");
const API_KEY = "YOUR_API_KEY";
const AUDIO_FILE = "recording.opus";
const CHUNK_SIZE = 8192;
const url = new URL("wss://modulate-developer-apis.com/api/velma-2-stt-streaming");
url.searchParams.set("api_key", API_KEY);
url.searchParams.set("speaker_diarization", "true");
url.searchParams.set("emotion_signal", "true");
const ws = new WebSocket(url.toString());
const utterances = [];
ws.on("open", () => {
const stream = fs.createReadStream(AUDIO_FILE, { highWaterMark: CHUNK_SIZE });
stream.on("data", (chunk) => ws.send(chunk));
stream.on("end", () => ws.send(""));
});
ws.on("message", (data) => {
const msg = JSON.parse(data.toString());
if (msg.type === "utterance") {
utterances.push(msg.utterance);
console.log(`[Speaker ${msg.utterance.speaker}] ${msg.utterance.text}`);
} else if (msg.type === "done") {
console.log(`Done. Duration: ${msg.duration_ms}ms`);
console.log("Transcript:", utterances.map((u) => u.text).join(" "));
ws.close();
} else if (msg.type === "error") {
console.error("Error:", msg.error);
ws.close();
}
});
ws.on("error", (err) => console.error("WebSocket error:", err.message));
WebSocket APIs cannot be tested with cURL. For command-line testing, use websocat.