Real-time PII/PHI redaction over WebSocket. Streams audio to the server and receives, per utterance, a redacted transcript and a redacted MP3 clip with the PII/PHI ranges silenced.
Endpoint
wss://platform.modulate.ai/api/velma-2-pii-phi-redaction-streaming
Authentication
Pass your API key as a query parameter when opening the connection.
wss://platform.modulate.ai/api/velma-2-pii-phi-redaction-streaming?api_key=YOUR_API_KEY
Unlike the batch endpoints, the streaming API does not use an X-API-Key header. The key must be in the query string at connection time.
See Authentication and rate limits for how to obtain and manage API keys.
Self-describing formats (auto-detected from file headers — no extra parameters needed):
AAC, AIFF, FLAC, MP3, OGG, WAV, WebM
OGG / Opus: OGG is a container that may carry Opus-encoded audio. Pass audio_format=ogg, not audio_format=opus.
Raw / headerless formats (require audio_format, sample_rate, and num_channels):
s8, s16le, s16be, s24le, s24be, s32le, s32be, u8, u16le, u16be, u24le, u24be, u32le, u32be, f32le, f32be, f64le, f64be, mulaw, alaw
Valid sample rates: 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000
Query parameters
| Parameter | Type | Default | Description |
|---|
api_key | string | — | Required. Your API key |
speaker_diarization | boolean | true | Identify and label distinct speakers |
audio_format | string | (auto-detect) | Audio encoding format. Omit for self-describing formats; required for raw formats |
sample_rate | integer | — | Sample rate in Hz. Required for raw formats only |
num_channels | integer | — | Number of channels (1–8). Required for raw formats only |
start_redaction_padding_ms | integer | 100 | Extra silence (ms) prepended before each redacted audio range |
end_redaction_padding_ms | integer | 0 | Extra silence (ms) appended after each redacted audio range |
Connection flow
- Connect to the WebSocket endpoint with
api_key and any optional parameters.
- Stream audio data as binary WebSocket frames. Frames can be any size.
- Receive frame pairs per utterance: a JSON text frame, optionally followed by a binary MP3 frame.
- Send an empty text frame (
"") to signal end of audio.
- Receive a
done JSON frame, optionally followed by a final binary MP3 frame for any trailing audio.
- The connection closes automatically.
Server messages
The server sends frame pairs: a JSON text frame indicating the utterance, optionally followed by a binary MP3 frame. The redacted_audio field in the JSON tells you whether a binary frame follows.
utterance (JSON + optional binary MP3)
Sent when a speech segment has been transcribed and redacted.
JSON frame:
{
"type": "utterance",
"utterance": {
"utterance_uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"text": "Hello, my name is <pii:name></pii:name>.",
"start_ms": 0,
"duration_ms": 3000,
"speaker": 1,
"language": "en"
},
"redacted_audio": {
"start_ms": 0,
"duration_ms": 3000
}
}
When redacted_audio is not null, a binary MP3 frame follows immediately. It covers the window from the last emitted audio point to the end of this utterance, with PII/PHI ranges silenced.
When redacted_audio is null, no binary frame follows. This occurs for out-of-order utterances whose audio window was already emitted in a previous clip — the redacted text is still delivered.
Utterance fields
| Field | Type | Description |
|---|
utterance_uuid | string (UUID) | Unique identifier for this utterance |
text | string | Redacted text. Each detected PII/PHI span is replaced with an empty marker tag (e.g. <pii:name></pii:name>, <pii:ssn></pii:ssn>, <phi></phi>), with the surrounding text preserved |
start_ms | integer | Start time in milliseconds from the beginning of the stream |
duration_ms | integer | Duration of the utterance in milliseconds |
speaker | integer | Speaker number, 1-indexed. Consistent within a connection |
language | string | Detected language code (e.g. "en", "fr") |
Redacted audio info fields
| Field | Type | Description |
|---|
start_ms | integer | Start time of the MP3 clip in milliseconds from the beginning of the stream |
duration_ms | integer | Duration of the MP3 clip in milliseconds |
done (JSON + optional binary MP3)
Sent after all audio has been processed, in response to the end-of-stream signal.
{
"type": "done",
"duration_ms": 45000,
"trailing_redacted_audio": {
"start_ms": 43000,
"duration_ms": 2000
}
}
When trailing_redacted_audio is not null, a binary MP3 frame follows containing any remaining audio after the last utterance, with any applicable PII/PHI ranges silenced.
When trailing_redacted_audio is null, no binary frame follows.
error
Sent if redaction fails during processing. The connection closes after this message. No binary frame follows.
{
"type": "error",
"error": "Internal server error"
}
WebSocket close codes
| Code | Meaning |
|---|
1000 | Normal closure after a successful done message |
1003 | Invalid connection parameters (unsupported audio_format, sample_rate, or num_channels; raw format missing sample_rate/num_channels) |
4003 | Request could not be validated, or is not permitted (auth failure, missing model access) |
4029 | Insufficient credits, or concurrent-connection limit exceeded |
An error JSON message is sent before the connection closes (except on 1000).
Rate limits
- Concurrent connection limits apply per organization.
- Monthly usage limits (in audio hours) apply per organization.
- Connections that exceed limits are rejected during the WebSocket handshake with close code
4029.
See Authentication and rate limits for retry guidance.
Each detected PII/PHI span is replaced with an empty marker tag in the transcript text: <phi></phi> for health information and <pii:CATEGORY></pii:CATEGORY> for personal information, where CATEGORY identifies the detected entity type. The surrounding text is preserved. For more detail, see the PII/PHI Redaction (Batch) API reference.
Currently, all entity types the model can detect are redacted. Per-entity configurability is planned for a future release.
Examples
Python (aiohttp)
JavaScript (Node.js)
import asyncio
import json
import aiohttp
API_KEY = "YOUR_API_KEY"
AUDIO_FILE = "recording.ogg"
CHUNK_SIZE = 8192
async def redact_streaming():
url = (
"wss://platform.modulate.ai/api/velma-2-pii-phi-redaction-streaming"
f"?api_key={API_KEY}"
"&speaker_diarization=true"
"&start_redaction_padding_ms=100"
"&end_redaction_padding_ms=0"
)
utterances = []
audio_clips = []
async with aiohttp.ClientSession() as session:
async with session.ws_connect(url) as ws:
async def send_audio():
with open(AUDIO_FILE, "rb") as f:
while chunk := f.read(CHUNK_SIZE):
await ws.send_bytes(chunk)
await asyncio.sleep(CHUNK_SIZE / 4000)
await ws.send_str("")
send_task = asyncio.create_task(send_audio())
try:
done = False
async for msg in ws:
if msg.type == aiohttp.WSMsgType.TEXT:
data = json.loads(msg.data)
if data["type"] == "utterance":
u = data["utterance"]
utterances.append(u)
print(f"[Speaker {u['speaker']}] ({u['language']}) {u['start_ms']}ms: {u['text']}")
elif data["type"] == "done":
print(f"\nDone. Duration: {data['duration_ms']}ms")
done = True
if not data.get("trailing_redacted_audio"):
break
elif data["type"] == "error":
print(f"Error: {data['error']}")
break
elif msg.type == aiohttp.WSMsgType.BINARY:
audio_clips.append(msg.data)
if done:
break
elif msg.type in (
aiohttp.WSMsgType.ERROR,
aiohttp.WSMsgType.CLOSE,
aiohttp.WSMsgType.CLOSED,
):
break
finally:
if not send_task.done():
send_task.cancel()
if audio_clips:
with open("redacted.mp3", "wb") as f:
for clip in audio_clips:
f.write(clip)
asyncio.run(redact_streaming())
const WebSocket = require("ws");
const fs = require("fs");
const API_KEY = "YOUR_API_KEY";
const AUDIO_FILE = "recording.ogg";
const CHUNK_SIZE = 8192;
const url = new URL(
"wss://platform.modulate.ai/api/velma-2-pii-phi-redaction-streaming"
);
url.searchParams.set("api_key", API_KEY);
url.searchParams.set("speaker_diarization", "true");
url.searchParams.set("start_redaction_padding_ms", "100");
url.searchParams.set("end_redaction_padding_ms", "0");
const ws = new WebSocket(url.toString());
const utterances = [];
const audioClips = [];
let isDone = false;
ws.on("open", () => {
const stream = fs.createReadStream(AUDIO_FILE, { highWaterMark: CHUNK_SIZE });
stream.on("data", (chunk) => ws.send(chunk));
stream.on("end", () => ws.send(""));
});
ws.on("message", (data, isBinary) => {
if (isBinary) {
audioClips.push(data);
if (isDone) finalize();
return;
}
const msg = JSON.parse(data.toString());
if (msg.type === "utterance") {
utterances.push(msg.utterance);
console.log(
`[Speaker ${msg.utterance.speaker}] (${msg.utterance.language}) ` +
`${msg.utterance.start_ms}ms: ${msg.utterance.text}`
);
} else if (msg.type === "done") {
console.log(`\nDone. Duration: ${msg.duration_ms}ms`);
isDone = true;
if (!msg.trailing_redacted_audio) finalize();
} else if (msg.type === "error") {
console.error("Error:", msg.error);
ws.close();
}
});
function finalize() {
if (audioClips.length > 0) {
const combined = Buffer.concat(audioClips);
fs.writeFileSync("redacted.mp3", combined);
}
ws.close();
}
ws.on("error", (err) => console.error("WebSocket error:", err.message));
WebSocket APIs cannot be tested with cURL. For command-line testing, use websocat.