Real-time PII/PHI redaction over WebSocket. Streams audio to the server and receives, per utterance, a redacted transcript and a redacted MP3 clip with the PII/PHI ranges silenced.Documentation Index
Fetch the complete documentation index at: https://docs.modulate.ai/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
Authentication
Pass your API key as a query parameter when opening the connection.Supported audio formats
Self-describing formats (auto-detected from file headers — no extra parameters needed): AAC, AIFF, FLAC, MP3, OGG, WAV, WebMOGG / Opus: OGG is a container that may carry Opus-encoded audio. Pass
audio_format=ogg, not audio_format=opus.audio_format, sample_rate, and num_channels):
s8, s16le, s16be, s24le, s24be, s32le, s32be, u8, u16le, u16be, u24le, u24be, u32le, u32be, f32le, f32be, f64le, f64be, mulaw, alaw
Valid sample rates: 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000
Query parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | string | — | Required. Your API key |
speaker_diarization | boolean | true | Identify and label distinct speakers |
audio_format | string | (auto-detect) | Audio encoding format. Omit for self-describing formats; required for raw formats |
sample_rate | integer | — | Sample rate in Hz. Required for raw formats only |
num_channels | integer | — | Number of channels (1–8). Required for raw formats only |
start_redaction_padding_ms | integer | 100 | Extra silence (ms) prepended before each redacted audio range |
end_redaction_padding_ms | integer | 0 | Extra silence (ms) appended after each redacted audio range |
Connection flow
- Connect to the WebSocket endpoint with
api_keyand any optional parameters. - Stream audio data as binary WebSocket frames. Frames can be any size.
- Receive frame pairs per utterance: a JSON text frame, optionally followed by a binary MP3 frame.
- Send an empty text frame (
"") to signal end of audio. - Receive a
doneJSON frame, optionally followed by a final binary MP3 frame for any trailing audio. - The connection closes automatically.
Server messages
The server sends frame pairs: a JSON text frame indicating the utterance, optionally followed by a binary MP3 frame. Theredacted_audio field in the JSON tells you whether a binary frame follows.
utterance (JSON + optional binary MP3)
Sent when a speech segment has been transcribed and redacted.
JSON frame:
redacted_audio is not null, a binary MP3 frame follows immediately. It covers the window from the last emitted audio point to the end of this utterance, with PII/PHI ranges silenced.
When redacted_audio is null, no binary frame follows. This occurs for out-of-order utterances whose audio window was already emitted in a previous clip — the redacted text is still delivered.
Utterance fields
| Field | Type | Description |
|---|---|---|
utterance_uuid | string (UUID) | Unique identifier for this utterance |
text | string | Redacted text — each detected PII/PHI span replaced with an entity-type tag (e.g. [FIRSTNAME], [SSN], [PHI]) |
start_ms | integer | Start time in milliseconds from the beginning of the stream |
duration_ms | integer | Duration of the utterance in milliseconds |
speaker | integer | Speaker number, 1-indexed. Consistent within a connection |
language | string | Detected language code (e.g. "en", "fr") |
Redacted audio info fields
| Field | Type | Description |
|---|---|---|
start_ms | integer | Start time of the MP3 clip in milliseconds from the beginning of the stream |
duration_ms | integer | Duration of the MP3 clip in milliseconds |
done (JSON + optional binary MP3)
Sent after all audio has been processed, in response to the end-of-stream signal.
trailing_redacted_audio is not null, a binary MP3 frame follows containing any remaining audio after the last utterance, with any applicable PII/PHI ranges silenced.
When trailing_redacted_audio is null, no binary frame follows.
error
Sent if redaction fails during processing. The connection closes after this message. No binary frame follows.
WebSocket close codes
| Code | Meaning |
|---|---|
4001 | Invalid API key |
4003 | Model access not enabled for your organization |
4029 | Rate limit exceeded — monthly usage or concurrent connections |
Rate limits
- Concurrent connection limits apply per organization.
- Monthly usage limits (in audio hours) apply per organization.
- Connections that exceed limits are rejected during the WebSocket handshake with close code
4029.
Redaction tags
Each detected PII/PHI span is replaced with an entity-type tag in the transcript text. For the full list of tags and entity types, see the Velma-2 — PII/PHI Redaction (Batch) API reference. Currently, all entity types the model can detect are redacted. Per-entity configurability is planned for a future release.Examples
- Python (aiohttp)
- JavaScript (Node.js)
WebSocket APIs cannot be tested with cURL. For command-line testing, use
websocat.Related
- Which API should I use? — PII/PHI redaction vs PII/PHI tagging, batch vs streaming
- STT enrichment features — PII/PHI tagging option in the STT transcription APIs
- Authentication and rate limits