Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.modulate.ai/llms.txt

Use this file to discover all available pages before exploring further.

Real-time PII/PHI redaction over WebSocket. Streams audio to the server and receives, per utterance, a redacted transcript and a redacted MP3 clip with the PII/PHI ranges silenced.

Endpoint

wss://modulate-developer-apis.com/api/velma-2-pii-phi-redaction-streaming

Authentication

Pass your API key as a query parameter when opening the connection.
wss://modulate-developer-apis.com/api/velma-2-pii-phi-redaction-streaming?api_key=YOUR_API_KEY
Unlike the batch endpoints, the streaming API does not use an X-API-Key header. The key must be in the query string at connection time.
See Authentication and rate limits for how to obtain and manage API keys.

Supported audio formats

Self-describing formats (auto-detected from file headers — no extra parameters needed): AAC, AIFF, FLAC, MP3, OGG, WAV, WebM
OGG / Opus: OGG is a container that may carry Opus-encoded audio. Pass audio_format=ogg, not audio_format=opus.
Raw / headerless formats (require audio_format, sample_rate, and num_channels): s8, s16le, s16be, s24le, s24be, s32le, s32be, u8, u16le, u16be, u24le, u24be, u32le, u32be, f32le, f32be, f64le, f64be, mulaw, alaw Valid sample rates: 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000

Query parameters

ParameterTypeDefaultDescription
api_keystringRequired. Your API key
speaker_diarizationbooleantrueIdentify and label distinct speakers
audio_formatstring(auto-detect)Audio encoding format. Omit for self-describing formats; required for raw formats
sample_rateintegerSample rate in Hz. Required for raw formats only
num_channelsintegerNumber of channels (1–8). Required for raw formats only
start_redaction_padding_msinteger100Extra silence (ms) prepended before each redacted audio range
end_redaction_padding_msinteger0Extra silence (ms) appended after each redacted audio range

Connection flow

  1. Connect to the WebSocket endpoint with api_key and any optional parameters.
  2. Stream audio data as binary WebSocket frames. Frames can be any size.
  3. Receive frame pairs per utterance: a JSON text frame, optionally followed by a binary MP3 frame.
  4. Send an empty text frame ("") to signal end of audio.
  5. Receive a done JSON frame, optionally followed by a final binary MP3 frame for any trailing audio.
  6. The connection closes automatically.

Server messages

The server sends frame pairs: a JSON text frame indicating the utterance, optionally followed by a binary MP3 frame. The redacted_audio field in the JSON tells you whether a binary frame follows.

utterance (JSON + optional binary MP3)

Sent when a speech segment has been transcribed and redacted. JSON frame:
{
  "type": "utterance",
  "utterance": {
    "utterance_uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "text": "Hello, my name is [FIRSTNAME] [LASTNAME].",
    "start_ms": 0,
    "duration_ms": 3000,
    "speaker": 1,
    "language": "en"
  },
  "redacted_audio": {
    "start_ms": 0,
    "duration_ms": 3000
  }
}
When redacted_audio is not null, a binary MP3 frame follows immediately. It covers the window from the last emitted audio point to the end of this utterance, with PII/PHI ranges silenced. When redacted_audio is null, no binary frame follows. This occurs for out-of-order utterances whose audio window was already emitted in a previous clip — the redacted text is still delivered.

Utterance fields

FieldTypeDescription
utterance_uuidstring (UUID)Unique identifier for this utterance
textstringRedacted text — each detected PII/PHI span replaced with an entity-type tag (e.g. [FIRSTNAME], [SSN], [PHI])
start_msintegerStart time in milliseconds from the beginning of the stream
duration_msintegerDuration of the utterance in milliseconds
speakerintegerSpeaker number, 1-indexed. Consistent within a connection
languagestringDetected language code (e.g. "en", "fr")

Redacted audio info fields

FieldTypeDescription
start_msintegerStart time of the MP3 clip in milliseconds from the beginning of the stream
duration_msintegerDuration of the MP3 clip in milliseconds

done (JSON + optional binary MP3)

Sent after all audio has been processed, in response to the end-of-stream signal.
{
  "type": "done",
  "duration_ms": 45000,
  "trailing_redacted_audio": {
    "start_ms": 43000,
    "duration_ms": 2000
  }
}
When trailing_redacted_audio is not null, a binary MP3 frame follows containing any remaining audio after the last utterance, with any applicable PII/PHI ranges silenced. When trailing_redacted_audio is null, no binary frame follows.

error

Sent if redaction fails during processing. The connection closes after this message. No binary frame follows.
{
  "type": "error",
  "error": "Internal server error"
}

WebSocket close codes

CodeMeaning
4001Invalid API key
4003Model access not enabled for your organization
4029Rate limit exceeded — monthly usage or concurrent connections

Rate limits

  • Concurrent connection limits apply per organization.
  • Monthly usage limits (in audio hours) apply per organization.
  • Connections that exceed limits are rejected during the WebSocket handshake with close code 4029.
See Authentication and rate limits for retry guidance.

Redaction tags

Each detected PII/PHI span is replaced with an entity-type tag in the transcript text. For the full list of tags and entity types, see the Velma-2 — PII/PHI Redaction (Batch) API reference. Currently, all entity types the model can detect are redacted. Per-entity configurability is planned for a future release.

Examples

import asyncio
import json
import aiohttp

API_KEY = "YOUR_API_KEY"
AUDIO_FILE = "recording.ogg"
CHUNK_SIZE = 8192

async def redact_streaming():
    url = (
        "wss://modulate-developer-apis.com/api/velma-2-pii-phi-redaction-streaming"
        f"?api_key={API_KEY}"
        "&speaker_diarization=true"
        "&start_redaction_padding_ms=100"
        "&end_redaction_padding_ms=0"
    )

    utterances = []
    audio_clips = []

    async with aiohttp.ClientSession() as session:
        async with session.ws_connect(url) as ws:

            async def send_audio():
                with open(AUDIO_FILE, "rb") as f:
                    while chunk := f.read(CHUNK_SIZE):
                        await ws.send_bytes(chunk)
                        await asyncio.sleep(CHUNK_SIZE / 4000)
                await ws.send_str("")

            send_task = asyncio.create_task(send_audio())

            try:
                done = False
                async for msg in ws:
                    if msg.type == aiohttp.WSMsgType.TEXT:
                        data = json.loads(msg.data)
                        if data["type"] == "utterance":
                            u = data["utterance"]
                            utterances.append(u)
                            print(f"[Speaker {u['speaker']}] ({u['language']}) {u['start_ms']}ms: {u['text']}")
                        elif data["type"] == "done":
                            print(f"\nDone. Duration: {data['duration_ms']}ms")
                            done = True
                            if not data.get("trailing_redacted_audio"):
                                break
                        elif data["type"] == "error":
                            print(f"Error: {data['error']}")
                            break
                    elif msg.type == aiohttp.WSMsgType.BINARY:
                        audio_clips.append(msg.data)
                        if done:
                            break
                    elif msg.type in (
                        aiohttp.WSMsgType.ERROR,
                        aiohttp.WSMsgType.CLOSE,
                        aiohttp.WSMsgType.CLOSED,
                    ):
                        break
            finally:
                if not send_task.done():
                    send_task.cancel()

    if audio_clips:
        with open("redacted.mp3", "wb") as f:
            for clip in audio_clips:
                f.write(clip)

asyncio.run(redact_streaming())
WebSocket APIs cannot be tested with cURL. For command-line testing, use websocat.