PII/PHI Redaction Streaming

Real-time PII/PHI redaction over WebSocket. Streams audio to the server and receives, per utterance, a redacted transcript and a redacted MP3 clip with the PII/PHI ranges silenced.

Endpoint

wss://modulate-developer-apis.com/api/velma-2-pii-phi-redaction-streaming

Authentication

Pass your API key as a query parameter when opening the connection.

wss://modulate-developer-apis.com/api/velma-2-pii-phi-redaction-streaming?api_key=YOUR_API_KEY

Unlike the batch endpoints, the streaming API does not use an X-API-Key header. The key must be in the query string at connection time.

See Authentication and rate limits for how to obtain and manage API keys.

Supported audio formats

Self-describing formats (auto-detected from file headers — no extra parameters needed): AAC, AIFF, FLAC, MP3, OGG, WAV, WebM

OGG / Opus: OGG is a container that may carry Opus-encoded audio. Pass audio_format=ogg, not audio_format=opus.

Raw / headerless formats (require audio_format, sample_rate, and num_channels): s8, s16le, s16be, s24le, s24be, s32le, s32be, u8, u16le, u16be, u24le, u24be, u32le, u32be, f32le, f32be, f64le, f64be, mulaw, alaw Valid sample rates: 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000

Query parameters

Parameter	Type	Default	Description
`api_key`	string	—	Required. Your API key
`speaker_diarization`	boolean	`true`	Identify and label distinct speakers
`audio_format`	string	(auto-detect)	Audio encoding format. Omit for self-describing formats; required for raw formats
`sample_rate`	integer	—	Sample rate in Hz. Required for raw formats only
`num_channels`	integer	—	Number of channels (1–8). Required for raw formats only
`start_redaction_padding_ms`	integer	`100`	Extra silence (ms) prepended before each redacted audio range
`end_redaction_padding_ms`	integer	`0`	Extra silence (ms) appended after each redacted audio range

Connection flow

Connect to the WebSocket endpoint with api_key and any optional parameters.
Stream audio data as binary WebSocket frames. Frames can be any size.
Receive frame pairs per utterance: a JSON text frame, optionally followed by a binary MP3 frame.
Send an empty text frame ("") to signal end of audio.
Receive a done JSON frame, optionally followed by a final binary MP3 frame for any trailing audio.
The connection closes automatically.

Server messages

The server sends frame pairs: a JSON text frame indicating the utterance, optionally followed by a binary MP3 frame. The redacted_audio field in the JSON tells you whether a binary frame follows.

`utterance` (JSON + optional binary MP3)

Sent when a speech segment has been transcribed and redacted. JSON frame:

{
  "type": "utterance",
  "utterance": {
    "utterance_uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "text": "Hello, my name is [FIRSTNAME] [LASTNAME].",
    "start_ms": 0,
    "duration_ms": 3000,
    "speaker": 1,
    "language": "en"
  },
  "redacted_audio": {
    "start_ms": 0,
    "duration_ms": 3000
  }
}

When redacted_audio is not null, a binary MP3 frame follows immediately. It covers the window from the last emitted audio point to the end of this utterance, with PII/PHI ranges silenced. When redacted_audio is null, no binary frame follows. This occurs for out-of-order utterances whose audio window was already emitted in a previous clip — the redacted text is still delivered.

Utterance fields

Field	Type	Description
`utterance_uuid`	string (UUID)	Unique identifier for this utterance
`text`	string	Redacted text — each detected PII/PHI span replaced with an entity-type tag (e.g. `[FIRSTNAME]`, `[SSN]`, `[PHI]`)
`start_ms`	integer	Start time in milliseconds from the beginning of the stream
`duration_ms`	integer	Duration of the utterance in milliseconds
`speaker`	integer	Speaker number, 1-indexed. Consistent within a connection
`language`	string	Detected language code (e.g. `"en"`, `"fr"`)

Redacted audio info fields

Field	Type	Description
`start_ms`	integer	Start time of the MP3 clip in milliseconds from the beginning of the stream
`duration_ms`	integer	Duration of the MP3 clip in milliseconds

`done` (JSON + optional binary MP3)

Sent after all audio has been processed, in response to the end-of-stream signal.

{
  "type": "done",
  "duration_ms": 45000,
  "trailing_redacted_audio": {
    "start_ms": 43000,
    "duration_ms": 2000
  }
}

When trailing_redacted_audio is not null, a binary MP3 frame follows containing any remaining audio after the last utterance, with any applicable PII/PHI ranges silenced. When trailing_redacted_audio is null, no binary frame follows.

`error`

Sent if redaction fails during processing. The connection closes after this message. No binary frame follows.

{
  "type": "error",
  "error": "Internal server error"
}

WebSocket close codes

Code	Meaning
`4001`	Invalid API key
`4003`	Model access not enabled for your organization
`4029`	Rate limit exceeded — monthly usage or concurrent connections

Rate limits

Concurrent connection limits apply per organization.
Monthly usage limits (in audio hours) apply per organization.
Connections that exceed limits are rejected during the WebSocket handshake with close code 4029.

See Authentication and rate limits for retry guidance.

Redaction tags

Each detected PII/PHI span is replaced with an entity-type tag in the transcript text. For the full list of tags and entity types, see the Velma-2 — PII/PHI Redaction (Batch) API reference. Currently, all entity types the model can detect are redacted. Per-entity configurability is planned for a future release.

Examples

Python (aiohttp)
JavaScript (Node.js)

import asyncio
import json
import aiohttp

API_KEY = "YOUR_API_KEY"
AUDIO_FILE = "recording.ogg"
CHUNK_SIZE = 8192

async def redact_streaming():
    url = (
        "wss://modulate-developer-apis.com/api/velma-2-pii-phi-redaction-streaming"
        f"?api_key={API_KEY}"
        "&speaker_diarization=true"
        "&start_redaction_padding_ms=100"
        "&end_redaction_padding_ms=0"
    )

    utterances = []
    audio_clips = []

    async with aiohttp.ClientSession() as session:
        async with session.ws_connect(url) as ws:

            async def send_audio():
                with open(AUDIO_FILE, "rb") as f:
                    while chunk := f.read(CHUNK_SIZE):
                        await ws.send_bytes(chunk)
                        await asyncio.sleep(CHUNK_SIZE / 4000)
                await ws.send_str("")

            send_task = asyncio.create_task(send_audio())

            try:
                done = False
                async for msg in ws:
                    if msg.type == aiohttp.WSMsgType.TEXT:
                        data = json.loads(msg.data)
                        if data["type"] == "utterance":
                            u = data["utterance"]
                            utterances.append(u)
                            print(f"[Speaker {u['speaker']}] ({u['language']}) {u['start_ms']}ms: {u['text']}")
                        elif data["type"] == "done":
                            print(f"\nDone. Duration: {data['duration_ms']}ms")
                            done = True
                            if not data.get("trailing_redacted_audio"):
                                break
                        elif data["type"] == "error":
                            print(f"Error: {data['error']}")
                            break
                    elif msg.type == aiohttp.WSMsgType.BINARY:
                        audio_clips.append(msg.data)
                        if done:
                            break
                    elif msg.type in (
                        aiohttp.WSMsgType.ERROR,
                        aiohttp.WSMsgType.CLOSE,
                        aiohttp.WSMsgType.CLOSED,
                    ):
                        break
            finally:
                if not send_task.done():
                    send_task.cancel()

    if audio_clips:
        with open("redacted.mp3", "wb") as f:
            for clip in audio_clips:
                f.write(clip)

asyncio.run(redact_streaming())

const WebSocket = require("ws");
const fs = require("fs");

const API_KEY = "YOUR_API_KEY";
const AUDIO_FILE = "recording.ogg";
const CHUNK_SIZE = 8192;

const url = new URL(
  "wss://modulate-developer-apis.com/api/velma-2-pii-phi-redaction-streaming"
);
url.searchParams.set("api_key", API_KEY);
url.searchParams.set("speaker_diarization", "true");
url.searchParams.set("start_redaction_padding_ms", "100");
url.searchParams.set("end_redaction_padding_ms", "0");

const ws = new WebSocket(url.toString());
const utterances = [];
const audioClips = [];
let isDone = false;

ws.on("open", () => {
  const stream = fs.createReadStream(AUDIO_FILE, { highWaterMark: CHUNK_SIZE });
  stream.on("data", (chunk) => ws.send(chunk));
  stream.on("end", () => ws.send(""));
});

ws.on("message", (data, isBinary) => {
  if (isBinary) {
    audioClips.push(data);
    if (isDone) finalize();
    return;
  }

  const msg = JSON.parse(data.toString());
  if (msg.type === "utterance") {
    utterances.push(msg.utterance);
    console.log(
      `[Speaker ${msg.utterance.speaker}] (${msg.utterance.language}) ` +
      `${msg.utterance.start_ms}ms: ${msg.utterance.text}`
    );
  } else if (msg.type === "done") {
    console.log(`\nDone. Duration: ${msg.duration_ms}ms`);
    isDone = true;
    if (!msg.trailing_redacted_audio) finalize();
  } else if (msg.type === "error") {
    console.error("Error:", msg.error);
    ws.close();
  }
});

function finalize() {
  if (audioClips.length > 0) {
    const combined = Buffer.concat(audioClips);
    fs.writeFileSync("redacted.mp3", combined);
  }
  ws.close();
}

ws.on("error", (err) => console.error("WebSocket error:", err.message));

WebSocket APIs cannot be tested with cURL. For command-line testing, use websocat.

Which API should I use? — PII/PHI redaction vs PII/PHI tagging, batch vs streaming
STT enrichment features — PII/PHI tagging option in the STT transcription APIs
Authentication and rate limits

Speech-to-text Transcription

Deepfake Detection

PII/PHI Redaction

PII/PHI Redaction Streaming

Endpoint

Authentication

Supported audio formats

Query parameters

Connection flow

Server messages

`utterance` (JSON + optional binary MP3)

Utterance fields

Redacted audio info fields

`done` (JSON + optional binary MP3)

`error`

WebSocket close codes

Rate limits

Redaction tags

Examples

Speech-to-text Transcription

Deepfake Detection

PII/PHI Redaction

Documentation Index

​Endpoint

​Authentication

​Supported audio formats

​Query parameters

​Connection flow

​Server messages

​utterance (JSON + optional binary MP3)

​Utterance fields

​Redacted audio info fields

​done (JSON + optional binary MP3)

​error

​WebSocket close codes

​Rate limits

​Redaction tags

​Examples

​Related

Endpoint

Authentication

Supported audio formats

Query parameters

Connection flow

Server messages

`utterance` (JSON + optional binary MP3)

Utterance fields

Redacted audio info fields

`done` (JSON + optional binary MP3)

`error`

WebSocket close codes

Rate limits

Redaction tags

Examples

Related