Documentation Index
Fetch the complete documentation index at: https://docs.modulate.ai/llms.txt
Use this file to discover all available pages before exploring further.
Real-time synthetic voice detection over WebSocket. Streams audio to the server and receives per-frame verdicts (synthetic, non-synthetic, or no-content) with confidence scores as analysis windows complete.
For a conceptual explanation of how detection works — including windowing, silence trimming, and the no-content verdict — see How synthetic voice detection works.
Endpoint
wss://modulate-developer-apis.com/api/velma-2-synthetic-voice-detection-streaming
Authentication
Pass your API key as a query parameter when opening the connection.
wss://modulate-developer-apis.com/api/velma-2-synthetic-voice-detection-streaming?api_key=YOUR_API_KEY&audio_format=s16le&sample_rate=16000&num_channels=1
Unlike the batch endpoint, the streaming API does not use an X-API-Key header. The key must be in the query string at connection time.
Query parameters
| Parameter | Type | Required | Description |
|---|
api_key | string | Yes | Your API key |
audio_format | string | Yes | Audio encoding format — see Audio formats and preprocessing |
sample_rate | integer | Conditional | Required for raw (headerless) formats. One of: 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000 |
num_channels | integer | Conditional | Required for raw formats. 1–8 |
For supported format values and format selection guidance, see Audio formats and preprocessing.
Connection flow
- Connect with
api_key, audio_format, and (for raw formats) sample_rate and num_channels.
- Stream audio as binary WebSocket frames. Frames can be any size.
- Receive
frame JSON messages as analysis windows complete.
- Send an empty text frame (
"") to signal end of audio.
- Receive a
done message with total duration and frame count.
- The connection closes automatically.
Server messages
Frame result
Sent each time an analysis window is complete.
{
"type": "frame",
"frame": {
"start_time_ms": 0,
"end_time_ms": 4000,
"verdict": "synthetic",
"confidence": 0.9732
}
}
| Field | Type | Description |
|---|
start_time_ms | integer | Frame start time in the audio stream (ms) |
end_time_ms | integer | Frame end time in the audio stream (ms) |
verdict | string | "synthetic", "non-synthetic", or "no-content" |
confidence | float | Confidence in the verdict, 0.0–1.0 |
Done
{
"type": "done",
"duration_ms": 12500,
"frame_count": 10
}
| Field | Type | Description |
|---|
duration_ms | integer | Total duration of the streamed audio in milliseconds |
frame_count | integer | Total number of frames analyzed |
Error
{
"type": "error",
"error": "Invalid audio_format='mp4'. Valid values: ['aac', 'aiff', ...]"
}
WebSocket close codes
| Code | Meaning |
|---|
1000 | Normal closure after a successful done message |
1003 | Invalid query parameters (bad format, sample rate, or channels) |
4002 | Audio data does not match the declared format |
4003 | Authentication failed or usage denied |
1011 | Server error during streaming |
Rate limits
- Concurrent connection limits apply per organization.
- Monthly usage limits (in audio hours) apply per organization.
- Connections that exceed limits are rejected during the WebSocket handshake with close code
4003.