Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.modulate.ai/llms.txt

Use this file to discover all available pages before exploring further.

Real-time synthetic voice detection over WebSocket. Streams audio to the server and receives per-frame verdicts (synthetic, non-synthetic, or no-content) with confidence scores as analysis windows complete. For a conceptual explanation of how detection works — including windowing, silence trimming, and the no-content verdict — see How synthetic voice detection works.

Endpoint

wss://modulate-developer-apis.com/api/velma-2-synthetic-voice-detection-streaming

Authentication

Pass your API key as a query parameter when opening the connection.
wss://modulate-developer-apis.com/api/velma-2-synthetic-voice-detection-streaming?api_key=YOUR_API_KEY&audio_format=s16le&sample_rate=16000&num_channels=1
Unlike the batch endpoint, the streaming API does not use an X-API-Key header. The key must be in the query string at connection time.

Query parameters

ParameterTypeRequiredDescription
api_keystringYesYour API key
audio_formatstringYesAudio encoding format — see Audio formats and preprocessing
sample_rateintegerConditionalRequired for raw (headerless) formats. One of: 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000
num_channelsintegerConditionalRequired for raw formats. 1–8
For supported format values and format selection guidance, see Audio formats and preprocessing.

Connection flow

  1. Connect with api_key, audio_format, and (for raw formats) sample_rate and num_channels.
  2. Stream audio as binary WebSocket frames. Frames can be any size.
  3. Receive frame JSON messages as analysis windows complete.
  4. Send an empty text frame ("") to signal end of audio.
  5. Receive a done message with total duration and frame count.
  6. The connection closes automatically.

Server messages

Frame result

Sent each time an analysis window is complete.
{
  "type": "frame",
  "frame": {
    "start_time_ms": 0,
    "end_time_ms": 4000,
    "verdict": "synthetic",
    "confidence": 0.9732
  }
}
FieldTypeDescription
start_time_msintegerFrame start time in the audio stream (ms)
end_time_msintegerFrame end time in the audio stream (ms)
verdictstring"synthetic", "non-synthetic", or "no-content"
confidencefloatConfidence in the verdict, 0.0–1.0

Done

{
  "type": "done",
  "duration_ms": 12500,
  "frame_count": 10
}
FieldTypeDescription
duration_msintegerTotal duration of the streamed audio in milliseconds
frame_countintegerTotal number of frames analyzed

Error

{
  "type": "error",
  "error": "Invalid audio_format='mp4'. Valid values: ['aac', 'aiff', ...]"
}

WebSocket close codes

CodeMeaning
1000Normal closure after a successful done message
1003Invalid query parameters (bad format, sample rate, or channels)
4002Audio data does not match the declared format
4003Authentication failed or usage denied
1011Server error during streaming

Rate limits

  • Concurrent connection limits apply per organization.
  • Monthly usage limits (in audio hours) apply per organization.
  • Connections that exceed limits are rejected during the WebSocket handshake with close code 4003.