Skip to main content
Real-time AI music detection over WebSocket. The client streams audio and receives per-window vocal AI verdicts as they become available, followed by a final clip-level summary - including instrumental AI detection - on completion.

Endpoint

wss://platform.modulate.ai/api/velma-2-ai-music-detection-streaming

Authentication

Pass your API key as a query parameter on the connection URL:
wss://.../velma-2-ai-music-detection-streaming?api_key=YOUR_API_KEY&audio_format=mp3

Connection parameters

ParameterRequiredDescription
api_keyYesYour API key
audio_formatYesAudio format - see supported formats below
sample_rateRaw PCM onlySample rate in Hz
num_channelsRaw PCM onlyNumber of channels (1-8)

Supported audio formats

Container formats - sample_rate and num_channels must not be specified (the headers already carry this metadata): wav, mp3, ogg, flac, webm, aac, aiff Raw PCM formats - sample_rate and num_channels are required: s8, s16le, s16be, s24le, s24be, s32le, s32be, u8, u16le, u16be, u24le, u24be, u32le, u32be, f32le, f32be, f64le, f64be, mulaw, alaw Valid sample rates: 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000

Protocol

Client -> server

MessageDescription
Binary frameChunk of audio bytes in the declared format (any size)
Empty text frame ""Signals end of stream

Server -> client

MessageDescription
{"type": "window", "window": {...}}Per-window result - emitted for each completed 4-second window, in order
{"type": "done", ...}Stream complete - clip-level verdict plus instrumental AI detection
{"type": "error", "error": "..."}An error occurred - connection will close
Vocal AI detection runs on each 4-second window as audio arrives and is reported in window messages. Instrumental AI detection runs on the accumulated audio at end-of-stream, so instrumental_ai_percentage and instrumental_ai_confidence appear only in the final done message.

Window object

FieldTypeDescription
start_time_msintegerWindow start time in milliseconds
end_time_msintegerWindow end time in milliseconds
vocal_percentagefloatPercentage of the window containing vocal content (0-100)
vocal_ai_percentagefloat100 if the window is classified as AI-generated vocals, 0 otherwise (always 0 without sufficient vocal content)
vocal_ai_confidencefloatConfidence the window contains AI-generated vocals (0-1); 0 without sufficient vocal content
instrumental_percentagefloatPercentage of the window containing instrumental music content (0-100)
silence_percentagefloatPercentage of the window containing neither vocal nor instrumental content (0-100)

Done object

FieldTypeDescription
duration_msintegerTotal duration of the streamed audio in milliseconds
window_countintegerTotal number of windows analysed during the session
primary_verdictstringClip-level classification: "ai-vocal-music", "ai-instrumental", or "not-ai-music"
vocal_percentagefloatAverage percentage of audio with vocal content, across all windows (0-100)
vocal_ai_percentagefloatPercentage of clip duration classified as AI-generated vocals (0-100)
vocal_ai_confidencefloatAverage confidence that vocal windows contain AI-generated vocals (0-1)
instrumental_percentagefloatAverage percentage of audio with instrumental content, across all windows (0-100)
instrumental_ai_percentagefloatAI detection score for the clip’s instrumental content (0-100)
instrumental_ai_confidencefloatConfidence in the instrumental AI assessment for the full clip (0-1)
silence_percentagefloatAverage percentage of audio with neither vocal nor instrumental content (0-100)

WebSocket close codes

CodeMeaning
1000Normal closure after the done message, or after the connection completes
1003Invalid or missing query parameters (unknown format, bad sample rate, missing audio_format)
4002Audio could not be decoded, or does not match the declared raw audio_format
4003The request could not be validated, or the request is not permitted
4029The request could not be completed due to insufficient credits
An error message is sent before the connection closes for the 1003, 4002, 4003, and 4029 cases.

Rate limits

  • Concurrent connection limits apply per organization
  • Monthly usage limits (in audio hours) apply per organization