Real-time AI music detection over WebSocket. The client streams audio and receives per-window vocal AI verdicts as they become available, followed by a final clip-level summary - including instrumental AI detection - on completion.
Endpoint
wss://platform.modulate.ai/api/velma-2-ai-music-detection-streaming
Authentication
Pass your API key as a query parameter on the connection URL:
wss://.../velma-2-ai-music-detection-streaming?api_key=YOUR_API_KEY&audio_format=mp3
Connection parameters
| Parameter | Required | Description |
|---|
api_key | Yes | Your API key |
audio_format | Yes | Audio format - see supported formats below |
sample_rate | Raw PCM only | Sample rate in Hz |
num_channels | Raw PCM only | Number of channels (1-8) |
Container formats - sample_rate and num_channels must not be specified (the headers already carry this metadata):
wav, mp3, ogg, flac, webm, aac, aiff
Raw PCM formats - sample_rate and num_channels are required:
s8, s16le, s16be, s24le, s24be, s32le, s32be, u8, u16le, u16be, u24le, u24be, u32le, u32be, f32le, f32be, f64le, f64be, mulaw, alaw
Valid sample rates: 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000
Protocol
Client -> server
| Message | Description |
|---|
| Binary frame | Chunk of audio bytes in the declared format (any size) |
Empty text frame "" | Signals end of stream |
Server -> client
| Message | Description |
|---|
{"type": "window", "window": {...}} | Per-window result - emitted for each completed 4-second window, in order |
{"type": "done", ...} | Stream complete - clip-level verdict plus instrumental AI detection |
{"type": "error", "error": "..."} | An error occurred - connection will close |
Vocal AI detection runs on each 4-second window as audio arrives and is reported in window messages. Instrumental AI detection runs on the accumulated audio at end-of-stream, so instrumental_ai_percentage and instrumental_ai_confidence appear only in the final done message.
Window object
| Field | Type | Description |
|---|
start_time_ms | integer | Window start time in milliseconds |
end_time_ms | integer | Window end time in milliseconds |
vocal_percentage | float | Percentage of the window containing vocal content (0-100) |
vocal_ai_percentage | float | 100 if the window is classified as AI-generated vocals, 0 otherwise (always 0 without sufficient vocal content) |
vocal_ai_confidence | float | Confidence the window contains AI-generated vocals (0-1); 0 without sufficient vocal content |
instrumental_percentage | float | Percentage of the window containing instrumental music content (0-100) |
silence_percentage | float | Percentage of the window containing neither vocal nor instrumental content (0-100) |
Done object
| Field | Type | Description |
|---|
duration_ms | integer | Total duration of the streamed audio in milliseconds |
window_count | integer | Total number of windows analysed during the session |
primary_verdict | string | Clip-level classification: "ai-vocal-music", "ai-instrumental", or "not-ai-music" |
vocal_percentage | float | Average percentage of audio with vocal content, across all windows (0-100) |
vocal_ai_percentage | float | Percentage of clip duration classified as AI-generated vocals (0-100) |
vocal_ai_confidence | float | Average confidence that vocal windows contain AI-generated vocals (0-1) |
instrumental_percentage | float | Average percentage of audio with instrumental content, across all windows (0-100) |
instrumental_ai_percentage | float | AI detection score for the clip’s instrumental content (0-100) |
instrumental_ai_confidence | float | Confidence in the instrumental AI assessment for the full clip (0-1) |
silence_percentage | float | Average percentage of audio with neither vocal nor instrumental content (0-100) |
WebSocket close codes
| Code | Meaning |
|---|
1000 | Normal closure after the done message, or after the connection completes |
1003 | Invalid or missing query parameters (unknown format, bad sample rate, missing audio_format) |
4002 | Audio could not be decoded, or does not match the declared raw audio_format |
4003 | The request could not be validated, or the request is not permitted |
4029 | The request could not be completed due to insufficient credits |
An error message is sent before the connection closes for the 1003, 4002, 4003, and 4029 cases.
Rate limits
- Concurrent connection limits apply per organization
- Monthly usage limits (in audio hours) apply per organization