Skip to main content
The music detection streaming API classifies audio as music, speech, or neither, returning frame-level probabilities progressively as audio arrives over WebSocket.

Connect and stream

1

Authenticate

Pass your API key as a query parameter on the connection URL. Unlike the batch endpoints, there is no X-API-Key header.
wss://modulate-developer-apis.com/api/velma-2-music-detection-streaming?api_key=YOUR_API_KEY&audio_format=s16le&sample_rate=16000&num_channels=1
See Authentication and rate limits for how to obtain a key.
2

Choose your audio format

Set audio_format to match what you’re sending.Container formats (no sample_rate or num_channels needed): wav, mp3, ogg, flac, webm, aac, aiffRaw PCM formats (sample_rate and num_channels required): s16le, s16be, s32le, s32be, f32le, f32be, and others — see the API reference for the full list.
3

Stream audio and read frames

Send audio as binary WebSocket frames in any chunk size. The server emits a frame message after each 192ms of audio processed:
{
  "type": "frame",
  "frame": {
    "start_time_ms": 0,
    "end_time_ms": 192,
    "music_prob": 0.0213,
    "speech_prob": 0.9888
  }
}
Music and speech probabilities are independent — both can be high simultaneously (e.g. music with vocals).
4

Signal end of stream

Send an empty text frame ("") when you’re done sending audio. The server will flush any remaining audio and respond with a done message:
{
  "type": "done",
  "duration_ms": 15360,
  "frame_count": 80,
  "music_pct": 3.2,
  "speech_pct": 94.1,
  "primary_label": "speech"
}

Code examples

import asyncio
import websockets
import json

WS_URL = "wss://modulate-developer-apis.com/api/velma-2-music-detection-streaming"
API_KEY = "YOUR_API_KEY"

async def stream_audio(file_path: str) -> None:
    url = (
        f"{WS_URL}?api_key={API_KEY}"
        f"&audio_format=s16le&sample_rate=16000&num_channels=1"
    )
    async with websockets.connect(url) as ws:
        with open(file_path, "rb") as f:
            while chunk := f.read(16000):  # 0.5s of s16le/16kHz mono
                await ws.send(chunk)

        await ws.send("")

        async for message in ws:
            msg = json.loads(message)
            if msg["type"] == "frame":
                frame = msg["frame"]
                print(
                    f"{frame['start_time_ms']}ms – {frame['end_time_ms']}ms  "
                    f"music={frame['music_prob']:.4f}  speech={frame['speech_prob']:.4f}"
                )
            elif msg["type"] == "done":
                print(f"\nDone — {msg['duration_ms']}ms, {msg['frame_count']} frames")
                print(f"Primary label: {msg['primary_label']}")
                break
            elif msg["type"] == "error":
                raise RuntimeError(f"Server error: {msg['error']}")

asyncio.run(stream_audio("/path/to/audio.raw"))
WebSocket APIs cannot be tested with cURL. For command-line testing, use websocat.

Next steps