Music & Speech Detection quickstart

The Music & Speech Detection streaming API classifies audio as music, speech, or neither, returning frame-level probabilities progressively as audio arrives over WebSocket.

Connect and stream

Authenticate

Pass your API key as a query parameter on the connection URL. Unlike the batch endpoints, there is no X-API-Key header.

wss://platform.modulate.ai/api/velma-2-music-detection-streaming?api_key=YOUR_API_KEY&audio_format=s16le&sample_rate=16000&num_channels=1

See Authentication and rate limits for how to obtain a key.

Choose your audio format

Set audio_format to match what you’re sending.Container formats (no sample_rate or num_channels needed): wav, mp3, ogg, flac, webm, aac, aiffRaw PCM formats (sample_rate and num_channels required): s16le, s16be, s32le, s32be, f32le, f32be, and others — see the API reference for the full list.

Stream audio and read frames

Send audio as binary WebSocket frames in any chunk size. The server emits a frame message after each 192ms of audio processed:

{
  "type": "frame",
  "frame": {
    "start_time_ms": 0,
    "end_time_ms": 192,
    "music_prob": 0.0213,
    "speech_prob": 0.9888
  }
}

Music and speech probabilities are independent — both can be high simultaneously (e.g. music with vocals).

Signal end of stream

Send an empty text frame ("") when you’re done sending audio. The server will flush any remaining audio and respond with a done message:

{
  "type": "done",
  "duration_ms": 15360,
  "frame_count": 80,
  "music_pct": 3.2,
  "speech_pct": 94.1,
  "primary_label": "speech"
}

Code examples

Python — raw PCM
Python — container format
JavaScript (Node.js)

import asyncio
import websockets
import json

WS_URL = "wss://platform.modulate.ai/api/velma-2-music-detection-streaming"
API_KEY = "YOUR_API_KEY"

async def stream_audio(file_path: str) -> None:
    url = (
        f"{WS_URL}?api_key={API_KEY}"
        f"&audio_format=s16le&sample_rate=16000&num_channels=1"
    )
    async with websockets.connect(url) as ws:
        with open(file_path, "rb") as f:
            while chunk := f.read(16000):  # 0.5s of s16le/16kHz mono
                await ws.send(chunk)

        await ws.send("")

        async for message in ws:
            msg = json.loads(message)
            if msg["type"] == "frame":
                frame = msg["frame"]
                print(
                    f"{frame['start_time_ms']}ms – {frame['end_time_ms']}ms  "
                    f"music={frame['music_prob']:.4f}  speech={frame['speech_prob']:.4f}"
                )
            elif msg["type"] == "done":
                print(f"\nDone — {msg['duration_ms']}ms, {msg['frame_count']} frames")
                print(f"Primary label: {msg['primary_label']}")
                break
            elif msg["type"] == "error":
                raise RuntimeError(f"Server error: {msg['error']}")

asyncio.run(stream_audio("/path/to/audio.raw"))

import asyncio
import websockets
import json

WS_URL = "wss://platform.modulate.ai/api/velma-2-music-detection-streaming"
API_KEY = "YOUR_API_KEY"

async def stream_audio_file(file_path: str, audio_format: str) -> None:
    url = f"{WS_URL}?api_key={API_KEY}&audio_format={audio_format}"
    async with websockets.connect(url) as ws:
        with open(file_path, "rb") as f:
            while chunk := f.read(65536):
                await ws.send(chunk)
        await ws.send("")

        async for message in ws:
            msg = json.loads(message)
            if msg["type"] == "frame":
                frame = msg["frame"]
                print(
                    f"{frame['start_time_ms']}ms – {frame['end_time_ms']}ms  "
                    f"music={frame['music_prob']:.4f}  speech={frame['speech_prob']:.4f}"
                )
            elif msg["type"] == "done":
                print(f"\nDone — {msg['duration_ms']}ms, {msg['frame_count']} frames")
                print(f"Primary label: {msg['primary_label']}")
                break
            elif msg["type"] == "error":
                raise RuntimeError(f"Server error: {msg['error']}")

asyncio.run(stream_audio_file("/path/to/audio.mp3", "mp3"))

import { WebSocket } from "ws";
import { createReadStream } from "fs";

const WS_URL = "wss://platform.modulate.ai/api/velma-2-music-detection-streaming";
const API_KEY = "YOUR_API_KEY";

async function streamAudio(filePath, audioFormat) {
  const url = `${WS_URL}?api_key=${API_KEY}&audio_format=${audioFormat}`;
  const ws = new WebSocket(url);

  await new Promise((resolve, reject) => {
    ws.on("open", () => {
      const stream = createReadStream(filePath, { highWaterMark: 65536 });
      stream.on("data", (chunk) => ws.send(chunk));
      stream.on("end", () => ws.send(""));
      stream.on("error", reject);
    });

    ws.on("message", (data) => {
      const msg = JSON.parse(data);
      if (msg.type === "frame") {
        const { start_time_ms, end_time_ms, music_prob, speech_prob } = msg.frame;
        console.log(
          `${start_time_ms}ms – ${end_time_ms}ms  ` +
          `music=${music_prob.toFixed(4)}  speech=${speech_prob.toFixed(4)}`
        );
      } else if (msg.type === "done") {
        console.log(`\nDone — ${msg.duration_ms}ms, ${msg.frame_count} frames`);
        console.log(`Primary label: ${msg.primary_label}`);
        ws.close();
        resolve();
      } else if (msg.type === "error") {
        reject(new Error(`Server error: ${msg.error}`));
      }
    });

    ws.on("error", reject);
  });
}

await streamAudio("/path/to/audio.mp3", "mp3");

WebSocket APIs cannot be tested with cURL. For command-line testing, use websocat.

Next steps

Music & Speech Detection streaming reference — full protocol docs, all close codes, and parameter reference
Music & Speech Detection batch — classify a complete file with a single HTTP POST
Audio formats — supported formats across all APIs

Get started

By capability

Guides

Music & Speech Detection quickstart

Connect and stream

Code examples

Next steps

​Connect and stream

​Code examples

​Next steps

Connect and stream

Code examples

Next steps