Velma

It analyzes audio signals alongside the words to surface behaviors and risks in voice conversations — fraud, customer churn, compliance violations, and more. Configure it with 150+ pre-built behaviors or define your own in plain language, either the built-in default or a JSON BatchConfig — and behaviors can be pulled from a catalog of ready-made presets.

	Batch	Streaming
Use case	Analyze a complete recording	Analyze a live conversation in real time
Protocol	HTTP POST (multipart upload)	WebSocket
Configuration	`config` form field — `default` or a JSON `BatchConfig`	First text frame — `default` or a JSON `BatchConfig`
Output	Full `BatchResponse` after processing	Discrete events emitted as results are produced
Max file size	100 MB	— (streaming)
Transcription + diarization	✓	✓
Conversation-type & participant-role inference	✓	✓
Behavior detection (with presets)	✓	✓
Topics, topic sentiment, summary	✓	✓

For a side-by-side comparison with the other Modulate capabilities, see Which API should I use?.

Configuration

Both endpoints take the same configuration: either the literal string default to use the built-in configuration, or a JSON BatchConfig describing the conversation types, participant roles, behaviors, STT options, and which aggregate outputs (topics, sentiments, summary) to produce. The full BatchConfig schema is rendered on the Batch reference. Behaviors can be specified inline or referenced from a catalog of presets using the preset:<identifier> syntax. List the available presets with List behavior presets.

Authentication

Batch uses the X-API-Key header. Streaming uses an api_key query parameter at connection time. See Authentication and rate limits.

Speech-to-text Transcription

Deepfake Detection

Emotion Detection

Accent Detection

PII/PHI Redaction

Music Detection

AI Music Detection

Language Detection

Velma

Configuration

Authentication

​Configuration

​Authentication

Configuration

Authentication