Skip to main content
Velma-2 is Modulate’s Conversation Understanding model. It analyzes audio signals (tone, emotion, intent, and dozens of other acoustic cues) and words to capture the true intent and meaning behind a conversation. Use Velma to surface risks or specific “behaviors” in voice conversations, such as fraud, customer churn, and compliance violations. It ships with 150+ pre-configured behaviors covering common conversation topics and business use cases. Additionally, you can define your own by describing what to detect in a natural-language prompt. Every response also includes a conversation summary, emotion and accent detection, speaker identification, conversation topics, synthetic voice detection, sentiment, a diarized transcript, and more. The power of Velma is that you describe what you want to detect, and Velma uses audio signals to detect them.

Build with Velma

These are just examples — if you can describe it, Velma can detect it.

Fraud Prevention

Detect account impersonation, vishing, and social engineering in real time. Monitor agent SOP compliance on the same call.

Trust and Safety

Deploy Velma’s full safety catalog against live audio. Flag harmful content the moment it happens, not after the damage is done.

Full Call Analysis

The whole conversation analyzed. Summary, sentiment, speakers, behaviors, and risks in one response.

Batch or streaming?

Velma-2 BatchVelma-2 Streaming
Use caseAnalyze a complete recordingAnalyze a live or in-progress conversation
ProtocolHTTP POSTWebSocket
ResponseSingle JSON responseStream of typed events
Best forPost-call QA, compliance review, offline processingLive monitoring, real-time alerting, in-call coaching

What Velma produces

Behavior detection

Per-behavior verdicts with confidence scores and the specific clips that triggered each detection.

Conversation type

Classification of the conversation against the types you define — customer support, sales, interview, and more.

Participant roles

Per-speaker role assignments drawn from the roles you configure, resolved in real time as speakers are identified.

Topics

An aggregated list of the subjects discussed across the full conversation.

Topic sentiment

Per-speaker sentiment scores for each extracted topic, ranging from −1 (negative) to +1 (positive).

Summary

A free-form narrative summary of the conversation generated at end of stream.

How it works

Velma-2 Batch is an HTTP POST endpoint at /api/velma-2-batch. Submit a complete audio file as multipart/form-data — the upload_file field carries the audio (up to 100 MB), and the config field carries a JSON-encoded BatchConfig or the literal string "default" to use Velma’s built-in defaults.Velma processes the full recording and returns a single JSON response containing all clips, role assignments, behavior detections, topics, topic sentiment scores, and a summary.
curl -X POST https://modulate-developer-apis.com/api/velma-2-batch \
  -H "X-API-Key: $MODULATE_API_KEY" \
  -F "upload_file=@recording.mp3" \
  -F "config=default"
Use batch when you have a finished recording and want a complete analysis in one call — post-call QA, compliance review, or processing an uploaded file.

How Velma differs from the other endpoints

Velma-2STTSVD
Primary outputConversation intelligenceTranscriptDeepfake scores
Behavior detection✓ (configurable)
Conversation classification
Topic and sentiment analysis
Summarization
Transcription✓ (via STT options)
Batch and streaming
Velma includes transcription as part of its output via STT options in your configuration. If transcription alone is what you need, the STT endpoints are the right choice.

Next steps

Capabilities

Explore all of Velma’s analysis outputs and configuration options in detail.

Behaviors

Learn how to define the signals you want Velma to detect.