What is Velma?

Velma is Modulate’s Conversation Understanding model. It analyzes audio signals (tone, emotion, intent, and dozens of other acoustic cues) and words to capture the true intent and meaning behind a conversation. Use Velma to surface risks or specific “behaviors” in voice conversations, such as fraud, customer churn, and compliance violations. It ships with 150+ pre-configured behaviors covering common conversation topics and business use cases. Additionally, you can define your own by describing what to detect in a natural-language prompt. Every response also includes a conversation summary, emotion and accent detection, speaker identification, conversation topics, deepfake detection, sentiment, a diarized transcript, and more. The power of Velma is that you describe what you want to detect, and Velma uses audio signals to detect them.

Build with Velma

Start from a detection package — a ready-made configuration that groups the conversation types, roles, and behaviors for a common use case. Explore what each detects, then download and adapt it.

Fraud Detection and Prevention

Catch social engineering, account-takeover attempts, and impersonation in customer conversations.

Agentic AI Guardrails

Keep AI voice agents on-policy: catch off-script, unsafe, or out-of-scope behavior.

Trust and Safety

Flag harassment, hate, and safety violations in live social and voice-chat audio.

Customer Retention

Surface churn signals, dissatisfaction, and save opportunities on every call.

Human Agent Welfare

Protect agents by detecting abuse, distress, and burnout indicators.

Compliance and Risk Monitoring

Catch disclosure gaps, compliance breaches, and risk events across regulated calls.

Batch or streaming?

	Velma (batch)	Velma (streaming)
Use case	Analyze a complete recording	Analyze a live or in-progress conversation
Protocol	HTTP POST	WebSocket
Response	Single JSON response	Stream of typed events
Best for	Post-call QA, compliance review, offline processing	Live monitoring, real-time alerting, in-call coaching

What Velma produces

Behavior detection

Per-behavior verdicts with confidence scores and the specific clips that triggered each detection.

Conversation type

Classification of the conversation against the types you define — customer support, sales, interview, and more.

Participant roles

Per-speaker role assignments drawn from the roles you configure, resolved in real time as speakers are identified.

Topics

An aggregated list of the subjects discussed across the full conversation.

Topic sentiment

Per-speaker sentiment scores for each extracted topic, ranging from −1 (negative) to +1 (positive).

Summary

A free-form narrative summary of the conversation generated at end of stream.

How it works

Batch
Streaming

Velma (batch) is an HTTP POST endpoint at /api/velma-2-batch. Submit a complete audio file as multipart/form-data — the upload_file field carries the audio (up to 100 MB), and the config field carries a JSON-encoded BatchConfig or the literal string "default" to use Velma’s built-in defaults.Velma processes the full recording and returns a single JSON response containing all clips, role assignments, behavior detections, topics, topic sentiment scores, and a summary.

curl -X POST https://platform.modulate.ai/api/velma-2-batch \
  -H "X-API-Key: $MODULATE_API_KEY" \
  -F "upload_file=@recording.mp3" \
  -F "config=default"

Use batch when you have a finished recording and want a complete analysis in one call — post-call QA, compliance review, or processing an uploaded file.

Velma (streaming) is a WebSocket endpoint. You connect, send a configuration frame, then stream audio incrementally. Velma emits structured JSON events as it processes — clips, role assignments, behavior detections, topics, sentiment, and a summary — and closes with a done event when audio ends.The connection follows a strict sequence:

Connect

Open a WebSocket connection to wss://platform.modulate.ai/api/velma-2-streaming with your api_key as a query parameter.

Send configuration

Send a single JSON text frame containing your configuration — the conversation types, participant roles, and behaviors you want Velma to work with. This frame must arrive before any audio.

Stream audio

Send your audio as binary WebSocket frames in any chunk size. Self-describing formats (MP3, WAV, OGG, FLAC, WebM, AAC, AIFF) are auto-detected. Raw PCM formats require audio_format, sample_rate, and num_channels query parameters.

Signal end of stream

Send an empty text frame ("") to tell Velma the audio is complete.

Receive events

Velma emits typed JSON events throughout and closes the connection with a done event.

How Velma differs from the other endpoints

	Velma	Transcription	Deepfake Detection
Primary output	Conversation intelligence	Transcript	Deepfake scores
Behavior detection	✓ (configurable)	—	—
Conversation classification	✓	—	—
Topic and sentiment analysis	✓	—	—
Summarization	✓	—	—
Transcription	✓ (via Transcription options)	✓	—
Batch and streaming	✓	✓	✓

Velma includes transcription as part of its output via Transcription options in your configuration. If transcription alone is what you need, the Transcription endpoints are the right choice.

Velma Triage

Behaviors

Detection packages

Build with Velma

Fraud Detection and Prevention

Agentic AI Guardrails

Trust and Safety

Customer Retention

Human Agent Welfare

Compliance and Risk Monitoring

Batch or streaming?

What Velma produces

Behavior detection

Conversation type

Participant roles

Topics

Topic sentiment

Summary

How it works

How Velma differs from the other endpoints

Next steps

Capabilities

Behaviors

​Build with Velma

Fraud Detection and Prevention

Agentic AI Guardrails

Trust and Safety

Customer Retention

Human Agent Welfare

Compliance and Risk Monitoring

​Batch or streaming?

​What Velma produces

Behavior detection

Conversation type

Participant roles

Topics

Topic sentiment

Summary

​How it works

​How Velma differs from the other endpoints

​Next steps

Capabilities

Behaviors

Build with Velma

Batch or streaming?

What Velma produces

How it works

How Velma differs from the other endpoints

Next steps