Modulate developer docs

Detect behaviors, classify conversations, identify participant roles, and extract topics with sentiment. Velma is the flagship model for developers building complete voice intelligence solutions — available as a single API call or a real-time stream.

Individual models

Transcription

Multilingual transcription with speaker diarization, emotion, accent, and PII/PHI tagging — batch or real-time streaming.

Deepfake Detection

Per-frame deepfake detection. Classify recorded files or stream live audio for real-time anti-spoofing.

PII/PHI Redaction

Remove sensitive spans from transcripts and silence the matching audio ranges — batch and streaming.

Music & Speech Detection

Frame-level music and speech probability scoring. Classify a file or stream audio for real-time content analysis.

Language Detection

Identify the spoken language of an audio file — confidence-scored results across 100 languages in a single synchronous call.

Not sure which API fits your use case? See Which API should I use?

What you can build

Meeting transcription

Multilingual transcripts with speaker labels, timestamps, and optional emotion or accent signals.

Live captions

Stream audio over WebSocket and receive utterances as they’re spoken.

Anti-spoofing

Real-time deepfake detection during voice authentication flows.

Compliance archives

Shareable recordings with PII/PHI silenced from both transcript and audio.

Content moderation

Classify audio as music, speech, or neither — frame by frame, at scale.

Deepfake screening

Batch-process uploaded audio to flag AI-generated voice content.

Get started

By capability

Guides

Modulate developer docs

Velma

Velma: Conversation Understanding

Individual models

Transcription

Deepfake Detection

PII/PHI Redaction

Music & Speech Detection

Language Detection

New here?

Quick start

What you can build

Meeting transcription

Live captions

Anti-spoofing

Compliance archives

Content moderation

Deepfake screening

​Velma

Velma: Conversation Understanding

​Individual models

Transcription

Deepfake Detection

PII/PHI Redaction

Music & Speech Detection

Language Detection

​New here?

Quick start

​What you can build

Meeting transcription

Live captions

Anti-spoofing

Compliance archives

Content moderation

Deepfake screening

Velma

Individual models

New here?

What you can build