Velma
Velma: Conversation Understanding
Detect behaviors, classify conversations, identify participant roles, and extract topics with sentiment. Velma is the flagship model for developers building complete voice intelligence solutions — available as a single API call or a real-time stream.
Individual models
Speech to text
Multilingual transcription with speaker diarization, emotion, accent, and PII/PHI tagging — batch or real-time streaming.
Deepfake detection
Per-frame synthetic voice detection. Classify recorded files or stream live audio for real-time anti-spoofing.
PII/PHI redaction
Remove sensitive spans from transcripts and silence the matching audio ranges — batch and streaming.
Music detection
Frame-level music and speech probability scoring. Classify a file or stream audio for real-time content analysis.
Language detection
Identify the spoken language of an audio file — confidence-scored results across 100 languages in a single synchronous call.
New here?
Quick start
Make your first API call in under five minutes — no SDK required.
What you can build
Meeting transcription
Multilingual transcripts with speaker labels, timestamps, and optional emotion or accent signals.
Live captions
Stream audio over WebSocket and receive utterances as they’re spoken.
Anti-spoofing
Real-time synthetic voice detection during voice authentication flows.
Compliance archives
Shareable recordings with PII/PHI silenced from both transcript and audio.
Content moderation
Classify audio as music, speech, or neither — frame by frame, at scale.
Deepfake screening
Batch-process uploaded audio to flag AI-generated voice content.