synthetic, non-synthetic, or no-content across time-windowed frames. Use batch for complete files; use streaming for real-time anti-spoofing checks.
Batch
Send a complete audio file and receive frame-level verdicts for the full clip.Expected response
Expected response
| Verdict | Meaning |
|---|---|
synthetic | AI-generated voice detected |
non-synthetic | Human voice detected |
no-content | Silence or non-voice content |
confidence (0–1) reflects the model’s certainty for that frame’s verdict.
Streaming (WebSocket)
Connect over WebSocket and receive frame verdicts progressively as audio arrives — useful for real-time anti-spoofing in voice authentication flows.Example messages received
Example messages received
Deepfake score in STT
If you need transcription and a synthetic voice signal, the STT Batch API supportsdeepfake_signal=true — it adds a per-utterance deepfake_score without a second API call. Use the dedicated SVD APIs when you need frame-level results, explicit no-content verdicts, or streaming verdicts without transcription.
API reference
- SVD Batch — full parameter and response schema
- SVD Streaming — WebSocket protocol, PCM format requirements, close codes