Skip to main content
Modulate offers four speech-to-text endpoints. Pick the one that matches your latency, language, and feature needs.
Batch (multilingual)Batch English VFastStreamingStreaming English
Use caseTranscription with rich metadataFast English-only transcriptionReal-time transcriptionLow-latency English real-time transcription
ProtocolHTTP POSTHTTP POSTWebSocketWebSocket
LanguagesMultilingualEnglish onlyMultilingualEnglish only
Speaker diarization
Emotion / accent detection
PII/PHI tagging
Partial transcripts during streaming✓ (every ~1.5 s)
For a side-by-side comparison with the other Modulate capabilities, see Which API should I use?.

Authentication

Batch endpoints use the X-API-Key header. Streaming endpoints use an api_key query parameter at connection time. See Authentication and rate limits.