Audio formats and preprocessing

Reference for supported audio formats across all Velma-2 endpoints, with guidance on format selection, conversion, and the special requirements of the streaming Synthetic Voice Detection (SVD) endpoint.

Format support by endpoint

Format	STT Batch	STT English VFast	STT Streaming	SVD Batch	SVD Streaming
AAC	✓	—	✓	✓	✓ (container)
AIFF	✓	—	✓	✓	✓ (container)
FLAC	✓	—	✓	✓	✓ (container)
MOV	✓	—	—	✓	—
MP3	✓	—	✓	✓	✓ (container)
MP4	✓	—	✓	✓	—
OGG	✓	—	✓	✓	✓ (container)
Opus	✓	✓ (required)	✓	✓	✓ (container)
WAV	✓	—	✓	✓	✓ (container)
WebM	✓	—	✓	✓	✓ (container)
Raw PCM	—	—	—	—	✓ (raw)

Recommended maximum file size for all HTTP batch endpoints: 100 MB.

Choosing a format

Opus is the recommended format for streaming use cases. It provides excellent audio quality at low bitrates, reducing bandwidth consumption while preserving the acoustic detail the models need. STT English VFast accepts Opus only. Files without this extension are rejected with 400. For all other batch endpoints, any supported container format works. Use whatever format your audio pipeline already produces.

SVD streaming: raw PCM vs container formats

The Synthetic Voice Detection streaming endpoint accepts two categories of audio format, declared via the audio_format query parameter.

Container formats

Container formats (WAV, MP3, OGG, FLAC, WebM, AAC, AIFF) include metadata — sample rate, channel count, codec — within the stream itself. When using a container format, sample_rate and num_channels are not required.

wss://...?api_key=YOUR_API_KEY&audio_format=webm

Raw PCM formats

Raw formats are headerless audio samples. The server cannot infer sample rate or channel count from the data itself, so sample_rate and num_channels are required query parameters when using any raw format.

wss://...?api_key=YOUR_API_KEY&audio_format=s16le&sample_rate=16000&num_channels=1

Supported raw formats: s8, s16le, s16be, s24le, s24be, s32le, s32be, u8, u16le, u16be, u24le, u24be, u32le, u32be, f32le, f32be, f64le, f64be, mulaw, alaw

Common raw format configurations

Use case	`audio_format`	`sample_rate`	`num_channels`
Default / native app	`s16le`	`16000`	`1`
Web Audio API (`AudioWorklet`)	`f32le`	`48000`	`1`
Native stereo capture	`s16le`	`48000`	`2`
Telephony (mu-law)	`mulaw`	`8000`	`1`
Telephony (A-law)	`alaw`	`8000`	`1`

The `s16le` passthrough optimization

When the input is s16le at 16 kHz mono, no format conversion is performed before analysis. This is the most efficient configuration for the SVD Streaming endpoint. All other formats are decoded and resampled to 16 kHz mono before analysis. There is no functional difference in output, but the passthrough path avoids the conversion overhead.

If you control the audio capture pipeline and are integrating with the SVD Streaming endpoint, capture in s16le at 16 kHz mono to take advantage of zero-cost passthrough.

Supported `sample_rate` values

8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000

`num_channels` range

1–8. Multi-channel audio is downmixed to mono before analysis.

Error handling for format problems

Endpoint	Status / code	Cause
STT English VFast	`400`	Non-Opus file, empty file, or decode error
STT Batch	`400`	Unsupported format or empty file
SVD Batch	`400`	Empty file or unsupported format
SVD Batch	`422`	Audio shorter than 0.5 seconds
SVD Streaming	Close code `1003`	Invalid `audio_format`, `sample_rate`, or `num_channels` query parameter
SVD Streaming	Close code `4002`	Audio data does not match the declared format

The SVD Streaming endpoint validates format parameters at connection time (close code 1003) and again when the first audio chunk arrives (close code 4002). If you declare audio_format=s16le but send WebM data, the connection is closed with 4002 after the first chunk.

Code examples by language — includes Opus conversion patterns
Troubleshooting

Get started

Guides

Resources

Audio formats and preprocessing

Format support by endpoint

Choosing a format

SVD streaming: raw PCM vs container formats

Container formats

Raw PCM formats

Common raw format configurations

The `s16le` passthrough optimization

Supported `sample_rate` values

`num_channels` range

Error handling for format problems

Get started

Guides

Resources

Documentation Index

​Format support by endpoint

​Choosing a format

​SVD streaming: raw PCM vs container formats

​Container formats

​Raw PCM formats

​Common raw format configurations

​The s16le passthrough optimization

​Supported sample_rate values

​num_channels range

​Error handling for format problems

​Related

Format support by endpoint

Choosing a format

SVD streaming: raw PCM vs container formats

Container formats

Raw PCM formats

Common raw format configurations

The `s16le` passthrough optimization

Supported `sample_rate` values

`num_channels` range

Error handling for format problems

Related