AI Music Detection Batch
Detect AI-generated music in an audio file. Returns a clip-level verdict plus a per-window breakdown of vocal and instrumental AI content.
Authorizations
API key used for authentication and usage tracking.
Body
Audio file to analyse. Must be non-empty. Supported formats:
.aac, .flac, .m4a, .mp3, .mp4, .ogg, .opus,
.wav. Maximum file size: 100 MB.
Response
Detection completed successfully.
Name of the submitted audio file. Empty string if no filename was provided in the upload.
"my_audio.mp3"
Total duration of the analysed audio in seconds.
x >= 089.28
Clip-level classification:
ai-vocal-music- AI-generated music with a detected synthetic voice (covers AI songs and AI synthetic vocal tracks).ai-instrumental- AI-generated instrumental music with no detectable synthetic voice.not-ai-music- the clip does not appear to contain AI-generated music.
ai-vocal-music, ai-instrumental, not-ai-music "ai-vocal-music"
Clip-level average percentage of the audio that contains vocal content, averaged across all windows.
0 <= x <= 10087.5
Percentage of the clip duration classified as AI-generated vocals. Computed as (seconds of windows classified as AI vocals) / (total clip seconds) * 100. A window contributes its full duration when it is classified as AI-generated vocals; zero otherwise.
0 <= x <= 10056.5
Average confidence that the vocal windows contain AI-generated vocals, across all windows with vocal content. Not diluted by non-vocal windows.
0 <= x <= 10.89
Clip-level average percentage of the audio that contains instrumental music content, averaged across all windows.
0 <= x <= 10064.3
Percentage of non-vocal non-silent windows classified as AI-generated instrumental content (0-100 scale).
0 <= x <= 10010.5
Maximum confidence that a non-vocal window contains AI-generated instrumental content. Zero if no such window was found.
0 <= x <= 10.95
Clip-level average percentage of the audio that contains neither vocal nor instrumental content, averaged across all windows.
0 <= x <= 1003.51
Per-window breakdown of detection results.
End-to-end inference time in milliseconds.
x >= 01333