Skip to main content
Velma-2 offers two endpoints with identical analysis capabilities. Both accept the same BatchConfig schema and return the same set of outputs — the difference is protocol and response shape.
BatchStreaming
EndpointPOST /api/velma-2-batchwss://modulate-developer-apis.com/api/velma-2-streaming
AuthX-API-Key headerapi_key query parameter
Configconfig form field (JSON string or "default")First text frame (JSON or "default")
ResponseSingle JSON objectStream of typed events

Configuration

Both endpoints use the same BatchConfig schema. You can send the literal string "default" instead of a full config to use Velma’s built-in default behavior set without specifying anything.

Conversation types

A conversation type tells Velma what kind of interaction it is analyzing. Velma uses this to contextualize behavior detection and role assignment. You can define multiple types — Velma will infer which one best matches, or use default_conversation_type as the fallback.
{
  "conversation_types": [
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111008",
      "name": "Customer Support Call",
      "short_description": "A customer contacting a business for help with a product or service.",
      "detailed_description": "The conversation involves a customer with a service issue and a representative resolving it. Tone is professional. The representative is expected to follow company procedure."
    }
  ]
}

Participant roles

Roles describe the speakers Velma expects. Scope roles to specific conversation types via applies_to_conversation_type_uuids. If omitted, the role applies to all types.
{
  "participant_roles": [
    {
      "participant_role_uuid": "22222222-2222-4222-8222-222222222017",
      "name": "Customer",
      "short_description": "The recipient of a service or good.",
      "detailed_description": "A customer calling in with a service issue.",
      "applies_to_conversation_type_uuids": ["11111111-1111-4111-8111-111111111008"]
    }
  ]
}

Behaviors

The behaviors array accepts two types of entries — full BehaviorDef objects and preset reference strings — and you can mix both in the same array. Preset reference: a string in the form "preset:<identifier>". Velma expands it into the full behavior definition before processing. Use GET /api/velma-2-batch/list-presets or GET /api/velma-2-streaming/list-presets to discover available identifiers. Full BehaviorDef: supply all four required fields yourself. Takes precedence over any preset entry with the same UUID.
{
  "behaviors": [
    "preset:harassment",
    "preset:service-churn",
    {
      "behavior_uuid": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
      "name": "Agent Script Deviation",
      "short_description": "Agent departs from the required call opening script.",
      "detailed_description": "..."
    }
  ]
}
See Behaviors for the full guide.

STT options

Control what transcription data appears in clip outputs:
OptionTypeDefaultWhat it adds
speaker_diarizationbooleantruePer-speaker clip attribution
emotion_signalbooleanfalsePer-clip emotion label
accent_signalbooleanfalsePer-clip accent label
deepfake_signalbooleanfalsePer-clip deepfake_score (0–1)
pii_phi_taggingbooleanfalseSensitive spans wrapped in entity tags
languagestringautoForce a specific language code

Aggregate outputs

OptionTypeDefault
produce_topicsbooleantrue
produce_topic_sentimentsbooleantrue
produce_summarybooleantrue
Set any of these to false to suppress the corresponding output.

Batch endpoint

POST /api/velma-2-batch — submit a complete audio file, receive a single JSON response. Requestmultipart/form-data:
FieldTypeRequiredDescription
upload_filebinaryYesAudio file. Max 100 MB. Supported formats: AAC, AIFF, FLAC, MP3, MP4, MOV, OGG, Opus, WAV, WebM.
configstringNoJSON-encoded BatchConfig, or the literal string "default". Defaults to "default" if omitted.
curl -X POST https://modulate-developer-apis.com/api/velma-2-batch \
  -H "X-API-Key: $MODULATE_API_KEY" \
  -F "upload_file=@recording.mp3" \
  -F 'config={"behaviors":["preset:harassment","preset:service-churn"]}'
ResponseBatchResponse:
FieldTypeDescription
duration_msintegerTotal audio duration
clipsarrayTranscribed segments — see Clip
conversation_type_pickobject or nullInferred conversation type
participant_role_picksarrayPer-speaker role assignments
behaviorsarrayPer-behavior detection results — see BehaviorDetection
topicsarrayExtracted topic strings
topic_sentimentsarrayPer-speaker sentiment per topic
summarystring or nullNarrative summary
Error responses:
StatusMeaning
400Unsupported file format, empty file, or malformed request
403Request not permitted
422Invalid config value — malformed JSON or unknown preset identifier
429Insufficient credits
500Internal server error
502Request could not be validated or completed — retry

Streaming events

Velma emits JSON events throughout a streaming session. Every event has a type field.

clip

A transcribed segment of speech. Emitted in near real time.
{
  "type": "clip",
  "clip": {
    "clip_uuid": "a1b2c3d4-...",
    "text": "I'd like to cancel my subscription.",
    "start_ms": 4800,
    "duration_ms": 2100,
    "speaker_label": "Speaker_1",
    "language": "en",
    "emotion": null,
    "accent": null,
    "deepfake_score": null
  }
}
emotion, accent, and deepfake_score are non-null only when their corresponding STT options are enabled.

conversation_type

Velma’s pick for the conversation type, emitted once enough context is available.
{
  "type": "conversation_type",
  "pick": {
    "conversation_type_uuid": "11111111-1111-4111-8111-111111111008",
    "name": "Customer Support Call",
    "confidence": 0.94,
    "selection_source": "inferred",
    "detail": "...",
    "reasoning": "..."
  }
}
selection_source is one of inferred, auto_selected_single_option, or default.

participant_role

A per-speaker role assignment. One event per speaker label.
{
  "type": "participant_role",
  "pick": {
    "speaker_label": "Speaker_1",
    "participant_role_uuid": "22222222-2222-4222-8222-222222222017",
    "name": "Customer",
    "confidence": 0.88,
    "selection_source": "inferred",
    "detail": "...",
    "reasoning": "..."
  }
}

behavior_detection

A per-behavior verdict. Emitted for each behavior once Velma has enough audio to decide.
{
  "type": "behavior_detection",
  "detection": {
    "behavior_uuid": "33333333-3333-4333-8333-033333333006",
    "behavior_name": "Service Churn",
    "speaker_label": "Speaker_1",
    "detected": true,
    "confidence": 0.91,
    "evidence_clip_uuids": ["a1b2c3d4-...", "e5f6a7b8-..."],
    "definitive_clip_uuid": "a1b2c3d4-...",
    "reasoning": "Speaker explicitly stated they want to cancel their subscription.",
    "skipped": false,
    "skip_reason": null,
    "error_reason": null
  }
}
skipped: true means Velma did not attempt detection — check skip_reason. error_reason is non-null if detection failed.

topics

Aggregated list of subjects discussed. Emitted at end of stream.
{
  "type": "topics",
  "topics": ["subscription cancellation", "billing dispute", "refund policy"]
}

topic_sentiment

Per-speaker sentiment for each topic. One event per speaker per topic.
{
  "type": "topic_sentiment",
  "topic_sentiment": {
    "topic": "subscription cancellation",
    "speaker_label": "Speaker_1",
    "sentiment_score": -0.72,
    "sentiment_label": "negative"
  }
}
sentiment_score ranges from −1 (strongly negative) to +1 (strongly positive).

summary

A free-form narrative summary. Emitted at end of stream.
{
  "type": "summary",
  "text": "The customer called to cancel their subscription due to billing concerns. The representative offered a partial credit, which the customer declined. The call ended without resolution."
}

done

Signals streaming is complete. Always the final event.
{
  "type": "done",
  "duration_ms": 183400
}

error

Emitted if a processing error occurs. The connection closes after this event.
{
  "type": "error",
  "error": "Invalid input audio"
}

WebSocket close codes

CodeMeaning
1000Normal closure after the done event
1003Protocol error — invalid config JSON, audio sent before config, unsupported audio format or sample rate
4003Request could not be validated, or not permitted
4029Insufficient credits