Skip to main content
Detect harmful content — harassment, hate speech, and identity-, welfare-, and child-safety violations — in live social audio and voice chat, in real time. Live social audio is one venue. The same detections extend across community moderation, user-generated content, livestreams, and any space where people interact at scale. Use this package as a starting point — add or swap in any preset from the catalog, or define your own behaviors to fit your use case. This package includes: 11 conversation types · 5 participant roles · 33 behaviors.

Use this package

The config below is ready to use as-is — download or copy it and pass it as the config payload in a Velma Triage request. Its behaviors are preset references: the API expands each preset:<identifier> into its full definition at request time, so you don’t need the criteria inline to run the package. Download trust-and-safety.json
{
  "conversation_types": [
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111004",
      "name": "General Media Narration",
      "short_description": "Any media style content with a single speaker that talks exclusively in the third person",
      "detailed_description": "media"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111005",
      "name": "Multiple Speakers Livestreamed Media",
      "short_description": "Any improvised media content that features several speakers and explicitly exists for entertainment or social purposes",
      "detailed_description": "mediasocial"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111006",
      "name": "Media Interview or Talk Show",
      "short_description": "Conversations in media formatted as one on one interviews, host with one or more guests, or podcasts formatted as question and answer shows.",
      "detailed_description": "mediasocial"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111009",
      "name": "General Media Dialogue",
      "short_description": "Any conversation with multiple speaking participants that's scripted featuring other things like music or sound effects",
      "detailed_description": "mediasocial"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111010",
      "name": "Music",
      "short_description": "Media content primarily featuring music",
      "detailed_description": "mediasocial"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111011",
      "name": "Audiobook",
      "short_description": "A book narrated into audio",
      "detailed_description": "mediasocial"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111012",
      "name": "Social Media Content",
      "short_description": "Any content with a narrator that's formatted in a way that's targeted for social media.",
      "detailed_description": "mediasocial"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111013",
      "name": "Single Speaker Livestreamed Media",
      "short_description": "Any media content that features only one speaker that's improvised and explicitly exists for entertainment or social purposes",
      "detailed_description": "mediasocial"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111014",
      "name": "Online Game Chat",
      "short_description": "Any social chat where the participants are brought together to play a game",
      "detailed_description": "gamingsocial"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111015",
      "name": "Online Chat Room",
      "short_description": "Any conversation with several participants who appear to be strangers, in a large group, or have the ability to talk over each other.",
      "detailed_description": "casualsocial"
    },
    {
      "conversation_type_uuid": "11111111-1111-4111-8111-111111111016",
      "name": "Social Phone or Video Call",
      "short_description": "Any call with two to five participants that's explicitly for the purpose of socializing",
      "detailed_description": "casualsocial"
    }
  ],
  "participant_roles": [
    {
      "participant_role_uuid": "22222222-2222-4222-8222-222222222003",
      "name": "Social Participant",
      "short_description": "Participant in social conversation",
      "detailed_description": ""
    },
    {
      "participant_role_uuid": "22222222-2222-4222-8222-222222222007",
      "name": "Media Participant",
      "short_description": "A speaker in any media",
      "detailed_description": ""
    },
    {
      "participant_role_uuid": "22222222-2222-4222-8222-222222222008",
      "name": "Narrator",
      "short_description": "A media participant who speaks in the third person exclusively",
      "detailed_description": ""
    },
    {
      "participant_role_uuid": "22222222-2222-4222-8222-222222222009",
      "name": "Social Participant",
      "short_description": "A partipant in a social conversation",
      "detailed_description": ""
    },
    {
      "participant_role_uuid": "22222222-2222-4222-8222-222222222016",
      "name": "Interviewer",
      "short_description": "A person giving an interview",
      "detailed_description": ""
    }
  ],
  "behaviors": [
    "preset:future_planning",
    "preset:social_etiquette",
    "preset:sexually_graphic_material",
    "preset:storytelling",
    "preset:social_boundary_setting",
    "preset:material_potentially_unsuitable_for_children",
    "preset:violent_graphic_material",
    "preset:social_connection_building",
    "preset:personal_vulnerability",
    "preset:encouragement",
    "preset:teaching_mentorship",
    "preset:narration",
    "preset:monologuing",
    "preset:poetry",
    "preset:rapport_building",
    "preset:inclusive_practices",
    "preset:unclear_speech",
    "preset:unaddressed_question",
    "preset:hateful_or_violent_ideology_propagation",
    "preset:child_safety_violation",
    "preset:sexual_harassment",
    "preset:harassment",
    "preset:suicidal_and_self_injurious_ideation",
    "preset:hate",
    "preset:self_harm_and_self_injury_glorification",
    "preset:misogyny",
    "preset:racism",
    "preset:homophobia",
    "preset:transphobia",
    "preset:sizeism",
    "preset:xenophobia",
    "preset:ableism",
    "preset:social_inclusion"
  ]
}

Expand the full criteria

To produce a self-contained config with every behavior’s full criteria inlined — for review, customization, or pinning a snapshot — fetch the live preset catalog and merge it into the downloaded config. The catalog is the source of truth for detection criteria.
curl -s https://modulate-developer-apis.com/api/velma-2-batch/list-presets \
  -H "X-API-Key: $MODULATE_API_KEY" \
| jq --slurpfile cfg trust-and-safety.json '
    [ $cfg[0].behaviors[] | ltrimstr("preset:") ] as $ids
    | { conversation_types: $cfg[0].conversation_types,
        participant_roles:  $cfg[0].participant_roles,
        behaviors: [ .presets[] | select(.identifier as $i | $ids | index($i)) ] }
  ' > trust-and-safety.full.json
trust-and-safety.full.json keeps the same conversation_types and participant_roles and replaces each preset reference with its full behavior definition — drop it into the config payload exactly like the preset version.

Conversation types

The interaction contexts this package expects to see.
NameWhat it is
General Media NarrationAny media style content with a single speaker that talks exclusively in the third person
Multiple Speakers Livestreamed MediaAny improvised media content that features several speakers and explicitly exists for entertainment or social purposes
Media Interview or Talk ShowConversations in media formatted as one on one interviews, host with one or more guests, or podcasts formatted as question and answer shows.
General Media DialogueAny conversation with multiple speaking participants that’s scripted featuring other things like music or sound effects
MusicMedia content primarily featuring music
AudiobookA book narrated into audio
Social Media ContentAny content with a narrator that’s formatted in a way that’s targeted for social media.
Single Speaker Livestreamed MediaAny media content that features only one speaker that’s improvised and explicitly exists for entertainment or social purposes
Online Game ChatAny social chat where the participants are brought together to play a game
Online Chat RoomAny conversation with several participants who appear to be strangers, in a large group, or have the ability to talk over each other.
Social Phone or Video CallAny call with two to five participants that’s explicitly for the purpose of socializing

Participant roles

The speaker roles the package distinguishes.
NameWhat it is
Social ParticipantParticipant in social conversation
Media ParticipantA speaker in any media
NarratorA media participant who speaks in the third person exclusively
Social ParticipantA partipant in a social conversation
InterviewerA person giving an interview

Behaviors

The 33 signals this package detects. Each maps to a reusable preset:<identifier> you can drop into the behaviors array of any BatchConfig — the config above already references them.
Full detection criteria are not duplicated here. The live preset catalog is the source of truth — retrieve the exact criteria for any behavior by name from the list-presets endpoint.
BehaviorWhat it detectsPreset
Future PlanningDiscussion of goals or intended future actions. We detect this using forward-looking temporal language paired with planning cadence, collaborative tone, anticipatory prosody, and measured pacing.preset:future_planning
Social EtiquetteObservance of politeness and manners. We detect this through courteous tone, appropriate pacing, respectful address forms, and smooth turn transitions.preset:social_etiquette
Sexually Graphic MaterialExplicit descriptions of sexual activity or content. We detect this using suggestive prosody, discomfort or arousal markers, and contextual acoustic cues that go beyond neutral or educational discussion.preset:sexually_graphic_material
StorytellingThird-person recounting of events in narrative form. We detect this using narrative arc pacing, character-based vocal modulation, temporal structuring, and expressive prosody.preset:storytelling
Social Boundary SettingEstablishing limits for appropriate interaction. We detect this using firm but calm tone, slowed pacing, clear prosodic boundaries, and reduced emotional escalation.preset:social_boundary_setting
Material Potentially Unsuitable for ChildrenUse of age-inappropriate language or themes. We detect this using profanity intensity, emotional arousal, laughter timing, vocal emphasis on taboo terms, and contextual cues to recognize innuendo.preset:material_potentially_unsuitable_for_children
Violent Graphic MaterialGraphic descriptions or depictions of physical violence. We detect this through vivid descriptive cadence, stress patterns, breath control changes, and emotional intensity that accompany graphic recounting beyond neutral narration.preset:violent_graphic_material
Social Connection BuildingSignals of interpersonal bonding and relational warmth. We detect this through mutual laughter, mirroring of speech rhythms, relaxed pacing, warm vocal timbre, and decreasing formality over time.preset:social_connection_building
Personal VulnerabilityExpressions of inner feelings or personal struggles. We detect this using softened volume, hesitations, longer pauses, emotional tremor, and shifts toward introspective tone and slower speech rate.preset:personal_vulnerability
EncouragementSupportive reinforcement of another’s actions or ideas. We detect this through positive prosody, upward intonation, affirming rhythm, increased energy, and emotional warmth in delivery.preset:encouragement
Teaching/MentorshipInstructional guidance aimed at skill or knowledge transfer. We detect this using structured pacing, explanatory intonation, deliberate pauses, corrective tone shifts, and reduced emotional volatility.preset:teaching_mentorship
NarrationThird-person descriptive speech detached from present interaction. We detect this through consistent third-person framing, steady pacing, neutral affect, and minimal turn-taking responsiveness.preset:narration
MonologuingExtended uninterrupted expressive speech by one speaker. We detect this using long speaking turns, theatrical intonation, emotional variability, minimal pauses for response, and self-directed delivery.preset:monologuing
PoetrySpeech employing poetic structure or stylistic devices. We detect this through rhythmic meter, deliberate pauses, rhyme or alliteration cues, melodic intonation, and performative cadence.preset:poetry
Rapport BuildingPositive alignment forming a professional relationship. We detect this through reciprocal tone matching, affirming backchannels, relaxed pacing, and increasing conversational ease.preset:rapport_building
Inclusive PracticesRespectful language promoting inclusion and equity. We detect this through careful word choice reinforced by respectful tone, measured pacing, non-dismissive intonation, and calm emotional delivery.preset:inclusive_practices
Unclear SpeechSpeech difficult to interpret or understand. We detect this using slurred articulation, inconsistent pacing, counterparty confusion, overlapping speech, and frequent self-corrections.preset:unclear_speech
Unaddressed QuestionFailure to adequately respond to a posed question. We detect this through avoidance pauses, topic-shifting intonation, increased filler usage, and prosodic signals of deflection.preset:unaddressed_question
Hateful or Violent Ideology PropagationPromotion of hate-based or violent belief systems. We detect this through ideological slogans, charged emotional delivery, escalating intensity, and dehumanizing tonal patterns.preset:hateful_or_violent_ideology_propagation
Child Safety ViolationSexual exploitation or endangerment involving minors. We detect this using covert language cues, abnormal hesitation, grooming-style warmth, secrecy-driven pacing, and contextual red flags.preset:child_safety_violation
Sexual HarassmentUnwanted sexualized speech or advances. We detect this through suggestive intonation, boundary-testing pauses, inappropriate familiarity, and discomfort responses from others.preset:sexual_harassment
HarassmentPersistent unwanted targeted behavior. We detect this through repeated hostile tone, fixation on a target, escalating intensity, and lack of de-escalation cues.preset:harassment
Suicidal and Self Injurious IdeationSignals of thoughts about self-harm or suicide. We detect this through flattened affect, slowed speech, long silences, emotional heaviness, and indirect despair cues.preset:suicidal_and_self_injurious_ideation
HateIdentity-based hateful or discriminatory speech. We detect this using demeaning tone, dehumanizing language delivery, emotional hostility, and ideological reinforcement patterns.preset:hate
Self-Harm and Self-Injury GlorificationPortrayal of self-harm as positive or necessary. We detect this through minimizing tone, abnormal calmness, valorizing prosody, and repeated normalization cues.preset:self_harm_and_self_injury_glorification
MisogynyEvidence of cultural attitudes that propagate systems of marginalizing people based on their gender identity or gender presentationpreset:misogyny
RacismEvidence of cultural attitudes that propagate systems of marginalizing people based on their racial or national identity if that identity isn’t whitepreset:racism
HomophobiaEvidence of cultural attitudes that propagate systems of marginalizing people based on their sexualitypreset:homophobia
TransphobiaEvidence of cultural attitudes that propagate systems of marginalizing people based on their trans identitypreset:transphobia
SizeismMarginalization or stigmatization based on body size, shape, or weight-related conditions. We detect sizism using ridicule or disgust conveyed through vocal affect, laughter timing, exaggerated emphasis on physical descriptors, shaming prosody, sarcastic cadence, and emotional distancing signals that indicate judgment even when explicit insults are absent.preset:sizeism
XenophobiaMarginalization or hostility toward people based on nationality, culture, or religion. We detect xenophobia using hostile or exclusionary tone, accent-mimicry or accent-mocking delivery, emotionally charged pacing, sharp prosodic emphasis around group references, us-versus-them framing expressed through intonation, and background conversational cues that signal fear or threat amplification.preset:xenophobia
AbleismCultural marginalization of people based on ability, disability, or neurodivergence. We detect ableism using dismissive or mocking tone, exaggerated vocal imitation, sarcasm markers, dehumanizing prosody, emotional contempt, emphasis patterns around ability-related references, and interaction dynamics that signal minimization or invalidation beyond the literal words spoken.preset:ableism
Social InclusionActive efforts to include diverse participants. We detect this using affirming tone, inclusive address patterns, balanced turn-taking, and warm emotional delivery.preset:social_inclusion