Schemas
Last updated May 26, 2026
AMDParams
| Field | Type | Description | |
|---|---|---|---|
initial_silence_timeout | integer | optional | Max milliseconds of silence before declaring no_speech 2500 |
greeting_duration | integer | optional | Speech duration threshold (ms) above which answerer is classified as machine 1500 |
after_greeting_silence | integer | optional | Silence duration (ms) after initial speech to declare human 800 |
total_analysis_time | integer | optional | Max analysis window in milliseconds 5000 |
minimum_word_length | integer | optional | Minimum speech burst duration (ms) to count as a word 100 |
beep_timeout | integer | optional | Max time (ms) to wait for the voicemail beep after machine detection. 0 or omitted = disabled. 0 |
AddLegRequest
| Field | Type | Description | |
|---|---|---|---|
leg_id | string | required | ID of the leg to add |
mute | boolean | optional | If set, apply this mute state to the leg atomically before it joins the mixer (no race where un-muted audio enters the mix). Omit to leave current state untouched (useful when moving between rooms). |
deaf | boolean | optional | If set, apply this deaf state to the leg atomically before it joins the mixer. Omit to leave current state untouched. |
accept_dtmf | boolean | optional | If set, control whether this leg receives DTMF digits broadcast from other legs in the same room. Omit to leave current state untouched (default for new legs is true). |
role | string | optional | If set, apply this routing role to the leg atomically before it joins the mixer. The room's routing matrix (see PUT /v1/rooms/{id}/routing) decides which other legs this leg hears and is heard by based on roles. Pass "" to clear the role (full mesh). Omit to leave the current role untouched. |
AgentMessageRequest
| Field | Type | Description | |
|---|---|---|---|
message | string | required | Context or instruction to inject into the running agent session |
AnswerLegRequest
| Field | Type | Description | |
|---|---|---|---|
speech_detection | boolean | optional | If true, emit speaking.started and speaking.stopped events for this leg. If false, suppress them. Omit to use the server default (SPEECH_DETECTION_ENABLED env var, default false). |
codec | enum | optional | Explicit codec for the answer SDP. Must appear in the remote offer's offered_codecs list. Omit to use the server's default preference order. Values: PCMU, PCMA, G722, opus |
BridgeView
| Field | Type | Description | |
|---|---|---|---|
id | string | required | Bridge identifier |
room_id | string | required | The peer room joined to the room in the path |
direction | enum | required | Audio flow relative to the room in the path: bidirectional, send, receive, or none. Values: bidirectional, send, receive, none |
sample_rate | integer | required | Shared mixer sample rate in Hz (both rooms must match). |
CreateLegRequest
| Field | Type | Description | |
|---|---|---|---|
type | enum | required | Leg type Values: sip, whatsapp, websocket |
to | string | optional | Destination. For sip legs, a SIP URI (e.g. "sip:alice@example.com"). For whatsapp legs, an E.164 phone number (with or without '+'). |
uri | string | optional | Deprecated alias for `to` (sip legs only). Prefer `to`. |
from | string | optional | Caller ID — sets the user part of the SIP From header (e.g. "+15551234567", "alice") |
privacy | string | optional | SIP Privacy header value (e.g. "id", "none") |
ring_timeout | integer | optional | Seconds to wait for answer; 0 = no timeout 0 |
max_duration | integer | optional | Maximum call duration in seconds after connect. Automatically hung up when reached. 0 or omitted = no limit. 0 |
codecs | array[enum] | optional | Codec preference order (sip legs only) |
headers | object | optional | Custom headers to include in the outbound INVITE (sip/whatsapp) or the WebSocket upgrade request (websocket) |
room_id | string | optional | Room ID to auto-add the leg to once media is ready (early_media or connected). If the room does not exist, it is automatically created. |
auth | any | optional | Digest auth credentials. Required for whatsapp legs (Meta-issued password; username defaults to `from` with '+' stripped). Optional for sip legs (sipgo retries on 401/407 challenge). |
webhook_url | string(uri) | optional | Route all events for this leg exclusively to this URL instead of global webhooks. |
webhook_secret | string | optional | HMAC-SHA256 signing secret for the per-leg webhook. |
amd | any | optional | Enable Answering Machine Detection on outbound calls. Include the object (even empty) to enable with defaults; omit to disable. |
accept_dtmf | boolean | optional | If false, this leg will not receive DTMF digits broadcast from other legs in the same room. Defaults to true. true |
app_id | string | optional | Application identifier. Carried through to all events for this leg. Use to filter the WebSocket event stream by app. |
speech_detection | boolean | optional | If true, emit speaking.started and speaking.stopped events for this leg. If false, suppress them. Omit to use the server default (SPEECH_DETECTION_ENABLED env var, default false). |
rtt | boolean | optional | For sip legs: offer Real-Time Text (ITU-T T.140 over RTP per RFC 4103) alongside audio. For websocket legs: enable the bidirectional text-message channel. Default: false. false |
url | string(uri) | optional | WebSocket target URL (ws:// or wss://) for outbound websocket legs. Required when type=websocket. |
sample_rate | enum | optional | PCM sample rate for websocket legs. The room's mixer automatically resamples between this and the room rate. 16000Values: 8000, 16000, 24000, 48000 |
wire_format | enum | optional | Audio framing for websocket legs. `binary` ships raw PCM as WebSocket binary frames; `json_base64` wraps PCM as `{"type":"audio","audio":"<base64>"}` text frames (browser-friendly). "binary"Values: binary, json_base64 |
sample_format | enum | optional | On-the-wire PCM sample encoding for websocket legs. v1 only supports `s16le`. "s16le"Values: s16le |
CreateRoomBridgeRequest
| Field | Type | Description | |
|---|---|---|---|
id | string | optional | Custom bridge ID (auto-generated UUID if omitted) |
room_id | string | required | The other room to join. Must use the same sample rate as the room in the path. |
direction | enum | optional | Audio flow relative to the room in the path: bidirectional (both hear each other), send (path room → other only), receive (other → path room only), none (allocated but silent). Default: bidirectional. "bidirectional"Values: bidirectional, send, receive, none |
DTMFRequest
| Field | Type | Description | |
|---|---|---|---|
digits | string | required | DTMF digits to send (0-9, *, #) |
DeepgramAgentRequest
| Field | Type | Description | |
|---|---|---|---|
settings | object | optional | Full Deepgram agent settings object (agent.listen, agent.think, agent.speak, etc.). When omitted, sensible defaults are used (nova-3 STT, gpt-4o-mini LLM, aura-2-asteria-en TTS). |
greeting | string | optional | Agent greeting message |
language | string | optional | Language code (e.g. "en", "es") |
api_key | string | optional | API key override (falls back to DEEPGRAM_API_KEY env var) |
DeleteLegRequest
| Field | Type | Description | |
|---|---|---|---|
reason | enum | optional | Disconnect reason. Only honored for unanswered SIP inbound legs (state `ringing` or `early_media`); on connected legs the body is ignored and the leg is hung up with the legacy `api_hangup` reason. The value flows through to `leg.disconnected`'s `cdr.reason` and selects the SIP final response: `busy`→486, `declined`/`rejected`→603, `unavailable`→480, `not_found`→404, `forbidden`→403, `server_error`→500. Values: busy, declined, rejected, unavailable, not_found, forbidden, server_error |
EarlyMediaLegRequest
| Field | Type | Description | |
|---|---|---|---|
codec | enum | optional | Explicit codec for the 183 Session Progress SDP. Must appear in the remote offer's offered_codecs list. Omit to use the server's default preference order. Values: PCMU, PCMA, G722, opus |
ElevenLabsAgentRequest
| Field | Type | Description | |
|---|---|---|---|
agent_id | string | required | ElevenLabs agent ID |
first_message | string | optional | Override the agent's first message |
language | string | optional | Language code (e.g. "en", "es") |
dynamic_variables | object | optional | Key-value pairs passed to the agent as dynamic variables |
api_key | string | optional | API key override (falls back to ELEVENLABS_API_KEY env var) |
Error
| Field | Type | Description | |
|---|---|---|---|
instance_id | string | optional | Instance identifier |
error | string | required | Error message |
ICECandidateInit
| Field | Type | Description | |
|---|---|---|---|
candidate | string | required | ICE candidate string |
sdpMid | string | optional | Media stream identification tag |
sdpMLineIndex | integer | optional | Index of the media description |
Leg
| Field | Type | Description | |
|---|---|---|---|
instance_id | string | optional | Instance identifier |
id | string | required | Unique leg identifier (UUID) |
type | enum | required | Leg type Values: sip_inbound, sip_outbound, webrtc, whatsapp_in, whatsapp_out, websocket_in, websocket_out |
state | enum | required | Leg state Values: ringing, early_media, connected, held, hung_up |
room_id | string | optional | Room ID if the leg is in a room, empty otherwise |
muted | boolean | required | Whether the leg is muted (cannot be heard by others) |
deaf | boolean | required | Whether the leg is deaf (cannot hear others) |
accept_dtmf | boolean | required | Whether the leg receives DTMF digits broadcast from other legs in the same room. Defaults to true. |
held | boolean | required | Whether the call is on hold (SIP legs only) |
role | string | optional | Routing role used by the room's audio routing matrix (e.g. "customer", "agent", "supervisor"). Empty string means unroled (full mesh). |
app_id | string | optional | Application identifier for event stream filtering. |
sip_headers | object | optional | Deprecated: X-* headers from the inbound INVITE. Only present on sip_inbound legs. Use `headers` for new code; it carries the same map plus surfaces handshake headers for websocket legs. |
headers | object | optional | Custom protocol headers exposed by the leg's transport — X-/P- headers from a SIP INVITE, the WebSocket upgrade request, or supplied at outbound dial time. |
PipecatAgentRequest
| Field | Type | Description | |
|---|---|---|---|
websocket_url | string(uri) | required | WebSocket URL of the Pipecat bot (e.g. ws://my-bot:8765) |
PlaybackRequest
| Field | Type | Description | |
|---|---|---|---|
url | string(uri) | required | URL of the audio file (mutually exclusive with tone) |
tone | string | required | Built-in telephone tone name. Format: {country}_{type} or bare {type} (defaults to US). Types: ringback, busy, dial, congestion. Countries: us, gb, de, fr, au, jp, it, in, br, pl, ru. Examples: us_ringback, gb_busy, dial. |
mime_type | string | required | MIME type (e.g. audio/wav). Required when using url. |
repeat | integer | required | Number of times to repeat playback (url only) 0 |
volume | integer | required | Volume adjustment in dB (-8 to 8) 0 |
RTTRequest
| Field | Type | Description | |
|---|---|---|---|
text | string | required | UTF-8 text to send. May be one or more characters and may include T.140 control codes (e.g. backspace U+0008, CR/LF). |
RecordingRequest
| Field | Type | Description | |
|---|---|---|---|
storage | enum | required | "file" (default) — local disk, "s3" — upload to S3 after recording stops Values: file, s3 |
multi_channel | boolean | required | When true, record each participant to a separate mono WAV file in addition to the full mix. Only applies to room recordings. false |
s3_bucket | string | required | S3 bucket name. Overrides S3_BUCKET env var. Required if env var is not set. |
s3_region | string | required | AWS region. Overrides S3_REGION env var. Default us-east-1. |
s3_endpoint | string | required | Custom S3 endpoint (MinIO, etc.). Overrides S3_ENDPOINT env var. |
s3_prefix | string | required | Key prefix (e.g. recordings/). Overrides S3_PREFIX env var. |
s3_access_key | string | required | AWS access key ID. Overrides default credential chain. |
s3_secret_key | string | required | AWS secret access key. Must be set together with s3_access_key. |
Room
| Field | Type | Description | |
|---|---|---|---|
instance_id | string | optional | Instance identifier |
id | string | required | Room identifier |
app_id | string | optional | Application identifier for event stream filtering. |
sample_rate | integer | required | Mixer sample rate in Hz (8000, 16000, or 48000). |
participants | array[object] | required | Legs currently in this room |
RoomCreateRequest
| Field | Type | Description | |
|---|---|---|---|
id | string | required | Custom room ID (auto-generated UUID if omitted) |
webhook_url | string(uri) | optional | Route all events for this room exclusively to this URL instead of global webhooks. |
webhook_secret | string | optional | HMAC-SHA256 signing secret for the per-room webhook. |
app_id | string | optional | Application identifier. Carried through to all events for this room. Use to filter the WebSocket event stream by app. |
sample_rate | enum | optional | Mixer sample rate in Hz. Allowed values: 8000, 16000, 48000. Default: 16000. 16000Values: 8000, 16000, 48000 |
RoomRoutingRequest
| Field | Type | Description | |
|---|---|---|---|
matrix | object | required | Listener-role → list of allowed source roles. Omitted listener roles default to full mesh. Empty list = hears nothing. |
RoomRoutingUpdateRequest
| Field | Type | Description | |
|---|---|---|---|
updates | array[object] | required | Per-listener-role row replacements applied as a single atomic update. |
RoomRoutingView
| Field | Type | Description | |
|---|---|---|---|
matrix | object | required | Listener-role → list of allowed source roles. Roles absent from the matrix default to full mesh. |
RoutingRowUpdate
| Field | Type | Description | |
|---|---|---|---|
listener_role | string | required | The role whose row is being replaced. |
sources | array[string] | required | New list of allowed source roles for this listener role. Pass null to clear the row (full mesh). |
SIPAuth
| Field | Type | Description | |
|---|---|---|---|
username | string | optional | Digest auth username. Optional for whatsapp legs (defaults to `from` with '+' stripped, per Meta's spec). |
password | string | required | Digest auth password. |
STTRequest
| Field | Type | Description | |
|---|---|---|---|
language | string | required | Language code (e.g. "en", "es") |
partial | boolean | required | Emit partial (non-final) transcripts false |
provider | enum | optional | STT provider: "elevenlabs" (default) or "deepgram" Values: elevenlabs, deepgram |
api_key | string | optional | API key override (falls back to ELEVENLABS_API_KEY or DEEPGRAM_API_KEY env var depending on provider) |
SetLegRoleRequest
| Field | Type | Description | |
|---|---|---|---|
role | string | required | New routing role for the leg. The room's routing matrix decides which other legs this leg hears and is heard by based on roles. Pass an empty string to clear the role (full mesh). |
StatusResponse
| Field | Type | Description | |
|---|---|---|---|
instance_id | string | optional | Instance identifier |
status | string | required |
TTSRequest
| Field | Type | Description | |
|---|---|---|---|
text | string | required | Text to synthesize |
voice | string | required | Provider-specific voice identifier. ElevenLabs: voice name or ID. AWS Polly: voice ID (e.g. Joanna, Matthew). Google Cloud: voice name — either full format (e.g. en-US-Neural2-F) or short name for Gemini models (e.g. Achernar, Kore). Deepgram: model name (e.g. aura-2-asteria-en). |
model_id | string | required | Provider-specific model/engine. ElevenLabs: model ID. AWS Polly: engine (standard, neural, long-form, generative; default neural). Google Cloud: model name (e.g. gemini-2.5-pro-tts, chirp3-hd). |
language | string | optional | Language code (e.g. "en-US", "pl-pl"). Required for Google Gemini TTS voices that use short names (e.g. Achernar). Auto-extracted from full voice names like en-US-Neural2-F. |
prompt | string | optional | Style/tone instruction for promptable voice models (Google Gemini TTS only). E.g. "Read aloud in a warm, welcoming tone." |
volume | integer | required | Volume adjustment in dB (-8 to 8) 0 |
provider | enum | optional | TTS provider: "elevenlabs" (default), "aws", "google", or "deepgram" Values: elevenlabs, aws, google, deepgram |
api_key | string | optional | ElevenLabs: API key override (falls back to ELEVENLABS_API_KEY env var). AWS: optional ACCESS_KEY:SECRET_KEY override (falls back to default AWS credential chain). Google Cloud: optional API key override (falls back to Application Default Credentials). Deepgram: API key override (falls back to DEEPGRAM_API_KEY env var). |
TransferRequest
| Field | Type | Description | |
|---|---|---|---|
target | string | required | SIP URI to transfer the call to (e.g. "sip:bob@example.com"). |
replaces_leg_id | string | optional | ID of an existing connected SIP leg whose dialog should be replaced (attended transfer). Omit for blind transfer. |
UpdateRoomBridgeRequest
| Field | Type | Description | |
|---|---|---|---|
direction | enum | required | New audio flow relative to the room in the path: bidirectional, send, receive, or none. Values: bidirectional, send, receive, none |
VAPIAgentRequest
| Field | Type | Description | |
|---|---|---|---|
assistant_id | string | required | VAPI assistant ID |
first_message | string | optional | Override the agent's first message |
variable_values | object | optional | Key-value pairs passed as VAPI variable values (assistantOverrides.variableValues) |
api_key | string | optional | API key override (falls back to VAPI_API_KEY env var) |
VolumeRequest
| Field | Type | Description | |
|---|---|---|---|
volume | integer | required | Volume adjustment (-8 to 8, ~3dB per step, 0 = unchanged) |
WebRTCCandidatesResult
| Field | Type | Description | |
|---|---|---|---|
candidates | array[object] | required | |
done | boolean | required |
WebRTCOfferRequest
| Field | Type | Description | |
|---|---|---|---|
sdp | string | required | SDP offer from the browser |
WebRTCOfferResult
| Field | Type | Description | |
|---|---|---|---|
leg_id | string | required | |
sdp | string | required |
WebhookEvent
Event envelope delivered via HTTP POST to registered webhook URLs. Event-specific fields are flattened into the top-level object (no "data" wrapper). Includes X-Signature-256 header when a secret is configured.
| Field | Type | Description | |
|---|---|---|---|
type | enum | required | Values: leg.ringing, leg.early_media, leg.connected, leg.disconnected, leg.joined_room, leg.left_room, leg.muted, leg.unmuted, leg.deaf, leg.undeaf, leg.hold, leg.unhold, leg.command_failed, dtmf.received, rtt.received, speaking.started, speaking.stopped, playback.started, playback.finished, playback.error, tts.started, tts.finished, tts.error, recording.started, recording.finished, recording.paused, recording.resumed, leg.transfer_initiated, leg.transfer_requested, leg.transfer_progress, leg.transfer_completed, leg.transfer_failed, room.created, room.deleted, room.bridged, room.bridge_updated, room.unbridged, room.routing_changed, leg.role_changed, stt.text, agent.connected, agent.disconnected, agent.user_transcript, agent.agent_response, amd.result, amd.beep |
timestamp | string(date-time) | required | |
instance_id | string | optional | Instance identifier |