Schemas
Last updated April 16, 2026
AMDParams
| Field | Type | Description | |
|---|---|---|---|
initial_silence_timeout | integer | optional | Max milliseconds of silence before declaring no_speech 2500 |
greeting_duration | integer | optional | Speech duration threshold (ms) above which answerer is classified as machine 1500 |
after_greeting_silence | integer | optional | Silence duration (ms) after initial speech to declare human 800 |
total_analysis_time | integer | optional | Max analysis window in milliseconds 5000 |
minimum_word_length | integer | optional | Minimum speech burst duration (ms) to count as a word 100 |
beep_timeout | integer | optional | Max time (ms) to wait for the voicemail beep after machine detection. 0 or omitted = disabled. 0 |
AddLegRequest
| Field | Type | Description | |
|---|---|---|---|
leg_id | string | required | ID of the leg to add |
mute | boolean | optional | If set, apply this mute state to the leg atomically before it joins the mixer (no race where un-muted audio enters the mix). Omit to leave current state untouched (useful when moving between rooms). |
deaf | boolean | optional | If set, apply this deaf state to the leg atomically before it joins the mixer. Omit to leave current state untouched. |
accept_dtmf | boolean | optional | If set, control whether this leg receives DTMF digits broadcast from other legs in the same room. Omit to leave current state untouched (default for new legs is true). |
AgentMessageRequest
| Field | Type | Description | |
|---|---|---|---|
message | string | required | Context or instruction to inject into the running agent session |
CreateLegRequest
| Field | Type | Description | |
|---|---|---|---|
type | enum | required | Leg type Values: sip |
uri | string | required | SIP URI to dial |
from | string | optional | Caller ID — sets the user part of the SIP From header (e.g. "+15551234567", "alice") |
privacy | string | optional | SIP Privacy header value (e.g. "id", "none") |
ring_timeout | integer | optional | Seconds to wait for answer; 0 = no timeout 0 |
max_duration | integer | optional | Maximum call duration in seconds after connect. Automatically hung up when reached. 0 or omitted = no limit. 0 |
codecs | array[enum] | optional | Codec preference order |
headers | object | optional | Custom SIP headers to include in the outbound INVITE (e.g. X-Correlation-ID) |
room_id | string | optional | Room ID to auto-add the leg to once media is ready (early_media or connected). If the room does not exist, it is automatically created. |
auth | any | optional | SIP digest authentication credentials. If the remote challenges with 401/407, sipgo will retry with these credentials. |
webhook_url | string(uri) | optional | Route all events for this leg exclusively to this URL instead of global webhooks. |
webhook_secret | string | optional | HMAC-SHA256 signing secret for the per-leg webhook. |
amd | any | optional | Enable Answering Machine Detection on outbound calls. Include the object (even empty) to enable with defaults; omit to disable. |
accept_dtmf | boolean | optional | If false, this leg will not receive DTMF digits broadcast from other legs in the same room. Defaults to true. true |
DTMFRequest
| Field | Type | Description | |
|---|---|---|---|
digits | string | required | DTMF digits to send (0-9, *, #) |
DeepgramAgentRequest
| Field | Type | Description | |
|---|---|---|---|
settings | object | optional | Full Deepgram agent settings object (agent.listen, agent.think, agent.speak, etc.). When omitted, sensible defaults are used (nova-3 STT, gpt-4o-mini LLM, aura-2-asteria-en TTS). |
greeting | string | optional | Agent greeting message |
language | string | optional | Language code (e.g. "en", "es") |
api_key | string | optional | API key override (falls back to DEEPGRAM_API_KEY env var) |
ElevenLabsAgentRequest
| Field | Type | Description | |
|---|---|---|---|
agent_id | string | required | ElevenLabs agent ID |
first_message | string | optional | Override the agent's first message |
language | string | optional | Language code (e.g. "en", "es") |
dynamic_variables | object | optional | Key-value pairs passed to the agent as dynamic variables |
api_key | string | optional | API key override (falls back to ELEVENLABS_API_KEY env var) |
Error
| Field | Type | Description | |
|---|---|---|---|
instance_id | string | optional | Instance identifier |
error | string | required | Error message |
ICECandidateInit
| Field | Type | Description | |
|---|---|---|---|
candidate | string | required | ICE candidate string |
sdpMid | string | optional | Media stream identification tag |
sdpMLineIndex | integer | optional | Index of the media description |
Leg
| Field | Type | Description | |
|---|---|---|---|
instance_id | string | optional | Instance identifier |
id | string | required | Unique leg identifier (UUID) |
type | enum | required | Leg type Values: sip_inbound, sip_outbound, webrtc |
state | enum | required | Leg state Values: ringing, early_media, connected, held, hung_up |
room_id | string | optional | Room ID if the leg is in a room, empty otherwise |
muted | boolean | required | Whether the leg is muted (cannot be heard by others) |
deaf | boolean | required | Whether the leg is deaf (cannot hear others) |
accept_dtmf | boolean | required | Whether the leg receives DTMF digits broadcast from other legs in the same room. Defaults to true. |
held | boolean | required | Whether the call is on hold (SIP legs only) |
sip_headers | object | optional | X-* headers from the inbound INVITE. Only present on sip_inbound legs. |
PipecatAgentRequest
| Field | Type | Description | |
|---|---|---|---|
websocket_url | string(uri) | required | WebSocket URL of the Pipecat bot (e.g. ws://my-bot:8765) |
PlaybackRequest
| Field | Type | Description | |
|---|---|---|---|
url | string(uri) | required | URL of the audio file (mutually exclusive with tone) |
tone | string | required | Built-in telephone tone name. Format: {country}_{type} or bare {type} (defaults to US). Types: ringback, busy, dial, congestion. Countries: us, gb, de, fr, au, jp, it, in, br, pl, ru. Examples: us_ringback, gb_busy, dial. |
mime_type | string | required | MIME type (e.g. audio/wav). Required when using url. |
repeat | integer | required | Number of times to repeat playback (url only) 0 |
volume | integer | required | Volume adjustment in dB (-8 to 8) 0 |
RecordingRequest
| Field | Type | Description | |
|---|---|---|---|
storage | enum | required | "file" (default) — local disk, "s3" — upload to S3 after recording stops Values: file, s3 |
multi_channel | boolean | required | When true, record each participant to a separate mono WAV file in addition to the full mix. Only applies to room recordings. false |
s3_bucket | string | required | S3 bucket name. Overrides S3_BUCKET env var. Required if env var is not set. |
s3_region | string | required | AWS region. Overrides S3_REGION env var. Default us-east-1. |
s3_endpoint | string | required | Custom S3 endpoint (MinIO, etc.). Overrides S3_ENDPOINT env var. |
s3_prefix | string | required | Key prefix (e.g. recordings/). Overrides S3_PREFIX env var. |
s3_access_key | string | required | AWS access key ID. Overrides default credential chain. |
s3_secret_key | string | required | AWS secret access key. Must be set together with s3_access_key. |
Room
| Field | Type | Description | |
|---|---|---|---|
instance_id | string | optional | Instance identifier |
id | string | required | Room identifier |
participants | array[object] | required | Legs currently in this room |
RoomCreateRequest
| Field | Type | Description | |
|---|---|---|---|
id | string | required | Custom room ID (auto-generated UUID if omitted) |
webhook_url | string(uri) | optional | Route all events for this room exclusively to this URL instead of global webhooks. |
webhook_secret | string | optional | HMAC-SHA256 signing secret for the per-room webhook. |
SIPAuth
| Field | Type | Description | |
|---|---|---|---|
username | string | required | SIP auth username |
password | string | required | SIP auth password |
STTRequest
| Field | Type | Description | |
|---|---|---|---|
language | string | required | Language code (e.g. "en", "es") |
partial | boolean | required | Emit partial (non-final) transcripts false |
provider | enum | optional | STT provider: "elevenlabs" (default) or "deepgram" Values: elevenlabs, deepgram |
api_key | string | optional | API key override (falls back to ELEVENLABS_API_KEY or DEEPGRAM_API_KEY env var depending on provider) |
StatusResponse
| Field | Type | Description | |
|---|---|---|---|
instance_id | string | optional | Instance identifier |
status | string | required |
TTSRequest
| Field | Type | Description | |
|---|---|---|---|
text | string | required | Text to synthesize |
voice | string | required | Provider-specific voice identifier. ElevenLabs: voice name or ID. AWS Polly: voice ID (e.g. Joanna, Matthew). Google Cloud: voice name — either full format (e.g. en-US-Neural2-F) or short name for Gemini models (e.g. Achernar, Kore). Deepgram: model name (e.g. aura-2-asteria-en). |
model_id | string | required | Provider-specific model/engine. ElevenLabs: model ID. AWS Polly: engine (standard, neural, long-form, generative; default neural). Google Cloud: model name (e.g. gemini-2.5-pro-tts, chirp3-hd). |
language | string | optional | Language code (e.g. "en-US", "pl-pl"). Required for Google Gemini TTS voices that use short names (e.g. Achernar). Auto-extracted from full voice names like en-US-Neural2-F. |
prompt | string | optional | Style/tone instruction for promptable voice models (Google Gemini TTS only). E.g. "Read aloud in a warm, welcoming tone." |
volume | integer | required | Volume adjustment in dB (-8 to 8) 0 |
provider | enum | optional | TTS provider: "elevenlabs" (default), "aws", "google", or "deepgram" Values: elevenlabs, aws, google, deepgram |
api_key | string | optional | ElevenLabs: API key override (falls back to ELEVENLABS_API_KEY env var). AWS: optional ACCESS_KEY:SECRET_KEY override (falls back to default AWS credential chain). Google Cloud: optional API key override (falls back to Application Default Credentials). Deepgram: API key override (falls back to DEEPGRAM_API_KEY env var). |
TransferRequest
| Field | Type | Description | |
|---|---|---|---|
target | string | required | SIP URI to transfer the call to (e.g. "sip:bob@example.com"). |
replaces_leg_id | string | optional | ID of an existing connected SIP leg whose dialog should be replaced (attended transfer). Omit for blind transfer. |
VAPIAgentRequest
| Field | Type | Description | |
|---|---|---|---|
assistant_id | string | required | VAPI assistant ID |
first_message | string | optional | Override the agent's first message |
variable_values | object | optional | Key-value pairs passed as VAPI variable values (assistantOverrides.variableValues) |
api_key | string | optional | API key override (falls back to VAPI_API_KEY env var) |
VolumeRequest
| Field | Type | Description | |
|---|---|---|---|
volume | integer | required | Volume adjustment (-8 to 8, ~3dB per step, 0 = unchanged) |
WebRTCOfferRequest
| Field | Type | Description | |
|---|---|---|---|
sdp | string | required | SDP offer from the browser |
WebhookEvent
Event envelope delivered via HTTP POST to registered webhook URLs. Event-specific fields are flattened into the top-level object (no "data" wrapper). Includes X-Signature-256 header when a secret is configured.
| Field | Type | Description | |
|---|---|---|---|
type | enum | required | Values: leg.ringing, leg.early_media, leg.connected, leg.disconnected, leg.joined_room, leg.left_room, leg.muted, leg.unmuted, leg.hold, leg.unhold, dtmf.received, speaking.started, speaking.stopped, playback.started, playback.finished, playback.error, tts.started, tts.finished, tts.error, recording.started, recording.finished, recording.paused, recording.resumed, leg.transfer_initiated, leg.transfer_requested, leg.transfer_progress, leg.transfer_completed, leg.transfer_failed, room.created, room.deleted, stt.text, agent.connected, agent.disconnected, agent.user_transcript, agent.agent_response |
timestamp | string(date-time) | required | |
instance_id | string | optional | Instance identifier |