Free & Open Source · License

SIP and WebRTC Voice Framework.
Built-in AI support.

VoiceBlender is an open source Go service that bridges SIP and WebRTC voice calls with multi-party audio mixing, a REST API, and real-time webhooks. Plug in your own TTS, STT, and AI agent models.

go run ./cmd/voiceblender

Built-in AI support for

Features

A complete toolkit for voice transformation, built in the open.

SIP Inbound & Outbound

Originate and Receive SIP calls. Multiple codecs supported, including PCMA, PCMU and Opus.

WebRTC

Browser-based voice via SDP offer/answer with trickle ICE. Connect users directly from the browser with no plugins required.

Multi-Party Rooms

Mix multiple participants in a single room. Join via SIP, WebRTC, or WebSocket.

Answering Machine Detection

Detect whether an outbound call was answered by a human or a machine. AMD results are delivered via webhooks so you can route calls or drop voicemails automatically.

TTS & STT

Built-in support for ElevenLabs, Google Cloud, AWS Polly, and Deepgram for TTS and STT. Real-time STT with partial transcripts.

AI Agent

Attach a conversational AI agent to any leg or room with barge-in support. Supports ElevenLabs, VAPI, and Pipecat out of the box.

REST API & Webhooks

Full REST API for legs, rooms, playback, recording, and more. Real-time event delivery with HMAC-SHA256 signing and retry.

Build With VoiceBlender

From simple IVR trees to full-scale contact centers — all driven by a REST API.

IVR Systems

Build interactive voice response menus with DTMF input, TTS prompts, and AI-powered natural language routing. Replace rigid dialplans with simple REST calls.

Contact Centers

Route inbound calls to agent queues, mix participants in rooms, record conversations, and get real-time transcription — all through the API.

Help Lines

Set up multilingual voice hotlines with AI agents that handle first-level support, escalate to humans, and log every interaction via webhooks.

Getting Started

Up and running in minutes.

1

Build & Run

go build, go run, or pull the Docker image. REST API on :8080, SIP on :5060.

2

Configure

Set environment variables for SIP, ICE servers, webhooks, and your TTS/STT/AI provider API keys.

3

Connect

Originate SIP calls, accept inbound calls via webhooks, or connect browsers over WebRTC.

Documentation

Everything you need to get started.

Quick Start

bash
# Build and run
go build -o voiceblender ./cmd/voiceblender
./voiceblender

# Or run directly
go run ./cmd/voiceblender

# REST API on :8080, SIP on 127.0.0.1:5060

Typical Workflow

text
1. Register a webhook        POST /v1/webhooks
2. Receive inbound call      --> webhook: leg.ringing {leg_id, from, to}
3. Answer                    POST /v1/legs/{id}/answer
4. Create a room             POST /v1/rooms
5. Add legs to room          POST /v1/rooms/{id}/legs
6. Attach AI agent           POST /v1/legs/{id}/agent/elevenlabs
7. Start recording           POST /v1/legs/{id}/record
8. Hang up                   DELETE /v1/legs/{id}

Legs API

text
POST   /v1/legs                          # Originate outbound SIP call
GET    /v1/legs                          # List all legs
POST   /v1/legs/{id}/answer              # Answer ringing inbound leg
POST   /v1/legs/{id}/early-media         # Enable early media (183)
DELETE /v1/legs/{id}                     # Hang up
POST   /v1/legs/{id}/dtmf               # Send DTMF digits
POST   /v1/legs/{id}/play               # Play audio or tone
POST   /v1/legs/{id}/tts                # Text-to-speech
POST   /v1/legs/{id}/record             # Start recording
POST   /v1/legs/{id}/stt                # Start speech-to-text
POST   /v1/legs/{id}/agent/elevenlabs   # ElevenLabs agent
POST   /v1/legs/{id}/agent/vapi         # Vapi agent
POST   /v1/legs/{id}/agent/pipecat      # Pipecat agent
POST   /v1/legs/{id}/agent/deepgram     # Deepgram agent
POST   /v1/legs/{id}/agent/message      # Send message to agent

Rooms API

text
POST   /v1/rooms                         # Create room
GET    /v1/rooms                         # List rooms
DELETE /v1/rooms/{id}                    # Delete room (hangs up all legs)
POST   /v1/rooms/{id}/legs               # Add leg to room
GET    /v1/rooms/{id}/ws                 # Join room via WebSocket
POST   /v1/rooms/{id}/play               # Play audio or tone to room
POST   /v1/rooms/{id}/tts                # TTS to room
POST   /v1/rooms/{id}/record             # Record room mix
POST   /v1/rooms/{id}/agent/elevenlabs   # ElevenLabs agent
POST   /v1/rooms/{id}/agent/vapi         # Vapi agent
POST   /v1/rooms/{id}/agent/pipecat      # Pipecat agent
POST   /v1/rooms/{id}/agent/deepgram     # Deepgram agent
POST   /v1/rooms/{id}/agent/message      # Send message to agent

Configuration

bash
export HTTP_ADDR=:8080              # REST API listen address
export SIP_BIND_IP=127.0.0.1       # IP for SDP/Contact/Via headers
export SIP_PORT=5060                # SIP listen port
export ICE_SERVERS=stun:stun.l.google.com:19302
export RECORDING_DIR=/tmp/recordings
export LOG_LEVEL=info               # debug, info, warn, error
export WEBHOOK_URL=https://example.com/hooks
export ELEVENLABS_API_KEY=sk-...    # TTS, STT, Agent
export VAPI_API_KEY=...             # Vapi Agent provider
export DEEPGRAM_API_KEY=...         # STT, TTS, Agent
export S3_BUCKET=my-recordings      # Optional S3 upload

WebRTC

text
POST   /v1/webrtc/offer                 # SDP offer/answer exchange
POST   /v1/legs/{id}/ice-candidates     # Add trickle ICE candidate
GET    /v1/legs/{id}/ice-candidates     # Get gathered ICE candidates

Performance

Measured end-to-end with real SIP calls using the built-in benchmark suite.

20ms
Audio latency
avg leg-to-leg at 100 rooms
27
Rooms/sec
concurrent room setup throughput
19MB
Heap
at 100 rooms, 200 active calls
64ms
p99 latency
worst-case at full load

Run it yourself

bash
# Run the benchmark (default scales: 5, 10, 25, 50, 100 rooms)
go test -tags integration -v -timeout 300s \
  -run TestConcurrentRoomsScale ./tests/integration/

# Example output at 100 rooms:
# Phase 1 — Setup: 100 rooms in 3.7s (26.9 rooms/sec)
#   call+room setup latency: avg=570ms p50=615ms p95=728ms p99=751ms
#   Goroutines: 1914  |  Heap alloc: 19.0 MB
# Phase 2 — Sustaining 100 rooms for 3s... All 200 calls connected
# Phase 3 — Audio latency: avg=20ms p50=10ms p95=62ms p99=64ms
# Phase 4 — Teardown: 100 rooms in 5.6ms (17782 rooms/sec)

Contribute to VoiceBlender

VoiceBlender is built by the community. Whether you write code, report bugs, or improve docs, every contribution matters.