# ZuckZapGo โ€” Calls API: complete LLM build guide > Paste this whole file into any LLM (Claude, GPT, Gemini, โ€ฆ) and ask it to build a > WhatsApp **voice-call** integration against ZuckZapGo. It is self-contained: every > endpoint, the exact audio wire-format, the enable sequence, full end-to-end flows, a > drop-in browser client, a server-side (Node/Python/Go) client, the AI-voice-agent mode, > and the common pitfalls are all here. Nothing else is required. > > Native two-way call audio powered by **meowcaller** (https://github.com/purpshell/meowcaller) โ€” ๐Ÿ™ thank you, Rajeh (@purpshell). - **Format**: this document is the source of truth. Field names, status codes, and the PCM framing below are exact โ€” copy them verbatim. - **Two placeholders** you must fill in: `{BASE_URL}` (e.g. `https://your-host` or `http://localhost:8080`) and `{TOKEN}` (the per-instance user token). --- ## 0. The one-paragraph mental model ZuckZapGo exposes a **native, pure-Go VoIP engine**. You drive calls with plain **REST** (`/call/*`, JSON, token header) and you carry **audio** over a single **bidirectional WebSocket** at `GET /call/{call_id}/stream`. That socket speaks **raw PCM**: signed 16-bit little-endian, **16 000 Hz, mono**, in **960-sample (1920-byte, 60 ms) frames**. Inbound peer audio arrives as binary frames; you send the microphone back as binary frames. There is **no WebRTC, no SDP, no ICE** to deal with โ€” the engine already terminates the WhatsApp media relay (SRTP) for you and hands you decoded PCM. Your only job is: enable the engine, place/answer a call, open the WebSocket, and play/capture PCM. ``` control plane (REST, token header) your app โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ ZuckZapGo โ”€โ”€โ–บ WhatsApp POST /call/dial ยท /call/answer ยท /call/hangup ยท โ€ฆ (VoIP engine) relay โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ media plane (WebSocket, ?token=) your app โ—„โ•โ•โ•โ•โ•โ• s16le 16kHz mono PCM (peer audio) โ•โ•โ•โ•โ•โ•โ•โ• ZuckZapGo โ—„โ• SRTP/16kHz โ•โ•โ•โ•โ•โ•โ• s16le 16kHz mono PCM (your mic) โ•โ•โ•โ•โ•โ•โ•โ–บ /call/{id}/stream ``` **Golden rules** 1. Audio only flows **after the call is answered** (callee picks up an outbound call, or you answer an inbound call). Before that the relay is silent โ€” that is normal, not a bug. 2. For app/browser-driven two-way audio, the instance **must** be in `call_inbound_mode: "manual"`. Other modes make the engine answer and handle audio server-side. 3. The audio WebSocket and server-side **recording are mutually exclusive per call** โ€” they share the call's single inbound sink. Use one or the other. --- ## 1. Authentication | Surface | How to authenticate | |---|---| | REST `/call/*` (and all standard endpoints) | HTTP header `token: {TOKEN}` | | WebSocket `/call/{call_id}/stream` | Query param `?token={TOKEN}` (browsers cannot set WS headers). The `token:` header also works for non-browser clients. | | Admin endpoints (not needed for calls) | HTTP header `Authorization: {ADMIN_TOKEN}` | The token identifies the **instance** (one WhatsApp number). All `/call/*` calls operate on that instance's engine. --- ## 2. Response envelope Every JSON response is wrapped: ```jsonc // success { "code": 200, "data": { /* payload */ }, "success": true } // error { "code": 409, "error": "calls engine not enabled for this instance ...", "success": false } ``` Always read the payload from `.data` and check `.success` / HTTP status. Examples below show the `data` payload. --- ## 3. Two call APIs โ€” use the native engine one for audio ZuckZapGo has **two** families under `/call/*`. Do not mix them up: | Family | Endpoints | Purpose | Audio? | |---|---|---|---| | **Native VoIP engine** โœ… | `/call/config`, `/call/status`, `/call/dial`, `/call/answer`, `/call/hangup`, `/call/play`, `/call/record/start`, `/call/record/stop`, `/call/{id}/stream` | Real media: dial, answer, two-way PCM audio, play files, record, AI agents | **Yes** โ€” this is the one you want | | Legacy signaling (Spec 004) | `/call/reject`, `/call/accept`, `/call/preaccept`, `/call/terminate`, `/call/initiate`, `/call/reject/send` | Raw WhatsApp call-signaling control only | **No.** `/call/initiate` returns **501** (needs a WebRTC stack not in this build). Ignore these for audio. | **To make calls with audio, only use the Native VoIP engine endpoints.** Outbound = `POST /call/dial` (never `/call/initiate`). --- ## 4. Endpoint reference (Native VoIP engine) All paths are relative to `{BASE_URL}`. All REST calls send `token: {TOKEN}`. ### 4.1 `GET /call/config` โ€” read engine configuration Response `data`: ```jsonc { "callsEnabled": true, "callInboundMode": "manual", // manual | bot | ivr | ai | webhook | reject "callRecord": false, "callSttUrl": "", // AI modes only "callLlmUrl": "", "callTtsUrl": "", "callSystemPrompt": "", "callGreeting": "" } ``` > `callProviderToken` is write-only and never returned. ### 4.2 `PUT /call/config` โ€” update engine configuration (applies on next reconnect) Body: ```jsonc { "callsEnabled": true, "callInboundMode": "manual", // empty defaults to "webhook"; invalid value โ†’ 400 "callRecord": false, "callSttUrl": null, // AI modes (ai/ivr/bot); see ยง9 "callLlmUrl": null, "callTtsUrl": null, "callProviderToken": null, // bearer token for the STT/LLM/TTS provider(s) "callSystemPrompt": null, "callGreeting": null } ``` Response `data`: `{ "ok": true }`. **The new config takes effect on the instance's next reconnect** โ€” see ยง5. ### 4.3 `GET /call/status` โ€” list live calls Response `data`: ```jsonc { "calls": [ { "callId": "A1B2C3...", // engine's own id โ€” use this everywhere "peer": "5521999999999@s.whatsapp.net", "state": "active", // idle|calling|ringing|connecting|active|ended|unknown "direction": "inbound", // inbound | outbound "video": false, "recording": false, "startedAt": "2026-06-28T12:00:00Z" } ] } ``` Poll this (~2.5 s) to detect **inbound ringing calls** and remote hangups โ€” the media socket alone does not signal a remote hangup. **Detect "call ended" by the call's _disappearance_ from the `calls[]` array, not by a `state:"ended"` value**: when a call ends the engine stops tracking it, so it vanishes from the list (you will rarely, if ever, observe `state:"ended"` here). Once a call you were in is gone from `calls[]`, tear your UI/socket down. ### 4.4 `POST /call/dial` โ€” place an outbound call Body: `{ "phone": "5521999999999", "video": false }` - `phone` = **E.164 digits only, no `+`** (country code + number). - `video` is **accepted but currently ignored** โ€” outbound is voice-only in this build. Do not build a video-dial UI against it (it is a no-op, not an error). - Response `data`: `{ "callId": "A1B2C3..." }`. - **409** if the engine is not enabled (enable + reconnect first, ยง5). - **502** if the peer is unreachable. After dialing, open the media WebSocket (ยง6) immediately; audio starts when the callee answers. ### 4.5 `POST /call/answer` โ€” answer a ringing inbound call (manual mode) Body: `{ "callId": "A1B2C3..." }` โ†’ `data: { "ok": true }`. **404** if the call is not found. Then open the media WebSocket (ยง6). ### 4.6 `POST /call/hangup` โ€” end a live call Body: `{ "callId": "A1B2C3..." }` โ†’ `data: { "ok": true }`. **404** if not found. Also close the WebSocket. Use this to **decline** an inbound call too. ### 4.7 `POST /call/play` โ€” play an audio file/clip into the call Body: `{ "callId": "...", "audioUrl": "https://โ€ฆ/clip.mp3" }` or `{ "callId": "...", "audioBase64": "" }`. - Format auto-detected (WAV / MP3 / Ogg-Opus) and resampled to 16 kHz mono. - Plays to the peer. โ†’ `data: { "ok": true }`. Works alongside or instead of the mic stream. ### 4.8 `POST /call/record/start` โ€” start server-side recording Body: `{ "callId": "..." }` โ†’ `data: { "ok": true }`. Records the **peer's** audio to a WAV. Mutually exclusive with the media WebSocket on the same call. **`start` succeeds regardless of storage config** โ€” a missing/misconfigured S3 does not fail here; it surfaces later as an empty `mediaKey` on `stop` (see ยง4.9). ### 4.9 `POST /call/record/stop` โ€” stop + upload recording Body: `{ "callId": "..." }` โ†’ `data: { "ok": true, "mediaKey": "" }`. The WAV is uploaded to object storage (S3) and `mediaKey` is its key. **Treat an empty `mediaKey` (`""`) as failure** โ€” `ok` is still `true` even when the upload had no storage configured or errored, so check `mediaKey` is non-empty, not just `ok`. ### 4.10 `GET /call/{call_id}/stream` โ€” bidirectional PCM media WebSocket The audio plane. Full protocol in ยง6. Auth via `?token={TOKEN}`. --- ## 5. Enabling the engine (do this once per instance) The engine attaches when the instance connects. To turn it on for app-driven audio: ```bash # 1) Enable native calls in MANUAL mode (required for two-way app/browser audio) curl -X PUT "{BASE_URL}/call/config" \ -H "token: {TOKEN}" -H "Content-Type: application/json" \ -d '{"callsEnabled":true,"callInboundMode":"manual","callRecord":false}' # 2) Reconnect so the engine attaches โ€” subscribe to "Call" so inbound offers ring. # (Read current events first to preserve existing subscriptions.) curl -X POST "{BASE_URL}/session/disconnect" -H "token: {TOKEN}" -d '{}' sleep 1 curl -X POST "{BASE_URL}/session/connect" \ -H "token: {TOKEN}" -H "Content-Type: application/json" \ -d '{"Subscribe":["Message","Call"],"Immediate":true}' ``` After reconnect, `GET /call/status` works and `POST /call/dial` no longer returns 409. **Why reconnect**: `callsEnabled` / `call_inbound_mode` are read when the engine attaches at connect time. Without a reconnect a running session keeps the old (disabled) engine. Optional (push instead of polling): point the instance webhook at your server and subscribe to the `Call` event. Inbound calls then emit `v1.call.*` webhook events โ€” the inbound **ring** is **`v1.call.offer`** (others: `v1.call.offer_notice`, `v1.call.accept`, `v1.call.preaccept`, `v1.call.transport`, `v1.call.terminate`, `v1.call.reject`, `v1.call.relay_latency`, `v1.call.unknown`). These come from the **standard `Call` event subscription** and are independent of the legacy `/call/initiate` signaling endpoints (ยง3) โ€” you do **not** need that legacy path to receive them. Match the webhook's call id to `/call/status`, then `POST /call/answer`. Full payload shapes: see `docs/webhook-events.md`. If you don't want webhooks, polling `/call/status` (ยง4.3) is a complete alternative. --- ## 6. The media WebSocket protocol (exact) ``` GET {BASE_URL_AS_WS}/call/{call_id}/stream?token={TOKEN} ``` - `{BASE_URL_AS_WS}` = your base URL with `http`โ†’`ws` / `https`โ†’`wss`. - `{call_id}` = the `callId` returned by `/call/dial` or seen in `/call/status` (URL-encode it). - The connection is a standard WebSocket. Use **binary** frames only; text frames are ignored. **Audio format on the wire (both directions): raw PCM, no header, no container.** | Property | Value | |---|---| | Sample format | signed 16-bit **little-endian** (`s16le` / `int16`) | | Sample rate | **16 000 Hz** | | Channels | **1 (mono)** | | Frame size | **960 samples = 1920 bytes = 60 ms** | - **Server โ†’ client** (binary): decoded **peer** audio. Each message is s16le PCM. Treat the payload as a stream of `int16` little-endian samples; lengths are multiples of 2 bytes. Play them out (use a small jitter buffer โ€” see ยง7). - **Client โ†’ server** (binary): your **microphone** audio, s16le 16 kHz mono. **Send exactly 960-sample (1920-byte) frames.** The server splits incoming bytes into 960-sample frames and **zero-pads** any trailing partial frame to 960 samples (injecting a little silence into the call) โ€” so always frame to exactly 1920 bytes before sending to avoid that. **PCM sample conversions** (the only math you need): ```js // int16 little-endian -> float32 in [-1, 1] function s16leToFloat(arrayBuffer) { const v = new DataView(arrayBuffer), n = (arrayBuffer.byteLength >> 1); const out = new Float32Array(n); for (let i = 0; i < n; i++) out[i] = v.getInt16(i * 2, true) / 0x8000; return out; } // float32 in [-1, 1] -> int16 little-endian function floatToS16LE(frame) { const buf = new ArrayBuffer(frame.length * 2), v = new DataView(buf); for (let i = 0; i < frame.length; i++) { const s = Math.max(-1, Math.min(1, frame[i] || 0)); v.setInt16(i * 2, s < 0 ? s * 0x8000 : s * 0x7fff, true); } return buf; } ``` The socket closes when the call ends or you call `/call/hangup`. A remote hangup does **not** always close the socket promptly โ€” also watch `/call/status` (poll) to tear your UI down. --- ## 7. Drop-in browser client (Web Audio / AudioWorklet) This is the validated browser path: an `AudioContext` at 16 kHz, a **player** AudioWorklet (jitter buffer, silence on underrun) for inbound audio, and a **recorder** AudioWorklet that posts mic frames you batch into 1920-byte frames and send. A linear resampler is used only if the browser refuses a 16 kHz context. > A complete, production widget already ships in this repo at > `/dashboard` โ†’ `static/dashboard/js/calls.js`. Read it as the reference. The essentials: ```js const SAMPLE_RATE = 16000, FRAME_SAMPLES = 960; // 60 ms @ 16 kHz // --- inlined AudioWorklet processors (loaded as Blob modules) --- const PLAYER_WORKLET = ` class PcmPlayer extends AudioWorkletProcessor { constructor(){ super(); this.q=[]; this.r=null; this.p=0; this.b=0; this.play=false; this.pre=Math.floor(sampleRate*0.18); this.max=sampleRate*1; // ~180ms prebuffer, 1s cap this.port.onmessage=(e)=>{ const d=e.data; if(d==='flush'){ this.q=[]; this.r=null; this.p=0; this.b=0; this.play=false; return; } if(this.b>this.max){ this.q=[]; this.r=null; this.p=0; this.b=0; } this.q.push(d); this.b+=d.length; }; } pull(){ if(!this.r||this.p>=this.r.length){ this.r=this.q.shift()||null; this.p=0; if(!this.r) return 0; } this.b--; return this.r[this.p++]; } process(_i,o){ const out=o[0]&&o[0][0]; if(!out) return true; if(!this.play){ if(this.b>=this.pre) this.play=true; else { out.fill(0); return true; } } for(let i=0;i{ if(e.data&&typeof e.data.muted==='boolean') this.muted=e.data.muted; }; } process(inp){ const input=inp[0]&&inp[0][0]; if(input&&input.length){ const c=new Float32Array(input.length); if(!this.muted) c.set(input); this.port.postMessage(c,[c.buffer]); } return true; } } registerProcessor('pcm-recorder', PcmRec);`; const blobModuleURL = (code) => URL.createObjectURL(new Blob([code], { type: "application/javascript" })); async function startCallAudio(baseUrl, callId, token) { const wsUrl = baseUrl.replace(/^http/i, "ws") + "/call/" + encodeURIComponent(callId) + "/stream?token=" + encodeURIComponent(token); const ctx = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: SAMPLE_RATE }); await ctx.resume(); await ctx.audioWorklet.addModule(blobModuleURL(PLAYER_WORKLET)); await ctx.audioWorklet.addModule(blobModuleURL(RECORDER_WORKLET)); const player = new AudioWorkletNode(ctx, "pcm-player", { numberOfInputs:0, numberOfOutputs:1, outputChannelCount:[1] }); player.connect(ctx.destination); const localStream = await navigator.mediaDevices.getUserMedia({ audio: { channelCount:1, echoCancellation:true, noiseSuppression:true } }); const recorder = new AudioWorkletNode(ctx, "pcm-recorder", { numberOfInputs:1, numberOfOutputs:0 }); ctx.createMediaStreamSource(localStream).connect(recorder); const ws = new WebSocket(wsUrl); ws.binaryType = "arraybuffer"; let acc = new Float32Array(0); // mic -> batch to 960-sample frames -> send s16le recorder.port.onmessage = (e) => { if (ws.readyState !== WebSocket.OPEN) return; const merged = new Float32Array(acc.length + e.data.length); merged.set(acc); merged.set(e.data, acc.length); let off = 0; while (merged.length - off >= FRAME_SAMPLES) { ws.send(floatToS16LE(merged.subarray(off, off + FRAME_SAMPLES))); off += FRAME_SAMPLES; } acc = merged.slice(off); }; // peer audio -> player jitter buffer ws.onmessage = (ev) => { if (typeof ev.data === "string") return; const pcm = s16leToFloat(ev.data); player.port.postMessage(pcm, [pcm.buffer]); }; return { setMuted: (m) => recorder.port.postMessage({ muted: m }), close: () => { try { ws.close(); } catch(e){} localStream.getTracks().forEach(t=>t.stop()); ctx.close().catch(()=>{}); }, }; } ``` **Outbound from the browser** ```js const r = await fetch(`${BASE_URL}/call/dial`, { method:"POST", headers:{ "Content-Type":"application/json", token: TOKEN }, body: JSON.stringify({ phone: "5521999999999" }) }); const callId = (await r.json()).data.callId; const session = await startCallAudio(BASE_URL, callId, TOKEN); // callee answers -> audio // ... session.setMuted(true) to mute ... // hang up: await fetch(`${BASE_URL}/call/hangup`, { method:"POST", headers:{ "Content-Type":"application/json", token: TOKEN }, body: JSON.stringify({ callId }) }); session.close(); ``` **Inbound in the browser**: poll `GET /call/status`; when a call appears with `direction:"inbound"` and `state` in `ringing|connecting|calling`, show a ring UI; on accept `POST /call/answer {callId}` then `startCallAudio(BASE_URL, callId, TOKEN)`. Note: device labels are blank until the page holds mic permission โ€” call `getUserMedia` once to populate `enumerateDevices()` labels if you build a device picker. --- ## 8. Server-side client (Node / Python / Go) โ€” softphones, bots, AI You don't need a browser. Any WebSocket client can bridge the PCM both ways โ€” feed it from a TTS engine, an AI voice agent, a SIP gateway, a file, etc. Read **1920-byte** binary frames in; send **1920-byte** binary frames out. **Node.js (`ws`)** ```js import WebSocket from "ws"; const ws = new WebSocket( `${BASE_URL.replace(/^http/, "ws")}/call/${encodeURIComponent(callId)}/stream?token=${TOKEN}` ); ws.binaryType = "nodebuffer"; ws.on("message", (buf) => { // buf = s16le 16kHz mono peer audio. Int16Array view: const pcm = new Int16Array(buf.buffer, buf.byteOffset, buf.byteLength >> 1); // -> write to speaker / STT / file ... }); // send your audio: chunk your s16le 16kHz mono stream into 1920-byte Buffers function sendFrame(int16Frame /* length 960 */) { ws.send(Buffer.from(int16Frame.buffer)); // 1920 bytes } ``` **Python (`websockets`)** ```python import asyncio, websockets, numpy as np async def run(base_ws, call_id, token): url = f"{base_ws}/call/{call_id}/stream?token={token}" # base_ws uses ws:// or wss:// async with websockets.connect(url, max_size=None) as ws: async for msg in ws: # bytes = s16le 16kHz mono peer audio pcm = np.frombuffer(msg, dtype="" }`. **LLM โ€” `POST {callLlmUrl}`** (chat turn โ†’ reply) - Request: `Content-Type: application/json`, body: ```jsonc { "system": "", "messages": [ { "role": "user", "content": "..." }, { "role": "assistant", "content": "..." } ] } ``` (full conversation history; roles are `user` / `assistant`). - Response: JSON `{ "reply": "" }`. **TTS โ€” `POST {callTtsUrl}`** (text โ†’ speech) - Request: `Content-Type: application/json`, `Accept: audio/wav`, body `{ "text": "" }`. - Response: **either** a WAV file (`RIFF`/`WAVE` โ€” any sample rate, resampled for you) **or** raw **s16le 16 kHz mono PCM** (anything that isn't a RIFF header is treated as raw PCM). A non-2xx from any provider aborts that step. If you'd rather own the whole loop, use `manual` mode + the ยง8 WebSocket client instead of these endpoints. --- ## 10. End-to-end recipes **A) Outbound call with two-way audio** 1. (once) `PUT /call/config {callsEnabled:true, callInboundMode:"manual"}` โ†’ reconnect (ยง5). 2. `POST /call/dial {phone}` โ†’ `callId`. 3. Open `wss://โ€ฆ/call/{callId}/stream?token=โ€ฆ`; start the AudioWorklet/player+recorder. 4. Callee answers โ†’ audio both ways. `setMuted(true)` to mute. 5. `POST /call/hangup {callId}` + close the socket. **B) Inbound call with two-way audio** 1. (once) manual mode + reconnect, subscribe `Call`. 2. Detect ring: the **`v1.call.offer`** webhook (Call subscription) **or** poll `GET /call/status` for `direction:"inbound"`, `state in ringing|connecting|calling`. 3. `POST /call/answer {callId}` โ†’ open the socket โ†’ audio. 4. Decline instead = `POST /call/hangup {callId}`. **C) Play a clip / IVR-style prompt without a mic** 1. Dial or answer as above (manual mode). 2. `POST /call/play {callId, audioUrl|audioBase64}` (WAV/MP3/Ogg-Opus). No WebSocket required if you only need outbound audio. **D) Record a call** 1. Ensure S3 is configured for the instance. 2. `POST /call/record/start {callId}` โ€ฆ `POST /call/record/stop {callId}` โ†’ `mediaKey`. (Do **not** also open the PCM WebSocket on the same call โ€” mutually exclusive.) **E) Server-side AI agent (ZuckZapGo-managed)**: ยง9 `ai` mode โ€” no socket, no app audio code. **F) Server-side AI agent (your stack)**: manual mode + the ยง8 server WebSocket client. --- ## 11. Pitfalls & troubleshooting (read this) - **`409 calls engine not enabled`** โ†’ you enabled config but didn't reconnect. Do ยง5. - **Dial works but no audio** โ†’ (a) the callee hasn't answered yet (audio starts on answer); (b) you're not in `manual` mode; (c) browser `AudioContext` is suspended โ€” call `ctx.resume()` after a user gesture; (d) the player worklet isn't loaded / WS isn't `binaryType:"arraybuffer"`. - **Choppy / robotic audio** โ†’ you're not framing outbound to exactly **1920 bytes**, or your player has no jitter buffer. Batch to 960 samples; prebuffer ~180 ms. - **Inbound rings but `/answer` 404s** โ†’ use the `callId` exactly as returned by `/call/status` (it is the engine's own id; do not transform case). - **Remote hangup leaves UI stuck** โ†’ also poll `/call/status`; the socket may not close promptly. The ended call **disappears** from `calls[]` (you won't see `state:"ended"`). - **Recording produced no file / empty `mediaKey`** โ†’ `/call/record/start` always returns `ok`, but the WAV upload needs S3 configured; a missing/failed upload yields `mediaKey:""` on `/call/record/stop` (with `ok:true`). Treat empty `mediaKey` as failure. Also, the call must not have an open PCM WebSocket (recording and `/stream` share the single inbound sink). - **WS connects then closes immediately** โ†’ bad/missing `?token=`, or the `call_id` isn't a live call (check `/call/status`). - **Server-side debugging**: set env `VOIP_LOG_LEVEL=debug` on ZuckZapGo and watch logs tagged `subsystem:calls` โ€” `offer sent`, `relay silent after allocate`, `starting media`, `first RTP decoded from relay`, `selected audio codec`, `failed to unprotect`. `offer sent` then silence = callee never answered (not a bug). --- ## 12. Quick reference ``` AUTH REST: header token: {TOKEN} WS: query ?token={TOKEN} ENVELOPE { code, data, success } | errors: { code, error, success:false } REST (Native VoIP engine) GET /call/config -> engine config PUT /call/config {callsEnabled,callInboundMode,callRecord,call*Url,callProviderToken,callSystemPrompt,callGreeting} (reconnect to apply) GET /call/status -> { calls:[{callId,peer,state,direction,video,recording,startedAt}] } POST /call/dial {phone (E.164 no +), video(ignored, voice-only)} -> {callId} (409 if disabled, 502 if unreachable) POST /call/answer {callId} -> {ok} (404 if not found) POST /call/hangup {callId} -> {ok} (also: decline) POST /call/play {callId, audioUrl|audioBase64} -> {ok} POST /call/record/start {callId} -> {ok} (excl. with /stream; succeeds even w/o S3) POST /call/record/stop {callId} -> {ok, mediaKey} (empty mediaKey = upload failed/no S3) MEDIA (WebSocket) GET /call/{call_id}/stream?token=... bidirectional binary PCM format: s16le, 16000 Hz, mono frame: 960 samples = 1920 bytes = 60 ms server->client: peer audio client->server: your mic (send exact 1920-byte frames) INBOUND v1.call.offer webhook (Call subscription) OR poll /call/status MODES manual(app audio) | webhook | bot | ivr | ai | reject STATES idle | calling | ringing | connecting | active | (ended = call drops from /call/status) DIRECTION inbound | outbound ``` --- ## 13. Copy-paste task prompt for your LLM > You are integrating WhatsApp **voice calls** into using the ZuckZapGo > Calls API documented above. Build voice agent>. Requirements: > 1. Enable the engine once (`PUT /call/config` `manual` mode) and reconnect. > 2. Outbound via `POST /call/dial`; inbound via `/call/status` polling or the `v1.call.offer` > webhook + `POST /call/answer`. > 3. Carry audio over `GET /call/{call_id}/stream?token=โ€ฆ` as **s16le 16 kHz mono PCM**, sending > the mic in **exact 1920-byte (960-sample, 60 ms) frames** and playing inbound frames with a > ~180 ms jitter buffer. > 4. Mute = send **zero-filled (silence)** frames, keeping the socket open (the reference does > this โ€” it does not stop the send loop); hang up = `POST /call/hangup` + close the socket; > detect remote hangups by the call **disappearing** from `/call/status`. > 5. Use the response envelope `{code,data,success}` and the `token` header (`?token=` for the WS). > Produce complete, runnable code for , with reconnect/cleanup handling. --- _Powered by meowcaller (https://github.com/purpshell/meowcaller). For the in-repo backend details and the production-validated rationale, see `docs/call-audio-implementation-guide.md`._