Skip to content

Voice Agent Guide

When you create a voice agent in Bizway, the platform handles the entire real-time conversation pipeline — from incoming audio to AI processing to spoken response. Here’s what happens under the hood:

1. Audio Input

  • Incoming phone calls or web-based voice sessions connect via WebSocket
  • Server-side Voice Activity Detection (VAD) identifies when the caller is speaking
  • Audio is streamed in real-time to the selected AI provider

2. AI Processing

  • The AI model (Gemini, OpenAI, or Grok) processes the audio input
  • Your system prompt defines the agent’s personality, knowledge, and behavior
  • The model generates a response based on conversation context

3. Voice Output

  • The response is converted to natural speech using the selected voice
  • Audio is streamed back to the caller in real-time
  • The conversation continues with full context threading

4. Tool Execution (Optional)

  • During a conversation, the agent can invoke webhook tools
  • Dynamic variables ({{caller_phone}}, {{call_sid}}, {{agent_id}}) are substituted at runtime
  • External API responses are incorporated into the conversation
FeatureGeminiOpenAIGrok
Voice QualityHighPremiumHigh
LatencyLowLowLow
CostLowHigherMedium
Web SearchYesNoYes
Best ForGeneral purposePremium qualityReal-time info

The system prompt is the most important configuration. It defines your agent’s personality, knowledge base, conversation rules, and escalation behavior. Write it as if you’re briefing a human agent on their first day.

Set the maximum call duration (1–120 minutes) to prevent runaway sessions. The agent will gracefully end the call when the timeout is reached.

All agents have access to a built-in hangup_call tool. When the conversation is naturally complete, the agent can end the call with a farewell message — no caller action required.

The Command Center provides real-time visibility into:

  • All active voice sessions (phone and web)
  • Session duration and status
  • Manual termination controls
  • Historical session statistics and termination patterns