Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Bijit-Mondal/VoiceAgent/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers common issues you might encounter when using the Voice Agent AI SDK and how to resolve them.

WebSocket Connection Issues

Symptoms:
  • Error: connect ECONNREFUSED
  • WebSocket never emits connected event
Solutions:
  1. Verify your WebSocket server is running:
pnpm ws:server
  1. Check the WebSocket URL is correct:
// Make sure the endpoint matches your server
await agent.connect("ws://localhost:8080");
  1. Ensure no firewall is blocking the port
  2. Check server logs for startup errors
Symptoms:
  • Connection establishes but disconnected event fires immediately
  • socket.readyState shows closed state
Solutions:
  1. Check server-side error handling:
wss.on("connection", (socket) => {
  const agent = new VoiceAgent({ model, ... });
  agent.handleSocket(socket);
  
  // Listen for errors
  agent.on("error", (error) => {
    console.error("Agent error:", error);
  });
});
  1. Verify the WebSocket server accepts connections
  2. Check for authentication/authorization issues if implemented
Symptoms:
  • WebSocket connected but messages don’t trigger events
  • Silent failures
Solutions:
  1. Verify message format is valid JSON:
// Client must send properly formatted messages
socket.send(JSON.stringify({
  type: "transcript",
  text: "Hello, agent!"
}));
  1. Check for parsing errors in server logs
  2. Ensure event listeners are attached before connecting:
agent.on("text", ({ role, text }) => {
  console.log(`${role}: ${text}`);
});

await agent.connect();
Symptoms:
  • Cannot send message, socket state: 0 (CONNECTING)
  • Cannot send message, socket state: 2 (CLOSING)
  • Cannot send message, socket state: 3 (CLOSED)
Solutions:The SDK handles these gracefully (v0.1.0+), but if you’re seeing warnings:
  1. Wait for the connected event before sending:
agent.on("connected", () => {
  agent.sendText("Hello!");
});
  1. Check agent.connected before operations:
if (agent.connected) {
  await agent.sendText("Hello!");
}

Audio & Transcription Issues

Symptoms:
  • transcription_error: Whisper returned empty text
  • Warning: Transcription returned empty text
Solutions:
  1. Verify audio format is supported:
// Whisper supports: mp3, mp4, mpeg, mpga, m4a, wav, webm
const agent = new VoiceAgent({
  transcriptionModel: openai.transcription("whisper-1"),
});
  1. Check audio quality:
  • Audio should contain clear speech
  • Minimum duration ~0.5 seconds
  • Adequate volume level
  1. Verify base64 encoding is correct:
const base64Audio = Buffer.from(audioBuffer).toString("base64");
await agent.sendAudio(base64Audio);
  1. Test with a known-good audio file
Symptoms:
  • Audio input too large (X MB). Maximum allowed: Y MB
Solutions:
  1. Increase the limit if needed:
const agent = new VoiceAgent({
  maxAudioInputSize: 15 * 1024 * 1024, // 15 MB
});
  1. Or compress audio before sending:
  • Use lower bitrate encoding
  • Reduce sample rate (e.g., 16kHz for speech)
  • Use more efficient codec (e.g., opus)
  1. Split long audio into chunks if possible
Symptoms:
  • Audio chunks arrive out of order
  • Gaps between chunks
  • High latency
Solutions:
  1. Adjust streaming speech configuration:
const agent = new VoiceAgent({
  streamingSpeech: {
    minChunkSize: 40,        // Smaller = faster start
    maxChunkSize: 180,       // Larger = fewer requests
    parallelGeneration: true,
    maxParallelRequests: 3,  // Increase for faster generation
  },
});
  1. Ensure client plays chunks in order:
// Track chunk order
let expectedChunkId = 0;
const chunkBuffer = new Map();

socket.on("message", (data) => {
  const msg = JSON.parse(data);
  if (msg.type === "audio_chunk") {
    chunkBuffer.set(msg.chunkId, msg.data);
    
    // Play chunks in order
    while (chunkBuffer.has(expectedChunkId)) {
      playAudio(chunkBuffer.get(expectedChunkId));
      chunkBuffer.delete(expectedChunkId);
      expectedChunkId++;
    }
  }
});
  1. Check network latency and bandwidth
Symptoms:
  • Error: Transcription model not configured
  • Audio input fails silently
Solution:Add a transcription model to your configuration:
import { openai } from "@ai-sdk/openai";

const agent = new VoiceAgent({
  model: openai("gpt-4o"),
  transcriptionModel: openai.transcription("whisper-1"),
  // ... other options
});

TTS Generation Issues

Symptoms:
  • Text responses work but no audio
  • speech_start event never fires
Solutions:
  1. Verify speech model is configured:
const agent = new VoiceAgent({
  model: openai("gpt-4o"),
  speechModel: openai.speech("tts-1"), // or "gpt-4o-mini-tts"
  voice: "alloy",
  outputFormat: "mp3",
});
  1. Check that you’re listening for the right events:
agent.on("speech_start", ({ streaming }) => {
  console.log("Speech started, streaming:", streaming);
});

agent.on("audio_chunk", ({ chunkId, data }) => {
  console.log("Received chunk", chunkId);
});
Symptoms:
  • Long delay before first audio chunk
  • Slow overall response time
Solutions:
  1. Enable parallel generation:
streamingSpeech: {
  parallelGeneration: true,
  maxParallelRequests: 3, // Generate 3 chunks at once
}
  1. Reduce chunk size for faster time-to-first-audio:
streamingSpeech: {
  minChunkSize: 30,  // Lower = faster start
}
  1. Use faster TTS model:
speechModel: openai.speech("tts-1"), // Faster than tts-1-hd
Symptoms:
  • speech_interrupted event fires without user action
  • Audio stops mid-sentence
Possible causes:
  1. Barge-in triggered by new input:
// This is expected behavior when user speaks
agent.on("speech_interrupted", ({ reason }) => {
  console.log("Interrupted:", reason); // "user_speaking"
});
  1. WebSocket disconnection:
  • Check for disconnected event
  • Implement reconnection logic
  1. Error in speech generation:
  • Listen for error event
  • Check API quota/rate limits

Memory & Performance

Symptoms:
  • Increasing memory footprint in long sessions
  • Slow response times
Solutions:
  1. Configure conversation history limits:
const agent = new VoiceAgent({
  history: {
    maxMessages: 50,          // Keep last 50 messages
    maxTotalChars: 100_000,   // Or limit by character count
  },
});
  1. Monitor history_trimmed events:
agent.on("history_trimmed", ({ removedCount, reason }) => {
  console.log(`Trimmed ${removedCount} messages: ${reason}`);
});
  1. Clear history periodically if needed:
agent.clearHistory();
  1. Destroy agent instances when done:
agent.on("disconnected", () => {
  agent.destroy();
});
Symptoms:
  • CPU spikes during operation
  • Server becomes unresponsive
Solutions:
  1. v0.1.0+: Speech queue uses promises instead of polling (fixed)
  2. Limit concurrent parallel TTS requests:
streamingSpeech: {
  maxParallelRequests: 2, // Lower = less CPU
}
  1. Monitor active agent instances (one per user):
const agents = new Map();

wss.on("connection", (socket) => {
  const sessionId = generateSessionId();
  const agent = new VoiceAgent({ ... });
  agents.set(sessionId, agent);

  agent.on("disconnected", () => {
    agent.destroy();
    agents.delete(sessionId);
  });
});
Symptoms:
  • Interleaved messages
  • Duplicate responses
  • History contains unexpected messages
Solution:This was fixed in v0.1.0 with the serial input queue. Upgrade if on v0.0.1:
pnpm add voice-agent-ai-sdk@latest
The queue ensures:
  • sendText() calls are processed one at a time
  • WebSocket transcript messages are serialized
  • No concurrent modifications to conversationHistory

Error Handling Patterns

Best practices:
const agent = new VoiceAgent({ ... });

// Listen for all error types
agent.on("error", (error) => {
  console.error("Agent error:", error);
  
  // Notify user
  socket.send(JSON.stringify({
    type: "error",
    message: "Something went wrong. Please try again."
  }));
});

// Listen for warnings (non-fatal)
agent.on("warning", (message) => {
  console.warn("Agent warning:", message);
});

// Wrap async operations
try {
  await agent.sendText(userInput);
} catch (error) {
  console.error("Failed to process input:", error);
  // Handle error (retry, notify user, etc.)
}

// Clean up on disconnect
agent.on("disconnected", () => {
  console.log("Client disconnected");
  agent.destroy();
});
OpenAI API errors:
agent.on("error", async (error) => {
  if (error.message.includes("rate limit")) {
    // Implement exponential backoff
    await sleep(5000);
    // Retry or notify user to wait
  } else if (error.message.includes("quota")) {
    // Notify about quota exhaustion
    console.error("API quota exceeded");
  } else if (error.message.includes("timeout")) {
    // Retry with increased timeout
  }
});
Network errors:
const MAX_RETRIES = 3;
let retryCount = 0;

agent.on("disconnected", async () => {
  if (retryCount < MAX_RETRIES) {
    retryCount++;
    console.log(`Reconnecting (${retryCount}/${MAX_RETRIES})...`);
    try {
      await agent.connect();
      retryCount = 0; // Reset on success
    } catch (error) {
      console.error("Reconnection failed:", error);
    }
  } else {
    console.error("Max reconnection attempts reached");
    agent.destroy();
  }
});
Symptoms:
  • Error: VoiceAgent has been destroyed and cannot be used
Solution:Always check destroyed state before operations:
if (!agent.destroyed) {
  await agent.sendText("Hello");
}

// Or handle the error
try {
  await agent.sendText("Hello");
} catch (error) {
  if (error.message.includes("destroyed")) {
    // Agent was destroyed, create new instance
    agent = new VoiceAgent({ ... });
  }
}

Environment & Configuration

Symptoms:
  • OPENAI_API_KEY undefined
  • Connection to wrong endpoint
Solutions:
  1. Ensure .env file exists in project root:
OPENAI_API_KEY=sk-...
VOICE_WS_ENDPOINT=ws://localhost:8080
  1. Load dotenv at the top of your entry file:
import "dotenv/config"; // Must be first import
import { VoiceAgent } from "voice-agent-ai-sdk";
  1. Verify .env is not gitignored when needed
Common issues:
  1. Missing types:
pnpm add -D @types/node @types/ws
  1. AI SDK version mismatch:
// package.json
{
  "peerDependencies": {
    "ai": "^6.0.0"
  }
}
Install matching version:
pnpm add ai@^6.0.0
  1. Module resolution:
// tsconfig.json
{
  "compilerOptions": {
    "moduleResolution": "node",
    "esModuleInterop": true
  }
}

Getting Help

If you’re still experiencing issues:
  1. Check the changelog for recent fixes and breaking changes
  2. Review example code in the repository:
    • example/demo.ts — text-only usage
    • example/ws-server.ts — WebSocket server
    • example/voice-client.html — browser client
  3. Enable debug logging to see what’s happening:
    agent.on("chunk:text_delta", ({ text }) => console.log("[LLM]", text));
    agent.on("speech_chunk_queued", ({ id, text }) => console.log("[TTS Queue]", id, text));
    agent.on("audio_chunk", ({ chunkId }) => console.log("[Audio]", chunkId));
    
  4. Report issues on GitHub with:
    • Voice Agent AI SDK version
    • Node.js version
    • Minimal reproduction code
    • Error messages and logs