Troubleshooting - Voice Agent AI SDK

This guide covers common issues you might encounter when using the Voice Agent AI SDK and how to resolve them.

WebSocket Connection Issues

Connection refused or fails to establish

Symptoms:

Error: connect ECONNREFUSED
WebSocket never emits connected event

Solutions:

Verify your WebSocket server is running:

pnpm ws:server

Check the WebSocket URL is correct:

// Make sure the endpoint matches your server
await agent.connect("ws://localhost:8080");

Ensure no firewall is blocking the port
Check server logs for startup errors

Socket closes immediately after connecting

Symptoms:

Connection establishes but disconnected event fires immediately
socket.readyState shows closed state

Solutions:

Check server-side error handling:

wss.on("connection", (socket) => {
  const agent = new VoiceAgent({ model, ... });
  agent.handleSocket(socket);
  
  // Listen for errors
  agent.on("error", (error) => {
    console.error("Agent error:", error);
  });
});

Verify the WebSocket server accepts connections
Check for authentication/authorization issues if implemented

Messages not being received

Symptoms:

WebSocket connected but messages don’t trigger events
Silent failures

Solutions:

Verify message format is valid JSON:

// Client must send properly formatted messages
socket.send(JSON.stringify({
  type: "transcript",
  text: "Hello, agent!"
}));

Check for parsing errors in server logs
Ensure event listeners are attached before connecting:

agent.on("text", ({ role, text }) => {
  console.log(`${role}: ${text}`);
});

await agent.connect();

Socket state errors when sending

Symptoms:

Cannot send message, socket state: 0 (CONNECTING)
Cannot send message, socket state: 2 (CLOSING)
Cannot send message, socket state: 3 (CLOSED)

Solutions:The SDK handles these gracefully (v0.1.0+), but if you’re seeing warnings:

Wait for the connected event before sending:

agent.on("connected", () => {
  agent.sendText("Hello!");
});

Check agent.connected before operations:

if (agent.connected) {
  await agent.sendText("Hello!");
}

Audio & Transcription Issues

Transcription returns empty text

Symptoms:

transcription_error: Whisper returned empty text
Warning: Transcription returned empty text

Solutions:

Verify audio format is supported:

// Whisper supports: mp3, mp4, mpeg, mpga, m4a, wav, webm
const agent = new VoiceAgent({
  transcriptionModel: openai.transcription("whisper-1"),
});

Check audio quality:

Audio should contain clear speech
Minimum duration ~0.5 seconds
Adequate volume level

Verify base64 encoding is correct:

const base64Audio = Buffer.from(audioBuffer).toString("base64");
await agent.sendAudio(base64Audio);

Test with a known-good audio file

Audio input too large error

Symptoms:

Audio input too large (X MB). Maximum allowed: Y MB

Solutions:

Increase the limit if needed:

const agent = new VoiceAgent({
  maxAudioInputSize: 15 * 1024 * 1024, // 15 MB
});

Or compress audio before sending:

Use lower bitrate encoding
Reduce sample rate (e.g., 16kHz for speech)
Use more efficient codec (e.g., opus)

Split long audio into chunks if possible

Audio playback is choppy or delayed

Symptoms:

Audio chunks arrive out of order
Gaps between chunks
High latency

Solutions:

Adjust streaming speech configuration:

const agent = new VoiceAgent({
  streamingSpeech: {
    minChunkSize: 40,        // Smaller = faster start
    maxChunkSize: 180,       // Larger = fewer requests
    parallelGeneration: true,
    maxParallelRequests: 3,  // Increase for faster generation
  },
});

Ensure client plays chunks in order:

// Track chunk order
let expectedChunkId = 0;
const chunkBuffer = new Map();

socket.on("message", (data) => {
  const msg = JSON.parse(data);
  if (msg.type === "audio_chunk") {
    chunkBuffer.set(msg.chunkId, msg.data);
    
    // Play chunks in order
    while (chunkBuffer.has(expectedChunkId)) {
      playAudio(chunkBuffer.get(expectedChunkId));
      chunkBuffer.delete(expectedChunkId);
      expectedChunkId++;
    }
  }
});

Check network latency and bandwidth

Transcription model not configured

Symptoms:

Error: Transcription model not configured
Audio input fails silently

Solution:Add a transcription model to your configuration:

import { openai } from "@ai-sdk/openai";

const agent = new VoiceAgent({
  model: openai("gpt-4o"),
  transcriptionModel: openai.transcription("whisper-1"),
  // ... other options
});

TTS Generation Issues

No speech output generated

Symptoms:

Text responses work but no audio
speech_start event never fires

Solutions:

Verify speech model is configured:

const agent = new VoiceAgent({
  model: openai("gpt-4o"),
  speechModel: openai.speech("tts-1"), // or "gpt-4o-mini-tts"
  voice: "alloy",
  outputFormat: "mp3",
});

Check that you’re listening for the right events:

agent.on("speech_start", ({ streaming }) => {
  console.log("Speech started, streaming:", streaming);
});

agent.on("audio_chunk", ({ chunkId, data }) => {
  console.log("Received chunk", chunkId);
});

Speech generation is slow

Symptoms:

Long delay before first audio chunk
Slow overall response time

Solutions:

Enable parallel generation:

streamingSpeech: {
  parallelGeneration: true,
  maxParallelRequests: 3, // Generate 3 chunks at once
}

Reduce chunk size for faster time-to-first-audio:

streamingSpeech: {
  minChunkSize: 30,  // Lower = faster start
}

Use faster TTS model:

speechModel: openai.speech("tts-1"), // Faster than tts-1-hd

Speech interrupted unexpectedly

Symptoms:

speech_interrupted event fires without user action
Audio stops mid-sentence

Possible causes:

Barge-in triggered by new input:

// This is expected behavior when user speaks
agent.on("speech_interrupted", ({ reason }) => {
  console.log("Interrupted:", reason); // "user_speaking"
});

WebSocket disconnection:

Check for disconnected event
Implement reconnection logic

Error in speech generation:

Listen for error event
Check API quota/rate limits

Memory & Performance

Memory usage grows over time

Symptoms:

Increasing memory footprint in long sessions
Slow response times

Solutions:

Configure conversation history limits:

const agent = new VoiceAgent({
  history: {
    maxMessages: 50,          // Keep last 50 messages
    maxTotalChars: 100_000,   // Or limit by character count
  },
});

Monitor history_trimmed events:

agent.on("history_trimmed", ({ removedCount, reason }) => {
  console.log(`Trimmed ${removedCount} messages: ${reason}`);
});

Clear history periodically if needed:

agent.clearHistory();

Destroy agent instances when done:

agent.on("disconnected", () => {
  agent.destroy();
});

High CPU usage

Symptoms:

CPU spikes during operation
Server becomes unresponsive

Solutions:

v0.1.0+: Speech queue uses promises instead of polling (fixed)
Limit concurrent parallel TTS requests:

streamingSpeech: {
  maxParallelRequests: 2, // Lower = less CPU
}

Monitor active agent instances (one per user):

const agents = new Map();

wss.on("connection", (socket) => {
  const sessionId = generateSessionId();
  const agent = new VoiceAgent({ ... });
  agents.set(sessionId, agent);

  agent.on("disconnected", () => {
    agent.destroy();
    agents.delete(sessionId);
  });
});

Race conditions or corrupted history

Symptoms:

Interleaved messages
Duplicate responses
History contains unexpected messages

Solution:This was fixed in v0.1.0 with the serial input queue. Upgrade if on v0.0.1:

pnpm add voice-agent-ai-sdk@latest

The queue ensures:

sendText() calls are processed one at a time
WebSocket transcript messages are serialized
No concurrent modifications to conversationHistory

Error Handling Patterns

Handling errors gracefully

Best practices:

const agent = new VoiceAgent({ ... });

// Listen for all error types
agent.on("error", (error) => {
  console.error("Agent error:", error);
  
  // Notify user
  socket.send(JSON.stringify({
    type: "error",
    message: "Something went wrong. Please try again."
  }));
});

// Listen for warnings (non-fatal)
agent.on("warning", (message) => {
  console.warn("Agent warning:", message);
});

// Wrap async operations
try {
  await agent.sendText(userInput);
} catch (error) {
  console.error("Failed to process input:", error);
  // Handle error (retry, notify user, etc.)
}

// Clean up on disconnect
agent.on("disconnected", () => {
  console.log("Client disconnected");
  agent.destroy();
});

Recovering from API failures

OpenAI API errors:

agent.on("error", async (error) => {
  if (error.message.includes("rate limit")) {
    // Implement exponential backoff
    await sleep(5000);
    // Retry or notify user to wait
  } else if (error.message.includes("quota")) {
    // Notify about quota exhaustion
    console.error("API quota exceeded");
  } else if (error.message.includes("timeout")) {
    // Retry with increased timeout
  }
});

Network errors:

const MAX_RETRIES = 3;
let retryCount = 0;

agent.on("disconnected", async () => {
  if (retryCount < MAX_RETRIES) {
    retryCount++;
    console.log(`Reconnecting (${retryCount}/${MAX_RETRIES})...`);
    try {
      await agent.connect();
      retryCount = 0; // Reset on success
    } catch (error) {
      console.error("Reconnection failed:", error);
    }
  } else {
    console.error("Max reconnection attempts reached");
    agent.destroy();
  }
});

Preventing destroyed agent usage

Symptoms:

Error: VoiceAgent has been destroyed and cannot be used

Solution:Always check destroyed state before operations:

if (!agent.destroyed) {
  await agent.sendText("Hello");
}

// Or handle the error
try {
  await agent.sendText("Hello");
} catch (error) {
  if (error.message.includes("destroyed")) {
    // Agent was destroyed, create new instance
    agent = new VoiceAgent({ ... });
  }
}

Environment & Configuration

Environment variables not loading

Symptoms:

OPENAI_API_KEY undefined
Connection to wrong endpoint

Solutions:

Ensure .env file exists in project root:

OPENAI_API_KEY=sk-...
VOICE_WS_ENDPOINT=ws://localhost:8080

Load dotenv at the top of your entry file:

import "dotenv/config"; // Must be first import
import { VoiceAgent } from "voice-agent-ai-sdk";

Verify .env is not gitignored when needed

TypeScript errors

Common issues:

Missing types:

pnpm add -D @types/node @types/ws

AI SDK version mismatch:

// package.json
{
  "peerDependencies": {
    "ai": "^6.0.0"
  }
}

Install matching version:

pnpm add ai@^6.0.0

Module resolution:

// tsconfig.json
{
  "compilerOptions": {
    "moduleResolution": "node",
    "esModuleInterop": true
  }
}

Getting Help

If you’re still experiencing issues:

Check the changelog for recent fixes and breaking changes
Review example code in the repository:
- example/demo.ts — text-only usage
- example/ws-server.ts — WebSocket server
- example/voice-client.html — browser client

Enable debug logging to see what’s happening:

agent.on("chunk:text_delta", ({ text }) => console.log("[LLM]", text));
agent.on("speech_chunk_queued", ({ id, text }) => console.log("[TTS Queue]", id, text));
agent.on("audio_chunk", ({ chunkId }) => console.log("[Audio]", chunkId));

Report issues on GitHub with:
- Voice Agent AI SDK version
- Node.js version
- Minimal reproduction code
- Error messages and logs

Documentation Index

​WebSocket Connection Issues

​Audio & Transcription Issues

​TTS Generation Issues

​Memory & Performance

​Error Handling Patterns

​Environment & Configuration

​Getting Help

WebSocket Connection Issues

Audio & Transcription Issues

TTS Generation Issues

Memory & Performance

Error Handling Patterns

Environment & Configuration

Getting Help