Voice Pipeline
FERAL supports three voice paths optimized for different latency, cost, and quality tradeoffs. All paths share a commonVoiceRouter that selects the best pipeline per session.
Voice Paths
OpenAI Realtime
The lowest-latency option. Audio streams bidirectionally over a single WebSocket — no transcription step, no TTS step. The model hears your voice and speaks back directly.Gemini Live
Google’s bidirectional streaming voice API. Similar architecture to OpenAI Realtime but uses Gemini models.Whisper + Classic TTS
The fallback pipeline that works with any LLM provider. Audio is transcribed locally, sent as text to the LLM, and the response is synthesized back to speech.Wake Word Detection
FERAL uses openwakeword for always-on, local wake word detection. No audio leaves the device until the wake word fires.VoiceRouter
TheVoiceRouter decides which pipeline handles each session based on configuration, provider availability, and client capabilities.
realtime → gemini_live → whisper. If the preferred provider is down, the router degrades gracefully.
Router Configuration
Sub-200ms Latency Architecture
Achieving low latency requires minimizing hops between the user’s mic and the model’s audio output. Realtime path (OpenAI/Gemini):| Technique | Impact |
|---|---|
| Opus codec at 24kHz | 3× smaller frames vs raw PCM |
| Server-side VAD | No extra roundtrip for silence detection |
| Streaming playback | Speaker starts before full response arrives |
| Connection keep-alive | Eliminates WebSocket setup on subsequent turns |
| Edge routing | Provider SDKs route to nearest datacenter |
| Technique | Impact |
|---|---|
base.en model on GPU | ~100ms transcription for typical utterance |
| Streaming TTS | First audio chunk plays while rest generates |
| Sentence-level chunking | TTS starts per-sentence, not per-response |
| Local inference (Piper) | Eliminates TTS network roundtrip entirely |
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/session | WebSocket | Full duplex voice + text session |
/api/voice/config | GET | Current voice pipeline configuration |
/api/voice/config | PATCH | Update voice settings at runtime |
/api/voice/wakeword/status | GET | Wake word detector status and metrics |
