🧠 English documentation
ζ—₯本θͺž β†’

🧠Layer 2: ASR / LLM / TTS

The speech-recognition, language-model, and speech-synthesis backends β€” all swappable.

Each stage of the voice pipeline (STT β†’ LLM β†’ TTS) is swappable via environment variables. Source: INVENTORY.md Β§1.4 (the AEGIS_ settings in config.py).

STT (speech recognition) backends

πŸ§ͺ

stub (default)

AEGIS_STT_BACKEND=stub
πŸ—£οΈ

whisper

AEGIS_STT_BACKEND=whisper
🌐

viibevoice (HTTP)

AEGIS_STT_URL=...
☁️

elevenlabs

scribe_v2

LLM (language model) modes

πŸ§ͺ

stub (fixed responses, default)

AEGIS_LLM_MODE=stub
πŸ”

openai_compat (ollama / vLLM / LM Studio)

AEGIS_LLM_URL=http://localhost:11434/v1, model=gemma2:9b
πŸ€–

OpenAI Realtime (Route C)

gpt-realtime (direct audio)

TTS (speech synthesis) backends

πŸ”Š

edge_tts (default, +28% speed)

AEGIS_TTS_BACKEND=edge_tts
🎡

kokoro

AEGIS_TTS_BACKEND=kokoro
🎢

piper

AEGIS_TTS_BACKEND=piper
☁️

elevenlabs (needs VOICE_ID)

eleven_multilingual_v2
ℹ️The design intent

Because each stage can be switched via voice_backend / stt_backend / tts_backend, we avoid vendor lock-in while gradually moving toward production quality β€” a switchpoint that anticipates Phase 3’s move to local voice.