The pieces
| Component | What it does |
|---|---|
| Backend | REST + WebSocket API. Runs the flow executor, resolves providers, persists every session. The single domain surface every other component talks to. |
| Frontend | Dashboard — flow builder, monitoring, organization + provider management. Talks only to the backend. |
| Agent worker | One short-lived process per active voice session. Streams audio between the caller, the STT service, the LLM, and the TTS service. |
| STT, TTS, LLM providers | Pluggable. Cloud (Deepgram, ElevenLabs, OpenAI, …) or self-hosted (Whisper, Vosk, Piper, Kokoro, Ollama, vLLM). Selected per flow. |
| PostgreSQL + pgvector | Source of truth for flows, sessions, organizations, secrets, embeddings, transcripts. |
| Logto | OIDC identity provider for users + machine-to-machine tokens. |
| LiveKit | WebRTC + SIP transport for voice. |
Session lifecycle
From a trigger to an active conversation: The flow executor is stateless per request — every invocation reconstructs state from PostgreSQL. A long conversation runs as many short executor calls, driven by inbound events (transcripts, tool callbacks, child-flow completions).Provider resolution
Three layers merge into a single resolved config per session: Priority: flow > organization > default. Missing required secrets fail the trigger with HTTP 412 before the agent worker is dispatched — preventing opaque mid-call SDK errors.Channels
| Channel | Transport | Status |
|---|---|---|
| Phone | SIP via LiveKit SIP + telephony provider (Telnyx / Twilio) | implemented |
| Meta Cloud API | implemented | |
| Web widget | — | not implemented |
Memory
Three layers, all backed by pgvector:- Episodic memory — long-term semantic chunks of user turns. Searchable by user / cluster.
- Persistent user facts — extracted after each turn (
role,type,value). Newer extractions supersede older ones. - Knowledge bases — per-org document collections, retrievable
through the
ragflow node.