Roadmap
What's in Coulisse today, and what's coming.
Working today
-
Multi-agent routing via the
modelfield. -
Agents as tools — expose one agent to another under
subagents:with apurpose:description. Nested invocations are bounded by a depth cap. -
Skills — reusable instruction bundles (Claude Code / Codex style). Drop a
SKILL.mdfolder under./skills; agents opt in by name and get one progressive-disclosure tool per skill, plus askill_filereader for bundled resources. -
Per-user conversation history with isolation.
-
Long-term memory with semantic recall — persistent via SQLite and backed by a real embedder (OpenAI or Voyage AI;
hashfallback for offline dev). -
Long-term user state — opt-in
user_state: trueenables a background extractor that pulls durable facts from each exchange and deduplicates them before storing. Embedder and extraction model are auto-derived from your configured providers. -
Multi-backend support (Anthropic, OpenAI, Gemini, Cohere, Deepseek, Groq).
-
OpenAI-compatible HTTP API (
/v1/chat/completions,/v1/models). -
Studio UI at
/admin/— browse conversations, memories, and judge scores; edit agents, judges, experiments, and smoke tests live; watch the real-time task board at/admin/live. -
LLM-as-judge evaluation — background scoring of agent replies against YAML-defined rubrics, with per-judge sampling and per-user persistence.
-
Experiments (A/B testing) — wrap multiple agents under one addressable name and route traffic between them with sticky-by-user defaults. Three strategies:
split(weighted random),shadow(primary serves the user, others run in the background and are scored), andbandit(epsilon-greedy on a single judge criterion). -
Streaming responses over SSE (
stream: true, withstream_options.include_usage). -
MCP tool integration over stdio and HTTP, with per-agent filtering.
-
Per-user OAuth 2.0 for MCP servers (token vault, connect-link flow, per-user session pool).
-
Per-user token rate limiting (hour / day / month).
-
Triggers — start agents on a schedule (
cron), via HTTP POST (webhook), or on server boot (boot). -
Async task queue —
dispatch_taskenqueues background work;tasks_statusinspects the queue from chat;/admin/liveshows it in real time. -
Sidecars — long-lived helper processes (bridges, exporters) spawned and supervised by Coulisse.
-
Config variables (
vars:) — named string snippets shared across agent preambles. -
JSON Schema generation (
coulisse schema) for IDE autocompletion and live validation. -
YAML-driven config with startup validation.
-
Docker image with a volume-mounted SQLite store.
-
Credential-bound identity —
auth.proxy.identity: from_credentialderives the per-user identity from the authenticated principal (Basic username or OIDCsub) instead of trusting the request body, and rejects a mismatchedsafety_identifier. Makes adversarial multi-tenant serving safe; mutually exclusive withdefault_user_id. See User identification.
Planned
Durable rate-limit state
Current rate-limit counters live in memory — they reset on restart and don't span multiple instances. A durable, shared backend is planned so quotas survive reboots and horizontal scaling.
Vector index for large memory stores
Recall currently does a linear cosine scan over all memories for the user. Fine at hundreds-to-low-thousands of memories per user, but a vector index will be needed if per-user memory counts grow into the tens of thousands.
Per-agent memory overrides
Today the memory: block is global. A future revision will allow per-agent scoping (different embedders or budgets per agent) for cases where one agent handles long-form research and another handles short user chat.
This list reflects what's on deck at the time of writing — check the repository for the current state.