Coulisse

One YAML file. An OpenAI-compatible server with memory, tools, and multi-backend routing.

Coulisse is a single Rust binary that reads a coulisse.yaml file and spins up an OpenAI-compatible HTTP server. You point your existing tools, SDKs, and projects at it like any other OpenAI endpoint — and everything configurable lives in that one YAML file.

Why Coulisse?

Every multi-agent project ends up re-implementing the same plumbing:

Per-user conversation memory
Routing between model providers
Rate limits and retries
Tool integration
Multiple agents with different system prompts

Coulisse collapses this plumbing into one configurable server. You describe the setup in YAML and pilot the whole thing from there, instead of writing glue code for each prototype.

How it works

┌──────────────────┐        ┌──────────────────┐        ┌──────────────────┐
│  Your SDK / app  │───────▶│     Coulisse     │───────▶│   Anthropic      │
│  (OpenAI client) │        │                  │        │   OpenAI         │
└──────────────────┘        │   coulisse.yaml  │        │   Gemini …       │
                            │                  │        └──────────────────┘
                            │   + memory       │
                            │   + MCP tools    │        ┌──────────────────┐
                            │   + per-user     │───────▶│   MCP servers    │
                            └──────────────────┘        └──────────────────┘

Your application talks to Coulisse using any OpenAI-compatible SDK.
Coulisse picks the agent you asked for (by model name), assembles the user's memory, and calls the right backend.
The response flows back — and the exchange is saved to that user's memory for next time.

What's in the box

Feature	Status
Multi-agent routing	✅ Working
Per-user memory	✅ Persistent (SQLite) with semantic recall
Real embedders	✅ OpenAI + Voyage (hash fallback for offline dev)
Auto-extraction	✅ Optional — pulls durable facts from each exchange
MCP tool integration	✅ Working (stdio + HTTP)
Multi-backend support	✅ Anthropic, OpenAI, Gemini, Cohere, Deepseek, Groq
OpenAI-compatible API	✅ `/v1/chat/completions`, `/v1/models`
Streaming responses	✅ Server-Sent Events
Rate limiting	✅ Per-user token quotas (hour / day / month, in-memory)
Studio UI	✅ `/admin/` — conversations, memories, judges, live task board, admin edits
Triggers (cron / webhook / boot)	✅ Start agents on a schedule or via HTTP POST
Async task queue	✅ Fire-and-forget background work with `dispatch_task`
Sidecars	✅ Long-lived helper processes managed by Coulisse
Config variables (`vars:`)	✅ Named snippets shared across agent preambles
IDE schema (`coulisse schema`)	✅ JSON Schema for autocompletion in VS Code, Helix, Zed…
Durable rate-limit state	⏳ Planned

Continue to Installation to get started.

Stability

Coulisse is pre-1.0. It follows Semantic Versioning, but during the 0.x phase, minor version bumps (0.1 → 0.2) may include breaking changes to the YAML schema, HTTP surface, or CLI. Patch bumps (0.1.0 → 0.1.1) will not. See the Releasing chapter and CHANGELOG.md for the version history.