Bring any provider, mix freely
Local Ollama, vLLM, OpenRouter, OpenAI, Groq — all in the same room. Credentials and endpoints stay on the participant machine; the hub never sees them.
Build apps where many models collaborate across people, machines and providers.
Gambi ships the gateway. You ship the experience.
Six recipes that fit on a few lines of TypeScript. The substrate (routing, observability, multi-source) is already there.
Same prompt, every model in the room. Spot differences in tone, accuracy, and speed at a glance.
→ pattern: Round-robin via model:"*", N requests in parallel, render side by side.
see pattern →Generate the answer once. Fan it out to other models for verdicts and scoring. The g-eval pattern, off the shelf.
→ pattern: 1 generator participant + N judge participants. Aggregate the votes.
see pattern →Cheap-then-strong. Small model drafts, bigger one critiques, third polishes. Quality without paying frontier prices end to end.
→ pattern: 3 chained calls — output of step N feeds prompt N+1.
see pattern →Two models argue, a third moderates. Stream every turn through SSE for a live show.
→ pattern: Loop between participant IDs with conflicting system prompts.
see pattern →A game where each character has its own brain. Different model, different system prompt, different attitude.
→ pattern: One participant per persona. Route by ID at the moment of speech.
see pattern →Bring friends or students. Each plugs their own LLM into the room. You build the UI; the models stay theirs.
→ pattern: Multi-person registration. App reads the participant list and fans out.
see pattern →Your app talks to a single OpenAI-compatible URL. The hub routes inside the room. Every participant runs its own provider, in its own tunnel.
* · model:<name> · <id>llm.request · llm.complete
Provider endpoints stay on localhost. The hub never reaches in — the participant runtime opens the tunnel.
One binary on the hub machine, one SDK call from your app. No accounts, no cloud bill, no signup.
gambi hub servegambi room create --name "demo"gambi participant join --room ABC123 --model llama3import { createGambi } from "gambi-sdk";
import { generateText } from "ai";
const gambi = createGambi({
roomCode: "ABC123",
hubUrl: "http://localhost:3000",
});
// route to any participant
const { text } = await generateText({
model: gambi.any(),
prompt: "Explain TLS in one sentence.",
});
// or to a specific model
const judge = await generateText({
model: gambi.model("claude-haiku-4-5"),
prompt: `Score this answer 1-10: ${text}`,
}); Going multi-model normally means re-wiring auth, routing and observability for every new provider. Gambi already did the boring parts — local-first, no signup, no cloud bill.
Local Ollama, vLLM, OpenRouter, OpenAI, Groq — all in the same room. Credentials and endpoints stay on the participant machine; the hub never sees them.
Every inference emits an SSE event with TTFT, total duration, and token counts. Pipe it to a TUI, a dashboard, or your own tracer.
One endpoint speaks Responses API and Chat Completions. AI SDK, OpenAI SDK, curl, any tool with a custom base URL — they all just work.
Run as many rooms as you want on a single hub — one per app, team, or experiment. Each room has its own participants, routing, and event stream.