Quick Start

This tutorial walks you through setting up Gambi from scratch. By the end, you’ll have a hub running with participants sharing LLMs on your network.

What You’ll Need

A machine to run the hub — any computer on the network (doesn’t need a GPU, it just routes traffic)
At least one LLM endpoint — Ollama, LM Studio, vLLM, or any OpenAI-compatible API
Node.js only if you plan to install the CLI through npm

Installation

CLI

The CLI allows you to start hubs, create rooms, and join as a participant. Pre-built binaries are available for Linux (x64/arm64), macOS (Apple Silicon and Intel), and Windows (x64).

curl -fsSL https://raw.githubusercontent.com/arthurbm/gambi/main/scripts/install.sh | bash

The script auto-detects your OS and architecture, downloads the correct binary, and installs it to /usr/local/bin.

irm https://raw.githubusercontent.com/arthurbm/gambi/main/scripts/install.ps1 | iex

The script downloads the binary to %LOCALAPPDATA%\gambi\ and adds it to your user PATH.

# Via npm
npm install -g gambi

# Via bun
bun add -g gambi

The published gambi package is a wrapper that installs only the matching platform binary for your machine. npm users only need Node.js; Bun is not required at runtime.

Verify the installation:

gambi --version

Uninstallation

curl -fsSL https://raw.githubusercontent.com/arthurbm/gambi/main/scripts/uninstall.sh | bash

irm https://raw.githubusercontent.com/arthurbm/gambi/main/scripts/uninstall.ps1 | iex

# If installed via npm
npm uninstall -g gambi

# If installed via bun
bun remove -g gambi

SDK

The SDK provides Vercel AI SDK integration for using shared LLMs in your TypeScript/JavaScript applications. It defaults to the Responses API and also supports Chat Completions.

npm install gambi-sdk
# or
bun add gambi-sdk

Basic Usage

All CLI commands support interactive mode — run without flags and you’ll be guided through each option. Flags still work for scripting.

1. Start the Hub Server

Pick a machine on the network to be the hub. It doesn’t need a GPU — the hub only routes requests between participants.

# Interactive — prompts for port, host, mDNS:
gambi hub serve

# Or with flags:
gambi hub serve --port 3000 --mdns

The --mdns flag enables auto-discovery so other machines on the network can find the hub automatically.

2. Create a Room

# Interactive — prompts for room name and password:
gambi room create

# Or with flags:
gambi room create --name "My Room"
# Output: Room created! Code: ABC123

Share this code with everyone who wants to join — via chat, projector, sticky note, whatever works.

You can also create a password-protected room:

gambi room create --name "Private Room" --password secret123

3. Join with Your LLM

Each person with an LLM endpoint joins the room:

gambi participant join \
  --room ABC123 \
  --participant-id joao-1 \
  --model llama3

With flags, the default endpoint is http://localhost:11434 (Ollama). For other providers, use --endpoint:

# LM Studio
gambi participant join \
  --room ABC123 \
  --participant-id joao-lmstudio \
  --model mistral \
  --endpoint http://localhost:1234

# vLLM
gambi participant join \
  --room ABC123 \
  --participant-id joao-vllm \
  --model llama3 \
  --endpoint http://localhost:8000

The CLI probes your local endpoint, detects available models and protocol capabilities, registers the participant, opens a tunnel back to the hub, and keeps the session alive until interrupted.

Important implication: your provider endpoint can stay on localhost, even when the hub is running on another machine in the same trusted network. You no longer need to publish a LAN-reachable provider URL just to join a room. For the reasoning behind this, see How Tunnels Work.

It also shares your machine specs (CPU, RAM, GPU) automatically. Use --no-specs if you prefer not to share them.

Once joined, your LLM is available to everyone in the room.

4. Use the SDK

Now anyone can use the shared LLMs from their code:

import { createGambi } from "gambi-sdk";
import { generateText } from "ai";

const gambi = createGambi({
  roomCode: "ABC123",
  hubUrl: "http://localhost:3000",
});

// Send to any available participant
const result = await generateText({
  model: gambi.any(),
  prompt: "Hello, Gambi!",
});

console.log(result.text);

You can also target specific models or participants:

// Use a specific model
const result = await generateText({
  model: gambi.model("llama3"),
  prompt: "Explain quantum computing",
});

// Use a specific participant
const result = await generateText({
  model: gambi.participant("joao"),
  prompt: "Write a haiku",
});

To use Chat Completions instead of the default Responses API:

const gambi = createGambi({
  roomCode: "ABC123",
  hubUrl: "http://localhost:3000",
  defaultProtocol: "chatCompletions",
});

5. Use the API Directly

No SDK needed. The hub is an OpenAI-compatible API — use it from any language or tool:

curl -X POST http://localhost:3000/rooms/ABC123/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "*",
    "input": "Hello!"
  }'

Any tool that accepts a custom OpenAI base URL works — Lovable, Cursor, Open WebUI, Python’s openai library, etc. Just point it at:

http://<hub-ip>:<port>/rooms/<ROOM_CODE>/v1

See the API Reference for all available endpoints.

Next Steps

Learn about CLI commands
Explore SDK usage
See the full API Reference
Using cloud LLMs? See Remote Providers
Building multi-LLM experiences? See Multi-LLM Patterns