Pre-Vacation Build Plan

What We Discovered — Infrastructure Already Built

LiteLLM is already running on port 4000 with OpenRouter (DeepSeek, Qwen), Anthropic API (Claude Sonnet/Haiku), and local Ollama models. ClawdRouter at 8080 routes through it. The model fallback stack exists — it just isn't exposed in the dashboard UI yet.

Claude Code CLI respects ANTHROPIC_BASE_URL — same terminal, same tmux, same UI. Switching models = restart with different env vars. No new tab or CLI swap required. Claude Max stays as the default and is never auto-switched — fallback models are manual selection only.

Tailscale is for security, not just convenience. On unsecured WiFi, Tailscale exit node routes ALL traffic through VPS via encrypted WireGuard. Café WiFi sees only encrypted packets. Nothing leaks.

Tailscale — Secure Dashboard Access on Untrusted WiFi

SECURITY CRITICAL ~20 min Do first — everything else runs through this

Why Tailscale Is Right Here

On unsecured WiFi, even HTTPS leaks metadata — the network can see you're connecting to vercel.app, claude.asapai.net, etc. (DNS is often unencrypted, and the destination IP is visible even with TLS). Tailscale uses WireGuard: all traffic from your phone is encrypted at the packet level before it hits the WiFi network. The café sees only encrypted UDP packets going to your VPS's Tailscale IP. Nothing else.

The fix: make the VPS a Tailscale exit node. Your phone routes ALL internet traffic through the VPS via Tailscale tunnel. You then access the Vercel dashboard, claude.asapai.net, everything — as if you were on the VPS. No local network exposure.

Why You're Seeing One Page Right Now

When you access the VPS via its Tailscale IP directly, you hit whatever nginx or Tailscale serve has configured — likely a single-page app or a specific port. The full dashboard needs the exit node approach OR a proper serve config. Check current state first:

tailscale serve status

Recommended Path: Exit Node (10 minutes)

On the VPS — advertise it as a Tailscale exit node:

tailscale up --advertise-exit-node

This adds the VPS as an available exit node in your Tailnet. It doesn't change anything yet — your devices still use direct routes until you enable it on each device.

Approve the exit node in the Tailscale admin console (one-time, takes 30 seconds):

Go to login.tailscale.com/admin/machines → find your VPS → click the ⋯ menu → Edit route settings → enable Use as exit node → Save.

On your phone (when on unsecured WiFi): Open Tailscale app → tap the VPS in the peer list → toggle "Use as exit node".

Now ALL your phone's internet traffic routes through the VPS. Open forge-dashboard-kbce.vercel.app in browser — it loads via the VPS, not your phone's WiFi.

Turn off when on trusted WiFi/cell to avoid routing all traffic through VPS unnecessarily.

On your laptop (same approach): Tailscale → select VPS as exit node. Everything is secure.

Alternative: Serve Full Dashboard Directly via Tailscale

If you want the dashboard to live only inside Tailscale (never accessible on public internet), self-host the Next.js app on the VPS and serve it via Tailscale. More work but fully air-gapped from public internet.

Recommended: Exit Node

10 minutes. All traffic secured. Full dashboard works unchanged. No app changes.

Tradeoff: all traffic routes through VPS (minor latency). Turn off on trusted networks.

Alternative: Self-hosted Dashboard

Build Next.js on VPS, configure tailscale serve to proxy port 3000. Dashboard lives entirely within Tailscale network.

~1 hr more work. Dashboard URL becomes your Tailscale machine name. Requires keeping the VPS build current with Vercel.

Second-Order Effects of Exit Node

Dashboard, claude-terminal, all APIs — fully secure on any WiFi, no changes to URLs or bookmarks
SSH to VPS from anywhere — same Tailscale connection, no SSH key exposure on public networks
All future travel — one toggle on phone activates full security, no per-trip setup
Phone as forge terminal — ClawdTerm or web terminal in dashboard is fully usable and private

Definition of Done

VPS appears as available exit node in Tailscale admin. Phone can toggle it on. With exit node active, visiting forge-dashboard-kbce.vercel.app shows full dashboard. whatismyip.com shows VPS IP (not phone's WiFi IP) — confirms all traffic routing through VPS.

Model Selector — Full Stack (Max Default + API + OpenRouter + Local)

VACATION SAFETY ~1.5 hrs

Key Design: Claude Max is Always Default — Everything Else is Manual Fallback

The model selector defaults to Sonnet 4.6 (Max). The dropdown shows all available models grouped by billing type. You manually switch when Max is unavailable. No auto-switching, no surprises. The forge-claude-model localStorage key persists your last selection — so if you deliberately switch to a fallback, it stays until you switch back.

Full Model List

Name in Dropdown	Billing	Notes
Sonnet 4.6 DEFAULT	MAX	Current default. Full tool use. Uses subscription.
Opus 4.6	MAX	Heaviest Max model. Use for complex reasoning.
Haiku 4.5	MAX	Fast + cheap. Good for simple tasks on Max.
Sonnet 4.5 (API)	API KEY	Identical to Max Sonnet. Pay-per-token. Full tool use. Best fallback.
Haiku 4.5 (API)	API KEY	~$0.01/session. Full tool use. Use when Max is down + cost matters.
Gemini 2.0 Flash	OPENROUTER	Fast. Good for analysis + image gen. Tool use may degrade.
Gemini 2.5 Pro	OPENROUTER	Strong reasoning. Good code review. Tool use may degrade.
DeepSeek V3	OPENROUTER	Excellent coder. Very cheap. Tool use may degrade.
Qwen 2.5 Coder 32B	OPENROUTER	Strong on code. Cheap via OpenRouter.
Qwen 7B (Local)	FREE/LOCAL	Runs on VPS via Ollama. Free. No internet needed. Tool use unreliable.
Llama 3.2 3B (Local)	FREE/LOCAL	PII-safe (never leaves VPS). Tiny. Fast. Basic tasks only.

Tool Use Reality: Claude API Models = Full Parity. Others = Degraded.

Sonnet/Haiku via Anthropic API key (not Max) behave identically to Max — full file editing, bash, tool use, agentic tasks. OpenRouter and local models receive translated requests via LiteLLM but may garble tool calls. Best use for non-Claude models: chat, analysis, code review. Don't expect complex multi-file edits. Gemini is the best non-Claude option for tool use.

How Launching Each Model Type Works (same tmux, different env vars)

# Claude Max (default — current behavior)
claude --model sonnet --dangerously-skip-permissions

# Anthropic API (pay-per-token, full tool use, identical behavior)
ANTHROPIC_API_KEY=<key> claude --model claude-sonnet-4-5-20250929 --dangerously-skip-permissions

# LiteLLM → OpenRouter / Local (experimental tool use)
ANTHROPIC_BASE_URL=http://127.0.0.1:4000 \
ANTHROPIC_API_KEY=sk-forge-litellm-local \
claude --model gemini-2-flash --dangerously-skip-permissions

All three run in the same tmux window. The dashboard restart endpoint just sends different env vars + model flag. State impact = same as clicking Restart today.

State When Switching Models

Conversation history

Lost (same as today's restart)

Files + memory.md

Preserved

tmux session/window

Same — no new tab

Ralph + active tasks

Unaffected

Tool use (Claude API)

Full parity with Max

Tool use (OpenRouter)

May degrade

Files to Change

dashboard.ts — add GET /api/models endpoint returning full model list with billing type metadata.

dashboard.ts — update POST /api/claude-restart to accept all model IDs and launch with correct env vars per billing type (Max / API key / LiteLLM).

web-terminal.tsx — replace 3 hardcoded <option> tags with dynamic fetch from /api/vps/models, rendered as <optgroup> sections by billing type. Default selection = sonnet.

/opt/forge/config/litellm.yaml — add Gemini 2.0 Flash and Gemini 2.5 Pro via OpenRouter. Restart LiteLLM service.

Second-Order Effects

Max timeout → select "Haiku 4.5 (API)" — identical tool use, ~$0.01/session, forge never fully stops
MasteryBook fix — same architecture: point at LiteLLM → Gemini 2.0 Flash
Image generation — select Gemini 2.0 Flash in terminal, request image, done
Ralph gets cheaper models — ClawdRouter cascade already configured, Ralph was already using LiteLLM

Definition of Done

Model selector shows grouped optgroups: Claude Max / Anthropic API / OpenRouter / Local. Default is Sonnet 4.6. Selecting "Haiku 4.5 (API)" and restarting uses Anthropic API key with full tool use. Selecting DeepSeek V3 and restarting connects via LiteLLM. Both work in the same tmux terminal window.

Fix MasteryBook → LiteLLM Gemini

~20 min after Item 2

Root Cause

MasteryBook points at Gemini directly (bad key or Gemini CLI not installed). Fix: redirect to LiteLLM → gemini-2-flash via OpenRouter. Same model, no new key needed, uses OpenRouter key already in place. MasteryBook never breaks on Gemini API changes again — LiteLLM handles fallbacks.

Find MasteryBook + diagnose: find /opt/forge -name "*mastery*" 2>/dev/null | grep -v node_modules

In MasteryBook config/env, change to LiteLLM endpoint:

OPENAI_BASE_URL=http://localhost:4000/v1
OPENAI_API_KEY=sk-forge-litellm-local
MODEL=gemini-2-flash

If MasteryBook uses a Gemini-specific SDK, replace with openai-compatible client pointed at LiteLLM.

Definition of Done

MasteryBook generates content without errors. Logs show requests hitting LiteLLM → OpenRouter → Gemini successfully.

Image Generation (Free Once Gemini Is in LiteLLM)

BONUS — ~15 min

Once gemini-2-flash is in LiteLLM (Item 2, Step 4), image generation is selecting that model and asking for an image. Add gemini-2-flash-image pointing to openrouter/google/gemini-2.0-flash-exp:free (the experimental variant with image gen). Select it in the terminal model dropdown, ask Claude to generate an image — response includes base64 PNG.

Bonus: add /image [prompt] slash command to Forge terminals that calls this model directly, saves to /tmp/forge-image.png, and optionally publishes to a NowPage.

Definition of Done

Selecting Gemini image model in terminal and typing "generate an image of X" returns a usable PNG. Works on vacation via Tailscale exit node.

Build Order + Time Estimates

20m

Item 1

Tailscale Exit Node

1.5h

Item 2

Full Model Selector

20m

Item 3

MasteryBook Fix

15m

Item 4

Image Gen

~2.5 hrs total. Items 1+2 alone = vacation-safe + secure.

Item 1 is 20 minutes and makes everything else work securely from anywhere.

Forge Infrastructure Sprint

What We Discovered — Infrastructure Already Built

Tailscale — Secure Dashboard Access on Untrusted WiFi

Why You're Seeing One Page Right Now

Recommended Path: Exit Node (10 minutes)

Alternative: Serve Full Dashboard Directly via Tailscale

Model Selector — Full Stack (Max Default + API + OpenRouter + Local)

Full Model List

How Launching Each Model Type Works (same tmux, different env vars)

State When Switching Models

Files to Change

Fix MasteryBook → LiteLLM Gemini

Image Generation (Free Once Gemini Is in LiteLLM)

Build Order + Time Estimates