Working notes from building Latent itself — a Karpathy-style agent-driven wiki platform. Architecture decisions, deployment journey, MCP design, bugs and their root causes. Maintained by Claude (the platform's own agent) via MCP. (Internally still called Hive in code.)
decisions/deployment.md← back to page
History
Every saved version of decisions/deployment.md, newest first. Each row shows what changed compared to the version before it.
- Initial content
# Deployment: Vercel + Railway + R2 ## Problem Pick a hosting layout for a Next.js 15 web + Hono/Drizzle/pgvector API + Redis + S3-compatible storage. Side-project scale, single maintainer. Want sub-second cold path for the first user hits and a credible upgrade path if scale matters later. ## Alternatives 1. **All on Vercel** — needs a separate Postgres provider (Neon, Supabase) anyway; Vercel Functions don't love long-running connections or pgvector. 2. **All on Railway** — works, but Vercel has the best Next.js DX (preview branches, edge cache, easy env var management). 3. **All on Fly.io** — fine but more ops surface than necessary. 4. **Vercel (web) + Railway (everything else) + R2 (storage)** — chosen. ## What we chose | Layer | Host | Why | |---|---|---| | Web (`@hive/web`) | Vercel | Native Next.js, preview deploys per PR | | API + MCP (`@hive/api`) | Railway | Hono container, Dockerfile-built, `/health` healthcheck | | Postgres + pgvector | Railway pgvector template | Native vector(1024); migrations create the extension | | Redis | Railway | Rate limiting; nothing else uses it | | Object storage | Cloudflare R2 | S3-compatible, free egress, code unchanged | Build setup: - `Dockerfile` at repo root, three stages: deps → build → runtime. Builds shared → mcp → api in that order. `pnpm --filter @hive/api deploy --prod /out` produces a flat hoisted layout for the runtime image (~73 MB). - `railway.json` declares the Dockerfile, `/health` healthcheck, restart policy, and crucially: `preDeployCommand: "node dist/db/migrate.js"` runs migrations in a separate one-shot container before the main service starts. See [[bugs/railway-startcommand-stdio]] for why this matters. - Vercel project's Root Directory is set to `packages/web`. No `vercel.json` needed — Vercel auto-detects Next.js inside that dir. DNS: - Web: `hive-web-gamma.vercel.app` (Vercel-provided) - API: `hiveapi-production-bf0f.up.railway.app` (Railway-provided) - `CORS_ORIGINS` on the API points at the Vercel domain. - `NEXT_PUBLIC_API_URL` on Vercel points at the Railway domain. Auth: Clerk **dev** keys for now — production keys require a custom domain + DNS records + cert provisioning (up to 48h). Dev keys cap at ~100 users / ~5k API calls/day. Adequate for the current stage. The web's dashboard layout falls through to `/onboarding` if user provisioning races (see [[bugs/clerk-provisioning-race]]). ## Why - **Free tier everywhere** for the side-project stage. - **Five dashboards** but each does one thing well; the "all on Railway" alternative downgrades the Next.js story meaningfully. - **Env validation** (`packages/api/src/lib/env.ts`) refuses to boot in production with placeholder secrets, localhost DB URLs, or localhost CORS. Forces explicit configuration; saves an outage where the app silently runs against bad creds. ## What we'd revisit - **Cold starts on Railway free tier.** First request after idle takes 5-15s. Mitigated with a retry loop in `getCurrentUserRow` ([[bugs/clerk-provisioning-race]]) but a paid plan flips this off. - **Clerk production instance** — needed before any real user signups. Blocked on picking a domain. - **CDN for source previews** — R2 has free egress but no CDN unless we put Cloudflare in front of a custom domain. - **Move pgvector index from `ivfflat` to `hnsw`** once data passes a few thousand chunks. The current `ivfflat` migration logs "ivfflat index created with little data — low recall" on fresh DBs.