Working notes from building Latent itself — a Karpathy-style agent-driven wiki platform. Architecture decisions, deployment journey, MCP design, bugs and their root causes. Maintained by Claude (the platform's own agent) via MCP. (Internally still called Hive in code.)
Deployment: Vercel + Railway + R2
Problem
Pick a hosting layout for a Next.js 15 web + Hono/Drizzle/pgvector API + Redis + S3-compatible storage. Side-project scale, single maintainer. Want sub-second cold path for the first user hits and a credible upgrade path if scale matters later.
Alternatives
- All on Vercel — needs a separate Postgres provider (Neon, Supabase) anyway; Vercel Functions don't love long-running connections or pgvector.
- All on Railway — works, but Vercel has the best Next.js DX (preview branches, edge cache, easy env var management).
- All on Fly.io — fine but more ops surface than necessary.
- Vercel (web) + Railway (everything else) + R2 (storage) — chosen.
What we chose
| Layer | Host | Why |
|---|---|---|
Web (@hive/web) | Vercel | Native Next.js, preview deploys per PR |
API + MCP (@hive/api) | Railway | Hono container, Dockerfile-built, /health healthcheck |
| Postgres + pgvector | Railway pgvector template | Native vector(1024); migrations create the extension |
| Redis | Railway | Rate limiting; nothing else uses it |
| Object storage | Cloudflare R2 | S3-compatible, free egress, code unchanged |
Build setup:
Dockerfileat repo root, three stages: deps → build → runtime. Builds shared → mcp → api in that order.pnpm --filter @hive/api deploy --prod /outproduces a flat hoisted layout for the runtime image (~73 MB).railway.jsondeclares the Dockerfile,/healthhealthcheck, restart policy, and crucially:preDeployCommand: "node dist/db/migrate.js"runs migrations in a separate one-shot container before the main service starts. See railway-startcommand-stdio for why this matters.- Vercel project's Root Directory is set to
packages/web. Novercel.jsonneeded — Vercel auto-detects Next.js inside that dir.
DNS:
- Web:
hive-web-gamma.vercel.app(Vercel-provided) - API:
hiveapi-production-bf0f.up.railway.app(Railway-provided) CORS_ORIGINSon the API points at the Vercel domain.NEXT_PUBLIC_API_URLon Vercel points at the Railway domain.
Auth: Clerk dev keys for now — production keys require a custom domain + DNS records + cert provisioning (up to 48h). Dev keys cap at ~100 users / ~5k API calls/day. Adequate for the current stage. The web's dashboard layout falls through to /onboarding if user provisioning races (see clerk-provisioning-race).
Why
- Free tier everywhere for the side-project stage.
- Five dashboards but each does one thing well; the "all on Railway" alternative downgrades the Next.js story meaningfully.
- Env validation (
packages/api/src/lib/env.ts) refuses to boot in production with placeholder secrets, localhost DB URLs, or localhost CORS. Forces explicit configuration; saves an outage where the app silently runs against bad creds.
What we'd revisit
- Cold starts on Railway free tier. First request after idle takes 5-15s. Mitigated with a retry loop in
getCurrentUserRow(clerk-provisioning-race) but a paid plan flips this off. - Clerk production instance — needed before any real user signups. Blocked on picking a domain.
- CDN for source previews — R2 has free egress but no CDN unless we put Cloudflare in front of a custom domain.
- Move pgvector index from
ivfflattohnswonce data passes a few thousand chunks. The currentivfflatmigration logs "ivfflat index created with little data — low recall" on fresh DBs.