The Five-Day AI Application Engineer — A BlueAlly Field Guide

01 Orientation

Read this first. The promise is honest.

This guide takes you from a clean Mac to a deployed, database-backed, evaluated AI application. It is self-paced. It is not solitary. That difference matters more than anything in the curriculum.

Nobody becomes world-class at anything in five days, and a guide that claims otherwise lies on page one. Here is what five days actually buys: a working machine, a working practice loop, and a shipped application you built yourself. World-class comes from running that loop for the ninety days after. This guide installs the loop. The last section is the real path to the title — read it as part of the course, not as an appendix.

In plain English

The words this guide is built on

AI application engineer: Someone who builds products on top of frontier models — prompts, streaming interfaces, tools, retrieval, data, and evaluation. Not training models. Directing them.
Frontier model: A top-tier general model from a frontier lab — Anthropic's Claude, for this guide. You rent it through an API; you never run it yourself.
The loop: Explore → Plan → Code → Verify → Commit. The spine of professional work with an AI agent. Five days drill it.¹
Ship: Put a working thing on the internet at a real URL. A deployed URL is the unit of progress here — one per day.
Eval: A repeatable check of model behaviour. The AI-era equivalent of a test suite. It is how "seems better" becomes "is better."

Two tracks, one spine

Every section is for everyone unless marked. Two markers route you.

Novice path Slower, with worked examples

Extra explanation, smaller steps, every term defined. If you have never opened a terminal, this path is written for you. Skip nothing.

Fast track Accelerated, for experienced developers

Harder challenges and advanced tooling. If the basics bore you, jump between gates and take these instead.

And two honest exits. The Operator exit at the end of Day 3: you have shipped a real AI application and can build AI features into your work — a complete, legitimate finish for consultants, PMs, and analysts. The Builder finish at the end of Day 5: database, structured outputs, tool calling, evals, and a capstone demo. That is the engineer's track.

The rails — what keeps you finishing

Self-paced courses fail quietly. Socially-anchored ones do not. Use all four rails.

Gates.

Every day ends in a gate — a short checklist of things that must be true before you move on. If a gate item is false, do not advance. Fix it, or ask.

Your wave channel.

Post your deployed URL to your cohort's chat at every gate. Stuck more than twenty minutes? Post the error. Nobody stalls silently.

The Friday demo.

Five minutes, your capstone, screen shared to your wave. It is on the calendar. Build toward it from Day 1.

Daily rhythm.

Budget five to seven focused hours a day. Take the breaks. Intensity without recovery teaches nothing that survives the week.

The typing is no longer the bottleneck. Judgment is.

So what: you are not learning to type code faster. You are learning to direct capable agents and verify the result like a professional — part architect, part editor, part quality engineer.

02 Perspective

Three voices stand behind the method.

This guide follows three practitioners whose published work anchors everything here. Where you see a claim attributed to one of them, it is theirs. The synthesis — and any errors — are ours.

The Why

After Andrej Karpathy

Software has changed. You now program a new kind of computer in plain language — Karpathy's line is that the hottest new programming language is English.² But these systems have jagged intelligence: brilliant at hard things, careless at easy ones. So you keep the AI on a leash, slide autonomy up slowly, and verify everything. And you still learn the fundamentals by building them — what you cannot create, you do not understand.

The Method

After Andrew Ng

The best way to learn is to build something real. Small lessons, immediate hands-on practice, a project that grows every day, a habit that outlasts the week. Ng has called discouraging people from learning to code in the AI era some of the worst career advice going.³ Everyone builds here — including the people who thought they couldn't.

The Craft

After the makers of Claude Code

The people who built the tool describe their own setup as surprisingly plain.⁴ The craft is not exotic configuration. It is discipline: plan before you code, give the agent a way to verify its work, keep the context clean, teach your project file after every correction, and never merge what you can't explain. Boring, repeatable, effective.

04 Day 0 · the evening before

Day 0 · do this before Monday

Rig the machine

One pass, top to bottom, on a Mac. By the end, every tool answers when called. Environment failure is the silent killer of self-paced training — we kill it first, together, with a block that prints the truth.

Time 60–90 minOutput A verified toolchainTracks Everyone

Open the Terminal app (press ⌘ Space, type Terminal, hit return). You type a command, press return, the machine obeys. That is the whole trick. Run each block below one at a time and read what comes back. We use Homebrew (the Mac package manager) to install developer tools with one command each.

Terminal · Homebrew, then a Node version manager

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# When it finishes, Homebrew prints two or three "Next steps" lines.
# Run them — they put brew on your PATH. Then install fnm and Node:
brew install fnm
echo 'eval "$(fnm env --use-on-cd)"' >> ~/.zshrc && source ~/.zshrc
fnm install --lts && fnm default lts-latest
node -v   # prints Node 24.x — the current LTS, your green light

Novice path PATH is the list of folders your terminal searches when you type a command. "Command not found" right after installing a tool means the tool is fine — your PATH doesn't include it yet. Re-run the "Next steps" lines, then close and reopen Terminal. And never install Node directly: fnm auto-switches versions per project, so juggling client repos on different Node versions is painless later.

Terminal · the rest of the kit

brew install pnpm git gh          # package manager, version control, GitHub CLI
brew install --cask visual-studio-code
# Configure git once:
git config --global user.name "Your Name"
git config --global user.email "you@blueally.com"
git config --global init.defaultBranch main
gh auth login   # GitHub.com → HTTPS → authenticate Git → log in via browser
# Install Claude Code with the official installer (npm install is deprecated):
curl -fsSL https://claude.ai/install.sh | bash
pnpm add -g vercel   # the deploy CLI

We standardize on pnpm: it stores one copy of every dependency on disk and links it into projects — faster installs, a fraction of the disk, stricter hygiene. When a tutorial says npm, you type pnpm. Open VS Code once, press ⌘ Shift P for the Command Palette, and run Shell Command: Install 'code' command in PATH. Then add the working extensions and turn on format-on-save.

Terminal · VS Code extensions + Claude Code in the IDE

code --install-extension dbaeumer.vscode-eslint        # catches errors as you type
code --install-extension esbenp.prettier-vscode       # auto-formats on save
code --install-extension bradlc.vscode-tailwindcss    # Tailwind autocomplete
code --install-extension usernamehw.errorlens         # errors inline, in your face
code --install-extension eamodio.gitlens              # who changed what, when
code --install-extension anthropic.claude-code        # Claude Code in the IDE

Now make the machine prove itself. Run this. Every line must answer with a version or a logged-in status. If any line fails, fix it now — post the exact error in your wave channel if you are stuck more than twenty minutes. Do not start Day 1 with a broken rig.

Terminal · the truth printer

node -v && pnpm -v && git --version && gh auth status && claude --version && vercel --version

Last, create two free accounts you will need this week: Vercel at vercel.com (sign up with your GitHub account) and Neon at neon.tech (serverless Postgres, for Day 4).

Gate 0 — clear it before Day 1

Every box true, honestly. The gates are for you, not for us.

The truth printer shows a version for every tool, no errors
gh auth status shows me logged in to GitHub
Claude Code opens, authenticated, and answers a hello
VS Code opens from the terminal with code .
I have Vercel and Neon accounts, and I've joined my wave's channel

05 Day 1 · Monday

Day 1 · Monday

Ship on day one

A live URL with your name on it by lunch. Not on Day 3 — today. Nothing fights the urge to quit like something real on the internet. Afternoon: meet your tools and teach your project its first rules.

Time 5–6 hrsOutput A deployed Next.js siteGate Live URL posted

What you build today

A personal landing page, scaffolded with Next.js, edited by Claude Code, version-controlled on GitHub, and deployed to the internet by Vercel — all before lunch.

Terminal · scaffold, then run it locally

cd ~ && mkdir -p dev && cd dev
pnpm create next-app@latest hello-blueally
# Say YES to: TypeScript, ESLint, Tailwind CSS, App Router, import alias (@/*).
cd hello-blueally && pnpm dev
# Open http://localhost:3000 — that's your app, running on YOUR machine.
code .   # open the folder in VS Code; ⌃` toggles the built-in terminal

If pnpm stops you "Ignored build scripts: sharp, unrs-resolver … pnpm install has failed"? This is the single most common Day 1 wall, and it is not your fault. pnpm (v10+) refuses to run a package's native build step until you approve it — a safety feature, not a bug. Get past it in ten seconds. First, be inside the project: your prompt must end in hello-blueally %, not dev % — if it ends in dev, run cd hello-blueally first. (Running pnpm approve-builds from the wrong folder is why it says "no packages awaiting approval.") Then run pnpm approve-builds, press a to select all, Enter, then y. Re-run pnpm install and keep going. Stuck on it? The When you're stuck section has a copy-paste fix that always works.

Novice path What just happened? create-next-app generated a complete web application. Next.js 16 is the framework, TypeScript is JavaScript with seatbelts, Tailwind styles with utility classes, and the App Router means folders in app/ become pages. app/page.tsx is your home page.

First real work for the agent

In a second terminal tab, start the agent and let it learn the project. Then give it a task that is specific about the file, the content, and the constraint.

Terminal · claude

claude
/init   # Claude explores the repo and writes CLAUDE.md — its standing memory.
         You'll edit this file all week. Read what it wrote.

Prompt → Claude Code

Replace the contents of app/page.tsx with a simple personal landing page:
my name, the title "AI Application Engineer in training — BlueAlly",
and a short list of what I'm building this week. Style it with Tailwind,
navy and light-blue, centered, clean. Keep it a server component —
no client-side state. Don't touch any other file.

Claude proposes changes as diffs. Read the diff before you approve. Green lines added, red removed. You don't need to understand every character today — you need the habit of looking. Approve, check the browser, and there is your page. Now put it on GitHub and ship it.

Terminal · commit, publish, deploy

git add -A && git commit -m "feat: personal landing page"
gh repo create hello-blueally --private --source=. --push
vercel        # accept the defaults; ~60s later it prints a preview URL
vercel --prod  # that URL is live, on the internet, built by you, before lunch
# Post it in your wave channel. That's Gate 1's first box.

The one idea to carry forward

Open CLAUDE.md and shape it. This file loads into every Claude session in this repo — it is project memory. Keep it short and true.

CLAUDE.md · a strong starter

# Project: hello-blueally

## Stack
- Next.js (App Router) + TypeScript strict. Tailwind for styles. pnpm ONLY.

## Commands
- dev: pnpm dev · build: pnpm build · lint: pnpm lint

## Rules
- Server components by default; add 'use client' only for state/events.
- Small, focused diffs. One concern per commit. Conventional commit messages.
- After a task, run pnpm lint and pnpm build, and show me the result.

That last line is Principle 2 in action — you just gave the agent a way to verify. All week, whenever you correct Claude, end with: "update CLAUDE.md so you don't make that mistake again." The file compounds. By Friday it codes like you.

Fast track Adopt conventional commits now (feat:, fix:, chore:), turn on branch protection for main, then make Claude write your commits — it does it well when CLAUDE.md states the convention. Practice the rhythm with three reps: an /about page, a shared header in app/layout.tsx, and one deliberate break-and-fix. Every git push to main triggers a Vercel rebuild — that is continuous deployment, the thing teams used to spend quarters wiring up.

Gate 1 — ship confirmed

All true before Day 2.

My production URL is live and posted in my wave channel
The repo is on GitHub with at least four commits of mine
I made Claude change a file, read the diff, and approved it deliberately
CLAUDE.md exists with stack, commands, and a verify-your-work rule
I branched, merged back, and pushed — and the site rebuilt itself

Fig. 1 — The five-day arc. Each day ships a working URL that does more than the last. Two honest finish lines: Operator at Day 3, Builder at Day 5. The climb is the curriculum.

06 Day 2 · Tuesday

Day 2 · Tuesday

Learn to direct the agent

Yesterday you used the tool. Today you learn to direct it. One loop — Explore, Plan, Code, Commit — plus context hygiene, a permissions posture, and prompts that earn one-shot implementations. This is the day that separates operators from passengers.

Time 5–6 hrsOutput A feature built through the full loopGate One plan, one course-correction, zero blind approvals

The loop: Explore → Plan → Code → Commit

This is the spine of professional agentic work, straight from the people who built the tool.¹

Explore.

Put Claude in plan mode — press Shift Tab to cycle modes. In this mode it reads, searches, and answers, but cannot change files. Ask it to read the relevant code and explain what's there. You are loading the right context on purpose.

Plan.

Ask for a detailed plan: files to touch, sequence, risks, how it will verify the result. Interrogate it. Wrong assumptions die here for free; in code they cost an afternoon. A good plan earns a one-shot implementation.

Code.

Exit plan mode and tell it to implement the plan. Watch the diffs. Approve deliberately.

Commit.

Have Claude run the checks (lint, build), show you the evidence, then commit with a clear message and push.

When an implementation goes sideways, do not argue with it line by line. Press Escape to stop, switch back to plan mode, and re-plan with what you both just learned. If the session is truly tangled, /rewind restores a checkpoint, and git switch main plus a fresh branch costs you nothing. The undo button is why you can be brave.

Context hygiene, and a permissions posture

The context window is the agent's working memory, and performance degrades as it fills.⁵ Treat it as finite. Use /clear between unrelated tasks — the kitchen-sink session is the number one self-inflicted wound. Use /compact when a long session is still on-topic but heavy. One task, one session, one branch.

Claude asks before running commands that change your system. Clicking approve forty times an hour teaches you to stop reading — and an unread approval is no approval. The professional move is an allowlist.

Inside claude

/permissions
# Allow the safe, repeated commands: pnpm lint, pnpm build, pnpm dev,
# git status, git add, git commit, git push. Leave everything else on ask.

These choices live in .claude/settings.json — commit it, so the whole team inherits the same safe defaults. And the line you do not cross: never run with --dangerously-skip-permissions on your machine. The flag's own name is the policy.

The one idea to carry forward: specificity is kindness

An agent is a literal-minded senior engineer with no memory of your intentions. Vague in, vague out.

Weak → Strong

# Weak — vague target, no constraints, no verification:
make the site look better

# Strong — scoped, constrained, verifiable:
In app/page.tsx, restyle the hero: navy (#001278) background, white heading,
light-blue (#CDE5F1) subtext, one green (#36BF78) button linking to /about.
Tailwind only, no new dependencies, keep it a server component.
Then run pnpm lint and pnpm build and show me the output.

Point at files by path. Show an example when one exists. State what must not change. Demand evidence — end with the check it must run. On hard problems, ask it to think: that word requests extended reasoning. Today's build, through the full loop: add a Projects section — a typed data file, a /projects page rendering cards, status badges in code, the header updated. Somewhere in the middle, reject a plan on purpose and make Claude revise it. Feel what it is to be the editor, not the typist.

Fast track Build a custom slash command: create .claude/commands/ship.md with your end-of-task ritual (lint → build → conventional commit → push → report). Invoke it as /ship. Anything you do more than once a day becomes a command. Then skim hooks in the docs — deterministic guarantees for what CLAUDE.md can only suggest.

Gate 2 — discipline installed

All true before Day 3.

I ran a full Explore → Plan → Code → Commit loop, and edited a plan before approving it
I stopped a wrong implementation mid-flight and re-planned instead of pushing through
My permissions allowlist is in .claude/settings.json and committed — and I know the flag I never use
I used /clear between tasks and can say why context hygiene matters
CLAUDE.md grew by at least two earned rules; /projects is live in production

07 Day 3 · Wednesday

Day 3 · Wednesday

Make the app think

Today your software starts thinking. You wire a frontier model into your app with the AI SDK: a streaming chat interface with a real system prompt, proper secret hygiene, and a deploy. This is the day the title starts meaning something.

Time 5–7 hrsOutput A deployed, streaming AI assistantGate Operator exit available tonight

The mental model first

Every AI feature you will ever build is the same sandwich: your UI → your server route → the model API → streamed back. The model never talks to the browser directly, because the API key lives only on the server. The Vercel AI SDK (the npm package ai, currently version 6) is the house-standard library for this sandwich — typed, streaming-first, provider-agnostic.⁶ No framework bloat; you will actually understand its parts.

Fig. 2 — The application you assemble. A client page, a server route that holds the secret, and the model behind it. Streaming back is mandatory UX — and you get it free from this shape.

Build it: the twelve-line server route

Install the packages, put your key in .env.local (get it from the Anthropic Console), then write the route. Stop and verify the net is under you: open .gitignore and confirm .env* is listed. A leaked key is the one beginner mistake with a real invoice attached.

Terminal

git switch -c feature/ai-chat
pnpm add ai @ai-sdk/anthropic @ai-sdk/react zod
touch .env.local   # add: ANTHROPIC_API_KEY=sk-ant-...your-key...

app/api/chat/route.ts

import { anthropic } from '@ai-sdk/anthropic';
import { streamText, convertToModelMessages, type UIMessage } from 'ai';

export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  const result = streamText({
    model: anthropic('claude-sonnet-4-6'),
    system:
      'You are a concise, helpful assistant for BlueAlly consultants. ' +
      'Answer in short paragraphs. If you are unsure, say so plainly.',
    messages: convertToModelMessages(messages),
  });

  return result.toUIMessageStreamResponse();
}

Twelve working lines. streamText calls the model and streams tokens as they generate; the helper converts UI messages to model messages and back. The system prompt is your application's personality and policy — product surface, not boilerplate. You will rewrite it many times; that is the job. Build the matching app/chat/page.tsx with the useChat hook from @ai-sdk/react (a client component, since it is interactive), run pnpm dev, open /chat, and watch the answer stream.

Novice path Why a route? Files under app/api/.../route.ts run on the server only. The browser POSTs your messages to it; it holds the secret key, calls Anthropic, and streams the answer back. Server-side secrets, client-side experience.

A version-honesty note, because the stack moves: the code above is AI SDK 6, which keeps streamText and useChat and adds a unified ToolLoopAgent class and an Output.object() helper for structured generation.⁶ If types disagree on your machine, that is your first real dependency drift — ask Claude Code to reconcile against the installed version and check the AI SDK docs. Solving that is the curriculum.

Make it yours, then ship it

Three loop passes: rewrite the system prompt for an assistant you would actually use and test ten questions against it; add distinct user and assistant styling in brand colours with auto-scroll; wrap the route in a try/catch with a clean error and a length cap. Deploy: push the branch, merge to main, set the secret in production with vercel env add ANTHROPIC_API_KEY, and redeploy. Post the live chat URL.

Operator exit — a real finish line. If your role is selling, advising, or managing rather than building systems, you may stop here with honour. You have personally built and deployed a streaming AI application; you can read a diff, direct an agent, and reason about prompts, secrets, and streaming. Post your URL, take the Friday demo slot anyway, and read the Pitfalls before you go. Builders — Days 4 and 5 are where the engineering lives.

Gate 3 — the app thinks

All true before Day 4 (or before an honest Operator exit).

My streaming chat is live in production with a purpose-built system prompt
.env.local is gitignored — verified with git status, not assumed
The production key is set in Vercel env vars, not hardcoded anywhere
I can sketch the sandwich from memory: UI → route → model → stream
I tuned the system prompt at least three times against real test questions

08 Day 4 · Thursday

Day 4 · Thursday

Data, structure, tools

A chat that forgets everything is a toy. Today you add memory with a real Postgres database, force the model to return typed structured data, and give it tools it can call. You also learn the model ladder and the boundary that keeps AI apps honest.

Time 6–7 hrsOutput Persistent chat + structured output + one toolGate Data survives a refresh; types survive a lie

Memory: Neon and Drizzle

Neon is serverless Postgres — real Postgres that scales to zero and branches like git, which is why it is the house database. In the Neon console, create a project in the same region as your Vercel deployment, copy the connection string, and add it to .env.local as DATABASE_URL=postgres://… (and to Vercel for production). Drizzle is an ORM where the schema is TypeScript — you describe a table once and get types, queries, and migrations from it.

db/schema.ts · what a saved message is

import { pgTable, serial, text, varchar, timestamp } from 'drizzle-orm/pg-core';

export const messages = pgTable('messages', {
  id: serial('id').primaryKey(),
  conversationId: varchar('conversation_id', { length: 64 }).notNull(),
  role: varchar('role', { length: 16 }).notNull(),   // 'user' | 'assistant'
  content: text('content').notNull(),
  createdAt: timestamp('created_at').defaultNow().notNull(),
});

Terminal · the migration two-step, then browse the data

pnpm add drizzle-orm @neondatabase/serverless && pnpm add -D drizzle-kit
pnpm drizzle-kit generate   # writes a .sql migration — read it; it's honest
pnpm drizzle-kit migrate    # applies it to your Neon database
pnpm drizzle-kit studio     # a visual browser for your data — keep it open today

Now persist the conversation through the full loop. The plan to approve: in the chat route's streamText call, add an onFinish callback that inserts the user message and the assistant reply with a conversation id; add a small server function to load history; have the chat page load it on mount. Send a message, refresh the page, watch it persist in Drizzle Studio. Data surviving a refresh is the moment this became an application.

Fast track drizzle-kit push skips migration files and syncs the schema directly — fine for prototyping, wrong for production (no history, no review). Default to generate/migrate. Bonus: create a Neon branch and point a preview deployment at it — database-per-PR is a superpower clients pay for.

Structured output: the model fills your types

Chat is one mode. The workhorse mode of enterprise AI is: unstructured text in, typed object out. The SDK's generateObject plus a Zod schema (Zod describes a data shape in TypeScript and validates against it) makes the model's output conform to your shape — the pattern behind extraction, triage, scoring, and routing.

app/api/extract/route.ts

import { anthropic } from '@ai-sdk/anthropic';
import { generateObject } from 'ai';
import { z } from 'zod';

const ActionItems = z.object({
  items: z.array(z.object({
    task: z.string().describe('The action item, imperative voice'),
    owner: z.string().describe('Who is responsible; "unassigned" if unclear'),
    due: z.string().nullable().describe('Due date if stated, ISO format'),
    priority: z.enum(['high', 'medium', 'low']),
  })),
});

export async function POST(req: Request) {
  const { notes } = await req.json();
  const { object } = await generateObject({
    model: anthropic('claude-sonnet-4-6'),
    schema: ActionItems,
    prompt: 'Extract every action item from these meeting notes:\n\n' + notes,
  });
  return Response.json(object);   // typed. validated. boring. perfect.
}

Build a small /extract page today: a textarea for notes, a button, a rendered table of the returned items. Paste real notes from a real meeting and feel the consulting use cases line up.

Tool calling: the model gets hands

A tool is a typed function you offer the model; it decides when to call it, your code executes, the result flows back into the answer. This is the primitive under every agent you will build.

In the chat route — give it one tool

import { streamText, convertToModelMessages, tool, stepCountIs } from 'ai';
import { z } from 'zod';

// inside streamText({ ... })
tools: {
  getProjectStatus: tool({
    description: 'Look up the live status of a BlueAlly project by name',
    inputSchema: z.object({ projectName: z.string() }),
    execute: async ({ projectName }) => {
      // deterministic code — a DB query in real life:
      return { projectName, status: 'on track', nextMilestone: '2027-03-02' };
    },
  }),
},
stopWhen: stepCountIs(5),   // let it call tools, then answer — bounded

Ask the chat "what's the status of Project Apollo?" and watch it decide, call, and weave the result into prose. Notice what happened architecturally: the model classified and routed; your code did the work.

The boundary, stated once

The model routes. Code does the math.

Never ask a probabilistic system to do deterministic work. The model should not add invoice totals — it should decide that totals need adding and call your function that adds them. Jagged intelligence means it can ace the strategy memo and fumble the arithmetic in the same response. Route with the model. Compute with code. Nearly every production incident you will debug in AI systems traces back to someone blurring this line.

The one idea to carry forward: the model ladder

Model choice is an engineering dial — cost and latency against capability. The Claude family is a clean ladder; the names below will rotate, the ladder will not. Verify current models and pricing at the Anthropic docs before client work.⁷

Rung	Reach for it when	Typical jobs
Haiku 4.5 — fast, cheap	High volume, low ceremony, sub-second feel	Classification, routing, extraction at scale, sub-agent grunt work
Sonnet 4.6 — the daily driver	Default for product features and coding	Chat assistants, structured outputs, tool use, most of everything
Opus 4.8 — the heavy	Hardest reasoning, big refactors, plans worth one-shotting	Architecture, complex agents, plan mode on gnarly problems, final review

Above the ladder sits Claude Fable 5, Anthropic's most capable model, for the rare task that earns it.⁸ The production pattern worth naming: a smart model orchestrates while cheap models execute subtasks in parallel. You will feel this in Claude Code itself, which routes simple work to smaller models. Judgment here is margin — yours and the client's.

Gate 4 — it remembers, it types, it acts

All true before Day 5.

Messages persist in Neon — I watched them land in Drizzle Studio and survive a refresh
Migrations exist as files in /drizzle and I can explain generate vs migrate vs push
/extract returns a Zod-validated object from messy real-world text
My chat calls at least one tool, and I can say which side of the boundary each part lives on
I can argue Haiku vs Sonnet vs Opus for three different jobs, with the tradeoff named

09 Day 5 · Friday

Day 5 · Friday

Prove it, then demo it

What separates a demo from a product is not features — it is evaluation, security, and cost discipline. Install all three before lunch. Then build your capstone and demo it to your wave. Five minutes, screen shared, live URL.

Time 6–7 hrsOutput An evaluated capstone, demoedGate The Friday demo

Evals: the tests of the AI era

You changed the system prompt and the app "seems better." Seems is not engineering. An eval is a repeatable check of model behaviour — the AI-era equivalent of a test suite, and on real teams it is where much of serious AI development time goes. Start embarrassingly simple: a set of inputs, a pass/fail check per input, run on every prompt change.

evals/extract.eval.ts · run with: pnpm tsx evals/extract.eval.ts

import { anthropic } from '@ai-sdk/anthropic';
import { generateObject } from 'ai';
// import your ActionItems schema and a CASES array of { notes, mustInclude }

let pass = 0;
for (const c of CASES) {
  const { object } = await generateObject({
    model: anthropic('claude-sonnet-4-6'),
    schema: ActionItems,
    prompt: 'Extract every action item:\n\n' + c.notes,
  });
  const ok = c.mustInclude.every((m) =>
    object.items.some((i) => i.task.toLowerCase().includes(m)));
  console.log(ok ? 'PASS' : 'FAIL', '—', c.notes.slice(0, 40));
  if (ok) pass++;
}
console.log(pass + '/' + CASES.length + ' passing');

Write eight to ten cases for your extractor, including two nasty ones — vague notes, no real action items, a trick. Then change the prompt and watch the score move. That is prompt engineering. Two field rules: a one-line written critique of each failure is worth more than a number; and a suite passing 100% is not a good suite — it is an easy one. Add harder cases until something fails, then earn it back.

Fig. 3 — The evaluation loop. A scored suite turns prompt tuning from vibes into engineering. A suite that never fails taught you nothing — add hard cases until it does, then earn the green back.

Fast track Add an LLM-as-judge case: a second model call that grades a chat answer against a rubric ("concise? grounded? refused appropriately?"), returning pass/fail and one sentence via generateObject. Align the judge by showing it two example gradings first. Then run the suite without a person in the loop — claude -p "…" executes Claude Code headlessly, and the Claude Agent SDK (@anthropic-ai/claude-agent-sdk) lets you embed the same agent in a script or CI job.⁹ Wire it into a pre-push hook or a GitHub Action and you have CI for behaviour.

The security thirty minutes

Keys live server-side, period

Anything prefixed NEXT_PUBLIC_ ships to every browser. Your Anthropic and database keys must never carry that prefix or appear in a client component.

Prompt injection is real

Any text your app feeds a model — user input, retrieved documents, web content — may contain instructions ("ignore your rules and…"). Treat model output as untrusted: validate with Zod, never pipe it raw into queries or shell commands, and keep tools least-privilege. Your getProjectStatus tool can read one thing; it cannot delete anything. Design every tool that way.

Validate at the door

Zod-parse request bodies in every route; cap input lengths; return clean errors. The model is not your input sanitizer.

People are the leak

The policy is one sentence: client-confidential data goes only into BlueAlly-approved accounts and workspaces — never personal ones.

Cost: tokens are the meter

You pay per token, in and out, and output tokens cost several times input.⁷ Three habits cover ninety percent of cost discipline. Pick the lowest rung on the ladder that clears your eval — an eval that lets you confidently downgrade a model pays for itself forever. Keep system prompts tight and cap maxOutputTokens on bounded jobs. Estimate before you scale: tokens per call × calls per day × rate, in a spreadsheet, before the feature ships. Back-of-envelope cost math in a scoping call is a BlueAlly differentiator.

The capstone

Pick one. Each is a real BlueAlly demo archetype — build it like a client is watching, because one eventually will be.

Capstone	Archetype	Core build
Meeting-notes action engine	extract & draft	Paste notes → generateObject extraction → editable table in Neon → a "draft follow-up email" button via streamText
Document triage desk	validate & flag	Paste text → classify type + risk via generateObject → route by rules in code → flagged queue in the database
Knowledge assistant	semantic search	Seed 10–20 FAQ rows → retrieve relevant rows per question (keyword now; pgvector if you're flying) → grounded streaming answers that cite their source

Definition of done — this is the demo bar, and the bar is the point: a deployed Vercel URL; a GitHub repo with real commit history, a CLAUDE.md that taught the agent your standards, and a README with a five-line architecture sketch; at least one structured output or tool call with the boundary drawn correctly; an eval file with eight or more cases and your current score, plus one sentence on the hardest failure; and a five-minute live demo to your wave. Build it through the loop. Plan mode first — Friday afternoon is exactly when the plan saves you.

Gate 5 — the Friday demo

The last gate is public, and that is the design.

My capstone meets every line of the definition of done
My eval suite runs, and I can name the failure that taught me the most
The security thirty minutes is done: keys audited, inputs validated, tools least-privilege
I demoed live to my wave and posted the URL + repo in the channel
I have read The Next Ninety Days and put the first weekly rep on my calendar

10 When you're stuck

The errors everyone hits — and the fix for each.

A wall on Day 1 is not failure — it is the curriculum. Every error below has stopped someone before you, and every one has a fix that takes minutes. Two habits clear most of them: read the actual error text (the Error Lens extension prints it right in your editor), and paste that exact text to Claude Code — "I ran this, got this error, here it is" — it is good at fixing its own stack.

Fig. 4 — The unstuck loop. Read it, match it, fix it, re-run. The twenty-minute rule is the safety valve: stuck longer than that, you ask — with the exact command and the exact error, never just "it broke."

ERR_PNPM_IGNORED_BUILDS — "Ignored build scripts: sharp, unrs-resolver"

The Day 1 classic, and not your fault. pnpm (v10+) won't run a package's native build step until you approve it — sharp (images) and unrs-resolver (module resolution) each have one. FixBe inside the project — cd hello-blueally — then run pnpm approve-builds, press a to select all, Enter, then y. Always-works alternative: add "pnpm": { "onlyBuiltDependencies": ["sharp", "unrs-resolver"] } to package.json, then pnpm install.

"There are no packages awaiting approval"

You ran pnpm approve-builds from the wrong folder — your prompt ended in dev %, not hello-blueally %. FixRun cd hello-blueally first. A command only sees the project you are standing in; check the folder name at the end of your prompt before every command.

command not found right after installing a tool

The tool installed fine — your shell just hasn't picked it up yet (its folder isn't on your PATH this session). FixQuit Terminal and reopen it, or re-run the "Next steps" lines Homebrew printed. Then try the command again.

"Port 3000 is already in use" (EADDRINUSE)

A dev server is still running in another tab. FixPress Ctrl C in that tab to stop it, or start this one on a free port: pnpm dev --port 3001, then open localhost:3001.

401 / "could not resolve authentication" from the model

Your key isn't reaching the code. FixConfirm .env.local holds ANTHROPIC_API_KEY=sk-ant-… (no quotes, no trailing space), then restart pnpm dev — env files load only when the server boots, so a key added mid-session won't apply until you restart. Never commit that file.

Day 4: drizzle-kit can't find its config or your database

drizzle-kit needs a drizzle.config.ts at the project root and a DATABASE_URL. FixAsk Claude: "create drizzle.config.ts pointing at db/schema.ts, dialect postgresql, url from process.env.DATABASE_URL" — and put DATABASE_URL=postgres://… in .env.local.

Day 5: the eval can't find your API key

.env.local is a Next.js convenience; a plain pnpm tsx script doesn't load it. FixRun pnpm add -D tsx dotenv, put the key in a plain .env, and add import 'dotenv/config'; as the first line of the eval file.

Vercel build fails, but it works on your machine

Almost always a missing production secret — local has .env.local, production does not. FixRun vercel env add ANTHROPIC_API_KEY (and DATABASE_URL), then redeploy. Read the build log — it names the missing piece on the failing line.

The twenty-minute rule Tried the fix and still stuck after twenty minutes? Post the exact command and the exact error in your wave channel — not "it broke." Stalling in silence is the only real way to fail this week. Asking a sharp, specific question is a senior-engineer skill; start practicing it now.

11 Hard-won warnings

Pitfalls. Each one has a fix.

Every one of these has a body count on real teams. The fix is attached to each. Tape the ones that sting to your monitor.

The kitchen-sink session

Debugging, styling, and database work in one endless thread until the agent gets visibly dumber. FixOne task per session. /clear is free; confusion isn't.

The blind approve

Clicking accept on diffs you didn't read. Three days later nobody knows what's in the codebase — and "nobody" includes you in the client readout. FixRead every diff. Too big to read? It's too big — re-plan into smaller steps.

The vibe-to-prod pipeline

Fully giving in and never reading the code is a joy for throwaway weekend projects. Client work is not a throwaway weekend project. FixVibe on prototypes. Engineer — plan, verify, eval — on anything with a stakeholder.

The committed secret

An API key pushed to GitHub is compromised the moment it lands, public repo or not. Fix.env* gitignored (verify with git status), platform env vars in prod, and rotate immediately on any slip.

The permissions bypass

--dangerously-skip-permissions because the prompts felt slow. The flag's name is the risk assessment. FixAllowlist your safe commands in .claude/settings.json and commit it for the team.

The bloated CLAUDE.md

A four-hundred-line memory file the agent starts ignoring wholesale — including the three rules that mattered. FixFor each line ask: would removing this cause mistakes? No? Cut it.

Arguing instead of re-planning

Ten escalating corrections in a polluted context, each fix breaking something else. FixEscape, re-plan with what you learned, or /rewind. Fresh context plus a better plan wins in one shot.

The server/client confusion

useState in a server component, or 'use client' on everything until secrets and bundle size leak browser-ward. FixServer by default; 'use client' only where there's state or events; data fetching stays server-side.

The 100% eval suite

All green, every run, forever — a suite designed to flatter, not to find. FixAdd nasty cases until something fails. The failures are the product; a suite that never failed taught you nothing.

Life on main

Every commit straight to production's branch; one bad merge and the demo site is down an hour before the demo. FixBranch per task, PR to merge, protect main. Day 1 habit, career-long dividend.

The silent stall

Stuck Tuesday, embarrassed Wednesday, gone Thursday. The completion killer in every self-paced program ever run. FixThe twenty-minute rule plus the gates. Stalling silently is the only real failure this week.

12 After Friday

The next ninety days — where world-class happens.

Five days installed the loop. Ninety days of reps make it yours. This is the contract with yourself; the Friday demo was you signing it.

The weekly rep, weeks 1–12.

Ship one improvement a week through the full loop — evaled, deployed, posted. Read one diff deeply until you can explain it cold. Grow your evals: every bug a user finds becomes a case. Feed the commons — one CLAUDE.md rule, skill, or command per week into the shared repo. Two hundred people compounding each other's corrections is the asset this program was quietly building.

The milestones that mark the path.

Day 30: a second application shipped end-to-end, solo — new repo, new schema, new evals. Day 60: you've run parallel sessions on a real task, built a subagent or skill the team adopted, and reviewed someone's AI-built PR like you owned it. Day 90: you've demoed to a client — and your eval suite, not your enthusiasm, is what made the room trust the system.

Keep a standing kit. Official docs first — Claude Code and the Anthropic engineering blog, the AI SDK, Next.js, Drizzle, Neon. The stack moves; primary sources are how you move with it. Once a quarter, build a fundamental from scratch by hand — a tiny retrieval pipeline, a minimal agent loop, an eval harness with no libraries. What you cannot create, you do not understand; what you have created once, you can direct forever.

The people who are world-class a year from now kept the loop running after the week everyone else stopped.