Prompting as Specification: The Architect's Method for Reliable AI Cod

Quick Answer

Prompting in 2026 is specification writing. A working prompt names the goal in one sentence, states the hard constraints, defines the input and output interfaces, declares what done looks like, attaches a few examples, lists what you have not yet decided, and tells the model how to return its work. Anthropic, OpenAI, and Google all converged on this framing. The paste-ready eight-block template in Section 06 is the canonical starting point.

Section 01

Prompts Are Not Wishes. They Are Specifications.

If Class 01 was the paradigm and Class 02 was the environment, this class is the skill that decides whether the rest of your career as an architect is leverage or frustration. The skill is not "writing better prompts" in the 2023 sense of clever phrasing and magic words. It is specifying what software must do in language a capable but literal builder can act on. Done correctly, the same model produces production-grade work. Done poorly, the same model produces confident garbage at the same speed.

The shift is not subtle. A wish says "build me a login system." A specification says "build a single-page login form with email and password fields, validate against POST /api/auth/login, on success store the returned JWT in a httpOnly cookie and redirect to /dashboard, on failure show the API's error message inline below the form, lock the form for 30 seconds after five consecutive failures, and ensure the password field has autocomplete='current-password' and type='password'." Both prompts are about a login system. Only one is a brief the model can build correctly without guessing.

The architect's reframe

Every decision you do not write down is a decision the AI will make. Some of those guesses will be right, by accident. The wrong ones will look right, because the AI's output is fluent. The job of an architect is to convert ambiguity into specification before the AI is asked to act, not after.

This is the framing the industry settled on. AWS, GitHub, and Anthropic now publish similar guidance. The phrase that keeps appearing is "the spec is the prompt." Tools like AWS Kiro, GitHub Spec Kit, and BMAD-METHOD are the productized version of this shift. We compare them in Section 10. The deeper point is that all of them treat the prompt the way a senior engineer treats a feature brief: a real artifact, written deliberately, reviewed before implementation.

Section 01 of 13

Section 02

Why Plain Prompts Break Past 500 Lines

A well-known pattern haunts every new vibe coder. The first project comes together miraculously fast. The second feels harder. The third gets stuck in a loop where every fix breaks something else, the AI starts contradicting itself, and the architect rewrites the same component four times. The pattern has a name now. It is the 500-line wall.

The mechanism is straightforward. A short, casual prompt loads only a small amount of structure into the model's reasoning. When the project is small, that structure is enough; the model can hold the entire system in its context and fill in the gaps consistently. As the project grows past roughly 500 lines of meaningful code, the gaps the original prompt left start to compound. Decisions made in iteration three contradict assumptions from iteration one. The AI cannot fix what it does not know is broken, and you cannot tell it what is broken because the broken thing was never specified.

The same mechanism explains why the same architect, given a thorough specification, ships ten-thousand-line projects without the wall. The specification is the thing that scales. The AI just executes against it. Spec-driven development as a methodology exists because someone at AWS, at GitHub, and at the BMAD project independently noticed this pattern, named it, and built tooling around the fix.

The wall is not a model failure

It is tempting to blame the AI when the third iteration breaks. The model did not change between iteration one and iteration three. You did. You expanded the system without expanding the specification. The AI kept filling gaps consistently with what was in front of it, which by iteration three meant filling them inconsistently with what was in front of it three iterations ago. The fix is to write the spec the project deserves, not to find a smarter model.

Section 02 of 13

Section 03

The Anthropic Doctrine (Seven Principles, Applied)

Anthropic publishes the most consistent prompt engineering documentation in the field, and the seven principles below are what their guidance distills to. None are novel; together they describe the discipline of prompting a capable model in 2026. Read them in order. The first three are about clarity. The next two are about steering. The last two are about feedback.

1. Be explicit, not implicit.

State the constraint, do not allude to it. "Use TypeScript strict mode with noImplicitAny enabled" beats "make sure types are good." The model does not infer your standards from your enthusiasm; it acts on what is written.

2. Add context and motivation.

"This component renders inside a strict Content Security Policy that forbids inline scripts" tells the model why a constraint exists. Models with context choose better implementations than models that have to guess why the constraint matters. This is also the principle that prevents the agent from helpfully removing a constraint it does not understand.

3. Use examples.

Two or three carefully chosen input-output examples outperform a paragraph of prose description of the same pattern. Examples anchor the model on a concrete reference; description leaves room for interpretation. Section 07 covers few-shot done right.

4. Control format.

Tell the model exactly how to return work. "Return as a single complete file, with no surrounding prose." "Return only valid JSON matching this schema." "Return a unified diff against the existing file." Format instructions belong near the end of the prompt where the model gives them more weight.

5. Give a role only when it changes behavior.

"You are a senior backend engineer" sometimes shifts vocabulary and care; "You are a helpful assistant" does nothing. Use role assignment when you can predict the specific behavioral shift it should cause. Avoid it as a generic warm-up; the model has already inferred what kind of work this is.

6. Let it think, without leaking the thinking.

For complex reasoning, give the model space to work through the problem in tags it will use internally, while specifying that only the final answer should appear in the visible output. Section 08 covers when this helps and when it just adds latency.

7. Prefer positive steering over negative.

"Use vanilla JavaScript with ES modules" outperforms "do not use jQuery." Negative instructions can paradoxically increase the salience of the forbidden pattern. When you must use a prohibition, pair it with the positive alternative.

Cross-vendor consistency

OpenAI and Google publish broadly similar guidance with different vocabulary. OpenAI emphasizes Markdown structure plus XML where useful; Google emphasizes consistent formatting and positive examples over anti-patterns; Anthropic emphasizes XML tags as first-class structure. The differences are stylistic. The seven principles above are the intersection, which means they hold no matter which model you are prompting.

Section 03 of 13

Section 04

XML Tags as First-Class Prompt Structure

Claude was specifically trained to recognize XML tag structure in prompts, and Anthropic's own documentation treats it as a first-class best practice. Tags create unambiguous boundaries that the model uses to weight attention. Inside a tag, content is content. Across tags, the model knows it is moving from one logical section to another. The practical effect: fewer ambiguities for the model to guess about, which means more reliable output.

There are no required tag names. What matters is consistent, descriptive tags that clearly separate the parts of your prompt. Use tags that name the role of the content. The six tags below are the working set that covers the vast majority of real specifications. Add custom tags when your project has recurring sections (such as <style_guide> or <invariants>) that earn their own boundary.

Tag	What goes inside	Position in prompt
<context>	Background, motivation, the surrounding system this work fits into.	Near the top.
<data>	The actual content the model should operate on. Files, snippets, payloads.	Middle, before instructions.
<instructions>	The operative task. What you want the model to do.	After data, near the end.
<examples>	Input-output pairs that anchor the model's interpretation.	Before instructions, after context.
<output_format>	Exact shape of the response: file, diff, JSON schema, markdown.	End of prompt.
<constraints>	Hard rules that must hold regardless of implementation.	With or before instructions.

The smallest useful prompt uses three of these: <context>, <instructions>, and <output_format>. The largest useful prompt uses all six, plus a project-specific tag or two. Above six or seven tags, the structure starts to obscure rather than clarify. Resist the urge to invent a tag for every nuance; structure exists to serve clarity, not to look thorough.

Section 04 of 13

Section 05

The Eight-Block Specification Template (Paste-Ready)

This is the canonical DDS specification template. Eight blocks, in this order, every one earning its place. Use it as the starting structure for any non-trivial prompt to a coding agent. Sections you do not need, delete; never invent extra ones to look thorough. The template uses Anthropic-style XML tags throughout because they are the most portable structure across the 2026 model field.

dds-spec-template.xml Prompt · 8-Block Spec

<context>
This work happens inside [system name]. The system [one sentence on what it
does]. The relevant slice for this task is [what the model needs to know about
the surrounding code]. Existing tech stack: [language, framework, key libraries
and versions]. The user-visible context for this change is [why we are doing
this work].
</context>

<goal>
[One sentence stating the user-visible outcome. Not the implementation, the
result. If you cannot state the goal in one sentence, your specification is
not yet ready.]
</goal>

<constraints>
- [Hard constraint 1: must be true regardless of implementation choice.]
- [Hard constraint 2: performance, security, accessibility, legal.]
- [Hard constraint 3: existing patterns the new code must conform to.]
- [Hard constraint 4: what NOT to do, only when there is a specific anti-pattern.]
</constraints>

<interfaces>
Inputs:
- [Input 1: type, shape, source, valid range.]
- [Input 2: type, shape, source, valid range.]

Outputs:
- [Output 1: type, shape, destination, error shape if applicable.]
- [Output 2: type, shape, destination, error shape if applicable.]

Side effects:
- [Database writes, file operations, network calls, etc.]
</interfaces>

<done>
The work is done when:
1. [Concrete check 1 that a human or test can verify.]
2. [Concrete check 2.]
3. [Concrete check 3.]
4. [Concrete check 4: the negative cases pass too.]
</done>

<examples>
Example 1:
  Input: [concrete value]
  Expected output: [concrete value]

Example 2 (edge case):
  Input: [concrete edge case]
  Expected output: [concrete handling]

Example 3 (failure):
  Input: [concrete invalid input]
  Expected behavior: [specific error response]
</examples>

<unknowns>
Decisions I have not made, where I want the model to ask before guessing:
- [Decision 1: what I do not know yet.]
- [Decision 2: what depends on a choice I have not committed to.]
</unknowns>

<output_format>
Return the work as [exact format: a single complete file at path X, a unified
diff against file Y, a JSON object matching this schema, etc.]. Do not include
surrounding prose, do not explain unless I asked, do not write comments that
restate what the code does.
</output_format>

<iteration_rules>
If you encounter something that contradicts these instructions, stop and ask
rather than proceeding. If a constraint cannot be satisfied, surface that as a
question, not a quiet compromise.
</iteration_rules>

Read what is in the template and read what is not. Every block names a specific class of decision the AI would otherwise have to guess. None of the blocks say "please be helpful" or "use best practices" or "make it good." Those phrases add tokens without removing ambiguity, which is the opposite of what a specification is for.

The <unknowns> block deserves special attention. New architects feel guilty leaving decisions unmade and try to commit to every detail. That instinct produces worse specs, because some decisions genuinely depend on tradeoffs the architect has not yet seen. The <unknowns> block converts what would have been a silent guess into a surfaced question. The model will ask before deciding, which is exactly what an architect-to-engineer handoff should look like.

Template length is not the goal

A real spec written from this template runs 60 to 300 lines for a feature, and 20 to 50 lines for a bug fix. Anything longer is usually padding. If your spec exceeds 300 lines, decompose the work into smaller specs rather than writing one huge one. Multiple short specs outperform one long one, because the model holds shorter context with sharper attention.

Section 05 of 13

Section 06

Few-Shot Examples Done Right

Examples are the single highest-leverage block in a specification. Two or three carefully chosen input-output pairs typically outperform a paragraph of prose description of the same pattern. The model anchors on the concrete reference and interpolates between examples in ways prose cannot achieve. The trick is that the wrong examples actively mislead, so the choice of examples is as important as the choice of constraints.

Four rules for choosing examples that earn their tokens:

Rule 1: Cover the variation that matters.

If your task has three distinct cases (happy path, edge case, failure), include one of each. If it has only one mode but ranges across input sizes, include a small and a large input. The model interpolates between what you show it. Examples that all look the same teach the model to handle only one shape of the problem.

Rule 2: Show, do not tell, the format.

If you want output in a specific format, the example output should be in that exact format. Telling the model "use markdown headers" and then showing examples with plain text headers will produce plain text headers, every time. The example is the format specification; the prose is at best a tiebreaker.

Rule 3: Two examples beat one, three beat two, four rarely beats three.

The marginal value of additional examples drops sharply after the third one for most tasks. Past four or five, examples can start adding noise and lengthening context without adding signal. If you find yourself wanting ten examples to cover all the cases, your specification probably needs to be decomposed into multiple smaller specifications instead.

Rule 4: Mediocre examples mislead.

If an example contains a subtle bug, the model treats the bug as part of the intended pattern and reproduces it. Read every example you include as if it will become the model's anchor for the entire task, because it will. An example that works correctly is worth more than three that almost work.

When you have no examples to give

Novel patterns where you genuinely cannot produce an example are the case where prose description has to do the work. In that situation, describe the pattern explicitly and ask the model to propose two example outputs before committing to the full implementation. You get to confirm or correct the interpretation cheaply, then run the real generation against the confirmed pattern. This is the architect's "show me your understanding before you build" move, and it is one of the most effective ways to use a long conversation with an agent.

Section 06 of 13

Section 07

Chain-of-Thought in 2026: When It Helps, When It Hurts

Chain-of-thought prompting (asking the model to reason step by step before answering) was the canonical 2023 trick for improving accuracy on complex tasks. The advice has not aged uniformly. In 2026, chain-of-thought is more nuanced than the original guidance suggested, and applying it everywhere can hurt as often as it helps.

Two facts settled the question. First, reasoning-mode models like Claude with extended thinking enabled, OpenAI's o-series, and Gemini's thinking modes already reason internally before producing output. Wrapping their requests in "think step by step" produces redundant reasoning, which costs latency and tokens without improving accuracy. The Wharton June 2025 study quantified this: the improvement effect on reasoning models was limited and response time increased measurably.

Second, standard non-reasoning models still benefit from explicit chain-of-thought on genuinely complex tasks where the answer depends on multi-step inference. The benefit is real but not universal. On retrieval-style tasks, the model knows or it does not; chain-of-thought just lengthens the response.

Task type	Standard model	Reasoning-mode model
Multi-step reasoning	Use chain-of-thought.	Already happening internally. Skip.
Code generation	Sometimes helps for complex logic.	Rarely adds value over the internal thinking.
Format conversion	Usually unnecessary.	Unnecessary.
Factual lookup	Unnecessary.	Unnecessary.
Planning & decomposition	Strongly helps.	Strongly helps even on reasoning models.
Ambiguous spec interpretation	Useful to verify understanding.	Useful to verify understanding.

The architect's rule: ask the model to think out loud when you actually want to see its reasoning (so you can correct it) or when the task is complex enough that the model needs the scratchpad. Otherwise, ask for the answer. Default to brevity; add chain-of-thought as a deliberate choice, not as superstition.

Section 07 of 13

Section 08

Long-Context Placement: The Instructions-After-Data Rule

When your prompt contains a large context block such as a long document, a codebase excerpt, or many examples, the placement of your operative instructions matters. Multiple Anthropic studies have shown that instructions placed after the data block consistently outperform the same instructions placed before. The mechanism is straightforward: the model attends most strongly to content near the boundaries of its context window, with the very end given the most weight.

For shorter prompts (a few hundred tokens), placement matters less because the entire prompt is in the high-attention zone. The rule kicks in around the point where your context block exceeds roughly 2,000 tokens. The longer the context, the more important it becomes to push the instructions to the end.

long-context-template.xml Pattern · Instructions Last

<context>
[Short context paragraph: why this work matters, where it fits.]
</context>

<data>
[The long content. A document, a codebase excerpt, a transcript, a
specification dump. Can be thousands or tens of thousands of tokens.]
</data>

<examples>
[Two or three short examples of the kind of work expected.]
</examples>

<instructions>
[The operative task. What the model should do with the data above.
Keep this section near the end because the model gives it more weight.]
</instructions>

<output_format>
[Exact return shape. Always the last block in the prompt.]
</output_format>

For tasks that involve modifying or extracting from a long document, this ordering can be the difference between a model that follows instructions and a model that produces surface-level paraphrase. It is one of the cheapest, most reliable wins available to a prompt engineer.

Section 08 of 13

Section 09

The Three Maturity Levels of Spec-Driven Development

Spec-driven development is not one practice; it is three, arranged on a maturity curve. Knowing which level your team operates at clarifies which tools and disciplines are appropriate, and prevents the common mistake of adopting a Level 3 tool when your team is not ready to leave Level 1.

Level 1: Static spec, manual reconciliation

The spec is written once at the start of the work, in a document, an issue, or a chat thread. The model implements against it. As the code evolves through iteration, the spec falls out of date. Reconciliation between spec and code is manual and usually skipped. This is where most teams operate today, and there is nothing wrong with it for short-lived features or prototypes.

The cost shows up over time: documentation rot, surprise behavior changes, new team members who cannot understand "why was this built this way." Level 1 is a starting point, not a destination.

Level 2: Living spec, evolves with code

The spec and the code evolve together. Changes to the code are reflected back into the spec; changes to the spec drive code changes. The spec lives in the repository alongside the code, often in a structured form (such as Kiro's three documents or Spec Kit's slash-command workflow). Pull requests update both. The spec is the artifact of record.

Level 2 is where the productivity wins of spec-driven development materialize at scale. Tools like AWS Kiro and GitHub Spec Kit explicitly target this level. The cost is discipline: you must commit to keeping the spec current, and the team must enforce that in review. Most professional teams who adopt spec-driven development sit here.

Level 3: Spec as source

The spec is the canonical artifact. The code is generated from it on every change and is marked DO NOT EDIT. Bugs in the output are fixed by editing the spec, never the code. This level requires a tool that can deterministically translate spec to code (Tessl is the named example as of mid-2026, still experimental) and a team willing to give up direct code editing.

Level 3 is the future for certain domains (API contracts, schema-driven systems, declarative infrastructure). It is not the right target for most product engineering in 2026. Aim for Level 2; treat Level 3 as a direction, not a destination.

The level you actually need

Solo architect on a small project: Level 1, with a written spec saved alongside the code. Small team shipping product features: aim for Level 2 with Kiro or Spec Kit. Larger team with strict regulatory or contract requirements: Level 2 with discipline, considering Level 3 for the specific artifacts where determinism is worth the lock-in. Most teams overestimate the level they need; the discipline of Level 1 done well beats Level 2 done poorly.

Section 09 of 13

Section 10

Kiro, Spec Kit, and BMAD-METHOD Compared

Three tools have emerged as the productized version of spec-driven development. They occupy different points on the integrated-to-portable spectrum and are best understood through their actual workflows, not their marketing.

AWS Kiro

Kiro is an agentic IDE, built on VS Code, that bakes a structured three-document workflow into the editor. The three documents are requirements.md (user stories plus acceptance criteria in EARS notation), design.md (architecture and sequence diagrams), and tasks.md (discrete tracked steps the agent works through). Kiro auto-routes across Claude Sonnet, Qwen, DeepSeek, GLM, and MiniMax, selecting per-task. AWS has documented internal cases of 40-hour features shipped in under 8 hours of human time using the Kiro workflow.

The strength is integration: the spec, the design, and the tasks are all visible and editable inside the IDE where the work happens. The weakness is lock-in to the Kiro environment. If your team uses other editors, the workflow does not travel.

GitHub Spec Kit

Spec Kit is a model-agnostic Python CLI invoked with the specify command. Version 0.8.7 shipped May 2026 and the project has over 93,000 stars on GitHub. It is supported by more than 30 AI coding agents including Claude Code, GitHub Copilot, Cursor, Codex CLI, Gemini CLI, opencode, Windsurf, and Qwen Code. The workflow is a sequence of slash commands: /constitution (project rules), /specify (the feature brief), /clarify (resolve unknowns), /plan (technical plan), /tasks (decomposition), /analyze (review), /implement (build), /checklist (verify).

The strength is portability and tool-agnosticism. Whatever agent you use, the workflow holds. The weakness is that you are stitching the experience together yourself; there is no single integrated environment.

BMAD-METHOD

BMAD-METHOD takes a multi-agent approach. Roles are explicit (analyst, architect, developer, QA) and run as separate agents that hand off structured artifacts. The framework is open-source and model-agnostic. It is heavier than Spec Kit and lighter than Kiro, and works well for complex projects where you want the planning stages to be deliberate and traceable.

The strength is that BMAD makes the multi-role handoff explicit; you can see exactly which agent produced which artifact and why. The weakness is operational overhead: running four roles for a small change is theater.

Tool	Shape	Best when	Avoid when
Kiro	Integrated IDE	You want everything in one window and will commit to VS Code-family editors.	Your team uses heterogeneous editors.
Spec Kit	Portable CLI	You want the same workflow across multiple AI tools, or you mix agents.	You want a single integrated UX.
BMAD-METHOD	Multi-agent framework	The project is complex enough that explicit role separation reduces ambiguity.	The change is small; the overhead is not earned.

Beyond these three, there are smaller open-source spec frameworks (Intent, OpenSpec, spec-coding-mcp) and the experimental spec-as-source tool Tessl. The DeepLearning.AI short course Spec-Driven Development with Coding Agents (Sandeep Dinesh, late 2025) is the recommended structured introduction if you want to go deeper than this class can take you.

Section 10 of 13

Section 11

The Five Prompt Failure Modes

Bad prompts fail in predictable ways. Five patterns cover most of what goes wrong. Recognize them in your own writing and you will catch them before the AI does the wrong thing.

Failure 1: Vague goal dressed as detail

"Build a complete login system with all the standard features and good UX" looks specific because it is several sentences long. It defines no constraint the AI can verify. The fix is to replace abstract qualifiers (complete, standard, good) with concrete checks: which fields, which API, which states, which behaviors on error. If you cannot say what "complete" means, you do not have a goal yet.

Failure 2: Hidden constraints

You know the project uses PostgreSQL, not MongoDB. You know the styling system is Tailwind, not styled-components. You know the API uses snake_case, not camelCase. The model does not know any of this unless you wrote it down. Every constraint you keep in your head becomes a coin flip in the output. The fix is to drop every relevant constraint into the <constraints> block, even the ones that feel too obvious to mention.

Failure 3: Untestable acceptance criteria

"The page should feel fast and look good" is not done criteria; it is a vibe. The fix is to convert each vibe into a concrete check. "Page loads under 200ms on a cold render at the 75th percentile" is testable. "Visual design matches the existing /pages/about-us hero treatment" is testable. The work is done when the checks pass, not when it feels done.

Failure 4: Missing examples for novel patterns

If the work involves a pattern the model has not seen in its training, prose alone will not align it. Either find an example to attach, or ask the model to propose an example first and confirm before generating the real work. This is also the failure mode where a single subtly-wrong example mis-anchors the entire generation; example quality matters more than example quantity.

Failure 5: Fighting the model

Long prompts full of "do not do X" instructions try to suppress every wrong pattern the model might produce. This rarely works and often backfires (negative instructions can increase the salience of the named pattern). The fix is to specify what to do, not what to avoid. "Use ES modules" beats "do not use CommonJS." Negative instructions belong only where you have observed a specific anti-pattern the model produces, and they should be paired with the positive alternative.

Section 11 of 13

Section 12

The DDS Prompt-as-Spec Playbook

The full eight-block template from Section 05 is the canonical form. This section is the practical sequence: how to actually arrive at a working spec from a starting point of "I want to build something." Six moves, in order. Each takes minutes, not hours.

Move 1: Write the goal sentence first, before anything else.

One sentence. User-visible outcome. Not the implementation. If you cannot reduce the work to one sentence, you do not yet know what you are building. Spend the time here before going further. The goal sentence is also the answer to "what does done look like in one breath." Most weak specs trace back to a weak goal sentence.

Move 2: List the hard constraints by walking through your stack.

Frontend stack, backend stack, data layer, deployment, auth model, performance budget. For each, ask: is there a constraint this work must respect? Write each constraint in one bullet. Do not philosophize; declare. This pass usually surfaces three to seven real constraints.

Move 3: Define the interfaces by drawing the data flow.

What goes in (from the user, from another system, from a database). What comes out (rendered to the user, to another system, persisted). What side effects happen. The interfaces section is the contract; if you cannot complete it, the design is still ambiguous and writing more prompt will not fix the design problem.

Move 4: Write three done criteria, including one negative case.

Two positive criteria (it does X correctly, it does Y correctly) and at least one negative (it correctly refuses Z, or it handles the empty input gracefully). The negative case is the architect's discipline: it forces you to think through the unhappy paths Class 01 named, before the AI silently skips them.

Move 5: Add two or three examples covering the variation.

Per Section 06: one happy path, one edge case, optionally one failure. Read each example as if it is the only thing the model will see, because the example often outweighs the description in practice.

Move 6: Declare your unknowns, then send.

List the decisions you have not made. Do not feel obligated to decide everything; surface what is genuinely undecided. The model will ask before guessing, which is the architect-to-engineer handoff working correctly.

When the output misses, fix the spec, not the prompt

The first response from a coding agent is rarely the final code. When it misses, the temptation is to send conversational follow-ups: "actually make it do Y instead." Resist. Each follow-up dilutes the original specification and produces an artifact that no one (you, the model, a teammate) can re-read to understand the final state. The professional move is to edit the spec to reflect what you learned, then regenerate from the corrected spec. The spec is the artifact of record; the conversation is not.

Section 12 of 13

Section 13

Your First Real Spec (Live Exercise)

Reading about specifications does not produce the skill of writing them. This section is the exercise: take the worked example below, run it through your agent of choice, then run the original vague prompt against the same agent and read the two outputs side by side. The difference is what this entire class is about, and seeing it in your own work installs the lesson permanently.

The vague prompt (do not start here)

prompt-vague.txt Prompt · Wish

Write me a function that validates an email address. Make it good.

The same task as a specification

prompt-spec.xml Prompt · Architect Spec

<context>
This validator runs in a signup form on a Next.js app. We have measured that
overly strict validation hurts conversion more than catching every invalid
email helps. Server-side verification will catch the rest. The goal here is
to catch the obvious mistakes inline before the user submits.
</context>

<goal>
A validateEmail function that returns whether a string looks like a valid
email address for the purpose of a client-side signup form check.
</goal>

<constraints>
- TypeScript, no dependencies, no regex more than 80 characters.
- Must accept emails with plus addressing (alice+filter@example.com).
- Must accept subdomains (alice@mail.example.com).
- Must accept new TLDs (alice@example.dev, alice@example.museum).
- Must reject obvious mistakes: missing @, missing TLD, whitespace, control characters.
- Must NOT attempt to validate that the domain actually exists or that the
  mailbox is real. That is the server's job.
</constraints>

<interfaces>
Input: a single string, may be untrimmed user input.
Output: { valid: boolean; reason?: string }.
- reason is omitted when valid is true.
- reason is a short, user-facing string when valid is false.
</interfaces>

<done>
1. All ten examples below produce the expected output.
2. The function is under 30 lines including the type signature.
3. No regex line exceeds 80 characters.
4. The reason string for invalid inputs is suitable for display in a form field.
</done>

<examples>
Valid:
  "alice@example.com"        -> { valid: true }
  "alice+filter@example.com" -> { valid: true }
  "alice@mail.example.dev"   -> { valid: true }
  "  alice@example.com  "    -> { valid: true }   (whitespace is trimmed)

Invalid:
  ""                         -> { valid: false, reason: "Email is required" }
  "alice"                    -> { valid: false, reason: "Missing @ sign" }
  "alice@"                   -> { valid: false, reason: "Missing domain" }
  "alice@example"            -> { valid: false, reason: "Missing top-level domain" }
  "alice @example.com"       -> { valid: false, reason: "Email cannot contain spaces" }
  "alice@@example.com"       -> { valid: false, reason: "Too many @ signs" }
</examples>

<unknowns>
- Whether to treat IDN (internationalized domain names) as valid. If unsure,
  ask before deciding.
</unknowns>

<output_format>
A single TypeScript code block. The function as a default export. No
surrounding prose, no explanation, no example usage.
</output_format>

Run both. Read the two outputs. The vague version produces a regex of variable quality, no edge-case handling, and no error messages. The spec version produces a complete function that handles every example correctly, surfaces a question about IDN, and matches the output format you specified. The model is the same. The architect changed.

You now have the skill that decides whether the rest of your work as a vibe coder produces software or noise. The next class, Foundation 04, teaches the second half of the architect's discipline: reading what the AI produces, finding what is wrong, and directing the correction. Specifying and reading. Two sides of the same skill.

Section 13 of 13 · Foundation Class 03 Complete

Look What You Can Make

Specs that produced these

Every example below started as an eight-block specification. The agent did the typing. The architect did the deciding.

Live Storefront

A real e-commerce brand

Design Delight Studio — product pages, collections, policies, analytics — built from specs that named the goal, constraints, and done criteria for every page.

Visit the store → Free Academy

This entire Academy

Every page coded from a specification, then rendered with the V2 template. The specs are the source of truth; the Liquid is the build artifact.

Explore the Academy → Advanced Masterclass

Enterprise-grade training

A free 8-module AI cost-engineering masterclass with 50 paste-ready prompts, each one written in the eight-block specification format.

See an advanced class →

Robert McCullock

Architect-CEO · Design Delight Studio

Boston-based. Built a sustainable-streetwear brand and a portfolio of AI systems using the intent-based engineering method taught in this Academy. The eight-block specification template in Section 05 is the actual format used across every project in the DDS portfolio.

FAQ

Frequently Asked Questions

The questions newcomers ask most about prompting as specification. Each answer matches this page's structured data exactly, so a person reading the page and an AI engine extracting the schema receive the same canonical response.

What is prompt engineering in 2026?

Prompt engineering in 2026 is the discipline of writing the instructions an AI model needs to produce reliable output. The 2023-era version (clever phrasing, role-play tricks, magic words) is largely obsolete. The 2026 version is closer to specification writing: state the goal, the constraints, the interfaces, what done looks like, and provide examples. Anthropic, OpenAI, and Google all converged on this framing.

Why use XML tags in prompts to Claude?

Claude was specifically trained to recognize XML tag structure, and Anthropic's documentation treats it as a first-class best practice. Tags like context, instructions, examples, and output_format create unambiguous boundaries that the model uses to weight attention. There are no required tag names; what matters is consistent, descriptive tags that clearly separate sections. The practical effect is fewer ambiguities for the model to guess about, which means more reliable output.

What is spec-driven development?

Spec-driven development, or SDD, is the methodology of writing a precise specification first, then using AI to implement against that spec. The phrase "the spec is the prompt" captures it. The benefit is reproducibility: the same spec produces equivalent implementations, the spec can be reviewed by humans, and the spec becomes the contract that says whether the output is correct. Industry tools that implement it include AWS Kiro, GitHub Spec Kit, and BMAD-METHOD.

Should I use chain-of-thought prompting in 2026?

Sometimes. For genuinely complex reasoning tasks with a standard non-reasoning model, asking the model to think step by step still helps. For reasoning-mode models like Claude with extended thinking enabled or OpenAI o-series, explicit chain-of-thought prompting often adds latency without improving accuracy because the model is already reasoning internally. The Wharton June 2025 study found the improvement effect on reasoning models was limited and response time increased. The architect's rule is to use chain-of-thought when you can verify it helps on your specific task, not as a default.

What is the difference between Kiro and GitHub Spec Kit?

Both are tools for spec-driven development. AWS Kiro is an agentic IDE built on VS Code with a structured three-document workflow: requirements.md with EARS-notation acceptance criteria, design.md with architecture and sequence diagrams, and tasks.md with discrete tracked steps. It uses an auto-router across Claude Sonnet, Qwen, DeepSeek, GLM, and MiniMax. GitHub Spec Kit is a model-agnostic Python CLI that ships slash commands like /constitution, /specify, /clarify, /plan, /tasks, /analyze, /implement, /checklist, and works with over 30 AI coding agents including Claude Code, Copilot, Cursor, and Gemini CLI. Kiro is the integrated environment; Spec Kit is the portable toolkit.

What is BMAD-METHOD?

BMAD-METHOD is an open-source spec-driven framework focused on multi-agent collaboration. It breaks a build into roles (analyst, architect, developer, QA) and runs them as separate agents that hand off structured artifacts. It is heavier than Spec Kit and lighter than Kiro, and works well for complex projects where you want the planning stages to be deliberate and traceable. Like Spec Kit, it is model-agnostic.

What are the three maturity levels of spec-driven development?

Level 1 is a static spec written once and reconciled manually with the code as it diverges, which is where most teams sit. Level 2 is a living spec that evolves with the code in a continuous feedback loop, which is the target of tools like Kiro and Spec Kit. Level 3 is spec-as-source, where the spec is the canonical artifact and the code is generated from it on every change, marked DO NOT EDIT. Level 3 tools like Tessl are still experimental. Most teams should aim for Level 2.

How long should a prompt actually be?

As long as it needs to be, and no longer. The eight-block specification template in this class typically runs 60 to 300 lines for a real feature. That is far longer than a 2023-era prompt and far shorter than the project's full documentation. The shape of a good prompt is dense: every line is a decision the AI no longer has to guess. Lines that say "please be helpful" or "use best practices" add nothing and should be cut.

What is the most common prompt failure mode?

Vague goals dressed as detailed prompts. A prompt that says "build a complete login system with all the standard features and good UX" looks specific but defines no constraint the AI can verify. The other four common failures are hidden constraints (rules in your head that the prompt never states), untestable acceptance criteria (vibes instead of checks), missing examples for novel patterns, and fighting the model (telling it not to do things instead of telling it what to do).

Should I tell the AI what NOT to do?

Sparingly. Anthropic, OpenAI, and Google all observe that positive instructions (do X) outperform negative instructions (do not do Y) because models default toward the patterns they were trained on, and naming the wrong pattern can paradoxically increase its salience. Use negative instructions only when there is a specific, named anti-pattern you have seen the model produce, and pair them with the positive alternative. "Do not use jQuery" is weaker than "Use vanilla JavaScript with ES modules."

Do few-shot examples still help in 2026?

Yes, and the result is consistent across vendors. Examples anchor the model on a concrete pattern in a way that prose description cannot. Two or three carefully chosen examples typically outperform a longer prose description of the same pattern. The trick is to choose examples that span the variation you care about, so the model interpolates correctly between them. Mediocre examples can actively mislead.

Where in the prompt should the instructions go?

When you are pasting in a large context block such as a long document or codebase, put the instructions after the context, not before. Multiple studies have shown improved instruction-following when the model sees the data first, then the task. For shorter prompts under a few thousand tokens, ordering matters less. The general rule is: put the long thing in the middle, and the operative instructions near the end where the model gives them more weight.

What does the DDS spec template look like?

Eight blocks, in this order: Goal (one sentence), Constraints (hard rules), Interfaces (inputs and outputs), Done (acceptance criteria), Examples (few-shot pairs), Unknowns (what you have not decided), Format (how to return work), and Iteration Rules (how to refine). The full paste-ready template is in Section 12 of this class. It is intentionally short. Every block earns its place by removing a decision the AI would otherwise have to guess.

What comes after this class?

Foundation 04: Reading and Directing AI Code. You will have a working environment from Class 02 and the ability to write a real specification from this class. Class 04 teaches the review skill that closes the loop: how to read what the AI produces, what to look for, and the discipline that separates code you ship from code that ships you.