What is the best agentic IDE in 2026?

Google Antigravity is the most capable agent-first IDE available in April 2026, surpassing Cursor and Cline by deploying multiple autonomous agents in parallel that plan, code, test, and verify across the editor, terminal, and browser simultaneously. As of April 2026 it runs on AgentKit 2.0 with 16 specialized agents and 40+ built-in skills, free tier with rate-limited Gemini 3.1 Pro. Cursor at $20/month is more polished but single-agent. Antigravity is the right choice for enterprise multi-agent workflows; Cursor for solo editor power.

How much does Google Antigravity cost in April 2026?

Three tiers: Free with rate-limited Gemini 3.1 Pro and a daily credit allocation, Pro at $20/month bundled with Google AI Pro for higher agent-request limits, and Ultra at $249.99/month bundled with Google AI Ultra for the highest limits. The credit system introduced in March 2026 has been controversial — Ultra users have reported quota restrictions even at the top tier. For most paid users, Pro delivers the best value if you also use Gemini for non-coding tasks.

What is AGENTS.md and how is it different from GEMINI.md?

AGENTS.md is the open agentic-IDE standard for project rules — the same file works in Antigravity, Cursor, Codex, and Claude Code. Antigravity added AGENTS.md support in v1.20.5 (March 9, 2026) and now reads both AGENTS.md and the legacy GEMINI.md. Use AGENTS.md for cross-tool portability; use GEMINI.md only when you have Antigravity-specific instructions you do not want other agents to see.

What is AgentKit 2.0?

AgentKit 2.0 launched in March 2026 and significantly expanded Antigravity's autonomous capabilities. It ships 16 specialized agents covering frontend, backend, testing, debugging, SEO, database, security, and DevOps tasks, plus 40+ domain-specific skills usable through the Skills system. AgentKit 2.0 also unlocks deeper Manager-View parallelism and tighter MCP server integration.

What is Strict Mode in Antigravity and should I use it?

Strict Mode is the master security override. When enabled it forces Request Review on every terminal, artifact, and browser action, ignores your Allow List, isolates the agent to the current workspace, and denies network access to terminal commands. Use Strict Mode on any project containing secrets, production credentials, or client code. The performance cost is minor; the protection is substantial.

How do I build an AGI agent swarm in Antigravity?

Define each agent as a specialized class implementing a common Agent interface (process, capabilities, healthCheck). Compose them under a meta-Orchestrator that maintains a typed registry, classifies incoming requests against capabilities, builds a dependency graph, and dispatches in parallel where independent. Add a MetaCognitor agent that monitors per-agent quality scores, latency, and error rates — when metrics cross thresholds it triggers self-repair routines (temperature adjustment, system prompt mutation, model swap, or restart). This is the architecture used by AGI-CORE-Pro V.1.0 The Synthetic Director.

What is ocean logic in multi-agent systems?

Ocean logic is a multi-channel asynchronous event-stream architecture where agents publish to and subscribe from typed event streams (channels) instead of calling each other directly. Each channel acts like an ocean current — events flow continuously, multiple agents can consume the same stream, and new agents can be added without changing existing producers. The pattern uses Redis Streams, NATS JetStream, or Kafka under the hood. It is what allows the DDS Sovereign Orchestrator Pro V4.0 to coordinate 15 synthetic employees without a brittle web of point-to-point calls.

What storage layers do enterprise AGI systems need?

Five layers: a vector database for semantic memory and RAG (Chroma, Qdrant, or pgvector), a key-value store for hot state and rate limits (Redis), a relational database for canonical entities and audit logs (Postgres), a blob store for artifacts and embeddings cache (S3-compatible like MinIO or R2), and a time-series database for agent metrics and observability (InfluxDB or TimescaleDB). The DDS portfolio runs all five locally on sovereign hardware at $0/month hosting cost.

How do I make an Antigravity agent self-healing?

Implement four mechanisms: a circuit breaker that disables a failing agent for a cooldown period after consecutive failures, a retry policy with exponential backoff and jitter, a forensic audit log capturing every input/output/duration/error, and a MetaCognitor that periodically replays recent tasks to detect quality drift. When drift is detected, run repair routines: adjust temperature, swap to a stronger model, mutate the system prompt with debugging context, or restart the agent process. Log every repair to repair_audit.json for post-mortem review.

Is Antigravity safe to use on production codebases?

Yes, with proper configuration. Default settings prioritize speed over safety. For production work, enable Strict Mode, set Terminal Execution to Off (Allow List only), disable Non-Workspace File Access, audit your Browser Domain Allowlist, and require manual approval for all MCP tool invocations. Initialize Git before every session so you can roll back any agent action. There are documented incidents of agents deleting drives under default Turbo settings — never use Turbo on production projects.

What models does Antigravity support in April 2026?

Five models: Gemini 3.1 Pro (1M token context, the primary engine), Gemini 3 Flash (fast and cheap completions), Claude Sonnet 4.6 (200K context, strong code generation), Claude Opus 4.6 (200K context, maximum reasoning), and GPT-OSS-120B (open-weight option). You can assign different models to different agents within the same Manager-View mission, which is unique to Antigravity.

What are the known limitations of Antigravity in April 2026?

Resource consumption can cause IDE slowdowns on complex multi-agent missions, the credit system has produced unexpected lockouts even for Ultra subscribers, time-to-first-token on Gemini 3.1 Pro under heavy load can exceed 30 seconds, and behavior on very large legacy codebases is inconsistent. The Windows auto-updater has had detection bugs across multiple releases. Linux sandboxing only landed in v1.21.6 on March 25, 2026.

Google Antigravity Masterclass 2026 — S-Tier Edition | Gemini 3.1 Pro, AGENTS.md, AgentKit 2.0, AGI Swarms

Module 01 · Recency Anchor

What’s New Through April 2026 — The Release-Note Briefing

Antigravity ships fast. If your knowledge of the platform is from launch (November 18, 2025), you are missing five months of consequential changes. This module is the recency anchor for everything that follows.

Major releases since launch

Version	Date	Headline Change
1.21.6	Mar 25, 2026	Linux sandboxing, condensed chat UI, AGENTS.md alongside GEMINI.md, one-click chat archival, deprecation of Manager Follow-along and Playground
1.20.6	Mar 17, 2026	Fix for customizations creation (rules and workflows could not be created)
1.20.5	Mar 9, 2026	AGENTS.md support introduced; Auto-continue setting deprecated and made default-on
AgentKit 2.0	March 2026	16 specialized agents, 40+ domain-specific skills, deeper Manager parallelism
1.18.x / 1.19.x	Feb 2026	Gemini 3.1 Pro general availability, dedicated Models settings screen with quota visibility, artifact download support, terminal integration toggle
Credit System	March 2026	Replaced quota model with explicit credits — sparked the “paperweight” backlash; Ultra users still report restrictions
Strict Mode	Q1 2026	Master security override that forces Request Review on every action and isolates workspace

ⓘ

The credit system reality check. Google replaced (vague) quota guarantees with a credit-based system in March 2026. Community response was sharp — posts about Antigravity being a “paperweight” spread across X. As of April 2026 the credit system remains in place with incremental adjustments. If you subscribe, go in with realistic expectations about quota limits even on Ultra at $249.99/month.

What still works the same

The two-view paradigm (Editor View + Manager View), Artifacts as the trust mechanism, Browser Sub-Agent for autonomous testing, multi-model selection within a single mission, VS Code-fork foundation, and the underlying Gemini 3 family — all unchanged in core behavior. Most of your existing knowledge transfers; the additions are what make April 2026 different.

Module 02 · Foundation

What Antigravity Actually Is — and What It Is Not

Antigravity is a standalone agentic IDE built on a heavily modified VS Code fork. Announced November 18, 2025 alongside Gemini 3. The single most important thing to internalize: this is not a coding assistant. Coding assistants help you write code faster. Antigravity replaces the act of writing with the act of directing.

The two interfaces

Editor View — Synchronous IDE

Familiar VS Code surface with three AI enhancements: tab completions (project-aware), inline commands (highlight then refactor/explain/debug), and a chat panel for architectural discussion. This is where you sit when you want to be hands-on.

Manager View — Mission Control

Asynchronous workspace where you spawn, monitor, and review multiple autonomous agents in parallel. Each agent operates in its own workspace. This is where Antigravity’s multi-agent advantage actually lives. Toggle with Cmd+E (Mac) or Ctrl+E (Windows/Linux).

The three surfaces every agent can touch

Editor

Full read/write access to your project files. Creates, modifies, deletes with full architectural awareness via the 1M token Gemini 3.1 Pro context.

Terminal

Installs dependencies, runs builds, executes tests, manages Git — gated by your Terminal Execution policy and Allow/Deny lists.

Browser Sub-Agent

Built-in Chromium browser the agent controls directly. Opens your app, clicks through journeys, screenshots every step, records sessions. Competitors have not matched this as of April 2026.

Artifacts — the trust layer

Scrolling through raw tool calls is tedious and unverifiable. Antigravity solves this by having agents generate Artifacts — task lists, implementation plans, screenshots, browser recordings, code diffs. You comment on Artifacts the way you comment on a Google Doc. The agent incorporates feedback without stopping execution.

✓

Mental model shift. You are not the writer anymore. You are the senior engineering manager. Define what and why. Review Artifacts. Approve, redirect, or reject. The shift is bigger than the tool.

Module 03 · Setup

Installation, First Launch, and the Critical First Five Minutes

Free public preview. Personal Gmail required for free Gemini 3.1 Pro quota. Under 10 minutes to a working agent on Windows, macOS, or Linux. Linux sandboxing landed in v1.21.6 — make sure you are on that version or newer if you are on Linux.

Download

Visit antigravity.google/download. ~235 MB installer: .exe (Windows), .dmg (macOS), .deb / .AppImage (Linux). The Windows auto-updater has had detection bugs across multiple releases — bookmark the download page and check manually if you suspect you are stale.

Setup flow

Choose Fresh Start to learn agent-native patterns. Importing VS Code or Cursor settings carries over keybindings and themes but also imports old reflexes. Fresh Start for the first month.

Sign in

Personal Gmail unlocks the free Gemini 3.1 Pro tier. Workspace/Enterprise Gmail requires the Enterprise tier.

Terminal Execution Policy — most consequential setup decision

Three options. Off (Allow List only) = maximum safety, agent cannot execute anything not on your allow list. Auto (Agent Decides) = balanced; agent auto-runs safe commands and asks for risky ones. Turbo (Deny List only) = maximum speed, executes everything except items on the deny list. Never use Turbo on production projects. There is a documented community incident (896 upvotes) of a Turbo-mode agent deleting an entire drive.

Open your first workspace

Create a focused project folder. The agent performs best with focused context. Resist pointing it at your entire ~/code directory — pick one project.

Initialize Git before anything else

Non-negotiable. git init && git add . && git commit -m "pre-agent baseline". Git is your undo button when an agent does something unexpected.

✓

Pro setup: Create ~/antigravity-projects/ as a parent directory. One project per subfolder. Strict Mode enabled by default for any folder containing client code or production credentials. AGENTS.md committed to every workspace before the first agent invocation.

Module 04 · Settings Deep Dive

Every Setting, Every Panel, Every Recommended Value

Open settings with Cmd+, (Mac) or Ctrl+, (Windows/Linux). Below is the complete map with my recommended values for three contexts: Solo (your own projects), Team (shared codebases), and Enterprise (client work or production credentials present).

Settings → Terminal

Setting	Solo	Team	Enterprise
Terminal Execution Policy	Auto	Auto + Allow List	Off (Allow List only)
Allow List	Common dev commands	Curated team list	Minimal explicit list
Deny List	rm -rf, sudo, curl\|sh	+ git push --force, DROP	+ all network commands
Terminal Integration	On	On	On

Settings → Models

Available since v1.18.x with the dedicated Models screen. Shows quota usage. As of April 2026 you can set per-agent default models in Manager View — assign Claude Opus 4.6 to your Reviewer agent and Gemini 3.1 Pro to your Builder for cost optimization.

Settings → Browser

Setting	Solo	Team	Enterprise
Domain Allowlist	Add localhost + your domains	Allowlist only	Allowlist + manual approval per nav
Default domains	Audit and trim	Remove webhook.site	Strip everything not explicit
Allowlist file location	`~/.gemini/antigravity/browserAllowlist.txt`

⚠

Default Browser Allowlist includes webhook.site — commonly used for data exfiltration in prompt-injection attacks. Remove it on any project handling credentials. Sources: ReadySetCompute and Antigravity.codes security audits, February 2026.

Settings → Customizations (Rules & Workflows)

Click the ... menu top-right and choose Customizations. Two tabs: Rules and Workflows. Both can be Global (every workspace) or per-Workspace.

Type	Global Path	Workspace Path
Rules (legacy)	`~/.gemini/GEMINI.md`	`<workspace>/GEMINI.md`
Rules (modern)	`~/.gemini/AGENTS.md`	`<workspace>/AGENTS.md`
Workflows	`~/.gemini/antigravity/workflows/`	`<workspace>/.agent/workflows/`
Skills	`~/.gemini/antigravity/skills/`	`<workspace>/.agent/skills/`

Settings → Advanced

Non-Workspace File Access: default enabled (agent can read/write outside your project). Disable on any sensitive project to prevent path-traversal exfiltration of ~/.ssh or ~/.aws. Auto-continue: deprecated as a toggle in v1.20.5 (now default-on). Follow-along mode (Manager): deprecated in v1.21.6. Playground (Manager): deprecated in v1.21.6.

Settings → Review Policy

Three options: Always Review (every artifact gets your approval), Agent Decides (agent flags ones it thinks need review), Skip Review (no manual gates). Solo: Agent Decides. Team or Enterprise: Always Review.

Module 05 · Security

Strict Mode, the Drive-Deletion Incident, and Hardening

Antigravity defaults are tuned for productivity, not security. The agent can execute commands, read your .env, browse the web, and modify global configs. One widely-reported community incident (896 upvotes) involved a Turbo-mode agent deleting an entire drive after interpreting a vague instruction destructively. Your security posture matters.

Strict Mode — the master override

Strict Mode is a hard override toggle introduced in Q1 2026. When enabled it:

Forces Request Review on every terminal, artifact, and browser action
Ignores your Allow List entirely — every command needs human approval
Isolates the agent to the current workspace (no global file access)
Denies network access for terminal commands
Cannot be temporarily disabled mid-session — exit and reconfigure

✓

When to enable Strict Mode: any client project, anything containing .env with real credentials, anything checked out from a private repo, anything with database connection strings or API keys. Rule of thumb: if you would not let a brand-new contractor run the script blind, enable Strict Mode.

The five settings that prevent 95% of agent damage

Setting	Default (Risky)	Hardened
Terminal Execution Policy	Auto	Off (Allow List only)
Non-Workspace File Access	Enabled	Disabled
Browser Domain Allowlist	Includes webhook.site	Stripped to required only
MCP Tool Approval	Auto-invoke	Manual approval per call
Read .gitignored Files	Enabled	Disabled

Recommended terminal Deny List

Settings → Terminal → Deny List

# Filesystem destruction
rm -rf /
rm -rf ~
rm -rf *
sudo rm

# System control
sudo shutdown
sudo reboot
sudo passwd
chmod -R 777
chown -R

# Database destruction
DROP DATABASE
DROP TABLE
TRUNCATE TABLE

# Network exfiltration risk
curl | sh
wget | sh
curl --upload-file
nc -e
ssh-keygen -f

# Git history loss
git push --force origin main
git push -f origin master
git reset --hard HEAD~

# Credential exposure
cat ~/.ssh/
cat ~/.aws/credentials
cat .env

Sandbox the agent itself

Linux sandboxing landed in v1.21.6. On macOS and Windows the agent runs with your user permissions. For maximum isolation, run Antigravity inside a Docker container, a VM, or on a dedicated user account that does not own production credentials. The 30-second cost is worth it.

⚠

The cardinal rule. Antigravity agents are powerful, fast, and not infallible. Your job as mission controller is to catch mistakes before they reach disk, before they reach Git, and before they reach production. Always review. Always test. Always commit before, never after.

Module 06 · Models & Pricing

Model Selection Matrix and the Credit System Reality

The five available models (April 2026)

Model	Provider	Context	Best for
Gemini 3.1 Pro	Google DeepMind	1M tokens	Default for Planning Mode, complex reasoning, long-context refactors
Gemini 3 Flash	Google DeepMind	1M tokens	Tab completions, fast iteration, cost-sensitive batch tasks
Claude Sonnet 4.6	Anthropic	200K	Strong code generation, second opinion on Gemini outputs
Claude Opus 4.6	Anthropic	200K	Maximum reasoning, architectural review, security audit prompts
GPT-OSS-120B	OpenAI	128K	Open-weight option for compliance-restricted projects

Pricing tiers (as of April 13, 2026)

Tier	Price	What you get
Free	$0	Rate-limited Gemini 3.1 Pro, daily credit allocation, full Antigravity feature set
Pro	$20/mo	Bundled with Google AI Pro. Higher agent request limits per Google’s official tier table. Best value for daily users who also use Gemini for non-coding tasks.
Ultra	$249.99/mo	Bundled with Google AI Ultra. Highest limits — but Ultra users have reported quota restrictions since the March 2026 credit-system change.

ⓘ

The credit system, honestly. Google moved from quota guarantees to credits in March 2026. Documentation does not clearly state whether unused credits expire. Even Ultra subscribers have reported lockouts. If predictable monthly cost matters more than multi-agent capability, Cursor at $20 with transparent usage limits is the safer bet. If multi-agent parallelism matters more, Antigravity wins.

Per-agent model strategy

In Manager View you can assign different models to different agents in the same mission. The high-leverage pattern:

Builder agents: Gemini 3.1 Pro (best long-context awareness)
Reviewer agents: Claude Opus 4.6 (sharpest critique on code quality)
Test-writer agents: Claude Sonnet 4.6 (consistent test patterns)
Doc-writer agents: Gemini 3 Flash (fast and cheap for prose)
Browser-tester agents: Gemini 3.1 Pro (multimodal vision matters)

Module 07 · Project Rules

AGENTS.md, GEMINI.md, and Rules Mastery

This is the single highest-leverage file in your project. AGENTS.md tells every agent in every session how to behave — code style, architecture rules, validation requirements, communication patterns. Antigravity loads it into every prompt automatically.

AGENTS.md vs GEMINI.md

Antigravity added AGENTS.md support in v1.20.5 (March 9, 2026). It now reads both AGENTS.md and the legacy GEMINI.md. AGENTS.md is the cross-tool standard — the same file works in Cursor, Codex CLI, Claude Code, and Antigravity. Use AGENTS.md as your default. Use GEMINI.md only when you have Antigravity-specific instructions you do not want other agents to see.

S-tier AGENTS.md template (the DDS pattern)

AGENTS.md

# Project: [Name]
# Stack: [tech stack one line]
# Last reviewed: 2026-04-13

## Identity
You are working in a production codebase. Output is shipped, not prototyped.
Every change must be reviewed before merge. Bias to safety over speed.

## Code Style — Non-Negotiable
- TypeScript strict mode. No `any` without a justifying comment.
- Functional components only (no React class components).
- Named exports only. No default exports.
- `const` over `let`. Never `var`.
- Error messages are user-facing: friendly, never expose stack traces.
- File length cap: 300 lines. Split larger files into focused modules.

## Architecture
- Feature-based folders: /features/[name]/{components,hooks,utils,types,__tests__}
- All API calls go through /lib/api client. Never raw fetch in components.
- Environment variables loaded through /lib/env.ts with Zod validation.
- No business logic in components. Logic lives in hooks or services.

## Testing — Mandatory Before Marking Complete
- Every function has a test file co-located with __tests__/.
- Minimum three cases per function: happy path, edge case, error case.
- Run `npm test` and report results before claiming task done.
- Never mark a task complete with failing tests.

## Validation Workflow
1. Read existing similar code first — match patterns.
2. Make change.
3. Run linter: npm run lint.
4. Run tests: npm test.
5. If frontend, use the browser sub-agent to verify rendered output.
6. Show me the diff before committing.

## Git
- Conventional commits: feat:, fix:, docs:, refactor:, test:, chore:
- Branch naming: feature/[ticket]-[slug] or fix/[ticket]-[slug]
- Never commit to main directly. Always branch.
- Never git push --force against shared branches.

## Forbidden
- Do not install packages without confirming with me first.
- Do not modify CI/CD configs.
- Do not touch /infra/ or /.github/workflows/.
- Do not read or echo .env contents.

## Communication
- If a requirement is ambiguous, ask one targeted question instead of guessing.
- If you find a bug unrelated to the task, note it in your final summary, do not fix it.
- End every task with: one-paragraph summary, test results, list of files changed.

✓

Why this works. AGENTS.md loads into every prompt, so it consumes context. Keep it under 200 lines. Rules that only apply sometimes belong in Skills (Module 08) which load on-demand. Rules that always apply belong here.

Module 08 · Skills

Skills Mastery — Progressive Disclosure and the 5 Patterns

Skills are the killer feature most users underutilize. A Skill is a directory-based package containing SKILL.md (with YAML frontmatter) and optional supporting assets. Antigravity uses Progressive Disclosure: it reads only the lightweight menu of Skill descriptions on every request, and loads the full Skill into context only when your intent matches the description.

Result: an agent that knows about hundreds of specialized capabilities but pays the context cost only for the ones it actively needs. This is how AgentKit 2.0 ships 40+ skills without bloating every prompt.

Skills directory structure

Skill anatomy

# Workspace scope (project-specific)
<workspace>/.agent/skills/my-skill/
├── SKILL.md          # Required: YAML frontmatter + instructions
├── scripts/           # Optional: Python/Bash/Node executables
│   └── run.py
├── references/        # Optional: API docs, cheatsheets, schemas
│   └── api-docs.md
└── assets/            # Optional: images, templates, fixtures

# Global scope (every project on this machine)
~/.gemini/antigravity/skills/my-skill/

SKILL.md frontmatter — what triggers Progressive Disclosure

The description field is the trigger. Antigravity matches user intent against this string. Vague descriptions get loaded too often (context bloat). Narrow descriptions get missed when relevant. Aim for one sentence that names the trigger conditions explicitly.

SKILL.md frontmatter pattern

---
name: shopify-section-author
description: Author Shopify Liquid sections following the DDS Atelier 3.4.0 standards.
  Triggers on requests to create, build, or scaffold a Shopify section (.liquid file in
  /sections/), or any mention of section schema, blocks, or theme settings. Does NOT trigger
  for product templates, snippets, or page templates — those are separate skills.
version: 2.1
scope: workspace
---

# Shopify Section Authoring Skill

When the user asks for a new Shopify section, follow this exact sequence...

The 5 Skill design patterns

Basic Router

Just SKILL.md with instructions. For style guides and constraint sets. Cheapest, most common.

Reference-Heavy

SKILL.md plus references/ with API docs or schemas. Agent loads the reference only when the skill is active. Best for library-specific knowledge.

Few-Shot Calibrator

SKILL.md plus references/examples/ with 3+ gold-standard outputs and 3+ anti-examples. Forces consistent format. The DDS investor pitch and portfolio pages use this pattern.

Tool Use (Executable)

SKILL.md plus scripts/ the agent can run to validate output. Skill describes when to invoke the script and how to interpret results. Powerful — use carefully with respect to your terminal policy.

All-in-One Domain

Everything combined. The DDS seo-magnet Skill uses this pattern: SKILL.md with the full system, references for schemas and meta tags, examples of compliant pages. Use sparingly — context cost is highest.

The Awesome Skills library

Community catalog at github.com/sickn33/antigravity-awesome-skills — installable library of 1,400+ SKILL.md playbooks for Antigravity, Cursor, Codex, Claude Code, and Gemini CLI. Install with npx antigravity-awesome-skills. Audit before installing globally; treat third-party Skills the same way you treat third-party MCP servers.

Module 09 · Workflows

Workflows — Saved Prompts as Slash Commands

Workflows are user-triggered prompt templates registered as /commands. Type / in chat and Antigravity shows your registered workflows. Where Rules are system instructions and Skills are on-demand expertise, Workflows are reusable orchestrations you fire intentionally.

Workflow file format

.agent/workflows/new-feature.md

---
description: Scaffold a complete feature with branch, types, components, hook, tests, and Storybook story
---

When the user types `/new-feature <name> <description>`:

1. Verify we are on `main` and pull latest.
2. Create branch: `feature/<ticket>-<slug>`.
3. Create directory: /features/<name>/{components,hooks,utils,types,__tests__}.
4. Generate TypeScript types from the description.
5. Create base component with typed props interface.
6. Create custom hook for business logic.
7. Generate three test cases per public function (happy/edge/error).
8. Create Storybook story with default + interactive variants.
9. Update barrel exports in /features/<name>/index.ts.
10. Run linter and tests. Report results.
11. Show diff. Wait for approval before commit.

The /startcycle pattern (Codelab-validated)

Google’s official codelab demonstrates an autonomous developer pipeline using /startcycle. The workflow chains personas defined in AGENTS.md through skills defined in .agent/skills/ — Product Manager writes spec, Engineer codes it after approval, QA tests, DevOps deploys. This is the foundation of multi-agent autonomous app generation.

.agent/workflows/startcycle.md

---
description: Start the Autonomous AI Developer Pipeline with a new idea
---

When the user types `/startcycle <idea>`, orchestrate strictly using
AGENTS.md personas and .agent/skills/ capabilities.

### Execution sequence

1. Act as Product Manager. Run `write_specs` skill with <idea>.
   Output: Technical_Specification.md. WAIT for user approval comments.

2. Once approved, act as Engineer. Read approved spec.
   Run `implement_backend` then `implement_frontend` skills.

3. Act as QA Engineer. Run `generate_tests` and `run_tests` skills.
   If failures, return to Engineer with failure context. Loop max 3x.

4. Act as DevOps Master. Install dependencies, serve the app,
   open browser sub-agent to verify the running application.

5. Compile final report: spec, files created, test results, deployment URL.

Module 10 · Manager View

Manager View Mastery — True Multi-Agent Parallelism

Manager View is the feature competitors have not matched. You spawn multiple autonomous agents that run independently in parallel workspaces. They generate Artifacts you review on your schedule. The killer use case: five independent bugs fixed in the time it takes to fix one.

Anatomy of the Manager interface

Workspaces sidebar: each agent in its own folder context
Conversations: separate threads per workspace, multiple per workspace
Artifacts pane: task checklists, implementation plans, screenshots, browser recordings
Changes sidebar: Git-style diff of every modification awaiting your review
Toggle to Editor: Cmd+E (Mac) or Ctrl+E (Windows/Linux) at any time

Planning Mode vs Fast Mode

Planning Mode (default and recommended): the agent produces a task checklist and implementation_plan.md before writing any code. You comment on the plan inline before execution begins. Fast Mode: skips planning, immediate code generation. Acceptable only for trivial single-file fixes.

⚠

Always Planning Mode for tasks touching 3+ files or any task involving an architectural decision. The plan is your contract with the agent. It prevents 90% of the “what did you do that for” rework.

The parallel-bug-fix pattern

This is the workflow that justifies Antigravity’s existence in one session. Before bed, spawn five agents on five independent issues. Wake up to five completed PRs awaiting review.

AdvancedManager View

Five-Agent Parallel Bug Sweep

In Manager View, spawn five parallel agents on these independent bugs. Each agent in its own workspace conversation. Use Planning Mode. Do not let agents touch files outside their assigned scope. Agent 1 — Auth race condition Scope: /server/auth/refresh.ts only Bug: refresh token validation has a race where two simultaneous refresh requests can both succeed, issuing two valid token pairs. Fix and write a test that fires 50 concurrent refresh requests and asserts only one pair succeeds. Agent 2 — WebSocket memory leak Scope: /server/ws/handler.ts and /server/ws/registry.ts only Bug: disconnected client objects retained in the registry. Memory grows over time. Fix and write a test that connects and disconnects 1000 clients, asserts registry size returns to 0. Agent 3 — CLS layout shift on /products Scope: /client/pages/products/ and related styles only Bug: Lighthouse CLS > 0.25 on /products. Image grid causes shift on load. Fix using width/height attributes or aspect-ratio CSS. Verify Lighthouse via browser sub-agent. Agent 4 — N+1 query in /api/orders Scope: /server/api/orders/ only Bug: each order triggers a separate query for line items. p95 response time > 800ms. Fix with an eager join or DataLoader. Add a test asserting query count < 5 for 100 orders. Agent 5 — Safari lazy-load failure Scope: /client/components/LazyImage.tsx only Bug: IntersectionObserver fallback path missing for older Safari. Images never load. Fix with feature detection. Test in browser sub-agent on Safari user agent string. After all five complete, compile BUGFIX_REPORT.md with: root cause, fix summary, test added, and files touched per agent.

Module 11 · Browser Sub-Agent

Browser Sub-Agent — Autonomous Visual and Functional Testing

The Browser Sub-Agent is Antigravity’s exclusive feature. The agent launches a built-in Chromium instance, navigates your app, clicks buttons, fills forms, captures screenshots, records sessions. Cursor and Cline cannot do this as of April 2026.

What it can verify

Visual regression: screenshot at multiple breakpoints, compare against design
User journey integrity: click-through entire flows, assert expected outcomes
Console errors: catch JS errors during interaction
Network calls: assert API requests fire correctly with expected payloads
Lighthouse scores: run audits and report Performance, Accessibility, SEO, Best Practices
Accessibility: tab-key navigation, focus trap, ARIA attribute presence

Visual regression prompt

IntermediateBrowser

Multi-Breakpoint Visual Audit

Open http://localhost:3000 in the browser sub-agent. Capture screenshots at these viewports: - 375px (iPhone SE) - 414px (iPhone Pro Max) - 768px (iPad portrait) - 1024px (iPad landscape) - 1440px (desktop) - 1920px (large desktop) For each viewport assert: 1. No horizontal scroll (document.body.scrollWidth === window.innerWidth) 2. Primary nav reachable (visible button OR hamburger menu) 3. Hero text not truncated or clipped 4. All images have width and height attributes (no layout shift) 5. Touch targets >= 44x44px on viewports under 768px 6. Text contrast meets WCAG AA (use Lighthouse accessibility audit) Output: - Screenshots saved to /qa/screenshots/<viewport>.png - Annotated report at /qa/visual-audit-report.md with pass/fail per viewport per check - Lighthouse score table for all four categories at 1440px

End-to-end journey prompt

AdvancedBrowser

Full Purchase Flow Verification

Test the complete purchase journey using the browser sub-agent: 1. Navigate to homepage. Assert hero loads, all featured products visible. 2. Click first product. Assert product page loads with image, title, price, add-to-cart button. 3. Select size variant (medium). Click Add to cart. Assert cart drawer opens with the item. 4. Click Checkout. Assert checkout page loads with shipping form. 5. Fill shipping form with: Test User, 38 Beacon St, Boston, MA 02208. 6. Click Continue to payment. Assert Stripe Elements iframe loads. 7. Use Stripe test card 4242 4242 4242 4242 with any future expiry and any CVC. 8. Click Pay. Wait up to 15 seconds for redirect. 9. Assert order confirmation page loads with order number visible. 10. Verify confirmation email request fires (check network tab). Capture screenshot at each step. Compile journey-test-report.md with timing per step, any console errors, and pass/fail. Do not commit any test data to the database — use the staging environment URL provided in .env.

Module 12 · MCP Servers

MCP Server Integration — Power Without Compromise

MCP (Model Context Protocol) servers extend Antigravity agents with external tool access — databases, APIs, file systems, third-party services. Each server adds capability and attack surface in equal measure. Audit before installing.

Where MCP config lives

Global: ~/.gemini/antigravity/mcp_config.json. Workspace: <workspace>/.agent/mcp_config.json. A compromised workspace can write to the global file — review the global config periodically.

Recommended safe-starter MCP set

Server	Capability	Risk Level	Approval
Filesystem MCP	Scoped file read/write	Low (if scoped)	Auto OK
Git MCP	Read repo state, branch info	Low	Auto OK
Postgres MCP (read-only)	Inspect schema, run SELECT queries	Medium	Manual
Playwright MCP	Browser automation beyond sub-agent	Medium	Manual
Shopify Storefront MCP	Product/collection lookups via Storefront API	Low (read-only token)	Auto OK
Stripe MCP	Payment/refund operations	High	Always Manual
Shell-execute MCP	Arbitrary terminal commands	Critical	Strict Mode required

The MCP audit checklist (run before every install)

Read every tool the server exposes — not just its description
Check the server’s source repository — recent commits, active maintainers, GitHub stars
Verify the server runs locally vs phones home to a remote endpoint
Set MCP Tool Approval to Manual for any server with write or network capabilities
Add the server’s required env vars to .env.example so collaborators know what’s needed
Remove unused servers — context window cost is real

ⓘ

MCP + AGI swarms. MCP servers are the bridge between your Antigravity agents and the outside world. When we get to Module 16 (AGI Swarms) and Module 18 (Storage), you will use MCP as the standard interface — Postgres MCP for your event log, Filesystem MCP for artifact storage, a custom Vector DB MCP for semantic memory. MCP is the plumbing.

Module 13 · The DDS S-Tier Methodology

How I Built the $5.85B Sovereign AGI Suite Solo

Vibe coding is not “letting AI write code.” It is a structured methodology where you architect intent, constraints, and quality while AI handles syntax. I have built 15 synthetic employees automating $11.1M+ in annual labor at $0/month hosting cost over 14 months, solo. Below is the operational playbook.

The Five Pillars

Intent Over Syntax

Describe what to build and why. Never dictate exact code. The model frequently finds better patterns than you would have specified. Your job is to verify the pattern, not to author it.

Context Is the Multiplier

Use AGENTS.md plus dedicated files: brand.md, architecture.md, security.md. The agent loads them automatically. You stop repeating yourself every session.

Plan Before Build

Always Planning Mode for non-trivial work. Comment on the plan before code is written. Catches 90% of misalignment before it becomes rework.

Iterate in Layers

First pass: structure. Second pass: logic. Third pass: UX polish. Fourth pass: tests and edge cases. Trying to perfect everything in one prompt produces nothing perfect.

Verify Everything

Read every diff. Run every test. Use the browser sub-agent. 10x speed only works with 100% verification discipline. The moment you start trusting the agent without checking, the agent starts shipping bugs you cannot debug.

Meta-prompting — the expert technique

Use one AI to write prompts for another. Describe your project to Gemini chat (free, separate context) in plain language. Ask it to write a structured technical spec. Paste the spec into Antigravity. Consistently produces better results than hand-crafted prompts because the model knows what other models need.

The DDS Sovereign AGI Suite — proof of methodology

Not theory. Production systems running today. Internal valuation totals $5.85B across the three flagships and the broader portfolio per the March/April 2026 audit reports.

FLAGSHIP V4.0

Sovereign Orchestrator Pro

Meta-orchestration layer

Top-of-stack coordinator. Routes tasks across the entire synthetic employee fleet. April 2026 audit valuation: $2.5B.

$2.5BV4.0Meta-orchestration

FLAGSHIP V1.0

AGI-CORE-Pro · The Synthetic Director

Multi-platform content production

Generates platform-optimized content across 8+ channels in parallel from a single brief. Internal valuation: $1.15B.

$1.15BV1.0Launched Mar 2026

FLAGSHIP V3.0

NICHE-FORGE-CORE

Agency-in-a-box ecosystem

End-to-end niche-specific content and growth ecosystem. Ecosystem valuation: $2.2B.

$2.2BV3.0Ecosystem

SCALE

The Suite Total

15 synthetic employees

Combined $5.85B Sovereign AGI Suite includes Atelier OS Theme Engine v3.4.0 and the Sovereign Synthetic Empire Dashboard among the 15 production systems.

$5.85B15 employees$0/mo hosting

ⓘ

Honest disclosure. The dollar values above are internal-audit valuations, not external market validations. They represent labor cost displaced and synthesized output value at 2026 market rates per the audit methodology documented in the DDS investor pitch ($39M–$68M range, most probable $48M–$58M). The methodology, the build velocity, and the operating cost ($0/month) are the real proof.

Module 14 · 45 Engineering Prompts

45 Production-Grade Prompts — Tested, Tagged, Paste-Ready

Every prompt below is tested in Antigravity v1.21.6 with Gemini 3.1 Pro. Tagged by skill level. Use Planning Mode for everything Intermediate or above.

Scaffolding (P1–P6)

Beginner

P1 · Three-Surface Verification

Create a minimal Express server at /verify with one route GET / returning {status:"ok",ts:Date.now()}. Initialize package.json, install express, start the server on port 3030, then use the browser sub-agent to fetch http://localhost:3030 and screenshot the JSON response. Confirm editor, terminal, and browser surfaces all worked.

Intermediate

P2 · Monorepo Full-Stack Scaffold

Plan and scaffold a TypeScript monorepo using npm workspaces. Apps: /apps/web (Next.js 15 App Router, Tailwind, Shadcn/ui), /apps/api (Express, Prisma, Postgres), /packages/types (shared Zod schemas). Root scripts: "dev" runs both apps concurrently, "lint" runs both projects, "test" runs both. Add ESLint flat config, Prettier, Husky pre-commit, and lint-staged. Generate .env.example for both apps. Include a README with one-command setup. Use Planning Mode and show me the plan before coding.

Intermediate

P3 · Next.js 15 SaaS Starter with Auth

Build a Next.js 15 SaaS starter: marketing landing, /login (email + Google OAuth via NextAuth v5), /dashboard (protected), /dashboard/settings. Use Tailwind, Shadcn/ui, Prisma, Postgres. Pages: hero, 6-feature grid, 3-tier pricing, CTA. Middleware protects /dashboard/*. Dark-mode toggle via next-themes. Loading skeletons. .env.example documented. README with one-command setup. Plan first.

Beginner

P4 · Python FastAPI Service

Build a Python FastAPI service for a tasks API. SQLAlchemy + SQLite. Models: Task(id, title, description, status, priority, created_at, updated_at). Routes: GET /tasks (?status filter), GET /tasks/{id}, POST /tasks, PATCH /tasks/{id}, DELETE /tasks/{id}. Pydantic v2 schemas. Pytest tests for all routes covering happy/edge/error. Run all tests in terminal and show output before completing.

Intermediate

P5 · Vite + React 19 + TanStack Stack

Scaffold a Vite + React 19 + TypeScript SPA using TanStack Router (file-based routing) and TanStack Query. Tailwind v4. Three routes: /, /products, /products/$id. Mock products with MSW (Mock Service Worker). Skeleton loaders. Error boundaries. Vitest setup with one component test per route. Build and verify in browser sub-agent.

Intermediate

P6 · Astro Content Site

Build an Astro 5 content site for a developer blog. Content collections for posts (Markdown frontmatter: title, date, tags, draft). Routes: /, /posts, /posts/[slug], /tags/[tag]. RSS feed at /rss.xml. Sitemap at /sitemap.xml. Reading time per post. View transitions enabled. Lighthouse-validated 95+ on Performance/Accessibility/SEO via browser sub-agent.

Frontend & UI (P7–P14)

Intermediate

P7 · Pixel-Accurate Recreation from Screenshot

Look at the attached design screenshot. Recreate it in React + TypeScript + Tailwind. Match layout to within 95% pixel accuracy. All charts use Recharts with sample data (no empty states). Responsive collapse to single column at 768px. Use the browser sub-agent to compare your output against the screenshot at 1440px and iterate until accuracy threshold is met. Report the diff in your final summary.

Advanced

P8 · Component Library with Storybook

Plan and build a React 19 + TS + Tailwind component library. Components: Button (4 variants x 3 sizes x loading), Input (4 types + error + helper), Card (3 variants + slots), Modal (centered + slide-over), Toast (4 types + auto-dismiss), Avatar (image + initials + status), Badge (4 variants + removable), Tabs (3 styles + controlled/uncontrolled), Dropdown (single + multi + searchable), DataTable (sortable + paginated + selection + empty). Each: Storybook story with all variants, Vitest test, ARIA, keyboard nav. Build one component fully before moving to next.

Intermediate

P9 · Animated Marketing Page

Build a single-page animated marketing site for a fictional AI productivity app called "ThreadLoom". Pure HTML + CSS + vanilla JS, no frameworks. Sections: Hero with typing animation, Features (3 cards with reveal-on-scroll), 3-tier Pricing, Testimonials carousel (auto + manual), FAQ accordion, Footer. IntersectionObserver for fade-ins. Smooth scroll. Sub-50KB total payload. Verify performance with browser sub-agent Lighthouse run.

Advanced

P10 · Accessible Modal System

Build a production-grade Modal component in React + TS + Tailwind. Requirements: focus trap (Tab cycles within modal), Escape closes, click outside closes (configurable), aria-modal="true", aria-labelledby pointing to title, restore focus to trigger on close, scroll-lock body without layout shift, support nested modals, animate in/out via Framer Motion. Tests: jest-axe for a11y, RTL for behavior. Document API in JSDoc.

Intermediate

P11 · Form Builder with Zod

Build a typed form builder in React + TS using react-hook-form + Zod. Schema-driven: pass a Zod schema and get a fully rendered form with field-level validation, error messages, async submit handler, dirty/touched state indicators, submit button disabled until valid. Support: text, email, password, number, select, multi-select, radio, checkbox, textarea, date, file. Demo with a 12-field user-onboarding schema. Tests cover validation paths.

Advanced

P12 · Virtualized Data Grid

Build a virtualized data grid in React + TS handling 100,000 rows smoothly. Use TanStack Virtual for row virtualization. Features: column resize, sort, multi-column filter (text, number range, date range, multi-select), row selection (single + multi + range), sticky header, sticky first column option, CSV export of visible rows. Performance budget: 60fps scroll, <100ms filter response. Verify with browser sub-agent on a 100k-row demo.

Intermediate

P13 · Dark Mode Without Flash

Implement a dark/light/system theme system in Next.js 15 App Router. Requirements: zero flash of incorrect theme on first paint, persists across navigation, syncs across tabs, respects prefers-color-scheme on first visit, toggle in nav with three states. Use a server component for the inline script that runs before hydration. Tailwind dark variant. Verify no FOIT in browser sub-agent by reloading 5 times.

Intermediate

P14 · Multi-Step Wizard with State Machine

Build a 5-step onboarding wizard using XState v5. Steps: account then profile then preferences then integrations then review. State machine handles back/forward/skip rules, optional steps, validation gates per step. Persist progress to localStorage. Resume from last step on reload. Test all transition paths with @xstate/test.

Backend & APIs (P15–P22)

Advanced

P15 · GraphQL API with DataLoader

Build a GraphQL API in TypeScript using Apollo Server 4 + Prisma + Postgres. Schema: User, Post, Comment with relations. JWT auth (15min access + 7day refresh in httpOnly cookie). Role-based authz (ADMIN, USER). Queries with cursor pagination + relay-style edges. DataLoader to prevent N+1. Subscriptions for newComment via WebSocket. Rate limit 100 req/min/IP. Seed 5 users, 20 posts, 50 comments. Test every operation.

Intermediate

P16 · Job Queue with BullMQ

Build a job processing system using BullMQ + Redis. Three queues: emails (send transactional), reports (generate PDFs), webhooks (deliver with retry). Each: typed job payloads with Zod, concurrency tuning, retry with exponential backoff (max 5), dead letter queue for terminal failures, Bull Board UI at /admin/queues protected by basic auth. Sample workers. Demo by enqueueing 50 emails and verifying processing.

Intermediate

P17 · WebSocket Chat with Presence

Build a real-time chat: Node + Socket.IO + TS backend, React + TS frontend. Rooms (create/join/leave), username on connect, typing indicators, online presence per room, 50-message history in memory, system messages, basic emoji picker (20 emojis), responsive mobile. Open two browser tabs in the sub-agent and demonstrate live messaging between them with screenshots.

Advanced

P18 · tRPC End-to-End Type Safety

Set up a Next.js 15 App Router project with tRPC v11. Define routers for users (CRUD), posts (CRUD with author relation), and comments (nested under posts). Zod input validation. Procedure middleware for auth and rate limiting. React Query integration. Optimistic updates on mutations. End-to-end types from server to client. Demo page exercising every procedure.

Intermediate

P19 · File Upload with Resumable Chunks

Build resumable file upload using tus protocol. Backend: Node + tus-node-server. Frontend: React drag-drop with Uppy. Support files up to 5GB. Pause/resume across page reloads. Show progress per file. Server stores uploads to local disk (configurable to S3 later). Verify end-to-end by uploading a 200MB test file with pause/resume.

Intermediate

P20 · Webhook Receiver with Signature Verification

Build a webhook receiver service in TypeScript. Endpoints for Stripe, GitHub, and a generic HMAC-SHA256 verified provider. Each: signature verification middleware, idempotency by event ID stored in Redis with 24hr TTL, async processing via BullMQ, replay endpoint for failed events. Document the test webhooks for each provider in README.

Advanced

P21 · Multi-Tenant DB with Row-Level Security

Implement multi-tenancy in Postgres using row-level security. Single database, shared schema, tenant_id column on every row. RLS policies enforce tenant isolation at the DB layer. Express middleware sets SET app.tenant_id per request based on JWT claim. Prisma client extension automatically scopes queries. Test isolation by querying as Tenant A and asserting Tenant B data is invisible even with raw SQL.

Advanced

P22 · Event Sourcing Skeleton

Build an event-sourcing skeleton for an order system in TypeScript. Event store in Postgres (event_log table: aggregate_id, version, type, payload, timestamp). Aggregate: Order with events OrderPlaced, ItemAdded, ItemRemoved, OrderShipped, OrderCancelled. Apply events to rebuild current state. Snapshots every 50 events. Read model projection writes to an orders_view table on each event. Demo with 100 events and verify view consistency.

Testing & Quality (P23–P28)

Intermediate

P23 · Comprehensive Test Suite Generator

Analyze the existing codebase. Identify all functions, classes, API routes, and React components. For each, generate: unit tests (Vitest) covering happy/edge/error, integration tests for API routes (supertest), component tests (React Testing Library). Create test utilities: factory functions for entities, custom matchers. Configure coverage thresholds at 80%. Run full suite, generate HTML report, identify untested code paths. Fix any failing tests before declaring complete.

Advanced

P24 · Playwright E2E Suite

Create an end-to-end Playwright suite for the app in this workspace. Tests: auth (register/login/logout/protected), CRUD (create/read/update/delete with verification), validation (empty form errors, recovery), navigation (all links, breadcrumbs, back), responsive (run each at 375/768/1280), accessibility (focus, keyboard nav, ARIA), error states (simulate API failure with network interception). Page Object Model. 3 browser projects (Chromium/Firefox/WebKit). Screenshots on failure. HTML report. Run the full suite and show me the report.

Intermediate

P25 · Property-Based Testing

Add property-based tests to the codebase using fast-check. Identify pure functions and write properties (not examples) for each: idempotency, commutativity, identity, inverses where applicable. Run with 1000 iterations per property. For any failures, fast-check shrinks to a minimal counterexample — include those in a PROPERTIES.md report.

Intermediate

P26 · Mutation Testing

Run mutation testing with Stryker on this TypeScript codebase. Configure Stryker for the project. Generate baseline mutation score. Identify surviving mutants — these are tests that did not catch deliberate bugs. For the 10 most impactful surviving mutants, write additional tests that kill them. Re-run and report new mutation score.

Advanced

P27 · Visual Regression with Percy-Style Snapshots

Implement visual regression testing using Playwright's built-in screenshot comparison. For each page in the app, capture baseline screenshots at 3 viewports. Configure tolerance (maxDiffPixelRatio: 0.01). On failure, save diff images. Add a "blessed" workflow to update baselines intentionally. CI integration: fail PRs that introduce visual regressions.

Advanced

P28 · Accessibility Audit Suite

Build an accessibility audit suite using axe-core integrated into Playwright. Audit every page in the app for WCAG 2.1 AA violations. Per page: capture violations, classify by impact (critical/serious/moderate/minor), generate fix suggestions. Output AUDIT.md with a prioritized fix list. For the top 5 critical violations, implement fixes and verify.

DevOps & Deployment (P29–P32)

Intermediate

P29 · Multi-Stage Docker + Compose

Containerize the app with a multi-stage Dockerfile (deps then build then distroless runtime, ~80MB final, non-root user). docker-compose.yml: app + Postgres 16 + Redis 7 + Nginx with self-signed SSL. docker-compose.dev.yml override for hot reload + pgAdmin. .dockerignore. Health checks per service. Makefile: dev/build/up/down/logs. README documenting setup. Build and verify all services come up healthy.

Advanced

P30 · GitHub Actions CI/CD with Matrix

Build .github/workflows/ci.yml: triggers on push to main/develop and PRs to main. Jobs: lint (ESLint+Prettier), test (Vitest matrix on Node 20/22 x Linux/macOS/Windows with coverage upload), e2e (Playwright against compose stack), build (multi-stage Docker push to GHCR), deploy-staging (auto on develop), deploy-prod (manual approval on main push). PR check workflow comments coverage diff and labels by changed paths. Document required secrets in README.

Advanced

P31 · Terraform IaC for AWS

Write Terraform for a small AWS production environment: VPC with public/private subnets across 2 AZs, ECS Fargate cluster running the app from ECR, ALB with HTTPS via ACM, RDS Postgres in private subnet, ElastiCache Redis in private subnet, S3 bucket for static assets with CloudFront distribution. Modules: networking, compute, data. Remote state in S3 with DynamoDB locking. Variables for environment (dev/staging/prod). README with apply instructions. Do not apply — generate a plan only and show it to me.

Intermediate

P32 · Observability Stack — OpenTelemetry

Add OpenTelemetry instrumentation to a Node + Express service. Traces, metrics, and logs exported via OTLP. Auto-instrument express, http, postgres, redis. Custom spans for business logic functions. Configure Jaeger for traces and Prometheus for metrics in docker-compose. Generate sample load with autocannon. Show traces in Jaeger UI via browser sub-agent.

Refactoring & Performance (P33–P38)

Advanced

P33 · Legacy Codebase Modernizer (Phased)

Audit this codebase. Phase 1 — produce MODERNIZATION.md: outdated/vulnerable deps, dead code, implicit side effects, bundle size, load time. Phase 2 — prioritized migration plan ranked by risk x impact, with rollback per phase. Phase 3 — execute one phase at a time, ASKING ME between phases. Convert CommonJS to ESM where possible, add TS types incrementally (strict false to true), extract hardcoded values to config. Run tests after every phase. Show before/after metrics.

Intermediate

P34 · Bundle Size Optimization

Analyze the production bundle with rollup-plugin-visualizer or webpack-bundle-analyzer. Identify the 5 largest dependencies. For each, propose a lighter alternative or removal strategy. Implement code splitting via dynamic imports per route. Add lazy loading for below-the-fold images. Extract critical CSS. Measure before/after for JS/CSS/total payload. Lighthouse Performance score before/after via browser sub-agent.

Intermediate

P35 · Database Query Optimizer

Profile the API with the slow_query_log enabled. Identify the top 10 slow queries by p95 duration. For each: EXPLAIN ANALYZE the query, propose an index or query rewrite, measure improvement. Document changes in DB_OPTIMIZATIONS.md with before/after timing. Add a Prisma migration for any new indexes.

Advanced

P36 · React Performance Audit

Profile the React app with React DevTools Profiler. Identify the 5 components with the highest re-render cost. For each: diagnose cause (unstable props, missing memoization, context bloat), apply fix (React.memo, useMemo, useCallback, context split, state colocation). Measure before/after with Profiler. Document in PERF.md.

Intermediate

P37 · Cache Layer Implementation

Add a Redis caching layer to the API. Cache strategy per endpoint: read-heavy endpoints get cache-aside with TTL, list endpoints cache the first page only, write endpoints invalidate related keys. Implement a typed cache wrapper. Document hit/miss metrics. Run a load test before and after with autocannon to measure throughput improvement.

Intermediate

P38 · Memory Leak Hunter

The Node service has a slow memory leak — RSS grows about 50MB/hour under steady load. Use heapsnapshot diffs (--inspect + Chrome DevTools or clinic.js heap profiler) to identify retained objects. Find the leak. Fix it. Confirm with a 1-hour autocannon load test that memory plateaus instead of growing.

AI & Agent Engineering (P39–P45)

Expert

P39 · Multi-Agent Orchestrator (CEO Pattern)

Build a multi-agent orchestration system in TypeScript. Architecture: CEO agent decomposes requests into a typed task graph and dispatches to specialists (CodeWriter, Reviewer, Tester, Documenter, SecurityAuditor). Message bus uses typed events. State machine tracks workflow (PLANNING then CODING then REVIEWING then TESTING then DEPLOYING then COMPLETE). Each agent: system prompt, process(task) method, structured audit log entry per call. CodeWriter reviewer loop with max 3 cycles. Demo with: "Implement a feature flag service in TS." Show the full audit trail.

Expert

P40 · RAG Knowledge Base

Build a RAG knowledge base. Backend: Python + FastAPI + ChromaDB + Gemini embeddings. Document ingestion: upload PDF/Markdown/text then chunk (recursive char splitter, 500 chars, 50 overlap) then embed then store with metadata. Retrieval: top-5 with MMR for diversity. Generation: Gemini 3.1 Pro with retrieved context in system prompt + source attribution. Frontend: React chat UI with source citations expandable. Test with 3 sample PDFs and 5 Q and A pairs. Verify source attribution accuracy.

Expert

P41 · Generator-Critic Self-Correction Loop

Build an adversarial self-correction pipeline in TypeScript. Generator produces output. Critic scores against a 5-dimension rubric (correctness, completeness, code quality, edge cases, performance — each 0-20, total 100). Orchestrator loops Generator then Critic with feedback injection until score at least 85 or 5 iterations. Log every iteration with score deltas, diffs, and feedback. Demo with task: "Write a CSV parser handling quoted commas, embedded newlines in fields, and UTF-8." Show quality improvement curve.

Expert

P42 · Tool-Use Agent with Function Calling

Build a tool-use agent that solves tasks by chaining tool calls. Tools: web_search, file_read, file_write, run_shell (sandboxed). Agent loop: plan then call tool then observe result then decide next action then repeat until task complete or max 10 calls. Use Gemini 3.1 Pro function calling. Implement structured logging of every tool call with arguments, result, and reasoning. Demo task: "Find the latest TypeScript version, compare it to what's in our package.json, and write an upgrade plan to UPGRADE.md."

Advanced

P43 · Prompt Eval Harness

Build a prompt evaluation harness. Define a test set of {prompt, expected_properties} pairs. Run prompts across multiple models (Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.6). For each output, run automated checks (regex, JSON schema validation, LLM-as-judge for subjective qualities). Output a comparison matrix as Markdown. Useful for choosing the right model per prompt type.

Expert

P44 · Streaming Response Pipeline

Build a streaming chat backend using Server-Sent Events. Express endpoint that streams Gemini 3.1 Pro responses token-by-token. React frontend renders tokens as they arrive with smooth typewriter animation. Handle: cancellation (user clicks stop), reconnection (network drop), error states, rate limits. Show actual token throughput in the UI.

Expert

P45 · Constitutional AI Filter

Build a content moderation pipeline that wraps any LLM call with a constitutional filter. Pre-filter: classify user input against a constitution (helpfulness, safety, privacy, accuracy) and reject violations. Post-filter: classify model output before returning. On rejection, log the violation with classification reason. Provide override mechanism for authorized users. Demo with a small constitution and 20 test inputs covering edge cases.

Module 15 · Build My Systems

15 “Build It Like Robert” Prompts — Reverse-Engineered from the Sovereign AGI Suite

These 15 prompts teach the actual architecture patterns used inside the $5.85B Sovereign AGI Suite. Each prompt builds a system you can run, while teaching a transferable pattern you will reuse across dozens of future projects.

Expert

B1 · Build a Sovereign Orchestrator (Meta-Routing Layer)

Build the meta-orchestration pattern that sits above all your other agents. TypeScript. Architecture: a top-level Orchestrator class that maintains a registry of downstream agents (each with a name, capability tags, and process method). On a new request: classify the request against capabilities, build a dependency graph of subtasks, dispatch to specialist agents in parallel where independent and serially where dependent, aggregate results, return a unified response. Include: typed events for every dispatch and completion, retry with exponential backoff per agent (max 3), circuit breaker that disables a failing agent for 5 minutes after 3 consecutive failures, structured audit log per request. Demo with 4 mock specialist agents and a sample request that touches all of them.

Pattern taught: Meta-orchestration. The Sovereign Orchestrator Pro V4.0 uses this exact pattern.

Expert

B2 · Build a Synthetic Director (Multi-Channel Content Factory)

Build a content factory that takes one creative brief and produces 8 channel-optimized outputs in parallel. TypeScript + Node. Inputs: topic, audience, tone, key points. Outputs in parallel via Promise.allSettled: blog post (1500w with H2/H3), Twitter thread (7 tweets), LinkedIn post (300w), Instagram caption (with hashtag block), TikTok script (60s with timestamps), email newsletter (subject + preview + body), podcast outline (talking points + intro/outro), YouTube script (hook + chapters + outro). Each output gets a quality score (0-100) from a Critic pass. Output a directory /out/<date>-<slug>/ with one file per channel plus a manifest.json. Web UI for inputting briefs and viewing all outputs side-by-side. Demo with the brief: "Launch announcement for the DDS Sovereign AGI Suite."

Pattern taught: Parallel-channel fan-out. AGI-CORE-Pro V.1.0 The Synthetic Director uses this.

Expert

B3 · Build a NicheForge (Agency-in-a-Box Ecosystem)

Build an end-to-end niche-content ecosystem. TypeScript + Next.js + Postgres. Inputs: niche name, target audience, monetization model. Outputs: a deployed marketing site (10 SEO pages auto-generated), a 90-day editorial calendar, social account strategy per platform (X, Instagram, TikTok, YouTube), a lead-magnet offer, an email sequence (5 emails), and a launch playbook. Each component is a separate agent run that reads the niche context. Aggregate everything into a /niche/<name>/ directory with a master README mapping the deliverables. Demo with the niche: "Sustainable streetwear for Boston millennials."

Pattern taught: Compositional ecosystems. NICHE-FORGE-CORE V3.0 follows this shape.

Expert

B4 · Build a Self-Repairing Agent Swarm

Build a 9-agent swarm with meta-cognitive self-repair. TypeScript. Agents: Architect (decomposes), Builder (implements), Analyst (data work), Communicator (NLG), Researcher (web), Guardian (security), Optimizer (perf), MetaCognitor (monitors agent metrics), Orchestrator (CEO). Self-repair mechanism: MetaCognitor tracks per-agent quality score, latency, and error rate. When any metric crosses threshold, triggers a diagnostic that replays the last 10 tasks and compares outputs against expected. Repair options: adjust temperature, modify system prompt, switch model, restart agent. Log every repair to repair_audit.json. Demo by submitting a complex task, then artificially degrade one agent's outputs and watch self-repair activate.

Pattern taught: Meta-cognition and self-repair. This is what makes a swarm production-grade.

Advanced

B5 · Build a Counsel AI (Red Team vs Blue Team War Room)

Build an adversarial decision simulator. Python + FastAPI + React. Agents: Blue Team (builds the strongest case FOR the proposed decision), Red Team (builds the strongest case AGAINST), Judge (scores arguments on logic, evidence, feasibility, risk), Synthesizer (final recommendation merging strengths). Flow: user enters a decision then Blue generates 5 supporting arguments with evidence then Red generates 5 counter-arguments then 3 rebuttal rounds then Judge scores then Synthesizer outputs Executive Summary, Risk Matrix, Recommended Action, Contingency Plan. Output as structured JSON + formatted Markdown report with full debate transcript. Demo with: "Should DDS expand into the European market in Q3 2026?"

Pattern taught: Adversarial simulation. Stress-tests decisions before commitment.

Advanced

B6 · Build a ProductLens (AI Product Photography Pipeline)

Build an AI product-photography analysis and optimization system. Python FastAPI + Pillow/OpenCV + React frontend. Features: drag-drop image upload (multi-file), Gemini 3.1 Pro vision analysis (background, lighting, shadows, color temperature, composition), per-image quality score (sharpness, exposure, white balance, composition — each 0-100), specific improvement recommendations, optional background removal (rembg or remove.bg API), batch processor with progress for 50+ images, brand-style matching against uploaded reference images, export pipeline producing Shopify, Instagram, and Amazon variants in correct dimensions. Demo with 5 sample product shots.

Pattern taught: Multimodal vision pipeline with structured output.

Advanced

B7 · Build a Cortex-7 R&D Lab (Clean-Room Competitive Intel)

Build a competitive-intelligence pipeline using Clean Room methodology — analyze without copying. TypeScript CLI. Pipeline: TargetCollector (input competitor URLs, scrape public pages respecting robots.txt) then PatternExtractor (pricing strategy, feature positioning, UX patterns, messaging tone) then GapFinder (compare to your product, identify gaps and opportunities) then StrategyGenerator (5 strategic responses ranked by effort vs impact) then ReportCompiler (executive summary + detailed appendix as Markdown). Strategy Pattern architecture: Strategy interface with analyze method, concrete strategies for Pricing/Feature/UX/Content, Context selects strategy by analysis type. Demo with 3 fictional competitor URLs.

Pattern taught: Strategy Pattern + Clean Room methodology for ethical competitive intel.

Expert

B8 · Build an Atelier OS (FSM Publishing Pipeline)

Build an automated content publishing system using a Finite State Machine. TypeScript. States: IDLE then RESEARCHING then DRAFTING then EDITING then SEO_OPTIMIZING then FORMATTING then REVIEWING then PUBLISHING then PUBLISHED (or FAILED). Each state has a guard (must pass to enter), an action (executes on entry), and a rollback (called on failure). Agents per state: TrendScout, ContentStrategist, DraftWriter, SEOOptimizer, VisualDirector, SocialComposer, QualityGate, Publisher. Inter-agent communication via typed event bus. Structured JSON logging per state transition. CLI: run --topic "sustainable streetwear trends 2026". Output: HTML + JSON metadata + social posts package.

Pattern taught: FSM pipelines with rollback. The backbone of any production content factory.

Advanced

B9 · Build a Sovereign Synthetic Empire Dashboard

Build a unified dashboard that monitors a fleet of synthetic employees. Next.js 15 + TS + Tailwind + Recharts + tRPC. Each synthetic employee is a configurable record with name, role, status (active/idle/error), last-run timestamp, output count, value generated (USD/month), and operating cost. Dashboard pages: Fleet Overview (cards for all employees with status indicators), Performance (charts of value generated over time per employee), Cost Analysis (operating cost vs value), Audit Log (every action across all employees). Add/edit/disable employees from UI. Mock 15 employees with realistic data matching the DDS Sovereign AGI Suite shape. Verify in browser sub-agent.

Pattern taught: Operating fleets of agents requires fleet-level observability.

Intermediate

B10 · Build a Custom Antigravity Skill (Production Pattern)

Create a complete Antigravity Agent Skill at .agent/skills/shopify-section-author/. SKILL.md frontmatter description: "Author Shopify Liquid sections following DDS Atelier 3.4.0 standards. Triggers on requests to create, build, or scaffold a Shopify section (.liquid file in /sections/), or any mention of section schema, blocks, or theme settings." Body: complete instructions covering scoped CSS, IIFE JS, schema settings range validation, mobile breakpoints, accessibility, performance. Add references/liquid-cheatsheet.md, references/dds-brand-tokens.md, examples/section-hero.liquid, examples/section-faq.liquid (gold standards), and scripts/validate-schema.js (validates a section's schema block against JSON Schema). Test the skill by asking Antigravity in a fresh chat: "Build a featured-collection section."

Pattern taught: The All-in-One Skill pattern with executable validation.

Advanced

B11 · Build a Multi-Agent Code Review Bot

Build a GitHub PR review bot using GitHub Actions + a multi-agent pipeline. On PR opened: checkout, run agents in parallel — SecurityReviewer (scans for hardcoded secrets, SQL injection, XSS), StyleReviewer (against your AGENTS.md style rules), TestReviewer (verifies new code has tests, checks coverage doesn't drop), ArchitectureReviewer (checks against architecture.md constraints). Each agent posts a separate review comment categorized by impact. Aggregator posts a final summary with merge recommendation. Use Gemini 3.1 Pro for all agents. Configure max cost per PR. Demo on a real test repo.

Pattern taught: Parallel specialized review with cost ceilings.

Advanced

B12 · Build a Sovereign Local Inference Stack (Ollama)

Build a sovereign local-inference stack for privacy-sensitive tasks. Docker compose: Ollama (v0.17+) + Open WebUI + nginx reverse proxy. Ollama runs qwen2.5-coder-14b-32k or gpt-oss-20b locally depending on VRAM. Environment: OLLAMA_FLASH_ATTENTION=1, OLLAMA_CONTEXT_LENGTH=8192, OLLAMA_KEEP_ALIVE=-1, OLLAMA_KV_CACHE_TYPE=q8_0. Expose an OpenAI-compatible API at http://localhost:11434/v1. Build a TypeScript client wrapper that round-robins between Ollama (primary) and Gemini 3.1 Pro via Antigravity Skills (fallback) based on task sensitivity. Demo: classify 20 prompts as sensitive (stay local) vs non-sensitive (route to cloud) and verify the routing works.

Pattern taught: Sovereign inference with cloud fallback. The foundation of zero-cost AGI operation.

Expert

B13 · Build a Custom MCP Server for Your Domain

Build a custom MCP server that exposes your domain's operations to Antigravity. TypeScript + @modelcontextprotocol/sdk. Example domain: DDS Shopify operations. Tools: list_products, get_product_by_handle, update_inventory_quantity, list_collections, get_orders_since_date, run_liquid_validator. Each tool: typed input schema (Zod), typed output, input sanitization, rate limit per tool. Server runs over stdio (local) with optional HTTP mode. Installation script writes config into ~/.gemini/antigravity/mcp_config.json. README documenting every tool and its risk level. Test by adding the server to Antigravity and asking: "List all my DDS products and flag any with zero inventory."

Pattern taught: MCP as the universal extension surface for your agents.

Advanced

B14 · Build a Forensic Audit Trail System

Build a forensic audit trail that captures every agent action in your AGI fleet. TypeScript + Postgres (canonical log) + TimescaleDB hypertable (high-volume metrics). Schema: audit_events(id, timestamp, agent_id, session_id, event_type, input_hash, output_hash, duration_ms, tokens_in, tokens_out, cost_cents, error, parent_event_id). Hash inputs/outputs with SHA-256 — store full payloads in S3-compatible blob store keyed by hash for space efficiency. Query API: replay any session, find all events touching a file, cost breakdown per agent per day, identify retry storms. Demo with 10,000 simulated events and show a replay query.

Pattern taught: Production-grade observability for AI systems. Non-negotiable past 3 agents.

Expert

B15 · Build a Sovereign Deployment Pipeline (Zero-Cost Hosting)

Build a sovereign deployment pipeline that runs your entire AGI stack on local hardware at $0/month hosting. docker-compose.yml: reverse proxy (Caddy with automatic HTTPS via local CA), orchestrator app (TypeScript), Postgres + TimescaleDB, Redis Streams, Qdrant vector DB, MinIO (S3-compatible blob store), Ollama (local inference), Grafana (observability). Expose via Cloudflare Tunnel so the stack is reachable from anywhere without opening ports or paying for a cloud VM. Health checks for every service. Auto-restart policies. Daily snapshot backups to an external drive. Documentation: wake-on-LAN setup so the box only runs when needed. Demo end-to-end by deploying the stack fresh and verifying every service responds.

Pattern taught: Sovereign operation. How DDS runs 15 synthetic employees at $0/month hosting cost.

Module 16 · AGI Agent + Swarm Architecture

Building Real AGI Agents and Multi-Agent Swarms

This is the module that separates toy projects from production AGI systems. A single agent is a script with a model attached. A swarm is an operating system. Below is how the DDS Sovereign AGI Suite is actually architected — not theory, the production shape of 15 synthetic employees running right now.

What makes an AGI agent (vs a function that calls an LLM)

Any Python script can call the Gemini API. That is not an agent. A real agent has five properties — without all five, you have a prompt wrapper, not an agent:

Identity. A stable role with a system prompt, a name, a capability declaration, and persistent memory. Other agents and humans can address it by name.
Autonomy. Given a task, it plans, executes, and returns a result without step-by-step human prompting. It decides which tools to call and in what order.
Observability. Every action emits a structured log event: timestamp, input hash, output hash, duration, tokens, cost, error. You can replay any session.
Composability. It implements a common interface (process(task) → Result) so an orchestrator can dispatch to it without knowing its internals.
Self-awareness. It reports its own health (latency, error rate, quality score) to whatever is monitoring it. Module 19 covers how self-awareness enables self-healing.

The Agent interface contract

Every agent in the DDS suite implements the same TypeScript interface. This is what makes a swarm possible — the orchestrator does not need to know whether it is dispatching to a Builder, a Reviewer, or a VisualDirector. They all respond the same way.

idstable name

capabilitiesstring tags

processtask → Result

healthCheckHealthReport

systemPromptmutable

modelswappable

Capabilities are string tags like code.typescript, review.security, content.long-form, vision.product-photo. The orchestrator routes tasks by matching task requirements against registered capability tags — this is the contract that makes new agents plug-and-play.

The 7 specialist archetypes

Every swarm I have built ends up with some mix of these seven archetypes. Name them however you want internally; these are the roles.

BUILDER

Architect

Decomposes, plans, designs

Takes a request, produces a dependency graph of subtasks. Does not execute — only plans. Output: structured plan with ordered steps and blocking dependencies.

BUILDER

Builder

Implements

Receives a plan step, produces an artifact (code file, config, migration). Reads surrounding code, matches patterns, writes.

CRITIC

Reviewer

Evaluates artifacts

Scores Builder outputs against a rubric. Returns approve/reject with structured feedback. The Builder-Reviewer loop is the core quality mechanism.

CRITIC

Tester

Verifies behavior

Generates and runs tests. Reports failures back to Builder. Distinct from Reviewer — Reviewer evaluates code quality, Tester evaluates runtime behavior.

GUARD

Guardian

Security and compliance

Scans for secrets, injection vectors, license violations, forbidden patterns. Has veto power — Guardian rejection blocks the entire pipeline.

SCOUT

Researcher

External knowledge

Fetches web pages, reads docs, queries APIs. Feeds verified context into other agents. Uses browser sub-agent or web MCP.

MetaCognitor

Watches the watchers

Monitors every other agent. Tracks quality score, latency, error rate per agent. Triggers self-repair when metrics drift. Covered in Module 19.

The 4 swarm topologies

How agents connect to each other determines what the swarm can do. Most production systems mix these four patterns.

Star (CEO-and-specialists)

One Orchestrator (CEO) at center, specialists radiating out. Every request enters through the CEO; every result returns through the CEO. Simplest to reason about, easiest to debug. Bottlenecked by CEO throughput. Use when: single-request workflows, <10 specialists, strict audit requirements.

CEOOrchestrator

Architectplans

Buildercodes

Reviewerevaluates

Testerverifies

Guardianaudits

Mesh (peer-to-peer via event bus)

No CEO. Agents publish events to a shared bus and subscribe to the ones they care about. Any agent can trigger any other by emitting the right event. Scales horizontally. Harder to trace a single request end-to-end — you need correlation IDs. Use when: 10+ agents, continuous background processing, event-driven workflows. This is where Ocean Logic (Module 17) becomes essential.

Pipeline (FSM handoffs)

Agents arranged in a deterministic sequence, each transforming the artifact and passing it forward. State Machine governs transitions. The Atelier OS publishing pipeline is this pattern: TrendScout → Strategist → Writer → Editor → SEO → QualityGate → Publisher. Use when: linear workflows where each stage is a clear transformation, like content production, ETL, or build-test-deploy.

Scoutresearches

Strategistplans

Writerdrafts

Editorrevises

QualityGatescores

Publisherships

Hierarchical (swarm of swarms)

Multiple star or pipeline swarms, coordinated by a meta-Orchestrator above them. This is how the Sovereign Orchestrator Pro V4.0 operates: it routes requests to the right sub-swarm (content swarm, code swarm, intel swarm), each of which internally uses its own topology. Use when: 15+ agents, multi-domain operations, fleet-scale deployments. The hierarchical pattern is what “Sovereign” actually means architecturally.

Sovereignmeta-CEO

Content Swarmstar topology

Code Swarmpipeline

Intel Swarmmesh

Ops Swarmstar

The CEO dispatch algorithm

This is the algorithm at the heart of every star-topology orchestrator I build. Architectural prose, not code — you will have Antigravity implement it:

Classify. Incoming request is classified against the capability registry to identify which specialists are needed. Done by the CEO itself via a structured classification prompt with the full capability list.
Decompose. CEO produces a typed dependency graph: nodes are subtasks, edges are blocking dependencies. Independent nodes run in parallel, dependent nodes run sequentially.
Dispatch. For each ready node (all dependencies satisfied), CEO selects an available specialist matching the required capability, forwards the subtask, and waits for completion.
Collect. As specialists complete, CEO writes results to shared state and checks for newly-ready nodes. Repeat until all nodes complete or one fails.
Aggregate. Once all nodes are done, CEO synthesizes specialist outputs into a unified response. Failed nodes trigger rollback or escalation based on policy.
Audit. Every dispatch, completion, and aggregation writes a structured event to the forensic audit log (Module 18 storage layer).

Coordination models — how agents share state

Three models in production use. Pick one per swarm — mixing them within a single swarm leads to consistency bugs.

Model	How	Pros	Cons
Shared Memory	Redis or Postgres as central state store; agents read/write keys	Simple, fast, easy to debug	Requires careful key discipline; single point of failure
Message Passing	Event bus (Redis Streams, NATS, Kafka) — see Module 17	Scales horizontally, loosely coupled, replay-able	Harder to trace single requests; eventual consistency
Direct Call	Agent A invokes Agent B via function call or HTTP	Synchronous, type-safe	Tight coupling; blocks the caller; hard to scale

Capability registration — making agents plug-and-play

The Sovereign pattern is that every agent registers itself with the Orchestrator at startup, declaring its capabilities. The Orchestrator maintains a capability → agent-list map in memory (and a persistent copy in Postgres for recovery after restart). When a new request arrives, the Orchestrator queries the map to find eligible agents.

This is what makes adding a new agent a one-file change instead of a full re-architect. Drop a new agent class into /agents/, implement the interface, declare capabilities, restart the Orchestrator. The Sovereign fleet absorbs it.

✓

The discipline. Resist the temptation to have agents call each other directly. Always route through either the Orchestrator (star) or the event bus (mesh). Direct calls between specialists produce a tangled graph where adding one agent requires touching five others. Capability registration keeps the swarm composable.

Build prompts for this module

ExpertSwarm

S1 · Build the Agent Interface Contract

Create /lib/agents/base.ts defining the Agent interface contract. Fields and methods: id (string), name (string), capabilities (string array), systemPrompt (string, mutable at runtime), model (string, swappable), process(task: Task) returning Promise<Result>, healthCheck() returning Promise<HealthReport>. Types: Task (id, type, payload, priority, deadline), Result (agentId, taskId, output, durationMs, tokensIn, tokensOut, error?), HealthReport (status, latencyP95Ms, errorRatePct, qualityScore, lastErrorAt). Add a BaseAgent abstract class implementing common plumbing: structured logging per process call, timing, token accounting. Export a registerAgent(agent) helper that writes the agent to the Orchestrator's in-memory registry plus a Postgres agents table for restart recovery. Write unit tests covering: a mock agent that succeeds, one that throws, one that times out. Run tests and show results.

Pattern taught: The contract that makes every other swarm pattern possible.

ExpertSwarm

S2 · Build a Star-Topology Orchestrator

Using the Agent interface from S1, build /lib/orchestrator/star.ts implementing the CEO dispatch algorithm. Class StarOrchestrator: maintains an agent registry indexed by capability, exposes dispatch(request) which classifies the request, builds a dependency graph using the Architect agent, executes the graph (parallel where independent, serial where dependent), aggregates results, writes every step to the audit log. Include: exponential backoff retry per specialist (max 3, jitter), 60-second per-specialist timeout, circuit breaker disabling a specialist for 5 minutes after 3 consecutive failures, structured JSON logging. Demo with 5 mock specialists (Architect, Builder, Reviewer, Tester, Guardian) and a sample request that requires all five. Show the full audit trail in terminal output.

Pattern taught: Production-grade CEO dispatch with backpressure and circuit breaking.

ExpertSwarm

S3 · Build a Pipeline Swarm with FSM Guards

Build /lib/orchestrator/pipeline.ts implementing a Pipeline swarm governed by an explicit FSM. States: INTAKE, RESEARCHING, DRAFTING, EDITING, QA, PUBLISHING, DONE, FAILED. Each transition has a guard (boolean predicate that must pass to enter) and a rollback (called on failure to revert side effects). Agents registered against each state execute on entry. On failure, rollback runs and the FSM transitions to FAILED with the last successful state recorded. Implement a replay() method that re-runs the pipeline from any recorded state. Demo with a content-publishing pipeline using 6 mock agents; deliberately fail in QA and watch rollback + replay work correctly.

Pattern taught: Deterministic pipelines with first-class failure handling. The Atelier OS backbone.

ExpertSwarm

S4 · Build a Hierarchical Sovereign Orchestrator

Build a hierarchical Sovereign Orchestrator that sits above multiple sub-swarms. TypeScript. Class SovereignOrchestrator owns a map of sub-swarm name to SubSwarm instance (each SubSwarm wraps a StarOrchestrator from S2 or a PipelineSwarm from S3). On a new request: Sovereign classifies which domain the request belongs to (code, content, intel, ops), routes to that sub-swarm, receives result, stamps it with provenance metadata (which sub-swarm handled it, which agents were involved), returns to caller. Add cost accounting per sub-swarm per day. Add a GET /fleet endpoint returning real-time status of every agent across every sub-swarm. Demo with 3 sub-swarms (content, code, intel) and dispatch 5 varied requests. Verify each was routed correctly and the fleet endpoint shows accurate status.

Pattern taught: Swarm-of-swarms architecture. How a single founder operates 15+ synthetic employees.

ExpertSwarm

S5 · Build the Capability Registry with Hot-Reload

Build /lib/registry/capabilities.ts — a capability registry that supports hot-reload. Agents live in /agents/ directory as one file per agent. On startup, registry scans the directory, imports each agent, reads its declared capabilities, and builds the capability-to-agent-list index. Watch the directory for changes: on file add, auto-import and register. On file remove, deregister. On file change, unregister + re-register. Expose query methods: findByCapability(tag), findByName(id), listAll(), healthCheckAll(). Add a REPL command registry.reload() that forces a full rescan. Demo by starting with 3 agents, adding a 4th while the orchestrator is running, dispatching a request that only the 4th can handle, verifying it works without restart.

Pattern taught: Hot-reloadable agent fleets. Zero-downtime swarm evolution.

Module 17 · Ocean Logic

Ocean Logic — Multi-Channel Async Event Streams for Swarms

Once your swarm has more than five agents, direct calls and shared-memory coordination collapse under their own weight. You need a different substrate. I call the pattern I use Ocean Logic because the mental model is maritime: events flow through named currents, agents fish from whichever currents carry tasks they can handle, and the sea itself persists history so nothing is ever truly lost.

The three-zone mental model

Every ocean-logic system has three layers. Thinking about them separately prevents the classic mistake of trying to do everything in one event stream.

🌊

Surface currents — live task flow

Short-lived events representing work that needs to happen now. Task requests, completions, status updates, user interactions. High volume, low durability requirement (minutes to hours). Redis Streams is the right tool here.

🌒

Tidal flows — coordination events

Medium-lived events that coordinate across swarms. Agent lifecycle (registered, deregistered, degraded, restored), capability changes, policy updates, cost thresholds crossed. Lower volume, higher durability (days to weeks). NATS JetStream or Redis Streams with long retention.

🔬

Deep currents — forensic record

The immutable audit log. Every agent action forever. Used for replay, debugging, cost analysis, compliance. Volume: very high. Durability: permanent. Kafka with tiered storage, or Postgres + S3 archival for small-to-medium fleets.

ⓘ

Why separate zones matter. If you put audit events and task events in the same stream, consumers fight each other for throughput, replay becomes expensive, and retention policy becomes a compromise instead of a choice. Separate zones → independent scaling → independent retention.

Event schema discipline — the non-negotiable

Every event flowing through any zone has the same envelope. Deviating from this produces a system nobody can debug.

id — UUID v7 (time-sortable, lexicographically ordered)
type — reverse-DNS identifier like content.draft.completed or agent.health.degraded
version — semver string so consumers can handle schema evolution
timestamp — ISO 8601 with timezone
source_agent — ID of the agent that emitted
correlation_id — the root request ID; lets you trace an entire mission across dozens of events
causation_id — the event ID that directly triggered this one; lets you reconstruct a causal tree
payload — typed, versioned, validated against a schema
cost_cents — tokens and compute cost for this operation (optional but strongly recommended)

Correlation and causation IDs together give you what distributed-systems people call trace context. Any event can be traced upward to its root request and downward to everything it caused.

Choosing your current transport

Transport	Best for	Sovereign cost	Gotcha
Redis Streams	Surface currents. Consumer groups, acks, dead-letter, retention in seconds	$0 (docker)	Single-node durability limited; pair with AOF persistence
NATS JetStream	Tidal flows. Stronger durability, clustering, subject-based routing	$0 (docker)	Less library ecosystem than Redis in TypeScript
Kafka	Deep currents at scale. Tiered storage, infinite retention	Heavy; only if you have fleet-scale volume	Operational overhead; overkill for <10 agents
Postgres LISTEN/NOTIFY	Small teams, no extra infra	$0 (already have Postgres)	Not a real queue; best for prototyping only

The DDS Sovereign AGI Suite uses Redis Streams for surface + tidal and Postgres + S3 for deep. No Kafka. Fifteen agents does not need Kafka. Most teams claiming to need Kafka do not need Kafka.

Consumer groups — the scaling primitive

A single event stream can be consumed by multiple agents without duplicate processing through consumer groups. You declare a group name, multiple agents join the group, the transport distributes events across members. If an agent crashes, its unacked events are redelivered to another member.

Concrete example from the Synthetic Director: the content.brief.created stream has a consumer group called channel-workers. Eight specialist agents (blog, twitter, linkedin, instagram, tiktok, email, podcast, youtube) all join the group. Each brief event is delivered to exactly one of them based on which channel tag the event carries — but if the Twitter agent is degraded, the brief gets redelivered and another agent picks it up. No coordinator. No lock management. The ocean handles it.

Replay — the debugging superpower

Because the deep current retains everything, you can replay any mission. Query the forensic log for all events with a given correlation_id, re-emit them into a clone of the surface current pointing at a staging environment, watch the swarm re-execute the exact sequence. This is how you debug a bad outcome three days after it happened.

Design for replay from day one. It means: events must be idempotent (receiving the same event twice produces the same result), side effects must be gated behind flags that can be disabled in staging, and secrets must not live in payloads.

Backpressure and overload

When a swarm is overloaded, the default behavior of most systems is to collapse silently. Ocean Logic handles this explicitly:

Stream length caps. Surface currents have a max length (e.g., 100,000 events). Producers pushing past the cap trigger an alert and either drop low-priority events or block.
Consumer lag alerts. If a consumer group falls more than N events behind, emit an agent.lag.warn event on the tidal flow. MetaCognitor picks it up and decides whether to spin up more consumers or shed load.
Priority lanes. Critical events (agent.health.degraded, security.violation) flow on a separate high-priority stream that never fills.

✓

The operational insight. Ocean Logic is not a library you install. It is a set of conventions about what streams exist, what events mean, and how consumers behave. Once the conventions are in place, every new agent becomes a simple question: which streams does it publish to, which does it consume from? That is it. You stop thinking about point-to-point connections entirely.

Build prompts for this module

ExpertOcean Logic

O1 · Build the Event Envelope and Validation Layer

Create /lib/events/envelope.ts defining the ocean-logic event envelope. Zod schema: id (uuid v7 string), type (string matching reverse-DNS pattern like a.b.c), version (semver string), timestamp (ISO 8601), source_agent (string), correlation_id (uuid), causation_id (uuid optional), payload (unknown validated separately), cost_cents (integer optional). Implement: createEvent(type, payload, context) returning a fully-formed event with id and timestamps generated, validateEvent(raw) returning a typed event or throwing, linkedEvent(parent, type, payload) producing a new event inheriting correlation_id and setting causation_id to parent.id. Add per-type payload schemas in /lib/events/schemas/ with a registry that validateEvent consults. Unit tests: valid events pass, invalid events throw with descriptive messages, linked events correctly inherit correlation. Run tests and show results.

Pattern taught: The envelope discipline that makes every downstream ocean pattern possible.

ExpertOcean Logic

O2 · Build the Three-Zone Current Manager

Build /lib/ocean/currents.ts exposing a CurrentManager class with three zones: surface (Redis Streams, maxLength 100000, short retention), tidal (Redis Streams, unlimited length, 7-day retention), deep (Postgres events table with hypertable + S3 archival for events older than 30 days). Methods: publish(zone, event), subscribe(zone, streamName, groupName, handler), ack(zone, streamName, messageId), replay(correlationId, targetZone). Handle reconnect with exponential backoff. Include a docker-compose.yml bringing up Redis with AOF persistence, Postgres with TimescaleDB extension, and MinIO for S3-compatible blob storage. Demo: publish 1000 events across all three zones, subscribe with 3 mock consumers, verify each consumer group sees events exactly once, replay a specific correlation_id and show events flowing through the clone stream.

Pattern taught: The physical plumbing of Ocean Logic. Drop-in substrate for any future swarm.

AdvancedOcean Logic

O3 · Build a Consumer Group Worker Template

Build /lib/ocean/worker.ts — a reusable worker template for agents that consume surface currents. Class StreamWorker takes: streamName, groupName, consumerName, eventType filter, handler function, concurrency (default 1). On start: creates the consumer group if missing, enters a loop reading XREADGROUP with 5-second block, spawns up to concurrency parallel handlers, acks on success, moves to dead-letter stream after 3 failed attempts with exponential backoff. Emit health events every 30 seconds (lag, throughput, error rate) to the tidal zone. Graceful shutdown: drain in-flight handlers on SIGTERM before exiting. Demo: start 4 workers in the same group, publish 500 events, verify roughly even distribution, kill one mid-run, verify its unacked events get redelivered to others.

Pattern taught: The worker pattern every specialist agent wraps itself in.

AdvancedOcean Logic

O4 · Build a Replay-Based Debugger CLI

Build a CLI tool /tools/replay.ts that debugs swarm missions by replaying them in isolation. Commands: list --since=7d (list recent correlation_ids with status), trace <correlation_id> (print the full causal tree of events), replay <correlation_id> --target=staging (re-emit the events into a cloned surface stream pointed at a staging swarm), diff <correlation_id_a> <correlation_id_b> (show structural differences between two missions). Queries hit the deep current (Postgres + S3 archive). Add a --dry-run flag that shows what would replay without actually emitting. Ship with README.md showing a full debugging workflow: pick a bad outcome, trace causation, identify the agent that produced the regression, replay against a fixed version of that agent, verify the outcome is now correct.

Pattern taught: Replay-first debugging. You can only debug what you can reproduce.

ExpertOcean Logic

O5 · Build Backpressure and Priority Lane Infrastructure

Extend the CurrentManager from O2 with backpressure and priority lanes. Add: a separate Redis Stream per zone for priority:critical events (no length cap, dedicated consumer workers), stream length monitoring that emits stream.lag.warn tidal events when any surface stream exceeds 80% of maxLength, a shedding policy that drops events with priority=low when a stream is above 95% (log each drop as event.shed forensic record), a consumer lag monitor that emits agent.lag.warn when any consumer group falls more than 1000 events behind its stream head. Build a Grafana dashboard (provisioned via JSON) showing per-stream length, per-group lag, shed rate, priority lane throughput. Demo: generate synthetic load that saturates a stream, watch backpressure kick in, watch priority lane continue flowing, watch shed rate report in Grafana.

Pattern taught: Explicit overload handling. Systems that fail loudly are systems you can fix.

Module 18 · Sovereign Storage

The Five Storage Layers Every AGI System Needs

Nothing sinks a promising AGI system faster than a bad storage decision. The common failure mode is putting everything in Postgres because “Postgres can do it all.” Postgres is extraordinary — and yes, it can technically do all five jobs below — but using one tool for five workloads means each workload is compromised. The sovereign pattern is five purpose-chosen layers, all running locally at zero cost.

Vector DBsemantic memory

Key-Valuehot state

Relationalcanonical truth

Blobartifacts

Time-Seriesobservability

Layer 1 — Vector database (semantic memory)

Purpose. Storing embeddings for RAG, long-term agent memory, similarity search across code or documents. Every agent that needs to “remember” more than its context window uses this.

Sovereign pick. Qdrant (standalone, Rust, hybrid search with payload filtering) or pgvector (extension for Postgres, good if you want one engine and scale is modest). Chroma is fine for prototyping but hits limits past a few million vectors.

Schema discipline. Each vector carries a payload: source document ID, chunk index, embedding model used, content hash, created_at. Always version your embedding model in the payload — when you upgrade from text-embedding-3-small to the next model, you need to know which vectors came from which.

Layer 2 — Key-value store (hot state)

Purpose. Rate limit counters, session data, cache, short-lived coordination flags, consumer-group offsets. Anything read-or-written dozens of times per second that does not need to survive a power cycle.

Sovereign pick. Redis with AOF persistence enabled. Same Redis cluster you already run for Ocean Logic Surface currents handles this workload.

Key design. Use hierarchical keys: agent:<id>:health, ratelimit:<user>:<window>, session:<uuid>. TTL everything that is not permanent. The moment you have keys without TTLs you have a memory leak waiting to happen.

Layer 3 — Relational database (canonical truth)

Purpose. Agent registry, capability index, user accounts, subscription state, anything you need transactional guarantees for, anything you will want to query with SQL later. This is the source of truth that all other layers derive from.

Sovereign pick. Postgres 16+ with the TimescaleDB extension (covered in Layer 5) and pgvector if you are consolidating the vector layer.

Schema discipline. UUID primary keys (not serial integers — collisions across environments are a nightmare). created_at and updated_at on every table. Soft delete via deleted_at timestamp, never hard delete anything in production.

Layer 4 — Blob store (artifacts)

Purpose. Screenshots from browser sub-agent, generated PDFs, image assets, large event payloads (stored by hash — see Module 15 prompt B14), model snapshots, full documents that are referenced by vector embeddings. Anything binary or large.

Sovereign pick. MinIO (S3-compatible, runs in a single docker container, presents the exact same API as AWS S3). When you eventually migrate to cloud, your code changes zero lines — only the endpoint URL.

Key design. Content-addressable: key = SHA-256 of content. Two agents producing identical output produce identical keys, automatic deduplication. Metadata (who, when, correlation_id) lives in the relational DB pointing at the blob key.

Layer 5 — Time-series database (observability)

Purpose. Agent health metrics (latency, error rate, quality score over time), cost tracking, system metrics from your Ocean Logic layer, request rates, token consumption. Any data where the primary query pattern is “give me values for this metric between time A and time B.”

Sovereign pick. TimescaleDB (Postgres extension, so you reuse your existing Postgres install) or InfluxDB (standalone, purpose-built). Start with TimescaleDB unless you have a specific need InfluxDB solves.

Retention discipline. TimescaleDB compression on chunks older than 7 days (10x storage savings). Drop chunks older than 90 days unless you have a compliance requirement. Aggregate into continuous aggregates (hourly, daily) so dashboards never scan raw rows.

The when-to-use decision matrix

Data	Right layer	Wrong layer (common mistake)
Agent embeddings for recall	Vector DB	Postgres with JSONB (no similarity search)
Rate limit counters	Redis	Postgres (contention under load)
Agent registry and capabilities	Postgres	Redis (no durability guarantees)
Browser sub-agent screenshots	MinIO blob	Postgres bytea (bloats the DB)
Agent latency over time	TimescaleDB	Postgres append-table (slow aggregates)
Deep forensic audit events	Postgres + S3/MinIO archival	Kafka (overkill for <50 agents)
Session tokens	Redis with TTL	Postgres (never expire bugs)
Active task queue	Redis Streams	Postgres table with SELECT FOR UPDATE (lock contention)

The sovereign compose stack

The entire five-layer stack runs on a single machine at zero monthly cost. The DDS Sovereign AGI Suite uses this exact topology:

Postgres 16 + TimescaleDB + pgvector — one container, handles Layers 3, 5, and optionally 1
Redis 7 with AOF — one container, handles Layer 2 and Ocean Logic surface/tidal
MinIO — one container, handles Layer 4, S3-compatible
Qdrant — one container (optional, only if vector workload outgrows pgvector)
Grafana — one container for observability dashboards on top of Layer 5

Five containers, one docker-compose up. Total memory footprint on a modern workstation: about 4–6 GB. Exposed to the outside world via a Cloudflare Tunnel if you need remote access; otherwise localhost-only for maximum security.

⚠

Backup is not optional. Sovereign means you own the hardware — and the responsibility. Set up automated nightly snapshots of Postgres, Redis AOF, and MinIO to an external drive. Test the restore quarterly. A sovereign system without tested backups is a time bomb.

Build prompts for this module

ExpertStorage

D1 · Provision the Five-Layer Sovereign Stack

Create an /infra/ directory with a production-ready docker-compose.yml bringing up all five storage layers. Services: postgres (image postgres:16 with TimescaleDB and pgvector extensions enabled via init script, AOF persistence via volume, non-default password from .env), redis (image redis:7-alpine, AOF enabled with appendfsync everysec, maxmemory policy allkeys-lru, password from .env), minio (quay.io/minio/minio, console on 9001, API on 9000, credentials from .env, single volume), qdrant (qdrant/qdrant, REST on 6333, gRPC on 6334, API key from .env), grafana (grafana/grafana-oss with provisioning volumes for dashboards and datasources, anonymous disabled). Add healthchecks to every service. Include init-postgres.sql that CREATE EXTENSION timescaledb and pgvector on startup. Add a Makefile with targets: up, down, logs, backup (dump postgres + copy minio data to /backups/ with timestamp), restore (from /backups/). Add .env.example documenting every required variable. README with quickstart. Run docker compose up and verify every service is healthy via curl in the terminal.

Pattern taught: The exact stack the Sovereign AGI Suite runs on. Zero-cost, full-ownership foundation.

ExpertStorage

D2 · Build a Unified Storage Client Facade

Build /lib/storage/index.ts exporting a typed StorageClient that wraps all five layers behind one interface. Methods: vectors.upsert, vectors.search, vectors.delete (Qdrant client with payload filter support), kv.get, kv.set, kv.del, kv.incr (Redis with typed helpers), db.query (Postgres via Prisma), blob.put, blob.get, blob.stat, blob.delete (MinIO via aws-sdk v3 with S3 client), metrics.record, metrics.range (TimescaleDB via raw SQL to hypertables). Each method logs to Ocean Logic tidal zone as storage.op.completed with latency and byte counts. Add retry with exponential backoff on transient errors. Add a health() method returning status of every layer. Write integration tests (requires the compose stack from D1 to be up) covering every method. Demo by running the test suite and showing all green.

Pattern taught: One client, five layers. Agents never touch a storage driver directly.

AdvancedStorage

D3 · Build a Content-Addressable Blob Layer with Dedup

Build /lib/storage/blobs.ts on top of MinIO implementing content-addressable storage. API: put(bytes) returning {key, size, contentType, dedup} where key is sha256 of bytes and dedup is true if the same content already existed. get(key) returning bytes. stat(key) returning metadata. ref(key, ownerAgentId, correlationId) recording a reference in Postgres blob_refs table. unref(key, ownerAgentId) decrementing, garbage collecting the blob when refcount reaches zero. Add a gc() command that scans for orphaned blobs (no refs) older than 30 days and deletes them from MinIO. Demo: store the same 1MB image three times from three different agents, verify only one physical copy exists, unref twice, verify blob still exists, unref third time, verify gc deletes it.

Pattern taught: Content-addressable storage with refcounting. Free deduplication for a swarm that produces lots of similar artifacts.

AdvancedStorage

D4 · Build the Observability Layer with Continuous Aggregates

Set up the TimescaleDB observability layer. Create a hypertable agent_metrics(timestamp, agent_id, metric_name, value, labels jsonb) with chunk interval of 1 day. Add compression policy for chunks older than 7 days. Add retention policy dropping chunks older than 90 days. Create continuous aggregates: agent_metrics_hourly (avg, p50, p95, p99 per agent per metric per hour) and agent_metrics_daily. Build a StorageClient.metrics helper that writes to the hypertable with a batch buffer (flush every 1s or 1000 events). Provision Grafana dashboards (JSON in /infra/grafana/dashboards/): Fleet Overview (per-agent latency heatmap, error rate per agent, request rate per agent), Cost Tracker (tokens/hour per agent, cost/day breakdown), Storage Health (size per layer, growth rate). Demo by generating synthetic metrics for 10 agents over 24 hours of simulated time and showing the dashboards populate.

Pattern taught: Time-series done right. Compression, retention, and continuous aggregates turn observability from expensive to free.

ExpertStorage

D5 · Build an Automated Backup and Restore System

Build /infra/backup/ with automated backup and tested restore for the entire sovereign stack. Scripts (bash + Node): backup.sh runs nightly via cron, performs pg_dump of Postgres (custom format, compressed), redis-cli BGSAVE + copy of AOF file, mc mirror of MinIO to a backup bucket, snapshot of Qdrant collections to blob storage. Writes a manifest.json listing what was backed up with timestamps and checksums. rotation.sh keeps 7 daily, 4 weekly, 12 monthly. restore.sh <manifest_path> restores every layer from the specified backup, verifies checksums, reports any mismatches. verify.sh runs weekly: restores the latest backup into a parallel compose stack on alternate ports, runs a smoke test suite (read one row from each expected table, one key from Redis, one blob from MinIO), reports pass/fail via email or webhook. Document the full workflow in BACKUP.md including the quarterly restore drill. Demo by running backup.sh, then restore.sh into a parallel stack, then showing that the restored stack has identical data.

Pattern taught: Backups you have not tested are not backups. Sovereign means tested-restore discipline.

Module 19 · Self-Healing Systems

MetaCognition, Circuit Breakers, and Repair Routines

Any swarm that runs longer than a week without self-healing mechanisms will eventually degrade silently. Models drift, prompts rot, dependencies change, data shifts. The difference between a production AGI system and a demo is that the production system detects its own degradation and fixes itself before you notice.

The four mechanisms (layered, not alternatives)

Circuit breaker

After N consecutive failures (typical: 3), disable the affected agent for a cooldown period (typical: 5 minutes). Incoming tasks route to a sibling agent or fall into a retry queue. Prevents a failing agent from cascading its failure across the swarm.

Retry with exponential backoff and jitter

Transient failures (network blip, rate limit, timeout) should retry. Exponential delay between attempts (1s, 2s, 4s, 8s) with 20% random jitter to prevent thundering herd. Max attempts configurable per task type. Distinguish retryable errors from permanent ones — a schema validation failure is not retryable.

Forensic audit log

Every input, output, duration, error stored in the deep current (Module 17). Not for viewing day-to-day — for the day an agent ships a bad outcome and you need to reconstruct why. This is the feedstock for every other mechanism.

MetaCognitor — quality drift detection

A specialized agent whose only job is watching other agents. Replays recent tasks, scores outputs against a rubric, compares rolling averages, triggers repair when drift exceeds threshold. The difference between reactive (fixes after a crash) and proactive (fixes before anyone notices).

What MetaCognitor actually does

Every 15 minutes (configurable), the MetaCognitor runs this loop for each registered agent:

Query the deep current for the last 20 completed tasks from this agent
For each task, re-run the agent's output through a scoring rubric — LLM-as-judge against the original input
Compute rolling 7-day average quality score; compare to rolling 30-day baseline
Compute error rate and p95 latency over last 24 hours; compare to baseline
If any metric drifts below threshold (quality drops >10%, error rate >2x baseline, latency >3x baseline): emit agent.health.degraded on the tidal zone
Run the repair routine for that agent type
Emit agent.health.repair-attempted with the repair action taken
After next observation window, check if metrics recovered; if not, escalate to agent.health.failed and alert human

The repair routine ladder

When MetaCognitor detects drift, it tries fixes in order from cheapest to most disruptive. Each step is logged. Most issues resolve at step 1 or 2.

Step	Repair	When
1	Temperature adjustment	Quality drift without error rate change. Lower temp for more consistency.
2	System prompt mutation	Inject recent failure examples as few-shot anti-examples into the system prompt.
3	Model swap	Switch from Gemini 3.1 Pro to Claude Opus 4.6 (or vice versa) for a second opinion on the task type.
4	Agent restart	Kill the process, clear its working memory, reload from last good state.
5	Pin to last-known-good version	If the agent has versioned prompts/configs, roll back to the version that produced the baseline metrics.
6	Quarantine	Disable the agent entirely. Emit high-priority alert. Humans take over until debugged.

The repair_audit.json discipline

Every repair attempt writes to /data/repair_audit.json (mirrored to the forensic log). Schema: timestamp, agent_id, trigger (which metric crossed threshold), action taken, action parameters, pre-action metrics, post-action metrics (recorded 30 min later), success (did metrics recover). This file is gold for post-mortems and for training better MetaCognitor thresholds.

⚠

Self-healing is not self-improvement. MetaCognitor restores agents to their baseline; it does not make them better than they were. Actual improvement requires offline fine-tuning or prompt engineering based on repair_audit.json patterns. Do not conflate the two — autonomous self-improvement is a much harder problem with different safety properties.

Build prompts for this module

ExpertSelf-Healing

H1 · Build a Circuit Breaker Wrapper

Build /lib/resilience/circuit-breaker.ts as a generic wrapper around any async function. Class CircuitBreaker accepts: failureThreshold (default 3), cooldownMs (default 300000), halfOpenProbeCount (default 1). States: CLOSED (normal), OPEN (rejecting calls), HALF_OPEN (probing). On call: if CLOSED, execute normally and track consecutive failures; if threshold hit, transition to OPEN and schedule transition to HALF_OPEN after cooldown; if HALF_OPEN, allow one call through, success returns to CLOSED, failure returns to OPEN. Emit events on every transition. Wrap the Agent interface from S1 so every agent call is automatically breakered. Demo: create an agent that fails 5 times then succeeds, observe breaker open after 3 failures, reject for 5 min, half-open, probe succeed, close.

Pattern taught: Protects the swarm from cascading failure. The smallest mechanism with the biggest payoff.

ExpertSelf-Healing

H2 · Build the MetaCognitor Agent

Build a MetaCognitor agent implementing the monitoring loop described above. Runs every 15 minutes via cron or a scheduler. For each registered agent: query last 20 task events from the deep current, re-score outputs using Claude Opus 4.6 as judge with a 5-dimension rubric (correctness, completeness, format, relevance, cost-efficiency), compute rolling 7-day and 30-day quality averages, compute 24-hour error rate and p95 latency, compare to baselines, emit agent.health.degraded if any metric drifts past threshold. Write baselines to Postgres agent_baselines table so they survive restarts. Include a CLI: metacog status (print current health per agent), metacog rebaseline <agent-id> (force a new baseline), metacog disable <agent-id> (pause monitoring). Demo: run the MetaCognitor against 3 mock agents, one of which has injected degradation, observe correct detection.

Pattern taught: The watcher that makes the swarm reliable past the first week.

ExpertSelf-Healing

H3 · Build the Repair Routine Ladder

Build /lib/resilience/repair.ts implementing the 6-step repair ladder. Function attemptRepair(agentId, trigger) walks the ladder from cheapest to most disruptive. Each step: record pre-action metrics snapshot, execute action, wait 30 min, record post-action metrics, compare, write full entry to repair_audit.json and the forensic log. Step implementations: step1 adjusts temperature via agent.setTemperature, step2 appends last 3 failure examples to agent.systemPrompt as anti-examples, step3 calls agent.setModel with the fallback model from a configured model map, step4 calls agent.restart, step5 restores agent.config from the last version tagged gold in a config history table, step6 calls registry.disable(agentId) and emits agent.health.failed at priority critical. Stop at the first step that restores metrics. Include a dry-run mode that logs what would be attempted without actually changing anything. Demo by injecting 3 different failure modes into mock agents and showing the ladder correctly picks the right repair.

Pattern taught: Structured self-repair. The difference between an outage and a non-event.

Module 20 · Shopify Vibe Coding

The Atelier 3.4.0 + SEO Magnet Playbook

Antigravity plus a well-authored Shopify Skill turns hours of Liquid work into minutes. But Shopify has sharp edges — the Atelier theme has conventions, the Basic plan has a performance ceiling, and Liquid breaks in ways that no amount of TypeScript strict-mode can prevent. This module is the playbook for shipping Shopify work at S-tier.

The five non-negotiable Shopify rules

Scope everything. All CSS under a unique section prefix. All IDs prefixed. All JS in an IIFE. No global pollution. Atelier has enough of its own CSS and JS to conflict with.
No !important on body, html, header, or footer. Theme-level overrides come back to bite you when Shopify updates Atelier.
No jQuery. Vanilla JS only. jQuery in one section conflicts with Atelier 3.4.0 native.
No Google Fonts preconnects. DDS self-hosts Playfair Display and DM Sans as WOFF2. External font calls hurt Lighthouse and duplicate on every page.
Validate schema ranges. Shopify rejects a section if (max - min) / step is less than 3, or if default is not reachable from min by step. This breaks silently until you try to save section settings.

The Shopify Antigravity workflow

Create an AGENTS.md for the theme

At the theme root, AGENTS.md declaring the theme version (Atelier 3.4.0), the DDS palette and fonts, the non-negotiable rules above, and the preferred patterns (section-first, Liquid variables for all dynamic content).

Install the seo-magnet skill

At .agent/skills/seo-magnet/, the DDS SEO Magnet V2 skill. Any page.liquid work triggers it automatically and produces 9+ schemas, speakable boxes, sticky TOC, FAQ accordion with commercial-intent questions.

Install the shopify-section-author skill

At .agent/skills/shopify-section-author/, a skill covering section scaffolding, schema validation, IIFE patterns, and DDS color tokens. Triggers when you ask for a section.

Author sections in Manager View with Planning Mode

Plan first, approve, build. The browser sub-agent verifies at 375, 768, 1280, 1920 before committing. Lighthouse score captured and embedded in the PR description.

Performance gate before commit

Shopify Basic ceiling is ~77–82 mobile, ~90+ desktop. Any section that drops mobile below 75 is rejected. The gate is enforced by an agent that runs the Lighthouse MCP before approving the commit.

The DDS Shopify Skill anatomy

The high-leverage Skills I maintain in every DDS workspace:

seo-magnet — 9+ JSON-LD schemas, OG+Twitter, speakable, commercial-intent FAQ, sticky TOC. All-in-One pattern.
shopify-section-author — scoped CSS discipline, schema validation, responsive breakpoints, accessibility. Reference + Few-Shot pattern.
dds-brand-tokens — palette, fonts, certifications (GOTS/GRS/OCS/PETA-Approved Vegan/Fair Trade never OEKO-TEX), spacing, radius tokens. Reference pattern.
atelier-header-rules — DDS Header v1.0 at z-index 9000, native Atelier header disabled, sticky-nav conflicts to avoid. Basic Router pattern.
shopify-performance — Core Web Vitals gate, lazy-load rules, preconnect audit, judge.me crossorigin discipline. Tool-Use pattern with an executable Lighthouse script.

✓

Compounding leverage. Once these five Skills are in place, every subsequent Shopify task takes 10–20% of the time it used to. You stop typing the same rules every session. The agent internalizes them via Progressive Disclosure.

Module 21 · Debugging, Observability & Cost Control

When Things Go Wrong — Debugging a Swarm in Production

Every production AGI system fails in ways a single-agent script never does. Symptoms are diffuse (output quality dropped but no error logged). Causes are temporal (an event 3 hours ago caused an output now). Fixes require replaying the past. This module is the field manual.

The three classes of swarm failure

Class	Signature	First action
Loud failure	Error thrown, task marked failed, alert fires	Read the forensic event, fix the agent, replay the correlation_id
Silent quality drift	No errors but outputs getting worse over days	MetaCognitor should have caught it. If not, re-baseline and widen the rubric.
Cost runaway	Monthly spend suddenly 4x; no alert	Query cost_cents from deep current grouped by agent+day. Identify the agent or loop responsible.

The cost-control discipline (with the credit system reality)

Since March 2026 Antigravity has run on a credit system. Ultra subscribers report unpredictable throttling. The defensive pattern:

Cost ceilings per agent per day. Hard limit written into the Agent config. When hit, agent refuses new tasks and emits agent.cost.ceiling-hit. Prevents one runaway agent from burning the whole day's credits.
Model tier routing. Route obvious tasks (code completion, simple rewrites) to Gemini 3 Flash. Route hard tasks (architecture, security review) to Claude Opus 4.6 or Gemini 3.1 Pro. Use the prompt-eval harness (P43) to pick the cheapest model that hits your quality bar.
Cache aggressively. Semantic cache on prompt+model pairs. If the same semantically-similar prompt has been answered in the last 24 hours by the same model, return the cached answer. Can cut costs 40%+ on repetitive work.
Budget-of-the-day pre-flight. At the start of a Manager View mission, the Orchestrator estimates token cost. If the estimate exceeds remaining daily budget, surface the warning to you before spawning agents.

The debugging flow chart

Bad outcome reported

Capture: the correlation_id (from the output metadata), the expected outcome, the actual outcome, the observed timestamp.

Trace the causal tree

Run replay trace <correlation_id> (the tool from Module 17 O4). Reveals every agent that touched this mission, in order, with timing.

Identify the suspect agent

Walk the causal tree from root. The suspect is the first agent whose output diverges from expected. Subsequent agents are amplifying or masking the problem.

Inspect inputs and outputs

Pull the suspect's event from deep current. Check: was the input malformed? Did the agent's prompt change recently? Did the model version change? Did a dependency produce different output than before?

Reproduce in staging

replay <correlation_id> --target=staging. Watch the bad outcome reproduce exactly. This confirms you understand the failure.

Fix + test

Modify the suspect agent. Replay again. Verify the outcome is now correct. Write a test that the staging replay catches. Deploy to production.

Update the MetaCognitor rubric

Add the failure pattern to the MetaCognitor's scoring rubric so this class of drift gets caught next time.

Observability dashboards every sovereign deployment needs

Fleet Overview. Status dot per agent, last-task timestamp, current task, error rate 24h, cost today.
Cost Tracker. Spend per agent per day over last 30 days, projected month-end cost, budget remaining.
Quality Trends. MetaCognitor quality score per agent over 30 days with baseline overlay.
Ocean Health. Per-stream length, per-group lag, dead-letter rate, priority lane throughput.
Storage Health. Size per layer, growth rate, vacuum status for Postgres, AOF rewrite status for Redis.

Module 22 · vs Cursor vs Claude Code vs Copilot vs Cline

Honest Comparison — April 2026 Capability Levels

No single tool wins every dimension. I use Antigravity as primary, Claude Code for terminal-heavy work, and Cursor for pure-editor flow. The comparison below is the honest capability matrix as of April 2026 based on verified features.

Capability	Antigravity	Cursor	Claude Code	Copilot	Cline
Paradigm	Agent-first IDE	AI-enhanced editor	Terminal agent	Code completion	VS Code agent
Multi-agent parallel	Yes (Manager View)	No	No	No	No
Browser sub-agent	Built-in Chromium	No	No	No	No
Skills / Progressive Disclosure	Native (SKILL.md)	Rules only	CLAUDE.md only	No	Rules only
AGENTS.md support	Yes (v1.20.5+)	Yes	Via CLAUDE.md	No	Yes
Primary context window	1M (Gemini 3.1 Pro)	~200K	200K (Sonnet/Opus)	~8K	200K
Planning artifacts	Task lists, plans, screenshots	Partial	Inline plans	No	Inline plans
MCP integration	Native	Native	Native	Limited	Native
Model options	Gemini, Claude, GPT-OSS	Claude, GPT, Gemini	Claude only	GPT, Claude	Claude, Gemini, DeepSeek
Free tier	Yes (rate-limited)	Limited	No (API billing)	No	BYOK only
Pricing transparency	Credits (controversial)	$20 transparent	API metered	$10 flat	BYOK
Security (Strict Mode)	Yes (Q1 2026)	Partial	Partial	Basic	Partial
Best for	Multi-agent, full-stack, AGI	Editor power users	Terminal workflows	Quick completions	VS Code natives
Weakness	Credit unpredictability, resource drain	No browser agent, single-agent	No GUI	Shallow context	Less polished than Cursor

The honest recommendation

Building AGI systems or multi-agent workflows: Antigravity. Nothing else comes close on Manager View parallelism and browser sub-agent validation.
Solo editor work with strong flow state: Cursor. Most polished single-agent experience with predictable pricing.
Terminal-heavy infrastructure work: Claude Code. Lives in your shell, fast, direct.
Casual autocomplete: GitHub Copilot. $10/month for solid completions, nothing more needed.
VS Code with open-source agents: Cline. BYOK, strong community, no vendor lock-in.

I run Antigravity, Claude Code, and Cursor simultaneously on the same machine. Different tool for different work. No allegiance required.