What’s New Through April 2026 — The Release-Note Briefing
Antigravity ships fast. If your knowledge of the platform is from launch (November 18, 2025), you are missing five months of consequential changes. This module is the recency anchor for everything that follows.
Major releases since launch
| Version | Date | Headline Change |
|---|---|---|
| 1.21.6 | Mar 25, 2026 | Linux sandboxing, condensed chat UI, AGENTS.md alongside GEMINI.md, one-click chat archival, deprecation of Manager Follow-along and Playground |
| 1.20.6 | Mar 17, 2026 | Fix for customizations creation (rules and workflows could not be created) |
| 1.20.5 | Mar 9, 2026 | AGENTS.md support introduced; Auto-continue setting deprecated and made default-on |
| AgentKit 2.0 | March 2026 | 16 specialized agents, 40+ domain-specific skills, deeper Manager parallelism |
| 1.18.x / 1.19.x | Feb 2026 | Gemini 3.1 Pro general availability, dedicated Models settings screen with quota visibility, artifact download support, terminal integration toggle |
| Credit System | March 2026 | Replaced quota model with explicit credits — sparked the “paperweight” backlash; Ultra users still report restrictions |
| Strict Mode | Q1 2026 | Master security override that forces Request Review on every action and isolates workspace |
The credit system reality check. Google replaced (vague) quota guarantees with a credit-based system in March 2026. Community response was sharp — posts about Antigravity being a “paperweight” spread across X. As of April 2026 the credit system remains in place with incremental adjustments. If you subscribe, go in with realistic expectations about quota limits even on Ultra at $249.99/month.
What still works the same
The two-view paradigm (Editor View + Manager View), Artifacts as the trust mechanism, Browser Sub-Agent for autonomous testing, multi-model selection within a single mission, VS Code-fork foundation, and the underlying Gemini 3 family — all unchanged in core behavior. Most of your existing knowledge transfers; the additions are what make April 2026 different.
What Antigravity Actually Is — and What It Is Not
Antigravity is a standalone agentic IDE built on a heavily modified VS Code fork. Announced November 18, 2025 alongside Gemini 3. The single most important thing to internalize: this is not a coding assistant. Coding assistants help you write code faster. Antigravity replaces the act of writing with the act of directing.
The two interfaces
Editor View — Synchronous IDE
Familiar VS Code surface with three AI enhancements: tab completions (project-aware), inline commands (highlight then refactor/explain/debug), and a chat panel for architectural discussion. This is where you sit when you want to be hands-on.
Manager View — Mission Control
Asynchronous workspace where you spawn, monitor, and review multiple autonomous agents in parallel. Each agent operates in its own workspace. This is where Antigravity’s multi-agent advantage actually lives. Toggle with Cmd+E (Mac) or Ctrl+E (Windows/Linux).
The three surfaces every agent can touch
Editor
Full read/write access to your project files. Creates, modifies, deletes with full architectural awareness via the 1M token Gemini 3.1 Pro context.
Terminal
Installs dependencies, runs builds, executes tests, manages Git — gated by your Terminal Execution policy and Allow/Deny lists.
Browser Sub-Agent
Built-in Chromium browser the agent controls directly. Opens your app, clicks through journeys, screenshots every step, records sessions. Competitors have not matched this as of April 2026.
Artifacts — the trust layer
Scrolling through raw tool calls is tedious and unverifiable. Antigravity solves this by having agents generate Artifacts — task lists, implementation plans, screenshots, browser recordings, code diffs. You comment on Artifacts the way you comment on a Google Doc. The agent incorporates feedback without stopping execution.
Mental model shift. You are not the writer anymore. You are the senior engineering manager. Define what and why. Review Artifacts. Approve, redirect, or reject. The shift is bigger than the tool.
Installation, First Launch, and the Critical First Five Minutes
Free public preview. Personal Gmail required for free Gemini 3.1 Pro quota. Under 10 minutes to a working agent on Windows, macOS, or Linux. Linux sandboxing landed in v1.21.6 — make sure you are on that version or newer if you are on Linux.
Download
Visit antigravity.google/download. ~235 MB installer: .exe (Windows), .dmg (macOS), .deb / .AppImage (Linux). The Windows auto-updater has had detection bugs across multiple releases — bookmark the download page and check manually if you suspect you are stale.
Setup flow
Choose Fresh Start to learn agent-native patterns. Importing VS Code or Cursor settings carries over keybindings and themes but also imports old reflexes. Fresh Start for the first month.
Sign in
Personal Gmail unlocks the free Gemini 3.1 Pro tier. Workspace/Enterprise Gmail requires the Enterprise tier.
Terminal Execution Policy — most consequential setup decision
Three options. Off (Allow List only) = maximum safety, agent cannot execute anything not on your allow list. Auto (Agent Decides) = balanced; agent auto-runs safe commands and asks for risky ones. Turbo (Deny List only) = maximum speed, executes everything except items on the deny list. Never use Turbo on production projects. There is a documented community incident (896 upvotes) of a Turbo-mode agent deleting an entire drive.
Open your first workspace
Create a focused project folder. The agent performs best with focused context. Resist pointing it at your entire ~/code directory — pick one project.
Initialize Git before anything else
Non-negotiable. git init && git add . && git commit -m "pre-agent baseline". Git is your undo button when an agent does something unexpected.
Pro setup: Create ~/antigravity-projects/ as a parent directory. One project per subfolder. Strict Mode enabled by default for any folder containing client code or production credentials. AGENTS.md committed to every workspace before the first agent invocation.
Every Setting, Every Panel, Every Recommended Value
Open settings with Cmd+, (Mac) or Ctrl+, (Windows/Linux). Below is the complete map with my recommended values for three contexts: Solo (your own projects), Team (shared codebases), and Enterprise (client work or production credentials present).
Settings → Terminal
| Setting | Solo | Team | Enterprise |
|---|---|---|---|
| Terminal Execution Policy | Auto | Auto + Allow List | Off (Allow List only) |
| Allow List | Common dev commands | Curated team list | Minimal explicit list |
| Deny List | rm -rf, sudo, curl|sh | + git push --force, DROP | + all network commands |
| Terminal Integration | On | On | On |
Settings → Models
Available since v1.18.x with the dedicated Models screen. Shows quota usage. As of April 2026 you can set per-agent default models in Manager View — assign Claude Opus 4.6 to your Reviewer agent and Gemini 3.1 Pro to your Builder for cost optimization.
Settings → Browser
| Setting | Solo | Team | Enterprise |
|---|---|---|---|
| Domain Allowlist | Add localhost + your domains | Allowlist only | Allowlist + manual approval per nav |
| Default domains | Audit and trim | Remove webhook.site | Strip everything not explicit |
| Allowlist file location | ~/.gemini/antigravity/browserAllowlist.txt | ||
Default Browser Allowlist includes webhook.site — commonly used for data exfiltration in prompt-injection attacks. Remove it on any project handling credentials. Sources: ReadySetCompute and Antigravity.codes security audits, February 2026.
Settings → Customizations (Rules & Workflows)
Click the ... menu top-right and choose Customizations. Two tabs: Rules and Workflows. Both can be Global (every workspace) or per-Workspace.
| Type | Global Path | Workspace Path |
|---|---|---|
| Rules (legacy) | ~/.gemini/GEMINI.md | <workspace>/GEMINI.md |
| Rules (modern) | ~/.gemini/AGENTS.md | <workspace>/AGENTS.md |
| Workflows | ~/.gemini/antigravity/workflows/ | <workspace>/.agent/workflows/ |
| Skills | ~/.gemini/antigravity/skills/ | <workspace>/.agent/skills/ |
Settings → Advanced
Non-Workspace File Access: default enabled (agent can read/write outside your project). Disable on any sensitive project to prevent path-traversal exfiltration of ~/.ssh or ~/.aws. Auto-continue: deprecated as a toggle in v1.20.5 (now default-on). Follow-along mode (Manager): deprecated in v1.21.6. Playground (Manager): deprecated in v1.21.6.
Settings → Review Policy
Three options: Always Review (every artifact gets your approval), Agent Decides (agent flags ones it thinks need review), Skip Review (no manual gates). Solo: Agent Decides. Team or Enterprise: Always Review.
Strict Mode, the Drive-Deletion Incident, and Hardening
Antigravity defaults are tuned for productivity, not security. The agent can execute commands, read your .env, browse the web, and modify global configs. One widely-reported community incident (896 upvotes) involved a Turbo-mode agent deleting an entire drive after interpreting a vague instruction destructively. Your security posture matters.
Strict Mode — the master override
Strict Mode is a hard override toggle introduced in Q1 2026. When enabled it:
- Forces Request Review on every terminal, artifact, and browser action
- Ignores your Allow List entirely — every command needs human approval
- Isolates the agent to the current workspace (no global file access)
- Denies network access for terminal commands
- Cannot be temporarily disabled mid-session — exit and reconfigure
When to enable Strict Mode: any client project, anything containing .env with real credentials, anything checked out from a private repo, anything with database connection strings or API keys. Rule of thumb: if you would not let a brand-new contractor run the script blind, enable Strict Mode.
The five settings that prevent 95% of agent damage
| Setting | Default (Risky) | Hardened |
|---|---|---|
| Terminal Execution Policy | Auto | Off (Allow List only) |
| Non-Workspace File Access | Enabled | Disabled |
| Browser Domain Allowlist | Includes webhook.site | Stripped to required only |
| MCP Tool Approval | Auto-invoke | Manual approval per call |
| Read .gitignored Files | Enabled | Disabled |
Recommended terminal Deny List
# Filesystem destruction rm -rf / rm -rf ~ rm -rf * sudo rm # System control sudo shutdown sudo reboot sudo passwd chmod -R 777 chown -R # Database destruction DROP DATABASE DROP TABLE TRUNCATE TABLE # Network exfiltration risk curl | sh wget | sh curl --upload-file nc -e ssh-keygen -f # Git history loss git push --force origin main git push -f origin master git reset --hard HEAD~ # Credential exposure cat ~/.ssh/ cat ~/.aws/credentials cat .env
Sandbox the agent itself
Linux sandboxing landed in v1.21.6. On macOS and Windows the agent runs with your user permissions. For maximum isolation, run Antigravity inside a Docker container, a VM, or on a dedicated user account that does not own production credentials. The 30-second cost is worth it.
The cardinal rule. Antigravity agents are powerful, fast, and not infallible. Your job as mission controller is to catch mistakes before they reach disk, before they reach Git, and before they reach production. Always review. Always test. Always commit before, never after.
Model Selection Matrix and the Credit System Reality
The five available models (April 2026)
| Model | Provider | Context | Best for |
|---|---|---|---|
| Gemini 3.1 Pro | Google DeepMind | 1M tokens | Default for Planning Mode, complex reasoning, long-context refactors |
| Gemini 3 Flash | Google DeepMind | 1M tokens | Tab completions, fast iteration, cost-sensitive batch tasks |
| Claude Sonnet 4.6 | Anthropic | 200K | Strong code generation, second opinion on Gemini outputs |
| Claude Opus 4.6 | Anthropic | 200K | Maximum reasoning, architectural review, security audit prompts |
| GPT-OSS-120B | OpenAI | 128K | Open-weight option for compliance-restricted projects |
Pricing tiers (as of April 13, 2026)
| Tier | Price | What you get |
|---|---|---|
| Free | $0 | Rate-limited Gemini 3.1 Pro, daily credit allocation, full Antigravity feature set |
| Pro | $20/mo | Bundled with Google AI Pro. Higher agent request limits per Google’s official tier table. Best value for daily users who also use Gemini for non-coding tasks. |
| Ultra | $249.99/mo | Bundled with Google AI Ultra. Highest limits — but Ultra users have reported quota restrictions since the March 2026 credit-system change. |
The credit system, honestly. Google moved from quota guarantees to credits in March 2026. Documentation does not clearly state whether unused credits expire. Even Ultra subscribers have reported lockouts. If predictable monthly cost matters more than multi-agent capability, Cursor at $20 with transparent usage limits is the safer bet. If multi-agent parallelism matters more, Antigravity wins.
Per-agent model strategy
In Manager View you can assign different models to different agents in the same mission. The high-leverage pattern:
- Builder agents: Gemini 3.1 Pro (best long-context awareness)
- Reviewer agents: Claude Opus 4.6 (sharpest critique on code quality)
- Test-writer agents: Claude Sonnet 4.6 (consistent test patterns)
- Doc-writer agents: Gemini 3 Flash (fast and cheap for prose)
- Browser-tester agents: Gemini 3.1 Pro (multimodal vision matters)
AGENTS.md, GEMINI.md, and Rules Mastery
This is the single highest-leverage file in your project. AGENTS.md tells every agent in every session how to behave — code style, architecture rules, validation requirements, communication patterns. Antigravity loads it into every prompt automatically.
AGENTS.md vs GEMINI.md
Antigravity added AGENTS.md support in v1.20.5 (March 9, 2026). It now reads both AGENTS.md and the legacy GEMINI.md. AGENTS.md is the cross-tool standard — the same file works in Cursor, Codex CLI, Claude Code, and Antigravity. Use AGENTS.md as your default. Use GEMINI.md only when you have Antigravity-specific instructions you do not want other agents to see.
S-tier AGENTS.md template (the DDS pattern)
# Project: [Name] # Stack: [tech stack one line] # Last reviewed: 2026-04-13 ## Identity You are working in a production codebase. Output is shipped, not prototyped. Every change must be reviewed before merge. Bias to safety over speed. ## Code Style — Non-Negotiable - TypeScript strict mode. No `any` without a justifying comment. - Functional components only (no React class components). - Named exports only. No default exports. - `const` over `let`. Never `var`. - Error messages are user-facing: friendly, never expose stack traces. - File length cap: 300 lines. Split larger files into focused modules. ## Architecture - Feature-based folders: /features/[name]/{components,hooks,utils,types,__tests__} - All API calls go through /lib/api client. Never raw fetch in components. - Environment variables loaded through /lib/env.ts with Zod validation. - No business logic in components. Logic lives in hooks or services. ## Testing — Mandatory Before Marking Complete - Every function has a test file co-located with __tests__/. - Minimum three cases per function: happy path, edge case, error case. - Run `npm test` and report results before claiming task done. - Never mark a task complete with failing tests. ## Validation Workflow 1. Read existing similar code first — match patterns. 2. Make change. 3. Run linter: npm run lint. 4. Run tests: npm test. 5. If frontend, use the browser sub-agent to verify rendered output. 6. Show me the diff before committing. ## Git - Conventional commits: feat:, fix:, docs:, refactor:, test:, chore: - Branch naming: feature/[ticket]-[slug] or fix/[ticket]-[slug] - Never commit to main directly. Always branch. - Never git push --force against shared branches. ## Forbidden - Do not install packages without confirming with me first. - Do not modify CI/CD configs. - Do not touch /infra/ or /.github/workflows/. - Do not read or echo .env contents. ## Communication - If a requirement is ambiguous, ask one targeted question instead of guessing. - If you find a bug unrelated to the task, note it in your final summary, do not fix it. - End every task with: one-paragraph summary, test results, list of files changed.
Why this works. AGENTS.md loads into every prompt, so it consumes context. Keep it under 200 lines. Rules that only apply sometimes belong in Skills (Module 08) which load on-demand. Rules that always apply belong here.
Skills Mastery — Progressive Disclosure and the 5 Patterns
Skills are the killer feature most users underutilize. A Skill is a directory-based package containing SKILL.md (with YAML frontmatter) and optional supporting assets. Antigravity uses Progressive Disclosure: it reads only the lightweight menu of Skill descriptions on every request, and loads the full Skill into context only when your intent matches the description.
Result: an agent that knows about hundreds of specialized capabilities but pays the context cost only for the ones it actively needs. This is how AgentKit 2.0 ships 40+ skills without bloating every prompt.
Skills directory structure
# Workspace scope (project-specific) <workspace>/.agent/skills/my-skill/ ├── SKILL.md # Required: YAML frontmatter + instructions ├── scripts/ # Optional: Python/Bash/Node executables │ └── run.py ├── references/ # Optional: API docs, cheatsheets, schemas │ └── api-docs.md └── assets/ # Optional: images, templates, fixtures # Global scope (every project on this machine) ~/.gemini/antigravity/skills/my-skill/
SKILL.md frontmatter — what triggers Progressive Disclosure
The description field is the trigger. Antigravity matches user intent against this string. Vague descriptions get loaded too often (context bloat). Narrow descriptions get missed when relevant. Aim for one sentence that names the trigger conditions explicitly.
--- name: shopify-section-author description: Author Shopify Liquid sections following the DDS Atelier 3.4.0 standards. Triggers on requests to create, build, or scaffold a Shopify section (.liquid file in /sections/), or any mention of section schema, blocks, or theme settings. Does NOT trigger for product templates, snippets, or page templates — those are separate skills. version: 2.1 scope: workspace --- # Shopify Section Authoring Skill When the user asks for a new Shopify section, follow this exact sequence...
The 5 Skill design patterns
Basic Router
Just SKILL.md with instructions. For style guides and constraint sets. Cheapest, most common.
Reference-Heavy
SKILL.md plus references/ with API docs or schemas. Agent loads the reference only when the skill is active. Best for library-specific knowledge.
Few-Shot Calibrator
SKILL.md plus references/examples/ with 3+ gold-standard outputs and 3+ anti-examples. Forces consistent format. The DDS investor pitch and portfolio pages use this pattern.
Tool Use (Executable)
SKILL.md plus scripts/ the agent can run to validate output. Skill describes when to invoke the script and how to interpret results. Powerful — use carefully with respect to your terminal policy.
All-in-One Domain
Everything combined. The DDS seo-magnet Skill uses this pattern: SKILL.md with the full system, references for schemas and meta tags, examples of compliant pages. Use sparingly — context cost is highest.
The Awesome Skills library
Community catalog at github.com/sickn33/antigravity-awesome-skills — installable library of 1,400+ SKILL.md playbooks for Antigravity, Cursor, Codex, Claude Code, and Gemini CLI. Install with npx antigravity-awesome-skills. Audit before installing globally; treat third-party Skills the same way you treat third-party MCP servers.
Workflows — Saved Prompts as Slash Commands
Workflows are user-triggered prompt templates registered as /commands. Type / in chat and Antigravity shows your registered workflows. Where Rules are system instructions and Skills are on-demand expertise, Workflows are reusable orchestrations you fire intentionally.
Workflow file format
---
description: Scaffold a complete feature with branch, types, components, hook, tests, and Storybook story
---
When the user types `/new-feature <name> <description>`:
1. Verify we are on `main` and pull latest.
2. Create branch: `feature/<ticket>-<slug>`.
3. Create directory: /features/<name>/{components,hooks,utils,types,__tests__}.
4. Generate TypeScript types from the description.
5. Create base component with typed props interface.
6. Create custom hook for business logic.
7. Generate three test cases per public function (happy/edge/error).
8. Create Storybook story with default + interactive variants.
9. Update barrel exports in /features/<name>/index.ts.
10. Run linter and tests. Report results.
11. Show diff. Wait for approval before commit.The /startcycle pattern (Codelab-validated)
Google’s official codelab demonstrates an autonomous developer pipeline using /startcycle. The workflow chains personas defined in AGENTS.md through skills defined in .agent/skills/ — Product Manager writes spec, Engineer codes it after approval, QA tests, DevOps deploys. This is the foundation of multi-agent autonomous app generation.
--- description: Start the Autonomous AI Developer Pipeline with a new idea --- When the user types `/startcycle <idea>`, orchestrate strictly using AGENTS.md personas and .agent/skills/ capabilities. ### Execution sequence 1. Act as Product Manager. Run `write_specs` skill with <idea>. Output: Technical_Specification.md. WAIT for user approval comments. 2. Once approved, act as Engineer. Read approved spec. Run `implement_backend` then `implement_frontend` skills. 3. Act as QA Engineer. Run `generate_tests` and `run_tests` skills. If failures, return to Engineer with failure context. Loop max 3x. 4. Act as DevOps Master. Install dependencies, serve the app, open browser sub-agent to verify the running application. 5. Compile final report: spec, files created, test results, deployment URL.
Manager View Mastery — True Multi-Agent Parallelism
Manager View is the feature competitors have not matched. You spawn multiple autonomous agents that run independently in parallel workspaces. They generate Artifacts you review on your schedule. The killer use case: five independent bugs fixed in the time it takes to fix one.
Anatomy of the Manager interface
- Workspaces sidebar: each agent in its own folder context
- Conversations: separate threads per workspace, multiple per workspace
- Artifacts pane: task checklists, implementation plans, screenshots, browser recordings
- Changes sidebar: Git-style diff of every modification awaiting your review
- Toggle to Editor: Cmd+E (Mac) or Ctrl+E (Windows/Linux) at any time
Planning Mode vs Fast Mode
Planning Mode (default and recommended): the agent produces a task checklist and implementation_plan.md before writing any code. You comment on the plan inline before execution begins. Fast Mode: skips planning, immediate code generation. Acceptable only for trivial single-file fixes.
Always Planning Mode for tasks touching 3+ files or any task involving an architectural decision. The plan is your contract with the agent. It prevents 90% of the “what did you do that for” rework.
The parallel-bug-fix pattern
This is the workflow that justifies Antigravity’s existence in one session. Before bed, spawn five agents on five independent issues. Wake up to five completed PRs awaiting review.
Five-Agent Parallel Bug Sweep
Browser Sub-Agent — Autonomous Visual and Functional Testing
The Browser Sub-Agent is Antigravity’s exclusive feature. The agent launches a built-in Chromium instance, navigates your app, clicks buttons, fills forms, captures screenshots, records sessions. Cursor and Cline cannot do this as of April 2026.
What it can verify
- Visual regression: screenshot at multiple breakpoints, compare against design
- User journey integrity: click-through entire flows, assert expected outcomes
- Console errors: catch JS errors during interaction
- Network calls: assert API requests fire correctly with expected payloads
- Lighthouse scores: run audits and report Performance, Accessibility, SEO, Best Practices
- Accessibility: tab-key navigation, focus trap, ARIA attribute presence
Visual regression prompt
Multi-Breakpoint Visual Audit
End-to-end journey prompt
Full Purchase Flow Verification
MCP Server Integration — Power Without Compromise
MCP (Model Context Protocol) servers extend Antigravity agents with external tool access — databases, APIs, file systems, third-party services. Each server adds capability and attack surface in equal measure. Audit before installing.
Where MCP config lives
Global: ~/.gemini/antigravity/mcp_config.json. Workspace: <workspace>/.agent/mcp_config.json. A compromised workspace can write to the global file — review the global config periodically.
Recommended safe-starter MCP set
| Server | Capability | Risk Level | Approval |
|---|---|---|---|
| Filesystem MCP | Scoped file read/write | Low (if scoped) | Auto OK |
| Git MCP | Read repo state, branch info | Low | Auto OK |
| Postgres MCP (read-only) | Inspect schema, run SELECT queries | Medium | Manual |
| Playwright MCP | Browser automation beyond sub-agent | Medium | Manual |
| Shopify Storefront MCP | Product/collection lookups via Storefront API | Low (read-only token) | Auto OK |
| Stripe MCP | Payment/refund operations | High | Always Manual |
| Shell-execute MCP | Arbitrary terminal commands | Critical | Strict Mode required |
The MCP audit checklist (run before every install)
- Read every tool the server exposes — not just its description
- Check the server’s source repository — recent commits, active maintainers, GitHub stars
- Verify the server runs locally vs phones home to a remote endpoint
- Set MCP Tool Approval to Manual for any server with write or network capabilities
- Add the server’s required env vars to
.env.exampleso collaborators know what’s needed - Remove unused servers — context window cost is real
MCP + AGI swarms. MCP servers are the bridge between your Antigravity agents and the outside world. When we get to Module 16 (AGI Swarms) and Module 18 (Storage), you will use MCP as the standard interface — Postgres MCP for your event log, Filesystem MCP for artifact storage, a custom Vector DB MCP for semantic memory. MCP is the plumbing.
How I Built the $5.85B Sovereign AGI Suite Solo
Vibe coding is not “letting AI write code.” It is a structured methodology where you architect intent, constraints, and quality while AI handles syntax. I have built 15 synthetic employees automating $11.1M+ in annual labor at $0/month hosting cost over 14 months, solo. Below is the operational playbook.
The Five Pillars
Intent Over Syntax
Describe what to build and why. Never dictate exact code. The model frequently finds better patterns than you would have specified. Your job is to verify the pattern, not to author it.
Context Is the Multiplier
Use AGENTS.md plus dedicated files: brand.md, architecture.md, security.md. The agent loads them automatically. You stop repeating yourself every session.
Plan Before Build
Always Planning Mode for non-trivial work. Comment on the plan before code is written. Catches 90% of misalignment before it becomes rework.
Iterate in Layers
First pass: structure. Second pass: logic. Third pass: UX polish. Fourth pass: tests and edge cases. Trying to perfect everything in one prompt produces nothing perfect.
Verify Everything
Read every diff. Run every test. Use the browser sub-agent. 10x speed only works with 100% verification discipline. The moment you start trusting the agent without checking, the agent starts shipping bugs you cannot debug.
Meta-prompting — the expert technique
Use one AI to write prompts for another. Describe your project to Gemini chat (free, separate context) in plain language. Ask it to write a structured technical spec. Paste the spec into Antigravity. Consistently produces better results than hand-crafted prompts because the model knows what other models need.
The DDS Sovereign AGI Suite — proof of methodology
Not theory. Production systems running today. Internal valuation totals $5.85B across the three flagships and the broader portfolio per the March/April 2026 audit reports.
Sovereign Orchestrator Pro
Top-of-stack coordinator. Routes tasks across the entire synthetic employee fleet. April 2026 audit valuation: $2.5B.
AGI-CORE-Pro · The Synthetic Director
Generates platform-optimized content across 8+ channels in parallel from a single brief. Internal valuation: $1.15B.
NICHE-FORGE-CORE
End-to-end niche-specific content and growth ecosystem. Ecosystem valuation: $2.2B.
The Suite Total
Combined $5.85B Sovereign AGI Suite includes Atelier OS Theme Engine v3.4.0 and the Sovereign Synthetic Empire Dashboard among the 15 production systems.
Honest disclosure. The dollar values above are internal-audit valuations, not external market validations. They represent labor cost displaced and synthesized output value at 2026 market rates per the audit methodology documented in the DDS investor pitch ($39M–$68M range, most probable $48M–$58M). The methodology, the build velocity, and the operating cost ($0/month) are the real proof.
45 Production-Grade Prompts — Tested, Tagged, Paste-Ready
Every prompt below is tested in Antigravity v1.21.6 with Gemini 3.1 Pro. Tagged by skill level. Use Planning Mode for everything Intermediate or above.
Scaffolding (P1–P6)
P1 · Three-Surface Verification
P2 · Monorepo Full-Stack Scaffold
P3 · Next.js 15 SaaS Starter with Auth
P4 · Python FastAPI Service
P5 · Vite + React 19 + TanStack Stack
P6 · Astro Content Site
Frontend & UI (P7–P14)
P7 · Pixel-Accurate Recreation from Screenshot
P8 · Component Library with Storybook
P9 · Animated Marketing Page
P10 · Accessible Modal System
P11 · Form Builder with Zod
P12 · Virtualized Data Grid
P13 · Dark Mode Without Flash
P14 · Multi-Step Wizard with State Machine
Backend & APIs (P15–P22)
P15 · GraphQL API with DataLoader
P16 · Job Queue with BullMQ
P17 · WebSocket Chat with Presence
P18 · tRPC End-to-End Type Safety
P19 · File Upload with Resumable Chunks
P20 · Webhook Receiver with Signature Verification
P21 · Multi-Tenant DB with Row-Level Security
P22 · Event Sourcing Skeleton
Testing & Quality (P23–P28)
P23 · Comprehensive Test Suite Generator
P24 · Playwright E2E Suite
P25 · Property-Based Testing
P26 · Mutation Testing
P27 · Visual Regression with Percy-Style Snapshots
P28 · Accessibility Audit Suite
DevOps & Deployment (P29–P32)
P29 · Multi-Stage Docker + Compose
P30 · GitHub Actions CI/CD with Matrix
P31 · Terraform IaC for AWS
P32 · Observability Stack — OpenTelemetry
Refactoring & Performance (P33–P38)
P33 · Legacy Codebase Modernizer (Phased)
P34 · Bundle Size Optimization
P35 · Database Query Optimizer
P36 · React Performance Audit
P37 · Cache Layer Implementation
P38 · Memory Leak Hunter
AI & Agent Engineering (P39–P45)
P39 · Multi-Agent Orchestrator (CEO Pattern)
P40 · RAG Knowledge Base
P41 · Generator-Critic Self-Correction Loop
P42 · Tool-Use Agent with Function Calling
P43 · Prompt Eval Harness
P44 · Streaming Response Pipeline
P45 · Constitutional AI Filter
15 “Build It Like Robert” Prompts — Reverse-Engineered from the Sovereign AGI Suite
These 15 prompts teach the actual architecture patterns used inside the $5.85B Sovereign AGI Suite. Each prompt builds a system you can run, while teaching a transferable pattern you will reuse across dozens of future projects.
B1 · Build a Sovereign Orchestrator (Meta-Routing Layer)
B2 · Build a Synthetic Director (Multi-Channel Content Factory)
B3 · Build a NicheForge (Agency-in-a-Box Ecosystem)
B4 · Build a Self-Repairing Agent Swarm
B5 · Build a Counsel AI (Red Team vs Blue Team War Room)
B6 · Build a ProductLens (AI Product Photography Pipeline)
B7 · Build a Cortex-7 R&D Lab (Clean-Room Competitive Intel)
B8 · Build an Atelier OS (FSM Publishing Pipeline)
B9 · Build a Sovereign Synthetic Empire Dashboard
B10 · Build a Custom Antigravity Skill (Production Pattern)
B11 · Build a Multi-Agent Code Review Bot
B12 · Build a Sovereign Local Inference Stack (Ollama)
B13 · Build a Custom MCP Server for Your Domain
B14 · Build a Forensic Audit Trail System
B15 · Build a Sovereign Deployment Pipeline (Zero-Cost Hosting)
Building Real AGI Agents and Multi-Agent Swarms
This is the module that separates toy projects from production AGI systems. A single agent is a script with a model attached. A swarm is an operating system. Below is how the DDS Sovereign AGI Suite is actually architected — not theory, the production shape of 15 synthetic employees running right now.
What makes an AGI agent (vs a function that calls an LLM)
Any Python script can call the Gemini API. That is not an agent. A real agent has five properties — without all five, you have a prompt wrapper, not an agent:
- Identity. A stable role with a system prompt, a name, a capability declaration, and persistent memory. Other agents and humans can address it by name.
- Autonomy. Given a task, it plans, executes, and returns a result without step-by-step human prompting. It decides which tools to call and in what order.
- Observability. Every action emits a structured log event: timestamp, input hash, output hash, duration, tokens, cost, error. You can replay any session.
- Composability. It implements a common interface (
process(task) → Result) so an orchestrator can dispatch to it without knowing its internals. - Self-awareness. It reports its own health (latency, error rate, quality score) to whatever is monitoring it. Module 19 covers how self-awareness enables self-healing.
The Agent interface contract
Every agent in the DDS suite implements the same TypeScript interface. This is what makes a swarm possible — the orchestrator does not need to know whether it is dispatching to a Builder, a Reviewer, or a VisualDirector. They all respond the same way.
Capabilities are string tags like code.typescript, review.security, content.long-form, vision.product-photo. The orchestrator routes tasks by matching task requirements against registered capability tags — this is the contract that makes new agents plug-and-play.
The 7 specialist archetypes
Every swarm I have built ends up with some mix of these seven archetypes. Name them however you want internally; these are the roles.
Architect
Takes a request, produces a dependency graph of subtasks. Does not execute — only plans. Output: structured plan with ordered steps and blocking dependencies.
Builder
Receives a plan step, produces an artifact (code file, config, migration). Reads surrounding code, matches patterns, writes.
Reviewer
Scores Builder outputs against a rubric. Returns approve/reject with structured feedback. The Builder-Reviewer loop is the core quality mechanism.
Tester
Generates and runs tests. Reports failures back to Builder. Distinct from Reviewer — Reviewer evaluates code quality, Tester evaluates runtime behavior.
Guardian
Scans for secrets, injection vectors, license violations, forbidden patterns. Has veto power — Guardian rejection blocks the entire pipeline.
Researcher
Fetches web pages, reads docs, queries APIs. Feeds verified context into other agents. Uses browser sub-agent or web MCP.
MetaCognitor
Monitors every other agent. Tracks quality score, latency, error rate per agent. Triggers self-repair when metrics drift. Covered in Module 19.
The 4 swarm topologies
How agents connect to each other determines what the swarm can do. Most production systems mix these four patterns.
Star (CEO-and-specialists)
One Orchestrator (CEO) at center, specialists radiating out. Every request enters through the CEO; every result returns through the CEO. Simplest to reason about, easiest to debug. Bottlenecked by CEO throughput. Use when: single-request workflows, <10 specialists, strict audit requirements.
Mesh (peer-to-peer via event bus)
No CEO. Agents publish events to a shared bus and subscribe to the ones they care about. Any agent can trigger any other by emitting the right event. Scales horizontally. Harder to trace a single request end-to-end — you need correlation IDs. Use when: 10+ agents, continuous background processing, event-driven workflows. This is where Ocean Logic (Module 17) becomes essential.
Pipeline (FSM handoffs)
Agents arranged in a deterministic sequence, each transforming the artifact and passing it forward. State Machine governs transitions. The Atelier OS publishing pipeline is this pattern: TrendScout → Strategist → Writer → Editor → SEO → QualityGate → Publisher. Use when: linear workflows where each stage is a clear transformation, like content production, ETL, or build-test-deploy.
Hierarchical (swarm of swarms)
Multiple star or pipeline swarms, coordinated by a meta-Orchestrator above them. This is how the Sovereign Orchestrator Pro V4.0 operates: it routes requests to the right sub-swarm (content swarm, code swarm, intel swarm), each of which internally uses its own topology. Use when: 15+ agents, multi-domain operations, fleet-scale deployments. The hierarchical pattern is what “Sovereign” actually means architecturally.
The CEO dispatch algorithm
This is the algorithm at the heart of every star-topology orchestrator I build. Architectural prose, not code — you will have Antigravity implement it:
- Classify. Incoming request is classified against the capability registry to identify which specialists are needed. Done by the CEO itself via a structured classification prompt with the full capability list.
- Decompose. CEO produces a typed dependency graph: nodes are subtasks, edges are blocking dependencies. Independent nodes run in parallel, dependent nodes run sequentially.
- Dispatch. For each ready node (all dependencies satisfied), CEO selects an available specialist matching the required capability, forwards the subtask, and waits for completion.
- Collect. As specialists complete, CEO writes results to shared state and checks for newly-ready nodes. Repeat until all nodes complete or one fails.
- Aggregate. Once all nodes are done, CEO synthesizes specialist outputs into a unified response. Failed nodes trigger rollback or escalation based on policy.
- Audit. Every dispatch, completion, and aggregation writes a structured event to the forensic audit log (Module 18 storage layer).
Coordination models — how agents share state
Three models in production use. Pick one per swarm — mixing them within a single swarm leads to consistency bugs.
| Model | How | Pros | Cons |
|---|---|---|---|
| Shared Memory | Redis or Postgres as central state store; agents read/write keys | Simple, fast, easy to debug | Requires careful key discipline; single point of failure |
| Message Passing | Event bus (Redis Streams, NATS, Kafka) — see Module 17 | Scales horizontally, loosely coupled, replay-able | Harder to trace single requests; eventual consistency |
| Direct Call | Agent A invokes Agent B via function call or HTTP | Synchronous, type-safe | Tight coupling; blocks the caller; hard to scale |
Capability registration — making agents plug-and-play
The Sovereign pattern is that every agent registers itself with the Orchestrator at startup, declaring its capabilities. The Orchestrator maintains a capability → agent-list map in memory (and a persistent copy in Postgres for recovery after restart). When a new request arrives, the Orchestrator queries the map to find eligible agents.
This is what makes adding a new agent a one-file change instead of a full re-architect. Drop a new agent class into /agents/, implement the interface, declare capabilities, restart the Orchestrator. The Sovereign fleet absorbs it.
The discipline. Resist the temptation to have agents call each other directly. Always route through either the Orchestrator (star) or the event bus (mesh). Direct calls between specialists produce a tangled graph where adding one agent requires touching five others. Capability registration keeps the swarm composable.
Build prompts for this module
S1 · Build the Agent Interface Contract
S2 · Build a Star-Topology Orchestrator
S3 · Build a Pipeline Swarm with FSM Guards
S4 · Build a Hierarchical Sovereign Orchestrator
S5 · Build the Capability Registry with Hot-Reload
Ocean Logic — Multi-Channel Async Event Streams for Swarms
Once your swarm has more than five agents, direct calls and shared-memory coordination collapse under their own weight. You need a different substrate. I call the pattern I use Ocean Logic because the mental model is maritime: events flow through named currents, agents fish from whichever currents carry tasks they can handle, and the sea itself persists history so nothing is ever truly lost.
The three-zone mental model
Every ocean-logic system has three layers. Thinking about them separately prevents the classic mistake of trying to do everything in one event stream.
Surface currents — live task flow
Short-lived events representing work that needs to happen now. Task requests, completions, status updates, user interactions. High volume, low durability requirement (minutes to hours). Redis Streams is the right tool here.
Tidal flows — coordination events
Medium-lived events that coordinate across swarms. Agent lifecycle (registered, deregistered, degraded, restored), capability changes, policy updates, cost thresholds crossed. Lower volume, higher durability (days to weeks). NATS JetStream or Redis Streams with long retention.
Deep currents — forensic record
The immutable audit log. Every agent action forever. Used for replay, debugging, cost analysis, compliance. Volume: very high. Durability: permanent. Kafka with tiered storage, or Postgres + S3 archival for small-to-medium fleets.
Why separate zones matter. If you put audit events and task events in the same stream, consumers fight each other for throughput, replay becomes expensive, and retention policy becomes a compromise instead of a choice. Separate zones → independent scaling → independent retention.
Event schema discipline — the non-negotiable
Every event flowing through any zone has the same envelope. Deviating from this produces a system nobody can debug.
- id — UUID v7 (time-sortable, lexicographically ordered)
- type — reverse-DNS identifier like
content.draft.completedoragent.health.degraded - version — semver string so consumers can handle schema evolution
- timestamp — ISO 8601 with timezone
- source_agent — ID of the agent that emitted
- correlation_id — the root request ID; lets you trace an entire mission across dozens of events
- causation_id — the event ID that directly triggered this one; lets you reconstruct a causal tree
- payload — typed, versioned, validated against a schema
- cost_cents — tokens and compute cost for this operation (optional but strongly recommended)
Correlation and causation IDs together give you what distributed-systems people call trace context. Any event can be traced upward to its root request and downward to everything it caused.
Choosing your current transport
| Transport | Best for | Sovereign cost | Gotcha |
|---|---|---|---|
| Redis Streams | Surface currents. Consumer groups, acks, dead-letter, retention in seconds | $0 (docker) | Single-node durability limited; pair with AOF persistence |
| NATS JetStream | Tidal flows. Stronger durability, clustering, subject-based routing | $0 (docker) | Less library ecosystem than Redis in TypeScript |
| Kafka | Deep currents at scale. Tiered storage, infinite retention | Heavy; only if you have fleet-scale volume | Operational overhead; overkill for <10 agents |
| Postgres LISTEN/NOTIFY | Small teams, no extra infra | $0 (already have Postgres) | Not a real queue; best for prototyping only |
The DDS Sovereign AGI Suite uses Redis Streams for surface + tidal and Postgres + S3 for deep. No Kafka. Fifteen agents does not need Kafka. Most teams claiming to need Kafka do not need Kafka.
Consumer groups — the scaling primitive
A single event stream can be consumed by multiple agents without duplicate processing through consumer groups. You declare a group name, multiple agents join the group, the transport distributes events across members. If an agent crashes, its unacked events are redelivered to another member.
Concrete example from the Synthetic Director: the content.brief.created stream has a consumer group called channel-workers. Eight specialist agents (blog, twitter, linkedin, instagram, tiktok, email, podcast, youtube) all join the group. Each brief event is delivered to exactly one of them based on which channel tag the event carries — but if the Twitter agent is degraded, the brief gets redelivered and another agent picks it up. No coordinator. No lock management. The ocean handles it.
Replay — the debugging superpower
Because the deep current retains everything, you can replay any mission. Query the forensic log for all events with a given correlation_id, re-emit them into a clone of the surface current pointing at a staging environment, watch the swarm re-execute the exact sequence. This is how you debug a bad outcome three days after it happened.
Design for replay from day one. It means: events must be idempotent (receiving the same event twice produces the same result), side effects must be gated behind flags that can be disabled in staging, and secrets must not live in payloads.
Backpressure and overload
When a swarm is overloaded, the default behavior of most systems is to collapse silently. Ocean Logic handles this explicitly:
- Stream length caps. Surface currents have a max length (e.g., 100,000 events). Producers pushing past the cap trigger an alert and either drop low-priority events or block.
- Consumer lag alerts. If a consumer group falls more than N events behind, emit an
agent.lag.warnevent on the tidal flow. MetaCognitor picks it up and decides whether to spin up more consumers or shed load. - Priority lanes. Critical events (
agent.health.degraded,security.violation) flow on a separate high-priority stream that never fills.
The operational insight. Ocean Logic is not a library you install. It is a set of conventions about what streams exist, what events mean, and how consumers behave. Once the conventions are in place, every new agent becomes a simple question: which streams does it publish to, which does it consume from? That is it. You stop thinking about point-to-point connections entirely.
Build prompts for this module
O1 · Build the Event Envelope and Validation Layer
O2 · Build the Three-Zone Current Manager
O3 · Build a Consumer Group Worker Template
O4 · Build a Replay-Based Debugger CLI
O5 · Build Backpressure and Priority Lane Infrastructure
The Five Storage Layers Every AGI System Needs
Nothing sinks a promising AGI system faster than a bad storage decision. The common failure mode is putting everything in Postgres because “Postgres can do it all.” Postgres is extraordinary — and yes, it can technically do all five jobs below — but using one tool for five workloads means each workload is compromised. The sovereign pattern is five purpose-chosen layers, all running locally at zero cost.
Layer 1 — Vector database (semantic memory)
Purpose. Storing embeddings for RAG, long-term agent memory, similarity search across code or documents. Every agent that needs to “remember” more than its context window uses this.
Sovereign pick. Qdrant (standalone, Rust, hybrid search with payload filtering) or pgvector (extension for Postgres, good if you want one engine and scale is modest). Chroma is fine for prototyping but hits limits past a few million vectors.
Schema discipline. Each vector carries a payload: source document ID, chunk index, embedding model used, content hash, created_at. Always version your embedding model in the payload — when you upgrade from text-embedding-3-small to the next model, you need to know which vectors came from which.
Layer 2 — Key-value store (hot state)
Purpose. Rate limit counters, session data, cache, short-lived coordination flags, consumer-group offsets. Anything read-or-written dozens of times per second that does not need to survive a power cycle.
Sovereign pick. Redis with AOF persistence enabled. Same Redis cluster you already run for Ocean Logic Surface currents handles this workload.
Key design. Use hierarchical keys: agent:<id>:health, ratelimit:<user>:<window>, session:<uuid>. TTL everything that is not permanent. The moment you have keys without TTLs you have a memory leak waiting to happen.
Layer 3 — Relational database (canonical truth)
Purpose. Agent registry, capability index, user accounts, subscription state, anything you need transactional guarantees for, anything you will want to query with SQL later. This is the source of truth that all other layers derive from.
Sovereign pick. Postgres 16+ with the TimescaleDB extension (covered in Layer 5) and pgvector if you are consolidating the vector layer.
Schema discipline. UUID primary keys (not serial integers — collisions across environments are a nightmare). created_at and updated_at on every table. Soft delete via deleted_at timestamp, never hard delete anything in production.
Layer 4 — Blob store (artifacts)
Purpose. Screenshots from browser sub-agent, generated PDFs, image assets, large event payloads (stored by hash — see Module 15 prompt B14), model snapshots, full documents that are referenced by vector embeddings. Anything binary or large.
Sovereign pick. MinIO (S3-compatible, runs in a single docker container, presents the exact same API as AWS S3). When you eventually migrate to cloud, your code changes zero lines — only the endpoint URL.
Key design. Content-addressable: key = SHA-256 of content. Two agents producing identical output produce identical keys, automatic deduplication. Metadata (who, when, correlation_id) lives in the relational DB pointing at the blob key.
Layer 5 — Time-series database (observability)
Purpose. Agent health metrics (latency, error rate, quality score over time), cost tracking, system metrics from your Ocean Logic layer, request rates, token consumption. Any data where the primary query pattern is “give me values for this metric between time A and time B.”
Sovereign pick. TimescaleDB (Postgres extension, so you reuse your existing Postgres install) or InfluxDB (standalone, purpose-built). Start with TimescaleDB unless you have a specific need InfluxDB solves.
Retention discipline. TimescaleDB compression on chunks older than 7 days (10x storage savings). Drop chunks older than 90 days unless you have a compliance requirement. Aggregate into continuous aggregates (hourly, daily) so dashboards never scan raw rows.
The when-to-use decision matrix
| Data | Right layer | Wrong layer (common mistake) |
|---|---|---|
| Agent embeddings for recall | Vector DB | Postgres with JSONB (no similarity search) |
| Rate limit counters | Redis | Postgres (contention under load) |
| Agent registry and capabilities | Postgres | Redis (no durability guarantees) |
| Browser sub-agent screenshots | MinIO blob | Postgres bytea (bloats the DB) |
| Agent latency over time | TimescaleDB | Postgres append-table (slow aggregates) |
| Deep forensic audit events | Postgres + S3/MinIO archival | Kafka (overkill for <50 agents) |
| Session tokens | Redis with TTL | Postgres (never expire bugs) |
| Active task queue | Redis Streams | Postgres table with SELECT FOR UPDATE (lock contention) |
The sovereign compose stack
The entire five-layer stack runs on a single machine at zero monthly cost. The DDS Sovereign AGI Suite uses this exact topology:
- Postgres 16 + TimescaleDB + pgvector — one container, handles Layers 3, 5, and optionally 1
- Redis 7 with AOF — one container, handles Layer 2 and Ocean Logic surface/tidal
- MinIO — one container, handles Layer 4, S3-compatible
- Qdrant — one container (optional, only if vector workload outgrows pgvector)
- Grafana — one container for observability dashboards on top of Layer 5
Five containers, one docker-compose up. Total memory footprint on a modern workstation: about 4–6 GB. Exposed to the outside world via a Cloudflare Tunnel if you need remote access; otherwise localhost-only for maximum security.
Backup is not optional. Sovereign means you own the hardware — and the responsibility. Set up automated nightly snapshots of Postgres, Redis AOF, and MinIO to an external drive. Test the restore quarterly. A sovereign system without tested backups is a time bomb.
Build prompts for this module
D1 · Provision the Five-Layer Sovereign Stack
D2 · Build a Unified Storage Client Facade
D3 · Build a Content-Addressable Blob Layer with Dedup
D4 · Build the Observability Layer with Continuous Aggregates
D5 · Build an Automated Backup and Restore System
MetaCognition, Circuit Breakers, and Repair Routines
Any swarm that runs longer than a week without self-healing mechanisms will eventually degrade silently. Models drift, prompts rot, dependencies change, data shifts. The difference between a production AGI system and a demo is that the production system detects its own degradation and fixes itself before you notice.
The four mechanisms (layered, not alternatives)
Circuit breaker
After N consecutive failures (typical: 3), disable the affected agent for a cooldown period (typical: 5 minutes). Incoming tasks route to a sibling agent or fall into a retry queue. Prevents a failing agent from cascading its failure across the swarm.
Retry with exponential backoff and jitter
Transient failures (network blip, rate limit, timeout) should retry. Exponential delay between attempts (1s, 2s, 4s, 8s) with 20% random jitter to prevent thundering herd. Max attempts configurable per task type. Distinguish retryable errors from permanent ones — a schema validation failure is not retryable.
Forensic audit log
Every input, output, duration, error stored in the deep current (Module 17). Not for viewing day-to-day — for the day an agent ships a bad outcome and you need to reconstruct why. This is the feedstock for every other mechanism.
MetaCognitor — quality drift detection
A specialized agent whose only job is watching other agents. Replays recent tasks, scores outputs against a rubric, compares rolling averages, triggers repair when drift exceeds threshold. The difference between reactive (fixes after a crash) and proactive (fixes before anyone notices).
What MetaCognitor actually does
Every 15 minutes (configurable), the MetaCognitor runs this loop for each registered agent:
- Query the deep current for the last 20 completed tasks from this agent
- For each task, re-run the agent's output through a scoring rubric — LLM-as-judge against the original input
- Compute rolling 7-day average quality score; compare to rolling 30-day baseline
- Compute error rate and p95 latency over last 24 hours; compare to baseline
- If any metric drifts below threshold (quality drops >10%, error rate >2x baseline, latency >3x baseline): emit
agent.health.degradedon the tidal zone - Run the repair routine for that agent type
- Emit
agent.health.repair-attemptedwith the repair action taken - After next observation window, check if metrics recovered; if not, escalate to
agent.health.failedand alert human
The repair routine ladder
When MetaCognitor detects drift, it tries fixes in order from cheapest to most disruptive. Each step is logged. Most issues resolve at step 1 or 2.
| Step | Repair | When |
|---|---|---|
| 1 | Temperature adjustment | Quality drift without error rate change. Lower temp for more consistency. |
| 2 | System prompt mutation | Inject recent failure examples as few-shot anti-examples into the system prompt. |
| 3 | Model swap | Switch from Gemini 3.1 Pro to Claude Opus 4.6 (or vice versa) for a second opinion on the task type. |
| 4 | Agent restart | Kill the process, clear its working memory, reload from last good state. |
| 5 | Pin to last-known-good version | If the agent has versioned prompts/configs, roll back to the version that produced the baseline metrics. |
| 6 | Quarantine | Disable the agent entirely. Emit high-priority alert. Humans take over until debugged. |
The repair_audit.json discipline
Every repair attempt writes to /data/repair_audit.json (mirrored to the forensic log). Schema: timestamp, agent_id, trigger (which metric crossed threshold), action taken, action parameters, pre-action metrics, post-action metrics (recorded 30 min later), success (did metrics recover). This file is gold for post-mortems and for training better MetaCognitor thresholds.
Self-healing is not self-improvement. MetaCognitor restores agents to their baseline; it does not make them better than they were. Actual improvement requires offline fine-tuning or prompt engineering based on repair_audit.json patterns. Do not conflate the two — autonomous self-improvement is a much harder problem with different safety properties.
Build prompts for this module
H1 · Build a Circuit Breaker Wrapper
H2 · Build the MetaCognitor Agent
H3 · Build the Repair Routine Ladder
The Atelier 3.4.0 + SEO Magnet Playbook
Antigravity plus a well-authored Shopify Skill turns hours of Liquid work into minutes. But Shopify has sharp edges — the Atelier theme has conventions, the Basic plan has a performance ceiling, and Liquid breaks in ways that no amount of TypeScript strict-mode can prevent. This module is the playbook for shipping Shopify work at S-tier.
The five non-negotiable Shopify rules
- Scope everything. All CSS under a unique section prefix. All IDs prefixed. All JS in an IIFE. No global pollution. Atelier has enough of its own CSS and JS to conflict with.
- No
!importanton body, html, header, or footer. Theme-level overrides come back to bite you when Shopify updates Atelier. - No jQuery. Vanilla JS only. jQuery in one section conflicts with Atelier 3.4.0 native.
- No Google Fonts preconnects. DDS self-hosts Playfair Display and DM Sans as WOFF2. External font calls hurt Lighthouse and duplicate on every page.
- Validate schema ranges. Shopify rejects a section if (max - min) / step is less than 3, or if default is not reachable from min by step. This breaks silently until you try to save section settings.
The Shopify Antigravity workflow
Create an AGENTS.md for the theme
At the theme root, AGENTS.md declaring the theme version (Atelier 3.4.0), the DDS palette and fonts, the non-negotiable rules above, and the preferred patterns (section-first, Liquid variables for all dynamic content).
Install the seo-magnet skill
At .agent/skills/seo-magnet/, the DDS SEO Magnet V2 skill. Any page.liquid work triggers it automatically and produces 9+ schemas, speakable boxes, sticky TOC, FAQ accordion with commercial-intent questions.
Install the shopify-section-author skill
At .agent/skills/shopify-section-author/, a skill covering section scaffolding, schema validation, IIFE patterns, and DDS color tokens. Triggers when you ask for a section.
Author sections in Manager View with Planning Mode
Plan first, approve, build. The browser sub-agent verifies at 375, 768, 1280, 1920 before committing. Lighthouse score captured and embedded in the PR description.
Performance gate before commit
Shopify Basic ceiling is ~77–82 mobile, ~90+ desktop. Any section that drops mobile below 75 is rejected. The gate is enforced by an agent that runs the Lighthouse MCP before approving the commit.
The DDS Shopify Skill anatomy
The high-leverage Skills I maintain in every DDS workspace:
- seo-magnet — 9+ JSON-LD schemas, OG+Twitter, speakable, commercial-intent FAQ, sticky TOC. All-in-One pattern.
- shopify-section-author — scoped CSS discipline, schema validation, responsive breakpoints, accessibility. Reference + Few-Shot pattern.
- dds-brand-tokens — palette, fonts, certifications (GOTS/GRS/OCS/PETA-Approved Vegan/Fair Trade never OEKO-TEX), spacing, radius tokens. Reference pattern.
- atelier-header-rules — DDS Header v1.0 at z-index 9000, native Atelier header disabled, sticky-nav conflicts to avoid. Basic Router pattern.
- shopify-performance — Core Web Vitals gate, lazy-load rules, preconnect audit, judge.me crossorigin discipline. Tool-Use pattern with an executable Lighthouse script.
Compounding leverage. Once these five Skills are in place, every subsequent Shopify task takes 10–20% of the time it used to. You stop typing the same rules every session. The agent internalizes them via Progressive Disclosure.
When Things Go Wrong — Debugging a Swarm in Production
Every production AGI system fails in ways a single-agent script never does. Symptoms are diffuse (output quality dropped but no error logged). Causes are temporal (an event 3 hours ago caused an output now). Fixes require replaying the past. This module is the field manual.
The three classes of swarm failure
| Class | Signature | First action |
|---|---|---|
| Loud failure | Error thrown, task marked failed, alert fires | Read the forensic event, fix the agent, replay the correlation_id |
| Silent quality drift | No errors but outputs getting worse over days | MetaCognitor should have caught it. If not, re-baseline and widen the rubric. |
| Cost runaway | Monthly spend suddenly 4x; no alert | Query cost_cents from deep current grouped by agent+day. Identify the agent or loop responsible. |
The cost-control discipline (with the credit system reality)
Since March 2026 Antigravity has run on a credit system. Ultra subscribers report unpredictable throttling. The defensive pattern:
- Cost ceilings per agent per day. Hard limit written into the Agent config. When hit, agent refuses new tasks and emits
agent.cost.ceiling-hit. Prevents one runaway agent from burning the whole day's credits. - Model tier routing. Route obvious tasks (code completion, simple rewrites) to Gemini 3 Flash. Route hard tasks (architecture, security review) to Claude Opus 4.6 or Gemini 3.1 Pro. Use the prompt-eval harness (P43) to pick the cheapest model that hits your quality bar.
- Cache aggressively. Semantic cache on prompt+model pairs. If the same semantically-similar prompt has been answered in the last 24 hours by the same model, return the cached answer. Can cut costs 40%+ on repetitive work.
- Budget-of-the-day pre-flight. At the start of a Manager View mission, the Orchestrator estimates token cost. If the estimate exceeds remaining daily budget, surface the warning to you before spawning agents.
The debugging flow chart
Bad outcome reported
Capture: the correlation_id (from the output metadata), the expected outcome, the actual outcome, the observed timestamp.
Trace the causal tree
Run replay trace <correlation_id> (the tool from Module 17 O4). Reveals every agent that touched this mission, in order, with timing.
Identify the suspect agent
Walk the causal tree from root. The suspect is the first agent whose output diverges from expected. Subsequent agents are amplifying or masking the problem.
Inspect inputs and outputs
Pull the suspect's event from deep current. Check: was the input malformed? Did the agent's prompt change recently? Did the model version change? Did a dependency produce different output than before?
Reproduce in staging
replay <correlation_id> --target=staging. Watch the bad outcome reproduce exactly. This confirms you understand the failure.
Fix + test
Modify the suspect agent. Replay again. Verify the outcome is now correct. Write a test that the staging replay catches. Deploy to production.
Update the MetaCognitor rubric
Add the failure pattern to the MetaCognitor's scoring rubric so this class of drift gets caught next time.
Observability dashboards every sovereign deployment needs
- Fleet Overview. Status dot per agent, last-task timestamp, current task, error rate 24h, cost today.
- Cost Tracker. Spend per agent per day over last 30 days, projected month-end cost, budget remaining.
- Quality Trends. MetaCognitor quality score per agent over 30 days with baseline overlay.
- Ocean Health. Per-stream length, per-group lag, dead-letter rate, priority lane throughput.
- Storage Health. Size per layer, growth rate, vacuum status for Postgres, AOF rewrite status for Redis.
Honest Comparison — April 2026 Capability Levels
No single tool wins every dimension. I use Antigravity as primary, Claude Code for terminal-heavy work, and Cursor for pure-editor flow. The comparison below is the honest capability matrix as of April 2026 based on verified features.
| Capability | Antigravity | Cursor | Claude Code | Copilot | Cline |
|---|---|---|---|---|---|
| Paradigm | Agent-first IDE | AI-enhanced editor | Terminal agent | Code completion | VS Code agent |
| Multi-agent parallel | Yes (Manager View) | No | No | No | No |
| Browser sub-agent | Built-in Chromium | No | No | No | No |
| Skills / Progressive Disclosure | Native (SKILL.md) | Rules only | CLAUDE.md only | No | Rules only |
| AGENTS.md support | Yes (v1.20.5+) | Yes | Via CLAUDE.md | No | Yes |
| Primary context window | 1M (Gemini 3.1 Pro) | ~200K | 200K (Sonnet/Opus) | ~8K | 200K |
| Planning artifacts | Task lists, plans, screenshots | Partial | Inline plans | No | Inline plans |
| MCP integration | Native | Native | Native | Limited | Native |
| Model options | Gemini, Claude, GPT-OSS | Claude, GPT, Gemini | Claude only | GPT, Claude | Claude, Gemini, DeepSeek |
| Free tier | Yes (rate-limited) | Limited | No (API billing) | No | BYOK only |
| Pricing transparency | Credits (controversial) | $20 transparent | API metered | $10 flat | BYOK |
| Security (Strict Mode) | Yes (Q1 2026) | Partial | Partial | Basic | Partial |
| Best for | Multi-agent, full-stack, AGI | Editor power users | Terminal workflows | Quick completions | VS Code natives |
| Weakness | Credit unpredictability, resource drain | No browser agent, single-agent | No GUI | Shallow context | Less polished than Cursor |
The honest recommendation
- Building AGI systems or multi-agent workflows: Antigravity. Nothing else comes close on Manager View parallelism and browser sub-agent validation.
- Solo editor work with strong flow state: Cursor. Most polished single-agent experience with predictable pricing.
- Terminal-heavy infrastructure work: Claude Code. Lives in your shell, fast, direct.
- Casual autocomplete: GitHub Copilot. $10/month for solid completions, nothing more needed.
- VS Code with open-source agents: Cline. BYOK, strong community, no vendor lock-in.
I run Antigravity, Claude Code, and Cursor simultaneously on the same machine. Different tool for different work. No allegiance required.