DDS Vibe Academy gpt-image-2 Snapshot April 21, 2026 Beginner to Expert
Masterclass 28 / May 2026

The Ultimate ChatGPT Images 2.0
Masterclass. No fluff.

The complete masterclass on OpenAI's gpt-image-2 released April 21, 2026. The 7-Part Prompt Formula. 100 paste-ready prompts. 8 production templates. 5 reusable skill packages. Thinking Mode, mask-based editing, multi-image reference, API engineering. Beginner to expert, no gatekeeping.

10
Modules
100
Prompts
~12h
Workload
$0
Course Cost
Quick Answer

ChatGPT Images 2.0 (model ID gpt-image-2, snapshot gpt-image-2-2026-04-21) is OpenAI's flagship image generation system released April 21, 2026. It is autoregressive, natively integrated into the GPT architecture, and replaces both DALL-E 3 and gpt-image-1.5 as the default model.

Five things changed: pixel-perfect text rendering across Latin and non-Latin scripts, native Thinking Mode (reasoning + web search + self-verification), 2K standard resolution with 4K via select hosts, up to 8 coherent images per prompt with continuity, and up to 16 reference images for editing.

This masterclass walks every capability end-to-end with the 7-Part Prompt Formula, 100 paste-ready prompts, 8 production templates, 5 reusable skill packages, and the honest decision matrix against Midjourney V8, Nano Banana 2, and Stable Diffusion.

Author Robert McCullock, Architect-CEO, Design Delight Studio
Verified Against OpenAI release notes, API docs, May 2026
Scope Beginner to expert. Marketing, e-commerce, characters, API.
Who This Is For

Three personas, three paths through the masterclass.

This is a 10-module curriculum plus reference sections. Most readers do not need every module. Pick the persona closest to your situation and the path below points to the modules that pay back fastest.

Beginner

You have never generated an AI image.

You want to ship your first usable image today. You have a ChatGPT Free or Plus account and a browser. That is all you need.

  • Start with Modules 1, 2, 3, 4
  • Skip Modules 5, 6, 9 until later
  • Module 2 alone changes your output forever
Power User

You ship images every day.

You want consistency, character continuity, multi-format outputs, and brand-locked production. You will live in Thinking Mode.

  • Read all 10 modules in order
  • Modules 5, 6, 7, 8 are your daily reference
  • The 5 Skill packages save you hours
Developer / Engineer

You integrate gpt-image-2 into apps.

You need the API surface, rate limits, cost math, edit endpoints, and the integration patterns that scale.

  • Modules 9, 6, 10 are your daily reference
  • Module 5 helps you build prompts users will actually love
  • The cost calculator saves your budget
MODULE 01 / 10

Master the New Reality

The image generation game changed on April 21, 2026. OpenAI shipped gpt-image-2 with no keynote and no hype cycle. It scored 1,512 on the Image Arena leaderboard, the largest lead in Arena history (+242 points over second place). This module is the orientation: what changed, what to use, what to pay.

You will leave this module able to:

  • Explain exactly what changed in gpt-image-2 vs gpt-image-1.5 vs DALL-E 3
  • Pick the right access tier (Free / Plus / Pro / API) for your job
  • Understand Instant Mode vs Thinking Mode and which you have

1.1 The 5 things that actually changed

  1. Text rendering reached ~95-99% character accuracy — across English, Japanese, Korean, Chinese, Hindi, Bengali, and Arabic. The biggest practical unlock for real production work.
  2. Native Thinking Mode — the model plans, web-searches, generates multiple coherent variants, and self-checks before delivering. First image model with O-series reasoning baked in.
  3. Up to 2K resolution standard, 4K experimental — with aspect ratios from 3:1 ultra-wide to 1:3 ultra-tall.
  4. Up to 8 coherent images from one prompt — with character and object continuity across the batch.
  5. Up to 16 reference images for editing — multi-image compositing, character transfer, style transfer, surgical inpainting.

1.2 Instant Mode vs Thinking Mode

The most important access distinction in the new model. Both render images. Only one reasons before generating.

Default

Instant Mode

Available: Free, Plus, Pro, Business, Enterprise
Standard generation. Fast, reliable. Use for quick creative work, single-subject scenes, and any prompt where you do not need web grounding or multi-image batching. This is the default for free users.
Premium

Thinking Mode

Available: Plus ($20/mo), Pro ($200/mo), Business, Enterprise
Reasoning + web search + multi-image batching (up to 8) + output self-verification before delivery. Use for complex layouts, fact-grounded infographics, multi-panel comics, and any prompt with many interdependent constraints.

1.3 The current OpenAI image model lineup

OpenAI image model lineup as of May 2026 with status, capabilities, and recommended use.
Model Released Status Use For
gpt-image-2 April 21, 2026 Current flagship All new work. Thinking Mode. Text rendering. Multi-image edits.
gpt-image-1.5 December 2025 Available, legacy Transparent backgrounds (not yet supported in gpt-image-2). Stable production fallback.
gpt-image-1 April 2025 Deprecated, shutdown October 23, 2026 Migrate immediately. Do not start new work here.
DALL-E 3 October 2023 Retired May 12, 2026 No longer available. All references in old guides are dead.
DALL-E 2 April 2022 Retired May 12, 2026 No longer available.

1.4 API pricing reality

The API switched from flat per-image pricing to token-based billing. Estimated per-image costs at 1024×1024 (rough, varies with prompt complexity):

  • Low quality: ~$0.006 per image
  • Medium quality: ~$0.053 per image
  • High quality: ~$0.211 per image
  • 4K via third-party hosts (fal.ai etc.): up to ~$0.41 per image

Raw token rates: image input $8/M, cached image input $2/M, image output $30/M, text input $5/M, cached text $1.25/M, text output $10/M. The Batch API halves these rates if you can tolerate 24-hour latency.

1.5 Rate limits to plan around

  • Tier 1: 5 images per minute
  • Tier 2: 20 images per minute
  • Tier 3: 50 images per minute
  • Tier 5: 250 images per minute (requires $1,000 spent + 30-day account)

1.6 The 30-second decision tree

  1. Need to generate one image right now? Open chatgpt.com. Free works. Use Instant Mode.
  2. Generating regularly, want multi-image batches and reasoning? Upgrade to Plus ($20/mo) and use Thinking Mode.
  3. Generating thousands per month for an app or pipeline? Use the API. Plan the tier ramp.
  4. Need transparent backgrounds today? Use gpt-image-1.5 via API. gpt-image-2 does not support them yet.
  5. Pure artistic stylization where text doesn't matter? Stay on Midjourney V8 for now.
The Honest Take

For the vast majority of practical work — product photos with labels, marketing graphics, infographics, UI mockups, posters, multilingual content — gpt-image-2 is now the default. Midjourney still wins on pure artistic atmospheric beauty. Nano Banana 2 wins on cost-sensitive bulk and Google Search grounding. Stable Diffusion wins on style fine-tuning and content that gets refused by hosted models. Everything else: gpt-image-2.

MODULE 02 / 10

The 7-Part Prompt Formula

If you remember nothing else from this masterclass, remember this formula. Master prompt engineers use the same seven-part structure on every production prompt. Fill every slot in order. Quality scales with specificity, not adjective stacking.

You will leave this module able to:

  • Construct production-grade prompts using the 7-Part Formula
  • Recognize and remove the AI-cliche words that dilute output
  • Apply the anti-slop rules that separate professional output from generic

2.1 The Formula

Stack your prompt in this exact order. Every slot has a job. Skip a slot and quality drops.

PART 01

Subject

The "who" or "what" of the image. Be concrete: "a matte black ceramic coffee mug" beats "a coffee cup."

PART 02

Style

Named visual treatment. "Editorial product photography" beats "professional photo." "Cinematic film still" beats "movie-like."

PART 03

Composition

Framing and layout. "Centered, eye-level, symmetrical" or "rule-of-thirds with subject on the left." Specify negative space.

PART 04

Lighting

The single biggest lever for realism. Specify direction, hardness, color temperature: "soft diffused daylight from the upper-left, golden hour temperature."

PART 05

Camera & Lens

Photography vocabulary translates directly: "shot on 85mm prime lens, f/1.8, shallow depth of field" or "wide-angle 24mm, deep focus."

PART 06

Mood

The emotional register. "Calm, contemplative, premium." "Energetic, playful, optimistic." Two or three adjectives, not five.

PART 07

Constraints

What you do NOT want. Aspect ratio, banned elements, required readability. "16:9 aspect ratio. No watermarks. No extra text. Text must be sharp and legible."

2.2 Weak prompt vs strong prompt

Weak (don't do this)

A stunning hyper-realistic photo of a coffee shop in the morning. Make it look amazing and professional with beautiful lighting.

Strong (do this)

Subject: A neighborhood coffee shop interior with three customers reading at a wooden counter, a barista pulling espresso, and steam rising from a portafilter. Style: Editorial photojournalism, slightly desaturated, warm undertones. Composition: Wide shot from the entrance looking in, customers visible in the middle ground, barista in sharp focus in the background, rule-of-thirds. Lighting: Soft morning daylight streaming through a tall window on the right, warm tungsten accents from pendant lamps, subtle haze in the air. Camera and lens: Shot on a 35mm prime, f/2.8, slight motion blur on the steam. Mood: Quiet, focused, unhurried, premium-neighborhood. Constraints: 16:9 aspect ratio. No visible brand names on cups or signage. Photorealistic. No people facing the camera.

The strong prompt produces a usable image 9 times out of 10. The weak prompt produces a generic AI coffee shop with garbled signage. Same model, same time, different structure.

2.3 The anti-slop list

These adjectives feel like they should help. They do not. They dilute your prompt by adding weight to non-specific tokens. Remove them and watch quality go up.

Words that hurt your prompt

Banned: stunning, hyper-realistic, breathtaking, amazing, beautiful, professional, high-quality, masterpiece, award-winning, epic, ultra-detailed, 8K, ultra HD, photorealistic (the model is already photoreal — saying it doesn't help).

Replace with: specific lighting, specific lens, specific material, specific lighting temperature, specific composition. Concrete physical descriptors always beat aesthetic adjectives.

2.4 The five constraint types that tighten output

  • Aspect ratio: "16:9", "4:5", "1:1", "9:16", "21:9" — be explicit, do not assume defaults.
  • Camera angle: "eye level", "low angle worm's-eye", "high angle bird's-eye", "Dutch tilt", "isometric 30 degrees".
  • Material specificity: "matte black ceramic", "brushed aluminum", "patinated brass", "raw oak", "cream linen with visible weave".
  • Lighting temperature: "5600K daylight", "warm 3200K tungsten", "cool 6500K overcast", "golden hour 2700K". Or in plain language: "golden hour", "blue hour", "overcast diffused", "harsh midday".
  • Negative constraints: "no watermark", "no extra text", "no logos", "no people facing the camera", "preserve face from reference", "preserve layout".

2.5 The universal template

Copy this skeleton into a notes app. Fill it in for every prompt. Stop guessing.

Subject: [who or what, with concrete physical descriptors] Style: [named visual treatment — editorial, cinematic, documentary, illustrative, etc.] Composition: [framing, where the eye lands, negative space, rule of thirds vs symmetric] Lighting: [direction, hardness, color temperature, atmospheric notes] Camera and lens: [focal length, aperture, depth of field] Mood: [2-3 emotional descriptors, no more] Constraints: [aspect ratio, banned elements, required readability, file format]
Pro tip

If you have to ask a question like "should I add more detail?" the answer is almost always no. Add concrete specificity to existing slots before adding new ones. A prompt with seven slots filled at high specificity beats a prompt with twelve slots filled at low specificity. Always.

MODULE 03 / 10

Photorealism & Product Photography

This is where gpt-image-2 produces work indistinguishable from a real photo studio. The key is treating the prompt like a brief to a working photographer, not like a wish to a genie. Photography vocabulary translates directly to model behavior.

You will leave this module able to:

  • Use lens, aperture, and lighting vocabulary to control realism
  • Produce e-commerce-ready product photography from a single prompt
  • Avoid the four common photoreal failure modes

3.1 The lighting vocabulary that matters

Light is the single biggest variable. The phrases below all produce predictable, repeatable results.

  • Soft diffused daylight — overcast sky look, no harsh shadows, even fill
  • Golden hour — warm directional sunlight, long shadows, 2700K temperature
  • Blue hour — cool dusk twilight, ambient blue, mixed practicals
  • Harsh midday — direct top-down sun, deep contrast, hard shadows
  • Studio softbox — controlled even light from one or both sides
  • Rembrandt lighting — directional with triangle of light on shadow-side cheek
  • Rim lighting — edge backlight separating subject from background
  • Practical-only — only visible in-scene light sources (lamps, neon, screens)

3.2 Camera and lens vocabulary

  • Wide-angle 24mm — sweeping landscape, environmental portrait, immersive
  • 35mm prime — documentary, photojournalistic, slightly wide of natural eye
  • 50mm prime — "natural" eye-equivalent, standard portraits
  • 85mm portrait — flattering compression, shallow depth, headshots
  • 100mm macro — extreme close-up, product detail, textures
  • 200mm telephoto — heavy background compression, isolated subject
  • f/1.4 to f/2.8 — shallow depth of field, subject pops, blurred background
  • f/8 to f/11 — deep focus, everything sharp, product/landscape default

3.3 The four photoreal failure modes

Watch for these

1. The plastic skin look: Skin renders too smooth. Fix: add "natural skin texture, visible pores, subtle imperfections, no skin smoothing."

2. The over-edited HDR look: Everything looks vivid and clarity-pushed. Fix: add "subtle contrast, gentle midtones, no HDR processing, color-graded for editorial."

3. The wrong-eye-line gaze: Eyes look slightly off. Fix: specify gaze direction explicitly ("looking directly into the lens" or "looking off-camera to the left").

4. The blank background trap: Pure white feels cut-out. Fix: add "subtle drop shadow, soft gradient, slight floor reflection" — never just "white background."

3.4 Twelve paste-ready photoreal prompts

Studio product photography of a wireless over-ear headphone in matte graphite finish, centered on a clean off-white seamless background, soft even softbox lighting from above and slightly behind, subtle floor reflection and soft drop shadow, 85mm prime lens, sharp focus throughout. Editorial e-commerce style for an Amazon-style listing. 1:1 aspect ratio. No text or watermarks.
Overhead flat lay of skincare bottles, eucalyptus leaves, a folded linen towel, and a small terracotta dish on a pale concrete surface, soft diffused daylight from above-right, muted sage green and warm beige palette, 50mm lens, sharp focus across the plane, editorial product styling. Square aspect ratio. Generous negative space top-left.
Lifestyle product shot of a person's hands in a cream knit sweater holding a ceramic mug of coffee, sitting by a tall window in a sunlit modern kitchen, warm morning light streaming in from the left, shallow depth of field with the window soft in the background, 50mm lens at f/2, lifestyle brand photography. 4:5 aspect ratio. Face not visible. Photorealistic without HDR processing.
Single hero shot of a sculpted glass perfume bottle on a brushed brass plinth, deep navy gradient background, rim lighting from behind separating the bottle from the background, glossy floor reflection, 100mm macro lens at f/8, sharp focus on the bottle's edge highlights, premium fragrance brand aesthetic, soft volumetric haze in the background. 4:5 aspect ratio.
Extreme macro of a single drop of clear serum landing on a glass eyedropper, capturing the moment of impact with crown splash, pure black background, single hard light from above-right, 100mm macro at f/5.6, frozen motion, pristine reflections, beauty editorial style. 1:1 aspect ratio. No text. Photorealistic without color grading.
Overhead three-quarter shot of a rustic ceramic bowl of roasted heirloom tomatoes, torn sourdough, and fresh basil on a weathered wooden table, golden hour side-light from a window out of frame, deep saturated reds and warm wood tones, 50mm prime, f/2.8, shallow depth of field on the bread crust, food editorial style. 4:5 aspect ratio.
Environmental portrait of a master watchmaker working at a wooden bench in a small Geneva atelier, leather apron over a white shirt, magnifying loupe in hand, soft window light from the left, warm tungsten task lamp on the bench, 35mm lens at f/2.8, slight depth of field, documentary editorial style, no posing. 3:2 aspect ratio. Photorealistic with natural skin texture.
Three-quarter front view of a vintage 1960s European sports car parked on a deserted coastal road at dusk, deep cobalt paint with chrome details, blue hour ambient light, single key light from the upper-left simulating residual sunset, 35mm lens at f/5.6, atmospheric haze in the background mountains, cinematic film still aesthetic. 21:9 aspect ratio. No license plate text.
Wide interior shot of a minimalist Scandinavian apartment, white oak floors, off-white walls, a single oxidized brass pendant lamp over a long oak dining table, late afternoon sun streaming through floor-to-ceiling windows on the right, soft shadows on the floor, 24mm wide-angle at f/8, deep focus, editorial architecture photography. 3:2 aspect ratio.
Mid-shot of a model in an oversized camel wool coat standing against a brutalist concrete wall, hands in pockets, looking off to the right, overcast diffused daylight, neutral cool grey palette, 85mm prime at f/2, shallow depth on the face, contemporary fashion editorial style. 4:5 aspect ratio. No visible brand names. Natural skin texture.
A glass jar of premium honey on a rustic kitchen counter next to a folded linen napkin and a wooden honey dipper, warm afternoon side-light from a window out of frame, deep amber liquid catching the light, 85mm lens at f/2.8, shallow depth of field, lifestyle product photography. The jar label reads "HIVE 24" in a serif typeface. 4:5 aspect ratio.
Frozen action shot of a long-distance runner mid-stride on a forest trail at golden hour, captured from a low side angle with the runner crossing left to right, motion blur on the trailing leg, warm directional backlight rim-lighting the silhouette, 200mm telephoto at f/4, heavy background compression, athletic editorial style. 16:9 aspect ratio. No visible brand logos.
Reuse pattern

Save these 12 as your skeleton library. Swap the subject and material descriptors and the rest of the prompt usually still works. The hardest 80% of the prompt is the lighting + camera + mood combination; that part is reusable across categories.

MODULE 04 / 10

Typography & Text-Heavy Visuals

This is gpt-image-2's biggest superpower. Two years ago, asking any AI image model for a menu with correctly spelled items was a guaranteed failure. Now you can ship a print-ready restaurant menu, a multilingual product label, or a complete poster with display typography on the first try. ~95-99% character accuracy including Japanese, Korean, Chinese, Hindi, Bengali, and Arabic.

You will leave this module able to:

  • Write text-bearing prompts that render correctly on the first attempt
  • Specify exact text strings, typefaces, weights, and layouts
  • Generate multilingual visuals across non-Latin scripts

4.1 The rule of quoted strings

Always wrap the exact text you want rendered in quotation marks inside your prompt. The model treats quoted strings as literal targets. Unquoted text in the prompt is treated as description, not content to render.

Wrong

A vintage poster that says spring sale at the top

Right

A vintage poster with the headline "SPRING SALE" centered at the top in heavy sans-serif type, and the subtitle "30% off all denim through May 31" below in smaller serif text.

4.2 Typography vocabulary the model understands

  • Type family: "heavy sans-serif", "humanist serif", "geometric grotesque", "monospaced", "display serif", "hand-lettered"
  • Weight: "thin", "light", "regular", "medium", "bold", "black"
  • Treatment: "all caps", "small caps", "lowercase only", "italic", "condensed", "extended"
  • Style era: "1960s mid-century", "1970s psychedelic", "1990s grunge", "Swiss design", "Bauhaus", "Art Deco", "Brutalist", "Contemporary editorial"
  • Layout: "stacked vertically", "left-aligned ragged right", "centered", "wrapped around the subject", "across the lower third"

4.3 Fifteen paste-ready typography prompts

A single-page restaurant menu for a modern Mexican kitchen called "BARRIO 7", printed on warm cream paper, headlines in heavy condensed sans-serif, body text in classic humanist serif. Three sections: STARTERS with five items, MAINS with six items, COCKTAILS with four items. Each item has a name, a one-line description, and a price in US dollars. Subtle illustrated chile pepper graphic in the top-right corner. 8.5x11 portrait aspect ratio. Print-ready.
A minimal exhibition poster on a black background with the title "FORM & FOG" centered in massive thin serif type spanning the upper two-thirds, the subtitle "An exhibition of contemporary landscape photography. May 14 to August 30, 2026." in small monospaced type below, and "Boston Museum of Modern Art" in even smaller type at the bottom. Generous negative space. 2:3 aspect ratio.
A close-up product photograph of a small amber glass bottle of essential oil on a marble surface, label in pure white with the brand name "TERRA NOTE" in elegant thin serif at the top, the variant name "BERGAMOT & CEDAR" in heavy sans below, and "30 ml e / 1 fl oz" in tiny type at the bottom. Soft diffused daylight from the left. 4:5 aspect ratio. Photorealistic.
A literary fiction book cover titled "THE LAST QUIET PLACE" by Anna Reyes. Title in tall condensed serif spanning the middle, author name below in classic Trajan-style small caps. Background: a textured pale blue gradient with a single small black silhouette of a lighthouse in the lower-right corner. Publisher imprint "VESPER PRESS" at the very bottom in tiny tracked-out type. 6x9 portrait aspect ratio.
A minimalist Japanese ramen restaurant poster with the kanji title "霧の麺" rendered large in elegant brush calligraphy at the top, the romaji subtitle "Kiri no Men" below in delicate sans-serif, and the English line "FOG NOODLES — TOKYO" at the bottom. Background: a textured washi paper aesthetic in pale grey. Generous negative space. 2:3 aspect ratio.
A photograph of a hand-painted wooden cafe sign hanging above a shop entrance in Seoul. Korean hangul reads "조용한 시간" in confident brushed strokes, with the English translation "QUIET HOURS" in small lowercase serif below. Sign hangs from black iron brackets against a brick wall. Soft afternoon side-light. 1:1 aspect ratio. Photorealistic.
A contemporary architecture magazine cover. The masthead "DWELL" sits across the top in heavy custom display sans. The cover line "INSIDE THE NEW BAUHAUS — Eight homes that rewrite the rulebook" runs across the lower-left in mixed weights. Cover image is a wide architectural shot of a modernist concrete house at golden hour. Issue number "VOL 27 / NO 4 / MAY 2026" in small type at the bottom-right. 8.5x11 portrait aspect ratio.
An Instagram square graphic with the quote "We do not see things as they are. We see them as we are." in confident serif type, centered, broken into four lines across the middle of the canvas. Attribution "— ANAÏS NIN" in small caps below. Background: a soft warm cream gradient with subtle paper grain. 1:1 aspect ratio. No other elements.
A modern airport terminal wayfinding sign mounted on a white pillar, with three lines of information in clean grotesque sans-serif: "GATES A1-A24" with a right arrow, "BAGGAGE CLAIM" with a down arrow, and "RIDESHARE PICKUP" with a left arrow. Black type on white background, soft fluorescent ambient lighting, shallow depth of field. 4:5 aspect ratio. Photorealistic.
A bold gym promotional poster with the massive headline "MOVE BETTER" in heavy condensed sans-serif spanning the full width, the subtitle "Strength & conditioning. Mornings & evenings. No contracts." across the middle in regular weight, and the location "FIELD HOUSE — SOMERVILLE" at the bottom. Background: a deep matte black with a subtle athletic figure silhouette mid-jump. 4:5 aspect ratio.
An infographic header card with the title "THE STATE OF HEADLESS COMMERCE — 2026" in editorial sans-serif at the top, a one-sentence subtitle "How 1,200 brands made the jump in the last twelve months." in serif italic below, and four small statistic teasers across the bottom: "62% report higher conversion", "3.2x average build cost", "$180k median engineering investment", "47% migrated back within 18 months". 16:9 aspect ratio. Editorial magazine aesthetic.
A close-up of a printed concert ticket on a wooden table. The ticket reads, from top to bottom: "ANNA REYES" as the headline artist in heavy serif, "TUESDAY MAY 19 / DOORS 7PM / SHOW 8PM" in mono type, "PARADISE ROCK CLUB / BOSTON MA" in elegant sans-serif, and "GA STANDING / SEC FLOOR / $42" at the bottom. Subtle warm overhead lighting. 16:9 aspect ratio. Photorealistic.
A coffee bag photograph showing a matte kraft pouch with a full label design. Top line: "ORIGIN COFFEE CO." in small caps. Middle: "ETHIOPIA YIRGACHEFFE" in tall condensed serif. Below: "Single Origin / Washed Process / Roasted May 8 2026" in three lines of small mono type. Bottom: "340g e / 12 oz" with a small flag of Ethiopia icon. Studio lighting, 1:1 aspect ratio, photorealistic.
A product launch banner with the same announcement in four languages stacked vertically: English "NEW ARRIVAL", French "NOUVELLE COLLECTION", Japanese "新作登場", Korean "신상품 출시". Each language in its appropriate display typography. Background: a clean cream gradient. Below all four lines, the date "MAY 14 2026" in small monospaced type. 1:1 aspect ratio.
A photograph of a folded heather grey t-shirt laid flat on a wooden surface. The shirt's chest print reads "ASK BETTER QUESTIONS" in confident condensed sans-serif across two lines, with a small minimalist line-drawing of a thinking head icon beneath. Soft diffused daylight. 4:5 aspect ratio. Photorealistic. Print must be sharp and centered.

4.4 When text rendering still fails

Three remaining gotchas to manage.

  • Very long strings (more than ~80 chars) sometimes drift. Break them up or place them on multiple lines explicitly.
  • Tiny text at small sizes can lose accuracy. If readability matters at small render, use Thinking Mode and increase the resolution.
  • Custom proprietary typography is approximate. The model will produce the genre (humanist sans, Trajan-style serif, etc.) but not literally your brand font. Composite the real font in post for brand-critical assets.
Production tip

For anything print-bound or brand-critical, generate the image with placeholder text, then composite your real typography in Figma or Photoshop. gpt-image-2 nails the layout 95% of the time, which is the hard part. Final type swap is the easy part.

MODULE 05 / 10

Thinking Mode Deep Dive

Thinking Mode is the agentic reasoning layer in gpt-image-2. Before generating a single pixel, the model plans the composition, optionally searches the web for live references, drafts multiple coherent variants, and self-checks the output. It costs more compute and latency. It pays back on complex prompts. This module is the playbook for when to use it.

You will leave this module able to:

  • Recognize the prompt patterns that benefit from Thinking Mode
  • Trigger Thinking Mode reliably in chat and via the API
  • Decide when to deliberately stay in Instant Mode for speed

5.1 What Thinking Mode actually does

Four mechanics run before the image renders:

  1. Composition planning. The model reasons about layout, object placement, spatial relationships, and lighting consistency.
  2. Web search grounding. When the prompt references a real place, product, event, or date past the December 2025 knowledge cutoff, the model can fetch live reference data.
  3. Multi-image batching. Up to 8 coherent images from one prompt with character and object continuity across the set.
  4. Output self-verification. The model checks its own output against the prompt before delivering, often catching layout misses that would have shipped in Instant Mode.

5.2 When Thinking Mode pays back

When to use Thinking Mode versus Instant Mode based on prompt characteristics.
Prompt characteristic Recommended mode Why
Single subject, single compositionInstantReasoning overhead not worth latency
5+ objects with spatial relationshipsThinkingPlanning prevents object collisions and miscounts
Multi-panel comic, storyboard, batch of 4-8 imagesThinkingContinuity across batch only works in Thinking Mode
Reference to real product, place, or date past Dec 2025ThinkingWeb search grounds factual accuracy
Infographic with dense text and dataThinkingSelf-verification catches text errors
Hero shot with one productInstantFaster and cheaper, quality already strong
50 A/B variants of a headlineInstantVolume work; reasoning overhead is wasted
Complex multilingual layoutThinkingComposition planning + script accuracy benefits

5.3 How to trigger Thinking Mode

  • In ChatGPT (Plus/Pro/Business): Click the model selector and pick the "Images with Thinking" option, or include phrases like "think carefully about", "plan the composition", or "verify the output" in your prompt — these cues nudge the system to engage reasoning.
  • In the API: Quality and complexity parameters drive engagement automatically. Higher quality settings spend more tokens on reasoning. The Responses endpoint exposes thinking traces if you want to inspect them.

5.4 Prompt patterns that exploit Thinking Mode

Pattern A: The 10-object scene

gpt-image-2 in Thinking Mode can handle 10-20 distinct objects with their traits and spatial relationships. Use this for explainer infographics, complex marketing scenes, and "everything I sell" hero shots.

A 4-row by 3-column grid on a white background containing these 12 items in order, left to right, top to bottom: 1. A matte black ceramic coffee mug 2. A vintage brass compass 3. A pair of tortoiseshell reading glasses 4. A leather-bound notebook with a fountain pen on top 5. A single ripe Bartlett pear 6. A small terracotta succulent in a hand-thrown pot 7. A folded grey cashmere scarf 8. A polished river stone the size of a fist 9. A glass jar of honey with a wooden dipper 10. A pair of brown leather Oxford shoes 11. A vintage silver pocket watch on a chain 12. A small stack of three hardcover books in earth tones Soft diffused daylight from above. Subtle drop shadows. Editorial product photography. 4:3 aspect ratio. Items should be visually balanced in size.

Pattern B: The fact-grounded infographic

An editorial infographic titled "BOSTON MARATHON 2026 — BY THE NUMBERS" with five large stat blocks: total registered runners, fastest finishing time, charity dollars raised, average age of participants, and number of countries represented. Use the actual 2026 figures by searching for them. Layout: title across the top in heavy condensed sans-serif, five stat blocks in a row across the middle each with a giant number and a one-line caption, source citation in tiny type at the bottom. Color palette: deep blue, gold, cream. 16:9 aspect ratio.

Pattern C: The 4-image consistent batch

Generate 4 coherent images of the same character — a 30-something woman with shoulder-length auburn hair, wearing a cream linen blazer over a charcoal turtleneck, in different scenes: 1. Standing at a podium giving a keynote address, confident pose 2. Sitting at a wooden cafe table reading a book, soft afternoon light 3. Walking down a city sidewalk with a leather tote bag, three-quarter view 4. Laughing with friends at a dinner table, candid lifestyle shot Maintain exact character likeness, clothing, and hair across all four images. Editorial photojournalism style. 4:5 aspect ratio each. Natural skin texture, no HDR processing.

Pattern D: The multi-panel comic

A 6-panel comic strip in a 2-by-3 grid, drawn in clean line art with limited flat color. Recurring character: a small grey tabby cat. Panels in order: 1. Cat sitting on a windowsill watching rain outside 2. Cat batting at a falling leaf on the glass 3. Cat losing interest, looking inside the room 4. Cat noticing an empty food bowl in the kitchen 5. Cat sitting expectantly next to a human's leg 6. Cat eating from the now-full bowl, looking content Same character design and color palette in every panel. Clean white gutters between panels. Editorial cartooning style. 1:1 aspect ratio overall.

5.5 When to deliberately stay in Instant Mode

  • You are exploring; you will iterate 5-10 times before committing
  • You are generating volume (50+ headline variants, bulk product shots)
  • The prompt is simple (single subject, single composition)
  • You are cost-constrained or on free tier
  • You need fast turnaround for a real-time application
Honest take

Most casual users will never need Thinking Mode. Most professional users will use it 30-50% of the time and Instant Mode the rest. The discipline is matching the mode to the prompt complexity, not defaulting to "always Thinking" because it sounds better.

MODULE 06 / 10

Editing, Inpainting & Multi-Image Reference

Generating a new image is the easy half. The hard half is fixing a 90%-there image without losing what already works. gpt-image-2's v1/images/edits endpoint supports mask-based inpainting, outpainting, and up to 16 reference images per call. This module is the editing playbook.

You will leave this module able to:

  • Pick between text-only edit, mask-based edit, and multi-reference edit
  • Build masks correctly so edits land where you want them
  • Use up to 16 reference images for style and subject transfer

6.1 Three edit modes, one endpoint

The images.edit endpoint handles all three. The mode is determined by what you pass in.

  1. Text-only edit. Pass one reference image + a prompt. The model interprets the change you want and applies it. Best for "make the sky orange" or "remove the person on the left."
  2. Mask-based edit (inpainting). Pass one reference image + a mask + a prompt. The transparent areas of the mask define what changes; everything else is preserved. Best for surgical product retouching, text replacement, and background swaps.
  3. Multi-reference edit. Pass 2-16 reference images + a prompt. The model uses them as visual context for style, subject, or composition. Best for character transfer, style matching, and brand-locked output.

6.2 The mask rules

  • Mask must match the source image dimensions exactly. No automatic resizing. Mismatched dimensions cause the request to fail.
  • Mask requires an alpha channel. Transparent pixels = "edit here". Opaque pixels = "preserve here".
  • Soft mask edges produce smoother transitions. Hard-edged masks can produce visible seams.
  • White-region masks (in some third-party hosts) invert this convention. Check the host's docs.

6.3 The "full redraw" reality

Important to understand: every gpt-image-2 edit is a full redraw under the hood. The model encodes your source image into tokens, generates a new complete token sequence, and decodes it into a new image. Even mask-based inpainting only adds constraints during the redraw — it does not literally overwrite pixels locally.

Four mechanisms preserve apparent consistency: visual token feature abstraction, global self-attention reference to the original, training inductive bias toward minimal change, and explicit planning in Thinking Mode.

Practical implication: micro-details (skin pores, exact fabric weave, precise reflections) can shift even when the model "preserves" a region. For brand-critical pixel-exact regions, composite the original back in via Photoshop.

6.4 Ten paste-ready edit prompts

[Edit endpoint with reference image] Replace the background with a clean off-white seamless studio backdrop with a subtle drop shadow under the subject. Preserve the subject, its lighting, and its colors exactly. Add a soft floor reflection. No text, no other elements.
[Edit endpoint with reference image] Remove the person standing on the left side of the frame. Reconstruct the wall, floor, and background that would naturally be visible behind them. Preserve everyone else, the lighting, and the overall composition. Do not add new elements.
[Edit endpoint with reference image] This is a Spanish menu. Translate every menu item to English while preserving the typography, layout, paper texture, and prices exactly. Keep the section headers in their existing styling. Do not change the visual design of the menu.
[Edit endpoint with reference image] Change the lighting from harsh midday to soft golden hour. Warm the color temperature, soften the shadows, add long directional shadows from the lower-left, and introduce a gentle haze in the background. Preserve the subject, composition, and all other elements.
[Edit endpoint with reference image and mask covering the jacket] Change the masked jacket from olive green to deep navy blue. Match the original lighting, fabric texture, and material. Preserve every other element in the image including the face, hair, and background.
[Edit endpoint with reference image and mask extending the canvas] Extend the canvas to a 16:9 aspect ratio by adding scene to the right of the existing image. Continue the wooden table, add a window in the upper-right, and place a small terracotta plant in the lower-right. Match the existing lighting and color grade exactly.
[Edit endpoint with 2 reference images: image A = a portrait, image B = a Wes Anderson film still] Generate a new portrait of the subject in image A, restyled in the visual aesthetic of image B. Match the color palette, framing, and symmetry. Preserve the subject's facial identity and clothing. Do not literally composite the two — render a new image inspired by both.
[Edit endpoint with 4 reference images of the same character from different angles] Generate a new image of this character in a different scene: walking through a Tokyo train station at night, wearing the same outfit shown in the references, captured in a wide candid documentary style. Preserve the character's facial features, hair, and clothing exactly. 3:2 aspect ratio.
[Edit endpoint with 3 reference images: image A = a blank coffee bag template, image B = a brand logo, image C = a typography sample] Generate a finished coffee bag mockup using the bag shape from image A, the logo from image B placed in the upper-third, and the typography style from image C for the product name "EQUINOX BLEND" in the lower-third. Render as a photograph on a marble counter with soft daylight. 4:5 aspect ratio.
[Edit endpoint with reference image and mask covering existing text] Replace the headline text in the masked region with "SPRING ARRIVALS" in the same typography style, weight, and color as the original. Preserve the surrounding image, layout, and any other elements exactly.

6.5 Edit cost reality

Important pricing note: edit requests always process reference images at high fidelity regardless of your quality parameter. This means edit requests cost more per call than pure text-to-image generation.

Two practical implications:

  • Downscale your reference images before uploading. A 4K reference photo costs the same input tokens whether your output is a thumbnail or a poster.
  • For high-volume edit pipelines, run a 1-week pilot at your real workload and multiply by 4.3 to get your monthly burn rate.
The character consistency reality

The "same character across multiple scenes" claim works well for simple subjects and gets less reliable as scene complexity grows. For series work (product line, character storyboard, recurring brand mascot), stack 4-8 reference images of the character from different angles and pass them all on every edit. Quality jumps significantly.

MODULE 07 / 10

Marketing & E-Commerce Production Pipelines

This is where gpt-image-2 produces the biggest practical ROI for working brands. Multi-format batch generation, brand-locked product photography at scale, 8-image carousels with character continuity, UGC-style ads, and before/after pairs. This module is the production playbook.

You will leave this module able to:

  • Generate multi-format asset variants from a single brand brief
  • Build 8-image product carousels with locked brand identity
  • Produce convincing UGC-style ad creative without a real shoot

7.1 The multi-format batch pattern

Generate one hero image, then request platform-specific variants from the same source. Each variant preserves the headline text and brand color cues.

Generate the same marketing campaign in four aspect ratios, with the same product, the same headline, and the same brand identity across all four: Subject: A premium glass water bottle in matte sage green with the brand name "WELL & FLOW" on a minimal white label. Headline text: "DRINK BETTER. EVERY HOUR." Brand palette: Sage green, off-white, deep charcoal. Tone: Calm, premium, intentional. Four outputs needed: 1. 1:1 square for Instagram feed 2. 9:16 vertical for Instagram Story / TikTok 3. 16:9 horizontal for YouTube thumbnail 4. 4:5 portrait for Pinterest / shopping ad Maintain identical typography, color, and product appearance across all four. Composition should adapt to each aspect ratio — do not just crop the same image.

7.2 The 8-image carousel pattern

[Thinking Mode] Generate 8 coherent product photography images of the same skincare line for a launch carousel. Brand: "FOREST ROOM". Three products: a cream cleanser, a botanical serum, and a moisturizing balm — all in matching amber glass with cream labels. Image 1: All three products together on a marble surface, flat lay with eucalyptus leaves Image 2: Cleanser in use, hands lathering at a sink with warm morning light Image 3: Serum dropper close-up, a single drop captured mid-air Image 4: Balm jar open showing texture, top-down macro Image 5: A model's reflection in a bathroom mirror, holding the serum Image 6: Hero shot of all three products vertically aligned on a wooden shelf Image 7: Texture macro of all three formulas swatched side by side on skin Image 8: Lifestyle shot of the products on a bedside table with morning sun Maintain exact brand identity across all 8 images: same amber glass, same cream labels, same color grading, same warm minimalist aesthetic. 4:5 aspect ratio each.

7.3 UGC-style ad creative

A UGC-style smartphone photograph (not professional studio) of a person holding up a wireless earbud case at arm's length in front of a casual home setting. Slight phone camera imperfections — minor lens flare, slightly over-exposed, casual framing. The earbuds case is white with a small brand logo. The person is mid-30s, in a casual sweatshirt, smiling. Natural daylight from a window. Looks like a real customer post, not an ad. 4:5 aspect ratio.

7.4 Before-and-after pair

A single 1:1 image split into two halves with a thin vertical dividing line. Left half labeled "BEFORE" in small caps at the top — a cluttered home office: papers stacked, tangled cables, sticky notes everywhere, dim lighting. Right half labeled "AFTER" at the top — the same office, now organized: clean desk, single monitor, hidden cables, a single plant, soft daylight. Same room, same furniture, same window — only the organization changes. Photorealistic with natural lighting. Maintain a consistent perspective and time of day across both halves.

7.5 Product on context

A wide editorial shot of a single ceramic pour-over coffee dripper sitting on a wooden counter in a quiet morning kitchen. Steam rising from the carafe below. Soft window light streaming from the left, warm color temperature. A simple cream-tile backsplash, no clutter. 3:2 aspect ratio. The coffee dripper is the unmistakable focal point but the scene reads as a lived-in moment, not a product catalog. Editorial lifestyle photography. Photorealistic with natural skin texture if any hands are visible.

7.6 Social card with text

A 1:1 Instagram social card with a deep forest green background and the headline "SUSTAINABILITY IS A SUPPLY CHAIN, NOT A SLOGAN." in large confident serif, broken into three lines, centered, cream-colored type. Below the headline, the small attribution "— OUR Q1 LETTER" in tracked-out monospace small caps. Generous negative space top and bottom. Brand mark "FOREST ROOM" in tiny lowercase at the very bottom center.

7.7 Hero banner with overlay

A 21:9 ultrawide horizontal hero banner. Left two-thirds: a cinematic editorial photograph of a couple walking on a foggy New England coastline at golden hour, shot from behind, wide angle. Right one-third: a clean off-white panel containing the headline "MAY ARRIVALS — COASTAL ESSENTIALS" in heavy condensed serif, the subhead "Hand-knit sweaters, weatherproof outerwear, and waxed cotton bags. Now in stock." in regular body, and a "SHOP THE COLLECTION" CTA button rendered as a black rounded rectangle with cream type. Photorealistic on the image side, clean editorial layout on the text side.

7.8 Email header

A 600x300 pixel email header for a brand newsletter. Left half: a single product photograph of a leather-bound notebook on a desk with a pen. Right half: cream solid panel with "ISSUE 17" in tiny mono, "ON WRITING" as the main headline in tall serif, and "Weekly notes on craft, process, and the long game." as a one-line subhead. Editorial newsletter aesthetic. Newsletter brand "PARAGRAPH 7" in tiny type at the bottom-left.

7.9 Product on white at scale

Generate the same product — a brushed aluminum water bottle in three colorways: graphite, sage, and bone — as three separate 1:1 product-on-white images. Each image: bottle centered, soft even softbox lighting, subtle drop shadow, slight floor reflection, off-white background. Maintain identical lighting, camera angle, and overall composition across all three so they look like a consistent product catalog set. No text or branding visible. Photorealistic.

7.10 The brand-locking prompt structure

For agency or production work where 100% brand consistency matters, prepend this brand lock to every prompt in the session. Pair it with the Brand Voice Skill in the Skills Library section.

BRAND LOCK — apply these to every output in this session: Brand: [BRAND NAME] Palette: [hex codes — primary, accent, neutral, background] Typography style: [display family character + body family character] Photography style: [editorial / documentary / studio / lifestyle / catalog] Lighting bias: [warm / neutral / cool] + [hard / soft / mixed] Color grading: [subtle / saturated / desaturated / warm / cool] Tone: [3 adjectives max] Banned: [generic phrases, banned visual cliches, banned compositions] Required: [logo placement rules, required negative space, required readability]
MODULE 08 / 10

Characters, Concept Art & Storyboards

The hardest test for any image model: keep a single character looking like themselves across many scenes. gpt-image-2 is the strongest model on the market for this as of May 2026, with two limits: simple subjects work better than complex ones, and exact facial likeness of a real person remains a hard wall. This module is the character pipeline.

You will leave this module able to:

  • Build a character reference sheet that holds up across scenes
  • Generate multi-view consistency (front, three-quarter, side, back)
  • Produce multi-panel storyboards with character continuity

8.1 The reference sheet pattern

Step 1 of every character pipeline: generate a clean reference sheet. Step 2: re-use that reference sheet as input to every subsequent prompt.

A character reference sheet for an animated story. Single character, displayed in five views on a neutral grey background: 1. Front view, neutral expression 2. Three-quarter view from the left, neutral expression 3. Side profile facing right 4. Back view 5. Front view with a smile Character details: - Late-20s woman - Shoulder-length wavy auburn hair, parted on the left - Hazel eyes - Pale freckled skin, natural texture - Wearing a cream linen blazer over a charcoal turtleneck and dark indigo jeans - Tan leather ankle boots - Small gold hoop earrings, no other jewelry Each pose at the same height, evenly spaced, full body visible in each. Clean concept-art style. Soft even studio lighting. 16:9 aspect ratio with all five poses in a single horizontal row.

8.2 Expression sheet

An expression sheet for the same character (use the reference sheet as input). Display the character's face from the same three-quarter angle in 8 different expressions, arranged in a 2-by-4 grid: 1. Neutral 2. Smiling warmly 3. Laughing 4. Concentrating 5. Concerned 6. Surprised 7. Skeptical 8. Tired Maintain exact facial features and hair across all 8. Clean character design style. Soft studio lighting. 16:9 aspect ratio.

8.3 Costume design sheet

A costume design sheet for the same character in 4 different outfits, full body, front view, same neutral pose across all 4: 1. Business: tailored navy blazer, white silk shell, grey trousers, low black heels 2. Casual: oversized grey hoodie, dark jeans, white sneakers 3. Outdoor: olive waxed cotton jacket, plaid flannel shirt, cuffed denim, leather hiking boots 4. Evening: deep emerald slip dress, gold drop earrings, strappy heels Maintain exact facial features and hair across all 4 outfits. Each outfit labeled in small mono caps below the figure. Clean concept-art style. 16:9 aspect ratio with all 4 in a horizontal row.

8.4 Multi-scene continuity

[Thinking Mode, with character reference sheet as input] Generate 6 photographic scenes of the same character, maintaining exact facial features, hair, and the Business outfit (navy blazer, white shell, grey trousers, low black heels) throughout: 1. Arriving at an office building entrance, morning, three-quarter view 2. In a glass-walled conference room presenting at a whiteboard, mid-shot 3. Walking through an open-plan office carrying a coffee cup, candid 4. Sitting at a desk with a laptop, mid-afternoon, side profile 5. Stepping out of the building at dusk with a coat over her arm 6. At a casual after-work bar with two colleagues, laughing Editorial corporate photography style. Natural skin texture, no HDR processing. 4:5 aspect ratio each. Same lighting feel throughout the day's arc.

8.5 Storyboard panels

[Thinking Mode] A 9-panel storyboard in a 3-by-3 grid, drawn in clean black-and-white ink wash. Recurring protagonist: a young man with short dark hair, round glasses, wearing a wool overcoat. Story: Panel 1: Wide establishing shot, the protagonist walking down a foggy street at night Panel 2: Close-up on his face, looking at something off-frame Panel 3: Reveal of what he sees — a glowing window in an old library Panel 4: He approaches the library door, hand on the brass handle Panel 5: Interior: shelves of books, single lamp, an elderly librarian Panel 6: The librarian looks up, smiles, gestures to a chair Panel 7: Close-up of an open leather-bound book on a wooden table Panel 8: The protagonist reading, candle nearby, deep in thought Panel 9: Wide pull-back through the window, library lit warm against the foggy night Maintain exact character design across all 9 panels. Cinematic graphic novel style. 1:1 aspect ratio overall.

8.6 Concept art for a fictional product

Three concept art views of a fictional handheld GPS device for outdoor explorers. Industrial design aesthetic: titanium body, matte black bezel, single circular monochrome e-ink display, two physical buttons. Brand mark: a small embossed compass icon. Three views in a single horizontal row on a clean white background: 1. Front view, screen on, displaying a topographic map 2. Three-quarter view from the right, showing the side button and lanyard loop 3. Back view, showing the textured grip and a small data plate Studio product photography, soft directional light. Editorial industrial design portfolio aesthetic. 16:9 aspect ratio.

8.7 The real-person workaround

gpt-image-2 will refuse to render exact likenesses of real public figures or named celebrities. Workarounds for legitimate use cases:

  • For personal portraits: Pass your own photo as a reference image and prompt for stylization or scene changes. Self-portraits are fine.
  • For fictional characters inspired by archetypes: Describe the archetype without naming the real person. "A 60s-era folk singer-songwriter" instead of "Bob Dylan."
  • For period or historical figures: Use descriptive language rather than names. "A Renaissance painter at his easel" instead of naming the painter.
  • For brand mascots: Define the character explicitly with concrete descriptors and lock it via the reference sheet pattern.
Character consistency limit

Character consistency is excellent for simple subjects (one person, clear outfit, predictable scene). It degrades as scene complexity grows. For brand-critical character work across 20+ assets, plan to do final brand-pass cleanup in Photoshop. The model gets you 90% of the way; the last 10% is hand work.

MODULE 09 / 10

API & Production Engineering

Everything above runs in ChatGPT.com. For volume, automation, integrations, and pipelines, you need the API. This module is the engineering reference: authentication, request shape, rate limits, cost discipline, error handling, and the patterns that scale.

You will leave this module able to:

  • Make working gpt-image-2 API calls in Python and Node
  • Implement the edits endpoint with masks and reference images
  • Plan rate limits, cost discipline, and observability for production

9.1 Endpoint surface

Four endpoints touch gpt-image-2:

  • POST /v1/images/generations — text-to-image
  • POST /v1/images/edits — image-to-image with optional mask and up to 16 references
  • POST /v1/responses — newer unified endpoint with reasoning trace access
  • POST /v1/chat/completions — image generation called as a tool from a chat

9.2 Basic text-to-image (Python)

from openai import OpenAI client = OpenAI() response = client.images.generate( model="gpt-image-2-2026-04-21", # pin to snapshot for production prompt="Studio product photography of a wireless over-ear headphone in matte graphite, centered on a clean off-white seamless background, soft even softbox lighting from above, subtle drop shadow, 85mm prime lens.", size="1024x1024", quality="medium", # low | medium | high n=1, # 1 to 10 ) # Image returned as a URL or base64 depending on response_format image_url = response.data[0].url print(image_url)

9.3 Image edit with mask (Python)

with open("product.png", "rb") as image_file, \ open("mask.png", "rb") as mask_file: response = client.images.edit( model="gpt-image-2", image=image_file, mask=mask_file, # transparent areas define edit region prompt="Replace the masked background with a clean seamless off-white studio backdrop. Preserve the subject exactly.", size="1024x1024", # Quality is fixed at high for image inputs regardless of this parameter ) print(response.data[0].url)

9.4 Multi-image reference edit (Python)

# Up to 16 reference images reference_files = [open(p, "rb") for p in [ "char_front.png", "char_three_quarter.png", "char_side.png", "char_back.png", ]] response = client.images.edit( model="gpt-image-2", image=reference_files, # list of file handles prompt="Generate a new image of this character walking through a foggy Tokyo train station at night, in the same outfit shown in the references. Editorial cinematography. 3:2 aspect ratio.", size="1536x1024", ) for f in reference_files: f.close() print(response.data[0].url)

9.5 Node.js / TypeScript (text-to-image)

import OpenAI from "openai"; const client = new OpenAI(); const response = await client.images.generate({ model: "gpt-image-2-2026-04-21", prompt: "Editorial photojournalism of a master watchmaker at a wooden bench in a Geneva atelier, soft window light, 35mm lens, f/2.8, natural skin texture.", size: "1024x1536", quality: "high", n: 1, }); console.log(response.data[0].url);

9.6 Async polling pattern (for hosts that queue)

Some third-party hosts (fal.ai, WaveSpeed, etc.) wrap gpt-image-2 in an async queue. Pattern:

// Submit const { request_id } = await fal.queue.submit("fal-ai/gpt-image-2", { input: { prompt: "..." }, webhookUrl: "https://your-server.com/webhook", }); // Poll for status const status = await fal.queue.status("fal-ai/gpt-image-2", { requestId: request_id, logs: true, }); // Fetch result when ready const result = await fal.queue.result("fal-ai/gpt-image-2", { requestId: request_id, });

9.7 Rate limit ladder

Plan the tier ramp before you launch. Limits are per minute for image generation:

OpenAI gpt-image-2 rate limits by usage tier with the spend thresholds to unlock each. Verify against OpenAI's official rate limits page before launch.
TierImages/MinUnlock Threshold
Tier 15Default for new accounts
Tier 220$50 spent + 7 days since first payment
Tier 350$100 spent + 7 days
Tier 4100$250 spent + 14 days
Tier 5250$1,000 spent + 30 days

9.8 Cost discipline patterns

  • Draft at low, finalize at high. Low quality at 1024×1024 is ~$0.006/image. Use it for exploration, then re-render the winner at high (~$0.211).
  • Cache repeated prompts. Cached text inputs drop from $5/M to $1.25/M tokens. Worth setting up for any pipeline with reused brand context.
  • Batch API halves rates if you can tolerate 24-hour latency. For overnight campaign generation, batch is the right call.
  • Downscale references before upload. Edit requests always process images at high fidelity. A 4K reference costs the same tokens as a 1024 reference for the same task.
  • Cap output resolution to use case. Social thumbnails do not need 4K. Cap at the smallest size your final deliverable accepts.

9.9 Error handling pattern

from openai import OpenAI, APIError, RateLimitError, BadRequestError import time client = OpenAI() def generate_with_retry(prompt: str, max_retries: int = 3) -> str: for attempt in range(max_retries): try: response = client.images.generate( model="gpt-image-2-2026-04-21", prompt=prompt, size="1024x1024", quality="medium", ) return response.data[0].url except RateLimitError as e: wait = (2 ** attempt) + 1 print(f"Rate limited, waiting {wait}s") time.sleep(wait) except BadRequestError as e: # Likely content policy refusal - no retry will help print(f"Refused: {e.message}") return None except APIError as e: if attempt == max_retries - 1: raise time.sleep(2) return None

9.10 Observability requirements

Log every request with at minimum:

  • Model snapshot ID (always pin in production)
  • Size and quality used
  • Input and output token counts
  • Latency from submit to result
  • Request ID (for debugging with OpenAI support)
  • Retry count
  • Moderation outcome (auto vs low)
  • Estimated cost per request

Without these, you cannot diagnose incidents or forecast monthly burn rate accurately.

Production pre-launch checklist

Before you flip the switch on a real pipeline: pin to snapshot, validate rate limit headroom at peak traffic, implement retry with exponential backoff, log every request, set a daily spend cap on your OpenAI account, and run a 1-week pilot at your real workload before forecasting your monthly bill. Multiply the weekly burn by 4.3 and add a 20% buffer.

MODULE 10 / 10

Comparison, Limits, and the Honest Take

gpt-image-2 is the strongest general-purpose image model in May 2026. It is not the strongest at everything. This module is the honest decision matrix: where each competitor wins, where gpt-image-2 hits walls, and how to build a multi-model workflow.

You will leave this module able to:

  • Pick the right image model for each job
  • Recognize gpt-image-2's hard failure modes
  • Build a multi-model workflow that uses each tool's strength

10.1 The four major image models in May 2026

Comparison of the four major image models as of May 2026 across capability, cost, and use case fit.
Axis gpt-image-2 Nano Banana 2 Midjourney V8 Stable Diffusion XL+
Text rendering Best (~99%) Mid (~70%) Weak Weak (better with LoRA)
Artistic style Strong Strong Best Strong (with fine-tunes)
Instruction following Best Strong Weakest (interprets liberally) Mid
Reference-driven editing Best (16 refs) Strong Weak Strong (ControlNet)
Web search grounding Yes (Thinking Mode) Yes (Google Search) No No
Cost at scale Moderate Best Subscription only Best (self-host)
Content policy strictness Strict Strict Moderate Permissive (self-host)
Transparent backgrounds No (yet) Yes No Yes (with workflow)

10.2 When to use each model

  • gpt-image-2 — Marketing assets, product photography, infographics, posters, packaging, UI mockups, multilingual content, character sheets, anything with readable text. Default starting point for most production work in 2026.
  • Nano Banana 2 — Cost-sensitive bulk generation, anything that needs Google Search grounding for geographic or news-grounded references, work where transparent backgrounds are required today, and high-volume e-commerce when budget is the constraint.
  • Midjourney V8 — Pure artistic atmospheric beauty, stylized art without readable text, cinematic moodboards, concept art where interpretation beats instruction-following, anything where "feeling" beats "accuracy."
  • Stable Diffusion XL+ (or successors) — Style fine-tuning with your own LoRA, self-hosted permissive content, custom workflows with ControlNet, locally-controlled pipelines where data must not leave your infrastructure.

10.3 gpt-image-2's hard limits

Be honest about where the model still fails. Pretending otherwise wastes time.

  • Facial likeness of real people. Strict content policy. Will refuse named celebrities, public figures, and politicians. Self-portraits via reference image are fine.
  • Transparent backgrounds. Not supported in gpt-image-2 yet. Use gpt-image-1.5 via the API as the legacy fallback or post-process with a background removal tool.
  • Very long text strings. Strings beyond ~80 characters can drift even with the new text rendering. Break long copy across multiple lines explicitly.
  • Brand-critical pixel-exact regions. Even mask-based edits do a full redraw under the hood. For exact pixel preservation (your real logo, brand-locked typography), composite the original back in via Photoshop after generation.
  • Very stylized artistic work where text doesn't matter. Midjourney V8 still has the edge on pure atmospheric beauty. Use the right tool.

10.4 The multi-model workflow

Most professional shops in May 2026 use 2-3 models in combination:

  1. gpt-image-2 for hero and primary assets. The default starting point. Text, layout, instruction-following, multi-image consistency.
  2. Midjourney V8 for atmospheric secondary content. Moodboards, social mood images, environmental backgrounds. Anything where instruction-following matters less than mood.
  3. Stable Diffusion for content that gets refused or needs fine-tuned style. Self-hosted, permissive, custom LoRA workflows for brand mascots and signature styles.
The honest final take

If you can only learn one model in 2026, learn gpt-image-2. The text rendering, the instruction following, the multi-image edits, the 8-image batches — these alone replace 80% of what working teams were doing across multiple models a year ago. Keep Midjourney for atmospheric work, keep Stable Diffusion for self-hosted needs, but make gpt-image-2 your daily driver. The rest of this masterclass — the 100 prompts, the 8 templates, the 5 skill packages — assumes that decision is already made.

Reference: 100 Paste-Ready Prompts

The ChatGPT Images 2.0 Prompt Library

One hundred production-tested prompts across ten categories. Each follows the 7-Part Prompt Formula from Module 2. Replace bracketed placeholders with your specifics. Save the ones that work for your brand as your personal prompt library.

Photorealism Prompts (01-10)

Studio-quality photoreal output. Pair with explicit lighting and lens vocabulary from Module 3.

01Photoreal

Editorial portrait of a [age] [gender] [profession] in [setting], shot on 85mm prime at f/2, soft window light from the [direction], natural skin texture with visible pores, slight desaturation, no HDR processing. 4:5 aspect ratio.

02Photoreal

Wide environmental shot of [location] at [time of day], 24mm wide-angle at f/8, deep focus, [warm/cool/neutral] color temperature, subtle atmospheric haze, documentary photography style. 3:2 aspect ratio.

03Photoreal

Macro detail shot of [subject] showing [specific texture or surface], 100mm macro lens at f/5.6, single hard light source from above-right, sharp focus on [focal point], frozen motion, beauty-editorial aesthetic. 1:1 aspect ratio.

04Photoreal

Three-quarter front view of a [object] on a [surface] in [environment], 50mm prime at f/2.8, golden hour side light, shallow depth of field, editorial commercial photography. 4:5 aspect ratio.

05Photoreal

Frozen action moment of [subject] mid-[action], 200mm telephoto at f/4, heavy background compression, warm rim backlight, slight motion blur on [moving element], athletic editorial style. 16:9 aspect ratio.

06Photoreal

Candid documentary moment of [scene description] in [location], 35mm lens at f/2.8, available light only, slight grain, no posing, photojournalism aesthetic. 3:2 aspect ratio. Natural skin texture if humans visible.

07Photoreal

Symmetric architectural interior of [space], 24mm tilt-shift lens corrected verticals, soft north-facing daylight, neutral cool palette, generous negative space, editorial architecture photography. 3:2 aspect ratio.

08Photoreal

Top-down flat lay of [items list] arranged on [surface] with [filler elements], 50mm directly overhead, soft diffused daylight, muted [palette] palette, editorial product styling. 1:1 aspect ratio.

09Photoreal

Cinematic film still of [character] in [location] at [time of day], anamorphic 2.39:1, soft directional key light, deep contrast, color-graded for editorial film aesthetic. 21:9 aspect ratio.

10Photoreal

Hands-only lifestyle shot of [action with product] on [surface], 50mm at f/2, warm morning side-light from the left, shallow depth on the hands, lifestyle brand photography, face not visible. 4:5 aspect ratio.

Product Photography Prompts (11-20)

E-commerce-ready, catalog-grade product shots. Default to clean lighting and subtle drop shadows.

11Product

Studio product photography of [product] in [color/material], centered on a clean off-white seamless background, soft even softbox lighting from above, subtle drop shadow, soft floor reflection, 85mm prime lens. 1:1 aspect ratio. No text, no watermarks.

12Product

Hero shot of a single [product] floating against a deep [color] gradient background, rim lighting from behind, 100mm macro at f/8, glossy floor reflection, premium [category] aesthetic, soft volumetric haze. 4:5 aspect ratio.

13Product

Three-quarter view of [product] on a [surface material] surface with [contextual prop], soft window light from the upper-left, 85mm at f/2.8, shallow depth on the product. Editorial lifestyle context, not pure catalog. 4:5 aspect ratio.

14Product

Same [product] shown in [number] colorways arranged in a horizontal row, identical lighting and camera angle across all variants for consistent catalog look. Soft softbox lighting, off-white background, subtle drop shadows. 16:9 aspect ratio.

15Product

Macro detail of [product]'s [specific component or texture], 100mm macro at f/5.6, single directional light source, sharp focus on [detail], premium product photography aesthetic. 1:1 aspect ratio.

16Product

Lifestyle context shot of [product] being used by [hands or person] in [environment], 35mm at f/2.8, warm golden hour light, candid moment not posed. Photorealistic with natural skin texture. 4:5 aspect ratio.

17Product

Exploded view of [product] showing its [number] components separated and floating in space, clean white background, soft even lighting, technical product illustration aesthetic with photorealistic rendering. 1:1 aspect ratio.

18Product

In-context packaging shot of [product] sitting on [contextual surface] with [supporting props], soft afternoon side-light, 85mm at f/2.8. The [label/box] is clearly readable with [brand name] in [typography style]. 4:5 aspect ratio.

19Product

Hands-in-frame demonstration shot of [product action], shallow depth of field on the action, soft window light, editorial product photography. Hands visible but face not visible. 4:5 aspect ratio.

20Product

Product line family portrait of [product family with variants] arranged in [composition — pyramid, row, grid], identical brand identity across all items, clean studio lighting, premium brand catalog aesthetic. 16:9 aspect ratio.

Typography & Text-Heavy Prompts (21-30)

gpt-image-2's strongest superpower. Always wrap exact text in quotation marks.

21Typography

A minimal poster with the headline "[YOUR HEADLINE]" centered in massive thin serif type spanning the upper two-thirds, the subtitle "[YOUR SUBTITLE]" in small monospace below, and "[YOUR FOOTER]" in tiny tracked-out type at the bottom. [Background color] background, generous negative space. 2:3 aspect ratio.

22Typography

A single-page menu for [restaurant name and concept]. Headlines in [type style], body in [type style]. Three sections: [section 1], [section 2], [section 3]. Each item has name, description, and price. Subtle illustrated motif in the corner. 8.5x11 portrait. Print-ready.

23Typography

A book cover titled "[BOOK TITLE]" by [author name]. Title in [typography style] spanning the middle, author below in [secondary typography]. Background: [visual treatment] with [single graphic element]. Publisher "[IMPRINT]" at the bottom in tiny type. 6x9 portrait.

24Typography

A product label photographed on the product itself. Top line "[BRAND NAME]" in [style]. Middle "[PRODUCT NAME]" in [style]. Bottom "[VARIANT / WEIGHT / DATE]" in small mono. Studio lighting, photorealistic, 1:1 aspect ratio.

25Typography

A magazine cover for "[MAGAZINE NAME]". Masthead across the top in heavy display sans. Cover line "[HEADLINE]" running across the lower-third. Hero image: [image description]. Issue indicator "[VOL/NO/DATE]" in small type. 8.5x11 portrait.

26Typography

A multilingual launch banner with the same announcement stacked vertically in [number] languages: [language 1] "[TEXT]", [language 2] "[TEXT]", [language 3] "[TEXT]". Each language in appropriate display typography. Background: [palette]. 1:1 aspect ratio.

27Typography

A wayfinding sign mounted on a [material] post or wall, reading "[DESTINATION]" with a [direction] arrow, in clean grotesque sans-serif. [Color combination] type. Soft ambient lighting, photorealistic. 4:5 aspect ratio.

28Typography

A 1:1 Instagram quote graphic with "[QUOTE TEXT IN QUOTES]" in [typography style], broken into [number] lines across the middle. Attribution "— [ATTRIBUTION]" below in small caps. Background: [color/gradient]. No other elements.

29Typography

A printed ticket close-up on a [surface]. Reads top to bottom: "[HEADLINE]" in heavy serif, "[DATE/TIME]" in mono, "[VENUE/LOCATION]" in elegant sans, "[SEATING/PRICE]" at bottom. Soft overhead lighting. Photorealistic. 16:9 aspect ratio.

30Typography

A bold promotional poster with massive headline "[HEADLINE]" in heavy condensed sans spanning the full width, subtitle "[SUBHEAD]" across the middle in regular weight, and location "[VENUE/LOCATION]" at the bottom. Background: [visual treatment]. 4:5 aspect ratio.

Marketing & Brand Prompts (31-40)

Campaign-ready hero shots, social cards, and brand-locked visuals.

31Marketing

A hero campaign image for [brand and product]. Headline "[HEADLINE]" overlaid in [typography]. Subject: [product/scene] in [setting] with [lighting]. Tone: [3 adjectives]. Brand palette: [palette]. 16:9 aspect ratio.

32Marketing

A 1:1 Instagram square with [product or scene] centered, headline "[HEADLINE]" in upper-third, CTA "[CTA TEXT]" rendered as a [button style] at the bottom. Brand palette: [palette]. Photorealistic.

33Marketing

A 9:16 vertical Story/TikTok creative with [product] as the focal point, headline "[HEADLINE]" stacked across the top, social handle "[@HANDLE]" in the bottom-right. Mobile-first composition. Brand palette: [palette].

34Marketing

A 21:9 ultrawide hero banner. Left two-thirds: cinematic editorial photo of [scene]. Right one-third: clean panel with "[HEADLINE]" in heavy serif, "[SUBHEAD]" below, and "[CTA]" as a button. Brand palette: [palette].

35Marketing

A before-and-after split image. Left "BEFORE": [problem state description]. Right "AFTER": [improved state description]. Same setting, same perspective, same time of day, only [variable] changes. Photorealistic with consistent lighting.

36Marketing

A UGC-style smartphone photo of [person profile] holding [product] at arm's length in [casual setting]. Slight phone camera imperfections, natural daylight, candid framing. Looks like a real customer post not an ad. 4:5 aspect ratio.

37Marketing

An email header 600x300 pixel. Left half: [product photograph]. Right half: cream panel with "[ISSUE NUMBER]" tiny mono, "[HEADLINE]" tall serif, "[SUBHEAD]" body. Brand "[NEWSLETTER NAME]" at the bottom.

38Marketing

A square social card with [background palette] and the headline "[STATEMENT TEXT]" in large [type style], broken into [number] lines, centered. Attribution "— [SOURCE]" below in small caps. Brand mark "[BRAND]" at the bottom.

39Marketing

A launch announcement creative for [product] showing the product hero shot with "[NEW]" or "[LAUNCH DATE]" badge in upper-right corner, headline "[HEADLINE]" overlaid, brand palette throughout. Editorial e-commerce aesthetic. 4:5.

40Marketing

A seasonal campaign image evoking [season] for [brand]. Subject: [season-appropriate scene] with [product] integrated naturally. Color palette: [seasonal palette]. Mood: [3 adjectives]. 16:9 aspect ratio.

Character & Concept Art Prompts (41-50)

Character pipelines. Reference sheets first, scene generation second.

41Character

A character reference sheet showing [character description] in 5 views in a single row: front, three-quarter left, side, back, smiling front. Neutral grey background, soft even studio lighting. 16:9 aspect ratio.

42Character

An expression sheet for [character], same three-quarter angle, 8 expressions in 2x4 grid: neutral, smiling, laughing, concentrating, concerned, surprised, skeptical, tired. Maintain exact facial features across all. 16:9.

43Character

A costume design sheet for [character] in [number] outfits, full body front view, neutral pose across all. Each outfit labeled in small mono caps below the figure. Maintain exact facial features. 16:9 horizontal row.

44Character

[Thinking Mode with reference sheet] Generate [number] scenes of [character], maintaining exact facial features and [outfit]: scene 1: [description], scene 2: [description], scene 3: [description]. Editorial photography style. 4:5 each.

45Character

[Thinking Mode] A 9-panel storyboard 3x3 grid in [style — ink wash / clean line / watercolor]. Recurring character: [character]. Story panels: [9 panel descriptions]. Maintain exact character design across all. 1:1 overall.

46Character

Concept art views of a fictional [object type] showing [number] views in a horizontal row on a clean white background: [view 1], [view 2], [view 3]. Industrial design portfolio aesthetic. Studio product photography. 16:9.

47Character

A character key art piece showing [character] in a dramatic [setting] with [dramatic lighting]. Mid-shot composition, [emotion] expression, [costume]. Cinematic film poster aesthetic. 2:3 portrait aspect ratio.

48Character

A side-by-side comparison of [character] in [number] different lighting setups for a film lookbook: [setup 1 — e.g., golden hour], [setup 2 — e.g., blue hour], [setup 3 — e.g., harsh midday]. Maintain identical character and costume.

49Character

A brand mascot character design sheet for [brand]. Character traits: [physical descriptors], personality: [3 adjectives], color palette: [palette]. Show the character in 4 poses: hero pose, action pose, friendly pose, thinking pose. Clean illustration style.

50Character

A multi-character group portrait showing [number] characters from a story side-by-side, each with distinct [physical features and costume]. Editorial group photo aesthetic, soft even studio lighting, neutral background. 21:9.

Infographic & Data Visualization Prompts (51-60)

Use Thinking Mode for fact-grounded infographics. The model can web-search for current data.

51Infographic

[Thinking Mode] An editorial infographic titled "[TITLE]" with [number] large stat blocks across the middle, each with a giant number and one-line caption. Web-search for current [topic] figures. Source citation in tiny type at the bottom. 16:9.

52Infographic

A process diagram showing [number] steps in a horizontal flow. Each step has a number, a one-line label, and a small icon. Connecting arrows between steps. Brand palette: [palette]. 21:9 aspect ratio.

53Infographic

A side-by-side comparison infographic of [option A] vs [option B] with [number] comparison rows. Each row labeled with a category, with A and B columns showing values or checkmarks. Editorial design aesthetic. 4:5 portrait.

54Infographic

A timeline infographic showing [topic] across [date range] with [number] key milestones plotted along a horizontal axis. Each milestone has a date, label, and brief description. Brand palette: [palette]. 21:9.

55Infographic

A pie chart or donut chart infographic showing [topic breakdown] with [number] segments. Each segment labeled with category and percentage. Clean editorial design. Brand palette: [palette]. 1:1 aspect ratio.

56Infographic

A bar chart comparing [number] [items] across a single metric. Y-axis labeled "[metric]", X-axis labeled with items. Tallest bar in accent color. Clean editorial chart aesthetic. 16:9 aspect ratio.

57Infographic

A geographic infographic showing [data] mapped across [region]. Heat map style with darker shades indicating higher values. Legend in the bottom-right corner. Brand palette: [palette]. 4:3 aspect ratio.

58Infographic

A how-it-works diagram of [system/process] showing components labeled with arrows pointing to their location. Technical illustration style with clean line art. Background: cream paper texture. 4:3 aspect ratio.

59Infographic

A decision tree infographic for "[QUESTION]" with [number] branches and leaf outcomes. Clean flowchart style, each node a rounded rectangle, connecting lines clearly visible. Editorial diagram aesthetic. 16:9.

60Infographic

A statistic hero card with a single massive number "[NUMBER]" centered, label "[METRIC NAME]" below in smaller type, and one-line context "[CONTEXT]" at the bottom. Background: bold solid color. 1:1.

UI & App Mockup Prompts (61-70)

Mobile and web UI mockups with readable interface text. Thinking Mode helps with layout precision.

61UI/UX

A mobile app screen mockup for [app concept], shown on a [device model] mockup. Screen shows [screen description with specific UI elements]. Clean modern design language, [palette]. 9:16 aspect ratio.

62UI/UX

A web dashboard mockup shown in a browser frame. Sidebar nav on left with [items], main panel showing [content], header with [user info]. Clean SaaS design aesthetic. 16:9 aspect ratio.

63UI/UX

A mobile app onboarding flow showing 3 screens side-by-side: screen 1 [welcome], screen 2 [feature highlight], screen 3 [sign up CTA]. Each on a phone mockup. Brand: [name]. 16:9 horizontal layout.

64UI/UX

An e-commerce product detail page mockup shown in a browser. Product image left, details right with title "[PRODUCT]", price "[PRICE]", description text, "ADD TO CART" button. Clean modern e-comm aesthetic. 16:9.

65UI/UX

A SaaS pricing page mockup with [number] tiers side-by-side. Each tier card has tier name, price, feature checklist, CTA button. Recommended tier highlighted. Clean editorial design. 16:9 aspect ratio.

66UI/UX

A mobile messaging interface mockup showing a conversation between [two parties] about [topic]. Show last 5 messages with timestamps. Phone status bar visible at top. 9:16.

67UI/UX

A settings screen UI for [app name] showing toggleable preferences: [setting 1], [setting 2], [setting 3]. Each row has label, current value, and chevron. Clean iOS or Android-style design. 9:16.

68UI/UX

A web checkout flow mockup showing 3 steps: cart review, shipping, payment. Each step header with step indicator. Form fields clearly labeled. Brand: [name]. Clean e-comm design. 16:9.

69UI/UX

An analytics dashboard mockup showing [number] chart widgets: [chart 1 description], [chart 2 description], [chart 3 description]. Side panel with filters. Header with date range picker. 16:9.

70UI/UX

A device mockup hero shot showing [app name] running on [phone + laptop + tablet] in a creative composition. App's hero screen visible on each device. Soft studio lighting, premium tech product photography. 16:9.

Social Media Content Prompts (71-80)

Scroll-stopping social content for Instagram, TikTok, LinkedIn, Pinterest.

71Social

An Instagram carousel cover slide with bold question "[QUESTION]" in heavy display type, intriguing visual hook in background, "SWIPE →" indicator in bottom-right. Brand: [name]. 1:1.

72Social

A LinkedIn post graphic with editorial business aesthetic. Headline "[INSIGHT]" in clean sans, supporting subhead, author byline with role at bottom. Brand palette: [palette]. 4:5 portrait.

73Social

A TikTok thumbnail 9:16 with explosive visual hook, massive text overlay "[HOOK COPY]" in white outlined heavy sans, subject [description] dominating frame. Mobile-optimized composition.

74Social

A Pinterest pin 2:3 portrait, recipe or how-to card style. Top image: [hero shot]. Bottom panel: title "[TITLE]" in heavy sans, source URL "[URL]" in small mono. Brand: [name].

75Social

An Instagram Reel cover 9:16 with subject [description] in dynamic action pose, text overlay "[HOOK]" massive at top, brand handle "@[HANDLE]" at bottom-right. High contrast for thumbnail visibility.

76Social

A Twitter/X header banner 3:1 ultrawide with editorial brand mood image, brand mark "[BRAND]" in upper-left, optional tagline "[TAGLINE]" across the lower-third.

77Social

A YouTube thumbnail 16:9 with subject's face dominating left half showing strong emotion, massive text "[HOOK]" filling right half in bold outlined sans, high contrast palette for click-through optimization.

78Social

An Instagram Story 9:16 quote card with text "[QUOTE]" in serif, attribution "— [SOURCE]" below in small caps, background: [palette gradient], swipe-up indicator at bottom.

79Social

A 1:1 testimonial graphic with customer quote "[QUOTE]" in serif, customer photo circle at bottom-left, customer name "[NAME]" and role "[ROLE]" right of photo, brand mark top-right.

80Social

A series of 5 connected Instagram Story frames each 9:16, telling a sequential story about [topic]. Frame 1: hook. Frames 2-4: content. Frame 5: CTA. Consistent visual style across all 5.

Poster & Print Design Prompts (81-90)

Print-ready posters, event flyers, packaging. Use Thinking Mode for complex multi-element layouts.

81Print

An event poster for "[EVENT NAME]" on [date] at [venue]. Massive title in [type style], date and venue below in smaller mono, hero illustration or photo dominating the upper half. 2:3 portrait. Print-ready.

82Print

A movie-poster-style key art for fictional film "[TITLE]". Main character mid-frame, dramatic [lighting] background, title in massive display below subject, tagline above title, credit block at bottom. 27x40 portrait.

83Print

A vintage-style travel poster for [destination] in 1960s mid-century illustration style. Title "[DESTINATION]" in tall thin serif at top, illustrated landscape in middle, subtitle "[TAGLINE]" at bottom. 2:3.

84Print

A minimal exhibition poster on [background color]. Title "[EXHIBITION TITLE]" massive thin serif spanning upper two-thirds. Subtitle "[VENUE / DATE]" small mono below. Generous negative space. 2:3.

85Print

A product packaging design for [product] in [container type]. Brand "[BRAND]" prominent, product name "[PRODUCT]" below, ingredient or feature callouts on side panel, regulatory text at bottom. Photorealistic mockup. 4:5.

86Print

A vinyl record album cover for fictional artist "[ARTIST]" album "[ALBUM]". Style: [music genre aesthetic]. Cover art: [description]. Artist name and album title in [typography]. 1:1 aspect ratio.

87Print

A business card design for [name] at [company], [role]. Front: name, role, contact details. Back: brand mark dominating. Clean editorial design, [palette]. Standard 3.5x2 inch card mockup.

88Print

A wine label design for "[VINEYARD]" [variety] [vintage]. Vineyard name in elegant serif at top, illustrated motif of [landscape element], variety name and vintage below, regulatory text in tiny type. Print-ready. 4:5.

89Print

A magazine spread mockup showing two facing pages. Left page: full-bleed photograph. Right page: editorial article layout with headline "[HEADLINE]", deck "[DECK]", body copy in 2 columns, pull quote. 16:9 spread.

90Print

A typographic poster with only text — no images. Massive quote "[QUOTE]" in display serif filling most of the canvas, attribution in small caps at bottom. Background: solid [color]. Editorial design. 2:3 portrait.

Edit & Transform Prompts (91-100)

For the images.edit endpoint. Each assumes one or more reference images attached.

91Edit

[Edit endpoint, 1 reference] Replace the background with a clean off-white seamless studio backdrop with subtle drop shadow under the subject. Preserve subject, lighting, and colors exactly.

92Edit

[Edit endpoint, 1 reference] Remove the [object/person] in the [location] of the frame. Reconstruct the background that would naturally be visible. Preserve everything else exactly.

93Edit

[Edit endpoint, 1 reference + mask] Change the masked [element] from [current color] to [new color]. Match original lighting, texture, and material. Preserve every other element including face, hair, and background.

94Edit

[Edit endpoint, 1 reference] Change the lighting from [current condition] to [target condition]. Adjust color temperature, shadow direction, and atmospheric quality accordingly. Preserve subject and composition.

95Edit

[Edit endpoint, 1 reference + outpaint mask] Extend the canvas to [new aspect ratio] by adding scene to the [direction]. Continue the existing environment naturally. Match existing lighting and color grade.

96Edit

[Edit endpoint, 2 references: portrait + style] Generate a new portrait of the subject in image A, restyled in the visual aesthetic of image B. Match palette, framing, and mood. Preserve subject's identity.

97Edit

[Edit endpoint, 4 references of same character] Generate a new image of this character in [new scene description] wearing [specified outfit from references]. Preserve facial features, hair, and clothing exactly.

98Edit

[Edit endpoint, 3 references: template + logo + typography] Compose a product mockup using shape from image A, logo from image B placed [location], typography style from image C for text "[PRODUCT NAME]" placed [location].

99Edit

[Edit endpoint, 1 reference + mask on text region] Replace the masked text with "[NEW TEXT]" in the same typography style, weight, and color as the original. Preserve surrounding image exactly.

100Edit

[Edit endpoint, 1 reference of menu/document in source language] Translate every text element to [target language] preserving typography, layout, paper texture, and prices exactly. Do not change visual design.

Pro tip

Copy all 100 into a markdown file at /prompts/chatgpt-images-2.md. Reference them by number when working: "Run prompt 14 with my brand details." Pair with the Skills Library below to compress your daily workflow.

Reference: 5 Reusable Skill Packages

The ChatGPT Images 2.0 Skill Library

Five paste-ready instruction blocks. Drop each into a ChatGPT Custom GPT's instructions field, into the system message of an API call, or as a pinned message at the top of a chat session. Each skill encodes a repeatable workflow so you stop re-explaining the same constraints on every prompt.

Skill 01

Brand Voice Locker

Pins your visual brand identity to every image generated in the session. Replace bracketed values with your specifics. Paste once at the start of a chat or save as a Custom GPT.

You are an image art director for [BRAND NAME]. BRAND IDENTITY (apply to every image): - Brand name: [BRAND NAME] - Positioning: [one-sentence positioning] - Audience: [target audience] - Tone: [3 adjectives — e.g., calm, premium, intentional] VISUAL SYSTEM (apply to every image): - Primary color: #[hex] - Accent color: #[hex] - Neutral: #[hex] - Background: #[hex] - Typography family for display: [type style — e.g., heavy condensed serif] - Typography family for body: [type style] - Photography style: [editorial / documentary / studio / lifestyle] - Lighting bias: [warm / neutral / cool] + [hard / soft / mixed] - Color grading: [subtle / saturated / desaturated] BANNED in every image: - AI-cliche adjectives (stunning, hyper-realistic, breathtaking, amazing, masterpiece, award-winning) - Generic stock-photo compositions - Excessive HDR processing - Plastic-smooth skin if humans visible - [Add brand-specific banned patterns] REQUIRED in every image: - Natural skin texture if humans visible - Subtle drop shadow on white-background product shots - [Add brand-specific required patterns] When the user describes an image, apply the 7-Part Prompt Formula (subject, style, composition, lighting, camera, mood, constraints) and silently incorporate the brand identity above into the prompt before generating. If a user request conflicts with the brand voice, ask one clarifying question before proceeding.
Skill 02

Photorealism Enforcer

Forces specificity in lighting, lens, and material every time. Eliminates the four common photoreal failure modes.

You are a senior commercial photographer turning written briefs into image prompts. For every image request, ensure the final prompt includes ALL of these: 1. LIGHTING (must specify): - Direction (from where: above, left, right, behind) - Hardness (hard, soft, diffused) - Color temperature (golden hour 2700K / daylight 5600K / overcast 6500K / blue hour) - Atmospheric quality (clean, hazy, foggy, dusty) 2. CAMERA AND LENS (must specify): - Focal length (24mm wide / 35mm documentary / 50mm natural / 85mm portrait / 100mm macro / 200mm telephoto) - Aperture (f/1.4 to f/2.8 shallow / f/4 mid / f/8 deep) - Camera height (low angle / eye level / high angle / overhead) 3. MATERIAL DESCRIPTORS (replace generic adjectives with concrete materials): - Replace "metal" with: brushed aluminum, patinated brass, polished chrome, oxidized copper - Replace "wood" with: raw oak, weathered walnut, lacquered cherry, reclaimed pine - Replace "fabric" with: cream linen with visible weave, charcoal merino wool, raw silk, brushed canvas 4. ANTI-FAILURE PATTERNS (always add): - "Natural skin texture, visible pores, subtle imperfections, no skin smoothing" if humans visible - "Subtle contrast, gentle midtones, no HDR processing, color-graded for editorial" - "Specify gaze direction explicitly" if humans visible - "Subtle drop shadow, soft floor reflection" instead of pure white background 5. BANNED ADJECTIVES (remove from every prompt before generating): stunning, hyper-realistic, breathtaking, amazing, beautiful, professional, high-quality, masterpiece, award-winning, epic, ultra-detailed, 8K, photorealistic (redundant — model is already photoreal) Apply the 7-Part Prompt Formula and rewrite any user request that is missing the lighting/camera/material specificity above. Ask before generating only if the subject is ambiguous; otherwise rewrite silently and generate.
Skill 03

Typography Validator

Maximizes gpt-image-2's biggest superpower. Ensures every text-bearing prompt renders correctly on the first try.

You are a typography-focused art director for image generation. RULE 1: Every literal text string the user wants rendered must be wrapped in quotation marks in the final prompt. Unquoted text is description. Quoted text is content. RULE 2: For every quoted text string, also specify: - Type family character (heavy sans / humanist serif / geometric grotesque / monospace / display serif / hand-lettered) - Weight (thin / light / regular / medium / bold / black) - Treatment (all caps / small caps / lowercase / italic / condensed) - Layout position (centered / left-aligned / upper-third / lower-right / etc.) - Approximate scale relative to other elements (massive / large / small / tiny) RULE 3: For multi-element layouts (poster, menu, magazine cover): - List every text element with its own position and styling - Specify hierarchy: which element is the focal point, which is secondary - Mention negative space requirements explicitly RULE 4: For text strings longer than 80 characters: - Break into multiple shorter lines explicitly - Or move to a body-text region with specified line breaks - Or warn the user that long text may drift and offer to break it RULE 5: For multilingual prompts: - Confirm the target language(s) - Specify appropriate display typography for each script (e.g., brush calligraphy for Japanese, classical seal for Chinese) - Render each language in its native script, not transliterated RULE 6: For brand-critical proprietary typography: - Warn the user that gpt-image-2 produces the genre (humanist sans, Trajan-style serif) but not their literal brand font - Suggest generating with placeholder text and compositing real typography in post When a user describes a text-bearing image, apply the 7-Part Prompt Formula plus all 6 rules above. Ask one clarifying question only if the literal text string is missing. Otherwise generate.
Skill 04

Character Bible

For multi-scene character work. Locks character identity so every subsequent image preserves the same person across scenes.

You are a character continuity supervisor for a multi-scene image project. CHARACTER BIBLE (apply to every image in this session): PRIMARY CHARACTER — [CHARACTER NAME]: - Age range: [age] - Gender presentation: [presentation] - Hair: [length, texture, color, part] - Eyes: [color, shape] - Skin: [tone, texture details — freckles, etc.] - Build: [height, build description] - Distinguishing features: [scars, tattoos, glasses, jewelry] DEFAULT OUTFIT: - Top: [garment with material and color] - Bottom: [garment with material and color] - Footwear: [type, material, color] - Accessories: [items] VOICE / PERSONALITY (informs facial expression and body language): - 3 personality adjectives: [list] - Typical expression: [neutral / warm / focused / wry] WORKFLOW RULES: 1. The first request in any new project must generate a reference sheet showing the character in 5 views (front, three-quarter left, side profile right, back, smiling front). 2. Every subsequent prompt must reference the character bible above. Restate the key features in each prompt (do not assume the model remembers across calls). 3. When generating multi-scene continuity (more than 2 images), use Thinking Mode and pass the reference sheet as input. 4. For new outfits, generate a costume sheet first, then use those outfits as locked variants in subsequent prompts. 5. For brand-critical character work, plan a final Photoshop pass to fix the inevitable drift in the last 10% of detail. When the user requests a new scene, build the prompt using the 7-Part Formula with the character bible incorporated.
Skill 05

Batch Consistency Rules

For multi-image campaigns. Ensures every image in a batch holds the same brand identity, lighting, and aesthetic.

You are a campaign batch supervisor for multi-image marketing projects. BATCH IDENTITY (apply to every image in this batch): VISUAL CONSTANTS: - Brand: [BRAND NAME] - Color palette: [3-4 hex codes] - Typography style: [display + body characters] - Photography style: [editorial / documentary / studio / lifestyle] - Lighting feel: [warm / cool / neutral] + [hard / soft] - Color grading: [subtle / saturated / desaturated] - Camera POV: [eye-level / overhead / low angle / mixed] VARIABLE ELEMENTS (these change per image): - Subject / product - Aspect ratio - Specific composition BATCH WORKFLOW: 1. Before generating, list every image in the batch with: - Image number - Subject of that image - Aspect ratio - One-line composition note - Confirm with the user before generating 2. Use Thinking Mode for any batch of 4 or more images. Pass the first finished image as a style anchor reference into subsequent prompts. 3. For each prompt, structure the prompt as: [BATCH IDENTITY constants restated] [Image-specific subject + composition] [Aspect ratio] [Match the visual feel of image 01] 4. After every batch of 4 images, audit the set for drift: - Color grading consistency - Lighting direction consistency - Brand palette adherence - Typography style consistency Note any drift and suggest fixes before continuing. 5. For long campaigns (8+ images), build a brand mood board first by generating 1-2 anchor images, then reference those as style inputs for every subsequent generation. When the user requests a campaign, lay out the full batch plan before generating any image. Ask one clarifying question only if the batch scope is ambiguous.
Installation

Three ways to use each skill: (1) Paste into the instructions field of a ChatGPT Custom GPT and save. (2) Paste as the system message in your API request. (3) Paste as the first message in a new ChatGPT chat session. Method 1 is the most repeatable for daily use. Method 2 is the right path for scripted pipelines. Method 3 is fine for one-off projects.

Reference: 8 Production Templates

The ChatGPT Images 2.0 Template Library

Eight tested production patterns covering the most common image generation jobs. Each template is a complete structure you can ask Claude or ChatGPT to fill in with your specifics. Save them as your daily reference.

Template 01

Product Packaging Mockup

Product hero on contextual surface with readable label, ingredient panel, and regulatory text. Print-ready packaging mockup.

Best for: DTC product launches, packaging design
Template 02

UGC-Style Ad

Smartphone-camera-feel customer photo with slight imperfections. Reads as a real user post, not a studio shoot.

Best for: Meta/TikTok ads, social proof creative
Template 03

Magazine Cover

Editorial masthead with cover lines, hero image, issue indicator, and full typography hierarchy.

Best for: Brand storytelling, content covers
Template 04

Infographic Hero Card

Title + stat blocks + source citation. Thinking Mode + web search for fact-grounded numbers.

Best for: Data-driven content, B2B marketing
Template 05

App Screen Mockup

Mobile or web UI shown on device frame with realistic readable interface text and navigation.

Best for: SaaS pitch decks, app launch materials
Template 06

Character Reference Sheet

Five-view character lineup on neutral background. Foundation for every multi-scene character pipeline.

Best for: Story art, brand mascot work, concept design
Template 07

Before/After Pair

Split-frame transformation story with consistent lighting and perspective across both halves.

Best for: Skincare, home services, fitness, renovation
Template 08

Multi-Panel Comic Strip

6-9 panel sequential storytelling with character continuity. Thinking Mode required for true continuity.

Best for: Explainer content, brand storytelling, education

How to invoke a template

To use a template by name in any ChatGPT or Claude conversation, paste this pattern:

Generate an image using Template 01 (Product Packaging Mockup) from the DVA ChatGPT Images 2.0 Masterclass. Product: [your product] Container type: [box / bottle / jar / pouch] Brand: [name, palette, typography style] Context surface: [marble / wood / fabric / etc.] Required text: [exact strings to render on label] Apply the Brand Voice Locker skill if it has been loaded. Output as a single high-quality image at [aspect ratio].
Reference: Cost Calculator

What gpt-image-2 Actually Costs

Token-based billing means the per-image number changes with prompt complexity. The estimates below are pulled from OpenAI's pricing calculator for 1024×1024 outputs. Run your own workload through the calculator before forecasting a production budget. Edit requests cost more than text-to-image due to high-fidelity reference processing.

Per-image cost estimates by quality and resolution for gpt-image-2 as of May 2026. Verify current rates on the official OpenAI pricing page before budgeting.
Quality 1024x1024 1024x1536 (portrait) 4K via fal.ai Use case
Low ~$0.006 ~$0.005 ~$0.01 Drafts, exploration, bulk variant generation
Medium ~$0.053 ~$0.041 ~$0.10 Day-to-day production assets
High ~$0.211 ~$0.165 ~$0.41 Final deliverables, print-ready, hero shots

Raw token rates

  • Image input: $8 per million tokens
  • Cached image input: $2 per million tokens
  • Image output: $30 per million tokens
  • Text input: $5 per million tokens
  • Cached text input: $1.25 per million tokens
  • Text output: $10 per million tokens
  • Batch API: halves all rates if you can tolerate 24-hour latency

Monthly budget examples (rough)

  • Solo creator, 100 images/month at medium quality: ~$5 in API costs
  • Marketing team, 1,000 images/month mixed quality: ~$80-150 in API costs
  • Agency, 5,000 images/month with edits: ~$500-1,200 in API costs
  • Production pipeline, 50,000 images/month with batching: ~$2,500-5,000 in API costs

Compare to a single stock photo license at $50-500 or a single commercial photoshoot at $2,000-50,000 and the math is obvious.

Reference: 35-Item Production Checklist

The Pre-Ship Quality Audit

Run every item before any gpt-image-2 output ships to a real customer-facing channel. If you cannot tick every relevant box, the image is not ready. Print this or paste into your project's quality control doc.

  • PromptThe 7-Part Formula was used (subject, style, composition, lighting, camera, mood, constraints)
  • PromptZero AI-cliche adjectives (stunning, hyper-realistic, breathtaking, masterpiece)
  • PromptConcrete materials specified instead of generic categories (brushed aluminum not metal)
  • PromptLighting direction, hardness, and color temperature all specified
  • PromptCamera focal length and aperture specified
  • CompositionNegative space intentional, not accidental
  • CompositionFocal point clear within 2 seconds of viewing
  • CompositionAspect ratio matches the intended deployment surface
  • CompositionNo accidental object collisions or weird overlaps
  • LightingLight direction is consistent with shadows
  • LightingNo HDR over-processing artifacts
  • LightingColor temperature appropriate to scene (no warm window light + cool ambient mixing accidentally)
  • Text RenderingEvery text string in the image is spelled correctly
  • Text RenderingTypography hierarchy is clear (headline larger than subhead)
  • Text RenderingLong strings broken into appropriate line lengths
  • Text RenderingMultilingual text uses appropriate native script and typography
  • EditingIf using edits endpoint, masks dimensions match source exactly
  • EditingReference images downscaled to appropriate size before upload
  • EditingFor character continuity, reference sheet attached as input
  • EditingBrand-critical regions composited in via Photoshop where pixel-exact needed
  • QualityResolution matches deployment requirement (no 4K for thumbnails)
  • QualityNatural skin texture if humans visible
  • QualityNo plastic-smooth artifacts
  • QualityNo anatomical errors (extra fingers, asymmetric eyes, etc.)
  • CostDrafted at low quality before finalizing at high
  • CostProduction pipeline uses Batch API where 24h latency acceptable
  • CostRepeated brand context cached via prompt caching
  • CostAPI model pinned to snapshot ID for production
  • CostRate limit headroom verified for peak traffic
  • OutputImage carries C2PA provenance metadata (AI-generated flag)
  • OutputCommercial use confirmed under OpenAI terms for the deployment context
  • OutputLocal advertising regulations checked for regulated categories
  • OutputImage file format and compression appropriate to deployment
  • OutputAlt text drafted for accessibility before posting
  • OutputSource prompt saved for reproducibility and future iteration
Frequently Asked Questions

Common questions about ChatGPT Images 2.0.

Continue Your Vibe Coding Path

You finished a 10-module masterclass. Now generate something.

The DDS Vibe Academy ships flagship masterclasses on AI-first development, AI image generation, headless commerce, and synthetic-employee architectures. Return to the hub for the full library, or dive into the companion masterclasses on Claude and Hydrogen.