Tombernail — Two-step AI thumbnail framework

The Problem

AI regenerates your face. You lose.

Every one-shot AI thumbnail generation has the same failure mode: the model doesn't composite your face, it invents one from its training data. Same rough structure, wrong identity.

One-shot generation

Your face goes in. A stranger comes out.

Prompt: "YouTube thumbnail with my face, dark background, bold title." The model generates a face that looks vaguely like you. It is not you. Wrong bone structure. Wrong skin tone. The more you describe your face, the more confident the AI gets — and the wronger it gets.

The Tombernail way

Separate layout from identity. Both win.

Step 1: generate everything except faces. Dark backdrop, title text, UI elements — iterate freely with zero identity drift risk. Step 2: lock the base image and composite your real face reference on top. The model treats it as a photographic cutout, not source material to redraw.

The Framework

Three steps. Two prompts. One thumbnail.

Tombernail splits the generation problem from the identity problem. Run Step 1 and Step 2 as two separate gpt-image-1 sessions.

Generate

Write the Base Layer prompt

Describe your backdrop, title text, badge, and any UI elements. Explicitly exclude all faces. Call out the empty zone where your face will go. Run this prompt with no attachments. Save the output image — this is your locked base.

Composite

Write the Composite prompt

Attach your Step 1 base image + your face reference photo. Instruct the model to preserve the base exactly and treat your face as a photographic cutout — not source material. Describe lighting integration and position. Include the Face Preservation Directive.

Run

Execute in gpt-image-1 or gpt-image-2

Run Step 1 first with no attachments. Open image edit mode. Attach the base image + your face reference. Paste the composite prompt. If the face drifts, re-run Step 2 — your base is already locked and won't change.

Prompt Templates

Copy. Fill in the blanks. Run.

Both templates use [PLACEHOLDERS] for anything channel-specific. Replace them with your own style, colors, and name.

Step 1 — Base Layer Template

YouTube thumbnail BASE LAYER, 16:9. NO PEOPLE. NO FACES.

BACKDROP:
[YOUR_BACKGROUND — e.g., "Pure black #0A0A0A, no texture" or "Dark cinematic
data center, warm amber light from above, heavy vignette at edges"]

TEXT — TITLE (two lines, bold condensed sans-serif, ALL CAPS, [center/right]-aligned):
- Line 1 ([white or your color]): "[SETUP LINE — the context or situation]"
- Line 2 ([YOUR_ACCENT_COLOR]): "[PAYOFF LINE — the hook or punchline]"
Each line ~25–30% of canvas height. Tight tracking.

[OPTIONAL — BADGE/LABEL:]
[POSITION, e.g., top-right corner]: solid [YOUR_COLOR] rounded pill,
"[LABEL TEXT]" in bold black condensed sans-serif. ~8% canvas height.

[OPTIONAL — LOGO:]
[Top-left: composite the attached channel wordmark PNG at ~12% frame width,
inset 3% from edges. Do NOT redraw — use the attached image as-is.]

[OPTIONAL — SCENE / UI ELEMENTS:]
[Floating UI cards, data panels, atmospheric effects. Be specific about
size, position, tilt, and what UI content to show.]

CRITICAL NEGATIVES:
- NO human face or figure of any kind
- [LEFT / RIGHT] [X]% of frame: intentionally dark — reserved for face composite
- NO text other than the title lines and badge

STYLE:
[YOUR_AESTHETIC — e.g., "photoreal cinematic, editorial" or "bold kinetic
content-creator energy, dark background with warm yellow glow"]

Step 2 — Composite Template

Composite [YOUR_NAME]'s face onto the attached base thumbnail without altering
the title, backdrop, or any other base layer elements.

ATTACHED IMAGES:
1. Base layer (Step 1 output — locked canvas, preserve exactly)
2. [YOUR_NAME] face reference — clean shoulders-up portrait

FACE TO ADD:
[YOUR_NAME] — positioned in the [LEFT / RIGHT] [X]% of the frame,
shoulders-up, facing slightly [left / right] toward the title.
Expression: [describe emotion — e.g., "wide excited grin", "open-mouth shocked
face", "confident direct camera look"].

IDENTITY PRESERVATION — non-negotiable:
Use the attached face reference as the identity-locked source. Preserve
[YOUR_NAME]'s facial identity EXACTLY — bone structure, eye shape and color,
nose, mouth, skin tone, hair. Do NOT stylize, slim, smooth, or beautify.
Treat as a photographic cutout. Relight with the scene's existing
[warm / cool] light from the [direction — e.g., "right"].
If you cannot preserve identity perfectly, leave the face area blank.

LIGHTING INTEGRATION:
[YOUR_NAME]'s face is lit by [the scene's light source — e.g., "the laptop screen,
cool blue from the right — left side in shadow"]. No flat lighting. No green spill.
Clean cutout edges.

DO NOT regenerate or move:
- Title text (both lines, exact text, font, colors)
- [Badge / label]
- Background scene and lighting
- [Any other base layer elements]

OUTPUT: 16:9, same dimensions as base.

Face Preservation Directive

Paste this when face fidelity is at risk.

Drop this block at the top of any composite prompt. It explicitly instructs the model not to regenerate — only to composite. Removes ambiguity about what "use my photo" means.

🔒 FACE PRESERVATION (highest priority):
Treat this as a COMPOSITE task, not a generation. Use the person from the
attached reference photo EXACTLY as they appear. Do NOT regenerate, redraw,
stylize, smooth, AI-enhance, or alter their face, skin, hair, or features
in any way. Only the background, lighting environment, and color grading may
be generated. The face must be pixel-faithful to the reference.

Final check: if the face looks "AI-rendered" or "stylized," regenerate.

Steal This Skill

Install in Claude Code. Type /tombernail.

This is the full installable skill file — a generic version of the framework, stripped of any channel-specific branding. Drop it in and type /tombernail in any Claude Code conversation to generate your thumbnail prompts.

Copy the skill file below. Save it as tombernail.md

Drop it in ~/.claude/skills/tombernail.md for global access, or .claude/skills/tombernail.md inside any project.

Open Claude Code and type /tombernail followed by your video title or script path. Claude generates base + composite prompts ready to paste into gpt-image-1.

tombernail.md — Claude Code skill

# /tombernail — Two-step AI thumbnail framework

**Trigger:** `/tombernail`, `thumbnail prompts for [video]`, `make thumbnail for [title]`
**Input:** Video title + one of: script path, topic summary, or key talking points
**Output:** Two paste-ready prompts per concept (base layer + composite) for gpt-image-1 or gpt-image-2
**Model:** Claude Sonnet

---

## Core Principle

**Title and thumbnail are a two-part joke. They do not repeat each other.**

| Surface | Job | Sells |
|---|---|---|
| **Title** | Table of contents | Breadth — the specific topics/beats |
| **Thumbnail** | One provocative hook | Curiosity gap — one question or bold claim |

Before writing any prompt, define:
1. What does the title say?
2. What should the thumbnail say DIFFERENTLY?
3. What is the one desire, pain, or curiosity this video triggers?

---

## Why Two Steps?

One-shot AI generation always fails on faces. The model regenerates your
face from training data — wrong identity. You lose.

Step 1 (Base Layer): Generate everything except faces. Iterate freely.
Step 2 (Composite): Lock the base. Attach real face reference. The model
treats it as a photographic cutout, not source material.

---

## Workflow

Step 1: ANALYZE
  Read the script, outline, or key talking points.
  Extract: hook tension, biggest story, title↔thumbnail split.
  What is the ONE thing the thumbnail should say that the title doesn't?

Step 2: WRITE TWO PROMPTS PER CONCEPT
  BASE LAYER: backdrop + text + UI elements — NO faces, NO logos
  COMPOSITE: lock base exactly, add face cutout + optional brand logo

Step 3: USER RUNS IN gpt-image-1 / gpt-image-2
  Run Step 1 prompt first (no attachments)
  Save the output as the base image
  Open image edit mode, attach base + face reference photo
  Paste Step 2 prompt → final thumbnail

Produce 2–3 concepts. Each concept gets its own pair of prompts.

---

## Base Layer Template

YouTube thumbnail BASE LAYER, 16:9. NO PEOPLE. NO FACES.

BACKDROP:
[YOUR_BACKGROUND — e.g., "Pure black #0A0A0A" or "Dark cinematic scene,
warm amber lighting, heavy vignette at edges"]

TEXT — TITLE (two lines, bold condensed sans-serif, ALL CAPS, [center/right]-aligned):
- Line 1 ([white or your color]): "[SETUP LINE — context or situation]"
- Line 2 ([YOUR_ACCENT_COLOR]): "[PAYOFF LINE — hook or punchline]"
Each line ~25–30% canvas height. Tight tracking.

[OPTIONAL — BADGE:] [POSITION]: solid [YOUR_COLOR] rounded pill,
"[LABEL]" in bold black condensed sans-serif. ~8% canvas height.

[OPTIONAL — LOGO:] Composite attached wordmark PNG at ~12% frame width,
top-left, inset 3%. Do NOT redraw — use attached image as-is.

[OPTIONAL — SCENE/UI:] [Describe floating UI cards, data panels,
atmospheric effects — specific about size, position, tilt.]

CRITICAL NEGATIVES:
- NO human face or figure
- [LEFT/RIGHT] [X]% of frame: intentionally dark — reserved for composite
- NO text other than title and badge
- [YOUR_EXCLUDED_ELEMENTS]

STYLE: [YOUR_AESTHETIC]

---

## Composite Template

Composite [YOUR_NAME]'s face onto the attached base thumbnail without
altering the title, backdrop, or any base layer elements.

ATTACHED IMAGES:
1. Base layer (Step 1 output — locked canvas, preserve exactly)
2. [YOUR_NAME] face reference — clean shoulders-up portrait

FACE TO ADD:
[YOUR_NAME] — [LEFT/RIGHT] [X]% of frame, shoulders-up, facing slightly
[left/right] toward the title. Expression: [emotion matching the hook].

IDENTITY PRESERVATION — non-negotiable:
Preserve [YOUR_NAME]'s facial identity EXACTLY — bone structure, eye shape
and color, nose, mouth, skin tone, hair. Treat as a photographic cutout.
Relight with scene's existing [warm/cool] light from [direction].
If identity cannot be preserved, leave the face area blank.

LIGHTING: Face lit by [scene's light source]. No flat lighting. No green
spill. Clean cutout edges.

DO NOT regenerate or move: title text, [badge], background scene,
[all other base layer elements].

OUTPUT: 16:9, same dimensions as base.

---

## Face Preservation Directive

Paste at the top of any composite prompt when face fidelity is at risk:

🔒 FACE PRESERVATION (highest priority):
Treat this as a COMPOSITE task, not a generation. Use the person from
the attached reference photo EXACTLY as they appear. Do NOT regenerate,
redraw, stylize, smooth, AI-enhance, or alter their face, skin, hair,
or features. Only background, lighting, and color grading may be generated.
The face must be pixel-faithful to the reference.
Final check: if the face looks "AI-rendered" or "stylized," regenerate.

---

## Quality Checklist

- [ ] Title and thumbnail say different things — thumbnail adds a new hook
- [ ] Base layer has zero faces and zero title text
- [ ] Badge / wordmark in base layer if applicable
- [ ] Empty zone called out (left/right X% intentionally dark)
- [ ] Identity preservation directive in composite prompt
- [ ] Lighting integration described
- [ ] DO NOT touch list covers all base layer elements
- [ ] Text readable at 320×180px (mobile thumbnail minimum)
- [ ] Max 3 visual elements in base layer

Works with any version of Claude Code (claude-sonnet-4-6 or newer recommended). No API keys or setup beyond saving the file.

What You Need

To run Tombernail, you need three things.

✓ A face reference photo. Clean shoulders-up portrait, facing forward. The cleaner, the better. One photo per host or creator.

✓ gpt-image-1 or gpt-image-2 access. Via ChatGPT Plus (image generation), the OpenAI Playground, or the API with image editing enabled.

✓ Claude Code (optional, for the /tombernail skill). Or just copy the templates above and paste directly into any LLM conversation.

Stop losing your face in AI thumbnails.