Open framework
Tombernail is a two-step prompt framework for YouTube creators. Generate the background and title text first — no faces. Then lock the base and composite your real photo on top. Identity preserved. Every time.
The Problem
Every one-shot AI thumbnail generation has the same failure mode: the model doesn't composite your face, it invents one from its training data. Same rough structure, wrong identity.
One-shot generation
Prompt: "YouTube thumbnail with my face, dark background, bold title." The model generates a face that looks vaguely like you. It is not you. Wrong bone structure. Wrong skin tone. The more you describe your face, the more confident the AI gets — and the wronger it gets.
The Tombernail way
Step 1: generate everything except faces. Dark backdrop, title text, UI elements — iterate freely with zero identity drift risk. Step 2: lock the base image and composite your real face reference on top. The model treats it as a photographic cutout, not source material to redraw.
The Framework
Tombernail splits the generation problem from the identity problem. Run Step 1 and Step 2 as two separate gpt-image-1 sessions.
Describe your backdrop, title text, badge, and any UI elements. Explicitly exclude all faces. Call out the empty zone where your face will go. Run this prompt with no attachments. Save the output image — this is your locked base.
Attach your Step 1 base image + your face reference photo. Instruct the model to preserve the base exactly and treat your face as a photographic cutout — not source material. Describe lighting integration and position. Include the Face Preservation Directive.
Run Step 1 first with no attachments. Open image edit mode. Attach the base image + your face reference. Paste the composite prompt. If the face drifts, re-run Step 2 — your base is already locked and won't change.
Prompt Templates
Both templates use [PLACEHOLDERS] for anything channel-specific. Replace them with your own style, colors, and name.
YouTube thumbnail BASE LAYER, 16:9. NO PEOPLE. NO FACES. BACKDROP: [YOUR_BACKGROUND — e.g., "Pure black #0A0A0A, no texture" or "Dark cinematic data center, warm amber light from above, heavy vignette at edges"] TEXT — TITLE (two lines, bold condensed sans-serif, ALL CAPS, [center/right]-aligned): - Line 1 ([white or your color]): "[SETUP LINE — the context or situation]" - Line 2 ([YOUR_ACCENT_COLOR]): "[PAYOFF LINE — the hook or punchline]" Each line ~25–30% of canvas height. Tight tracking. [OPTIONAL — BADGE/LABEL:] [POSITION, e.g., top-right corner]: solid [YOUR_COLOR] rounded pill, "[LABEL TEXT]" in bold black condensed sans-serif. ~8% canvas height. [OPTIONAL — LOGO:] [Top-left: composite the attached channel wordmark PNG at ~12% frame width, inset 3% from edges. Do NOT redraw — use the attached image as-is.] [OPTIONAL — SCENE / UI ELEMENTS:] [Floating UI cards, data panels, atmospheric effects. Be specific about size, position, tilt, and what UI content to show.] CRITICAL NEGATIVES: - NO human face or figure of any kind - [LEFT / RIGHT] [X]% of frame: intentionally dark — reserved for face composite - NO text other than the title lines and badge STYLE: [YOUR_AESTHETIC — e.g., "photoreal cinematic, editorial" or "bold kinetic content-creator energy, dark background with warm yellow glow"]
Composite [YOUR_NAME]'s face onto the attached base thumbnail without altering the title, backdrop, or any other base layer elements. ATTACHED IMAGES: 1. Base layer (Step 1 output — locked canvas, preserve exactly) 2. [YOUR_NAME] face reference — clean shoulders-up portrait FACE TO ADD: [YOUR_NAME] — positioned in the [LEFT / RIGHT] [X]% of the frame, shoulders-up, facing slightly [left / right] toward the title. Expression: [describe emotion — e.g., "wide excited grin", "open-mouth shocked face", "confident direct camera look"]. IDENTITY PRESERVATION — non-negotiable: Use the attached face reference as the identity-locked source. Preserve [YOUR_NAME]'s facial identity EXACTLY — bone structure, eye shape and color, nose, mouth, skin tone, hair. Do NOT stylize, slim, smooth, or beautify. Treat as a photographic cutout. Relight with the scene's existing [warm / cool] light from the [direction — e.g., "right"]. If you cannot preserve identity perfectly, leave the face area blank. LIGHTING INTEGRATION: [YOUR_NAME]'s face is lit by [the scene's light source — e.g., "the laptop screen, cool blue from the right — left side in shadow"]. No flat lighting. No green spill. Clean cutout edges. DO NOT regenerate or move: - Title text (both lines, exact text, font, colors) - [Badge / label] - Background scene and lighting - [Any other base layer elements] OUTPUT: 16:9, same dimensions as base.
Face Preservation Directive
Drop this block at the top of any composite prompt. It explicitly instructs the model not to regenerate — only to composite. Removes ambiguity about what "use my photo" means.
🔒 FACE PRESERVATION (highest priority): Treat this as a COMPOSITE task, not a generation. Use the person from the attached reference photo EXACTLY as they appear. Do NOT regenerate, redraw, stylize, smooth, AI-enhance, or alter their face, skin, hair, or features in any way. Only the background, lighting environment, and color grading may be generated. The face must be pixel-faithful to the reference. Final check: if the face looks "AI-rendered" or "stylized," regenerate.
Steal This Skill
This is the full installable skill file — a generic version of the framework, stripped of any channel-specific branding. Drop it in and type /tombernail in any Claude Code conversation to generate your thumbnail prompts.
Copy the skill file below. Save it as tombernail.md
Drop it in ~/.claude/skills/tombernail.md for global access, or .claude/skills/tombernail.md inside any project.
Open Claude Code and type /tombernail followed by your video title or script path. Claude generates base + composite prompts ready to paste into gpt-image-1.
# /tombernail — Two-step AI thumbnail framework **Trigger:** `/tombernail`, `thumbnail prompts for [video]`, `make thumbnail for [title]` **Input:** Video title + one of: script path, topic summary, or key talking points **Output:** Two paste-ready prompts per concept (base layer + composite) for gpt-image-1 or gpt-image-2 **Model:** Claude Sonnet --- ## Core Principle **Title and thumbnail are a two-part joke. They do not repeat each other.** | Surface | Job | Sells | |---|---|---| | **Title** | Table of contents | Breadth — the specific topics/beats | | **Thumbnail** | One provocative hook | Curiosity gap — one question or bold claim | Before writing any prompt, define: 1. What does the title say? 2. What should the thumbnail say DIFFERENTLY? 3. What is the one desire, pain, or curiosity this video triggers? --- ## Why Two Steps? One-shot AI generation always fails on faces. The model regenerates your face from training data — wrong identity. You lose. Step 1 (Base Layer): Generate everything except faces. Iterate freely. Step 2 (Composite): Lock the base. Attach real face reference. The model treats it as a photographic cutout, not source material. --- ## Workflow Step 1: ANALYZE Read the script, outline, or key talking points. Extract: hook tension, biggest story, title↔thumbnail split. What is the ONE thing the thumbnail should say that the title doesn't? Step 2: WRITE TWO PROMPTS PER CONCEPT BASE LAYER: backdrop + text + UI elements — NO faces, NO logos COMPOSITE: lock base exactly, add face cutout + optional brand logo Step 3: USER RUNS IN gpt-image-1 / gpt-image-2 Run Step 1 prompt first (no attachments) Save the output as the base image Open image edit mode, attach base + face reference photo Paste Step 2 prompt → final thumbnail Produce 2–3 concepts. Each concept gets its own pair of prompts. --- ## Base Layer Template YouTube thumbnail BASE LAYER, 16:9. NO PEOPLE. NO FACES. BACKDROP: [YOUR_BACKGROUND — e.g., "Pure black #0A0A0A" or "Dark cinematic scene, warm amber lighting, heavy vignette at edges"] TEXT — TITLE (two lines, bold condensed sans-serif, ALL CAPS, [center/right]-aligned): - Line 1 ([white or your color]): "[SETUP LINE — context or situation]" - Line 2 ([YOUR_ACCENT_COLOR]): "[PAYOFF LINE — hook or punchline]" Each line ~25–30% canvas height. Tight tracking. [OPTIONAL — BADGE:] [POSITION]: solid [YOUR_COLOR] rounded pill, "[LABEL]" in bold black condensed sans-serif. ~8% canvas height. [OPTIONAL — LOGO:] Composite attached wordmark PNG at ~12% frame width, top-left, inset 3%. Do NOT redraw — use attached image as-is. [OPTIONAL — SCENE/UI:] [Describe floating UI cards, data panels, atmospheric effects — specific about size, position, tilt.] CRITICAL NEGATIVES: - NO human face or figure - [LEFT/RIGHT] [X]% of frame: intentionally dark — reserved for composite - NO text other than title and badge - [YOUR_EXCLUDED_ELEMENTS] STYLE: [YOUR_AESTHETIC] --- ## Composite Template Composite [YOUR_NAME]'s face onto the attached base thumbnail without altering the title, backdrop, or any base layer elements. ATTACHED IMAGES: 1. Base layer (Step 1 output — locked canvas, preserve exactly) 2. [YOUR_NAME] face reference — clean shoulders-up portrait FACE TO ADD: [YOUR_NAME] — [LEFT/RIGHT] [X]% of frame, shoulders-up, facing slightly [left/right] toward the title. Expression: [emotion matching the hook]. IDENTITY PRESERVATION — non-negotiable: Preserve [YOUR_NAME]'s facial identity EXACTLY — bone structure, eye shape and color, nose, mouth, skin tone, hair. Treat as a photographic cutout. Relight with scene's existing [warm/cool] light from [direction]. If identity cannot be preserved, leave the face area blank. LIGHTING: Face lit by [scene's light source]. No flat lighting. No green spill. Clean cutout edges. DO NOT regenerate or move: title text, [badge], background scene, [all other base layer elements]. OUTPUT: 16:9, same dimensions as base. --- ## Face Preservation Directive Paste at the top of any composite prompt when face fidelity is at risk: 🔒 FACE PRESERVATION (highest priority): Treat this as a COMPOSITE task, not a generation. Use the person from the attached reference photo EXACTLY as they appear. Do NOT regenerate, redraw, stylize, smooth, AI-enhance, or alter their face, skin, hair, or features. Only background, lighting, and color grading may be generated. The face must be pixel-faithful to the reference. Final check: if the face looks "AI-rendered" or "stylized," regenerate. --- ## Quality Checklist - [ ] Title and thumbnail say different things — thumbnail adds a new hook - [ ] Base layer has zero faces and zero title text - [ ] Badge / wordmark in base layer if applicable - [ ] Empty zone called out (left/right X% intentionally dark) - [ ] Identity preservation directive in composite prompt - [ ] Lighting integration described - [ ] DO NOT touch list covers all base layer elements - [ ] Text readable at 320×180px (mobile thumbnail minimum) - [ ] Max 3 visual elements in base layer
Works with any version of Claude Code (claude-sonnet-4-6 or newer recommended). No API keys or setup beyond saving the file.
What You Need