All articles
April 18, 20269 min read·ImageAPI Team

Build an AI Thumbnail Generator with Node.js (Full Code)

A complete Node.js tutorial for an AI thumbnail generator. Covers prompt presets, batch calls, caching, and a small Express endpoint you can ship today.

Node.jsTutorialsYouTubeMarketing

Thumbnails sell the click. If you publish more than two videos a week or you run a content factory for clients, hand drawing each thumbnail in Photoshop does not scale. This guide builds a small Node.js service that takes a video title and returns a ready to use thumbnail in under ten seconds.

What we are building

An Express endpoint at POST /thumbnail. You send a JSON body with the video title and an optional style, and the service returns a 1280 by 720 PNG. Behind the scenes it expands the title into a richer prompt, calls the image API, and caches the result so duplicate titles do not cost twice.

Install the dependencies

mkdir thumbnail-service && cd thumbnail-service
npm init -y
npm install express node-fetch lru-cache dotenv

The thumbnail prompt template

A good thumbnail prompt has four parts. The subject, an emotion, a background style, and a composition rule. Encode that as a template so every call gets a consistent baseline.

function buildPrompt(title, style = "tech") {
  const styles = {
    tech: "vibrant blue and orange tones, neon accents, glowing edges",
    finance: "clean gradient background, gold accents, sharp typography vibe",
    gaming: "high contrast, cinematic glow, dark background, intense colors",
    cooking: "warm tones, top down composition, natural light",
  };

  const look = styles[style] || styles.tech;

  return [
    `A bold YouTube thumbnail about: ${title}.`,
    "Centered subject, large empty space on the left for text overlay,",
    `background style: ${look}.`,
    "Highly detailed, sharp focus, eye catching, professional thumbnail composition,",
    "no watermark, no text in the image.",
  ].join(" ");
}

The image API call

import fetch from "node-fetch";

const ENDPOINT = "https://api.imageapi.org/api/v1/generate";

export async function generateThumbnail(prompt) {
  const res = await fetch(ENDPOINT, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.IMAGEAPI_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      prompt,
      model: "flux-2-klein-9b",
      width: 1280,
      height: 720,
      num_inference_steps: 28,
    }),
  });

  if (!res.ok) {
    throw new Error(`Image API failed: ${res.status}`);
  }

  const data = await res.json();
  return data.data.image;
}

Wire it up to Express with caching

Cache by a hash of the title and style. Most users edit the title two or three times before settling on one. Caching saves you actual money and keeps the response under a second on repeat calls.

import express from "express";
import { LRUCache } from "lru-cache";
import crypto from "crypto";
import { generateThumbnail } from "./image.js";
import { buildPrompt } from "./prompt.js";

const app = express();
app.use(express.json());

const cache = new LRUCache({ max: 1000, ttl: 1000 * 60 * 60 * 24 });

function key(title, style) {
  return crypto.createHash("sha1").update(`${style}::${title}`).digest("hex");
}

app.post("/thumbnail", async (req, res) => {
  const { title, style } = req.body || {};
  if (!title) return res.status(400).json({ error: "title required" });

  const cacheKey = key(title, style);
  if (cache.has(cacheKey)) return res.json({ image: cache.get(cacheKey), cached: true });

  try {
    const prompt = buildPrompt(title, style);
    const image = await generateThumbnail(prompt);
    cache.set(cacheKey, image);
    res.json({ image, cached: false });
  } catch (err) {
    res.status(500).json({ error: err.message });
  }
});

app.listen(3000, () => console.log("Thumbnail service on :3000"));

Adding text overlay yourself

Diffusion models are still inconsistent at rendering long text. The trick is to ask the model for a clean composition with empty space on one side, then composite the title yourself with sharp on canvas using node canvas or sharp. You get pixel perfect text and you save a generation pass.

Going further

  • Add a /thumbnail/batch endpoint that accepts an array of titles and returns four variations per title.
  • Push the generated images to S3 or R2 and return URLs instead of base64 to keep payloads small.
  • Add a feedback loop. Track which thumbnails earn the highest CTR and feed those back into the style template.

Frequently asked questions

Can I use this for YouTube without breaking their rules?
Yes. YouTube allows AI generated thumbnails as long as they are not deceptive. Make sure the thumbnail represents the video content and you are fine.
What is the best image size for a YouTube thumbnail?
1280 by 720 pixels at a 16 by 9 ratio. That is what the example uses. YouTube will downscale anything larger.
How do I get readable text on the thumbnail?
Generate the background and subject with the API, then composite the headline yourself using a Node library like sharp. Diffusion models are still poor at long readable text.

Try the API used in this article

Free tier, transparent pricing, and a single REST endpoint for FLUX, Stable Diffusion, and Leonardo models.

Related reading