# Image generation (/ai-capabilities/image-generation) ## Overview Image generation runs on a **customized Diffusion engine** ([`qvac-ext-stable-diffusion.cpp`](https://github.com/tetherto/qvac-ext-stable-diffusion.cpp)). Load a supported model using `modelType: "diffusion"`. Then, provide a text `prompt` describing the image to generate. For image-to-image, also pass `init_image` (a `Uint8Array` of PNG or JPEG bytes) — the model transforms the input guided by the prompt instead of starting from noise. `diffusion()` returns one or more PNG images as `Uint8Array` buffers. Use `progressStream` to track generation progress step-by-step. For higher-resolution outputs, you can chain a Real-ESRGAN upscaler. Either attach it as a post-processing step at load time and trigger it per generation via `diffusion({ upscale })`, or load an ESRGAN model standalone with `modelConfig.mode: "upscale"` and call `upscale()` on any PNG/JPEG buffer — generated or not. Both paths return PNG `Uint8Array` buffers and accept a `repeats` option that compounds the scale factor across sequential passes. ## Functions Use the following sequence of function calls: 1. [`loadModel()`](/reference/api#loadmodel) 2. [`diffusion()`](/reference/api#diffusion) or `upscale()` 3. [`unloadModel()`](/reference/api#unloadmodel) For how to use each function, see [SDK — API reference](/reference/api/). ## Models Supported model families and their file layouts: * **FLUX.2-klein**: split layout — diffusion model `*.gguf` + LLM text encoder `*.gguf` (via `llmModelSrc`) + VAE `*.safetensors` (via `vaeModelSrc`). * **SD1.x, SD2.x**: single all-in-one `*.gguf` file. No companion files needed. * **SDXL, SD3**: may require separate CLIP/T5 text encoder files (`clipLModelSrc`, `clipGModelSrc`, `t5XxlModelSrc`) in `modelConfig` depending on the model variant. * **ESRGAN**: post-generation upscaler, `*.pth` format. Two ways to load: * **Paired with a diffusion model** — set `modelConfig.upscaler.model_src` at `loadModel()` time alongside the diffusion `modelSrc`. * **Standalone** — pass the ESRGAN model as the top-level `modelSrc` with `modelConfig.mode: "upscale"`. For models available as constants, see [SDK — Models](/introduction#models). **On upscaling:** ESRGAN can be used standalone via `modelConfig.mode: "upscale"` + `upscale()`, or optionally paired with any supported diffusion family to upscale generated images in the same call via `diffusion({ upscale })`. Available constants: `REALESRGAN_X4PLUS_ANIME_6B`, `REALESRGAN_X4PLUS`. ## Examples ### FLUX.2-klein The following script shows text-to-image generation using FLUX.2-klein with its split-layout model (separate diffusion model, LLM text encoder, and VAE): ```js file=/packages/sdk/dist/examples/diffusion-flux2-klein.js title="diffusion-flux2-klein.js" lineNumbers import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M } from "@qvac/sdk"; import fs from "fs"; import path from "path"; // FLUX.2 [klein] uses a split-layout: separate diffusion model + LLM text encoder + VAE const diffusionModelSrc = process.argv[2] || FLUX_2_KLEIN_4B_Q4_0; const llmModelSrc = process.argv[3] || QWEN3_4B_Q4_K_M; const vaeModelSrc = process.argv[4] || FLUX_2_KLEIN_4B_VAE; const prompt = process.argv[5] || "a futuristic city at sunset, photorealistic"; const outputDir = process.argv[6] || "."; console.log("Loading FLUX.2 [klein] split-layout model..."); const modelId = await loadModel({ modelSrc: diffusionModelSrc, modelType: "diffusion", modelConfig: { device: "gpu", threads: 4, llmModelSrc, vaeModelSrc, }, onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`), }); console.log(`Model loaded: ${modelId}`); console.log(`\nGenerating: "${prompt}"`); const { progressStream, outputs, stats } = diffusion({ modelId, prompt, width: 512, height: 512, steps: 20, guidance: 3.5, cfg_scale: 1, seed: -1, }); for await (const { step, totalSteps } of progressStream) { process.stdout.write(`\rStep ${step}/${totalSteps}`); } console.log(); const buffers = await outputs; for (let i = 0; i < buffers.length; i++) { const outputPath = path.join(outputDir, `flux2_${i}.png`); fs.writeFileSync(outputPath, buffers[i]); console.log(`Saved: ${outputPath}`); } console.log("\nStats:", await stats); await unloadModel({ modelId, clearStorage: false }); console.log("Done."); process.exit(0); ``` ```ts file=/packages/sdk/examples/diffusion-flux2-klein.ts title="diffusion-flux2-klein.ts" lineNumbers import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M } from "@qvac/sdk"; import fs from "fs"; import path from "path"; // FLUX.2 [klein] uses a split-layout: separate diffusion model + LLM text encoder + VAE const diffusionModelSrc = process.argv[2] || FLUX_2_KLEIN_4B_Q4_0; const llmModelSrc = process.argv[3] || QWEN3_4B_Q4_K_M; const vaeModelSrc = process.argv[4] || FLUX_2_KLEIN_4B_VAE; const prompt = process.argv[5] || "a futuristic city at sunset, photorealistic"; const outputDir = process.argv[6] || "."; console.log("Loading FLUX.2 [klein] split-layout model..."); const modelId = await loadModel({ modelSrc: diffusionModelSrc, modelType: "diffusion", modelConfig: { device: "gpu", threads: 4, llmModelSrc, vaeModelSrc, }, onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`), }); console.log(`Model loaded: ${modelId}`); console.log(`\nGenerating: "${prompt}"`); const { progressStream, outputs, stats } = diffusion({ modelId, prompt, width: 512, height: 512, steps: 20, guidance: 3.5, cfg_scale: 1, seed: -1, }); for await (const { step, totalSteps } of progressStream) { process.stdout.write(`\rStep ${step}/${totalSteps}`); } console.log(); const buffers = await outputs; for (let i = 0; i < buffers.length; i++) { const outputPath = path.join(outputDir, `flux2_${i}.png`); fs.writeFileSync(outputPath, buffers[i]!); console.log(`Saved: ${outputPath}`); } console.log("\nStats:", await stats); await unloadModel({ modelId, clearStorage: false }); console.log("Done."); process.exit(0); ``` ### Stable Diffusion The following script shows a minimal text-to-image generation example using a single all-in-one SD 2.1 model: ```js file=/packages/sdk/dist/examples/diffusion-simple.js title="diffusion-simple.js" lineNumbers import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk"; import fs from "fs"; // Minimal diffusion example — single GGUF model, no companion files needed. // Works with SD 1.x / 2.x all-in-one models. const modelSrc = process.argv[2] || SD_V2_1_1B_Q8_0; const prompt = process.argv[3] || "a photo of a cat sitting on a windowsill"; const modelId = await loadModel({ modelSrc, modelType: "diffusion", modelConfig: { prediction: "v" }, }); const { outputs } = diffusion({ modelId, prompt }); const buffers = await outputs; fs.writeFileSync("output.png", buffers[0]); console.log("Saved: output.png"); await unloadModel({ modelId, clearStorage: false }); process.exit(0); ``` ```ts file=/packages/sdk/examples/diffusion-simple.ts title="diffusion-simple.ts" lineNumbers import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk"; import fs from "fs"; // Minimal diffusion example — single GGUF model, no companion files needed. // Works with SD 1.x / 2.x all-in-one models. const modelSrc = process.argv[2] || SD_V2_1_1B_Q8_0; const prompt = process.argv[3] || "a photo of a cat sitting on a windowsill"; const modelId = await loadModel({ modelSrc, modelType: "diffusion", modelConfig: { prediction: "v" }, }); const { outputs } = diffusion({ modelId, prompt }); const buffers = await outputs; fs.writeFileSync("output.png", buffers[0]!); console.log("Saved: output.png"); await unloadModel({ modelId, clearStorage: false }); process.exit(0); ``` ### Image-to-image Pass `init_image` to transform an existing image guided by a text prompt. Behavior depends on the model family: * **FLUX.2**: in-context conditioning. Requires `prediction: "flux2_flow"` in `modelConfig` at `loadModel()` time; `strength` is ignored on this path. * **SD / SDXL / SD3**: SDEdit-style. Use `strength` to control how much the source is preserved (`0` = keep source, `1` = ignore source). The following script loads FLUX.2-klein in split-layout and transforms an input image using in-context conditioning (`prediction: "flux2_flow"`): ```js file=/packages/sdk/dist/examples/diffusion-flux2-klein-img2img.js title="diffusion-flux2-klein-img2img.js" lineNumbers import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M, } from "@qvac/sdk"; import fs from "fs"; import path from "path"; // img2img with FLUX.2 [klein] split-layout — uses in-context conditioning ("flux2_flow"). const inputPath = process.argv[2]; const prompt = process.argv[3] || "oil painting style, vibrant colors"; const outputDir = process.argv[4] || "."; const diffusionModelSrc = process.argv[5] || FLUX_2_KLEIN_4B_Q4_0; const llmModelSrc = process.argv[6] || QWEN3_4B_Q4_K_M; const vaeModelSrc = process.argv[7] || FLUX_2_KLEIN_4B_VAE; if (!inputPath) { console.error("❌ Error: input image path is required"); console.error("Usage: bun run bare:example dist/examples/diffusion-flux2-klein-img2img.js [prompt] [outputDir] [diffusionModelSrc] [llmModelSrc] [vaeModelSrc]"); process.exit(1); } try { console.log("Loading FLUX.2 [klein] split-layout model..."); const modelId = await loadModel({ modelSrc: diffusionModelSrc, modelType: "diffusion", modelConfig: { device: "gpu", threads: 4, llmModelSrc, vaeModelSrc, prediction: "flux2_flow", }, onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`), }); console.log(`Model loaded: ${modelId}`); const init_image = new Uint8Array(fs.readFileSync(inputPath)); console.log(`\nTransforming "${inputPath}" with prompt: "${prompt}"`); const { progressStream, outputs, stats } = diffusion({ modelId, prompt, init_image, steps: 20, guidance: 3.5, cfg_scale: 1, seed: -1, }); for await (const { step, totalSteps } of progressStream) { process.stdout.write(`\rStep ${step}/${totalSteps}`); } console.log(); const buffers = await outputs; for (let i = 0; i < buffers.length; i++) { const outputPath = path.join(outputDir, `flux2_img2img_${i}.png`); fs.writeFileSync(outputPath, buffers[i]); console.log(`Saved: ${outputPath}`); } console.log("\nStats:", await stats); await unloadModel({ modelId, clearStorage: false }); console.log("Done."); process.exit(0); } catch (error) { console.error("❌ Error:", error); process.exit(1); } ``` ```ts file=/packages/sdk/examples/diffusion-flux2-klein-img2img.ts title="diffusion-flux2-klein-img2img.ts" lineNumbers import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M, } from "@qvac/sdk"; import fs from "fs"; import path from "path"; // img2img with FLUX.2 [klein] split-layout — uses in-context conditioning ("flux2_flow"). const inputPath = process.argv[2]; const prompt = process.argv[3] || "oil painting style, vibrant colors"; const outputDir = process.argv[4] || "."; const diffusionModelSrc = process.argv[5] || FLUX_2_KLEIN_4B_Q4_0; const llmModelSrc = process.argv[6] || QWEN3_4B_Q4_K_M; const vaeModelSrc = process.argv[7] || FLUX_2_KLEIN_4B_VAE; if (!inputPath) { console.error("❌ Error: input image path is required"); console.error( "Usage: bun run bare:example dist/examples/diffusion-flux2-klein-img2img.js [prompt] [outputDir] [diffusionModelSrc] [llmModelSrc] [vaeModelSrc]", ); process.exit(1); } try { console.log("Loading FLUX.2 [klein] split-layout model..."); const modelId = await loadModel({ modelSrc: diffusionModelSrc, modelType: "diffusion", modelConfig: { device: "gpu", threads: 4, llmModelSrc, vaeModelSrc, prediction: "flux2_flow", }, onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`), }); console.log(`Model loaded: ${modelId}`); const init_image = new Uint8Array(fs.readFileSync(inputPath)); console.log(`\nTransforming "${inputPath}" with prompt: "${prompt}"`); const { progressStream, outputs, stats } = diffusion({ modelId, prompt, init_image, steps: 20, guidance: 3.5, cfg_scale: 1, seed: -1, }); for await (const { step, totalSteps } of progressStream) { process.stdout.write(`\rStep ${step}/${totalSteps}`); } console.log(); const buffers = await outputs; for (let i = 0; i < buffers.length; i++) { const outputPath = path.join(outputDir, `flux2_img2img_${i}.png`); fs.writeFileSync(outputPath, buffers[i]!); console.log(`Saved: ${outputPath}`); } console.log("\nStats:", await stats); await unloadModel({ modelId, clearStorage: false }); console.log("Done."); process.exit(0); } catch (error) { console.error("❌ Error:", error); process.exit(1); } ``` ### Upscaling #### Post-processing Pass `upscale` to `diffusion()` to upscale generated images in the same call. Requires `modelConfig.upscaler = { type: "esrgan", model_src, tile_size? }` at `loadModel()` time. Behavior depends on the value passed: * `true` (or `{}` / `{ repeats: 1 }`): single pass at the model's native scale factor (e.g. `x4` for RealESRGAN\_x4plus). * `{ repeats: N }`: N sequential passes — each pass multiplies the output dimensions by the model's scale factor (e.g. `repeats: 2` with an `x4` model → `x16`). * When `batch_count > 1`, every output image is upscaled independently. The following script loads SD 2.1 with an ESRGAN upscaler and generates both a single-pass (`x4`) and a two-pass (`x16`) upscale from the same prompt: ```js file=/packages/sdk/dist/examples/diffusion-esrgan-upscale.js title="diffusion-esrgan-upscale.js" lineNumbers import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0, REALESRGAN_X4PLUS_ANIME_6B, } from "@qvac/sdk"; import fs from "fs"; import path from "path"; // ESRGAN upscale example. // // Usage: // bun run examples/diffusion-esrgan-upscale.ts [esrganSrc] [prompt] [outputDir] const esrganArg = process.argv[2]; const promptArg = process.argv[3]; const outputDirArg = process.argv[4]; const esrganModelSrc = esrganArg ?? REALESRGAN_X4PLUS_ANIME_6B; const prompt = promptArg ?? "an illustrated red fox portrait, clean line art, soft watercolor background, detailed fur, crisp eyes"; const negative_prompt = "blurry, low quality, watermark, text"; const outputDir = outputDirArg ?? "."; const seed = 42; try { console.log("Loading SD 2.1 + ESRGAN upscaler..."); const modelId = await loadModel({ modelSrc: SD_V2_1_1B_Q8_0, modelType: "diffusion", modelConfig: { prediction: "v", upscaler: { type: "esrgan", model_src: esrganModelSrc, tile_size: 128, }, }, onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`), }); console.log(`Model loaded: ${modelId}`); // Source size is intentionally small — each ESRGAN repeat multiplies dimensions. const baseParams = { modelId, prompt, negative_prompt, width: 128, height: 128, steps: 5, cfg_scale: 7.5, seed, }; console.log(`\nGenerating ESRGAN x4 upscale: "${prompt}"`); const single = diffusion({ ...baseParams, upscale: true }); for await (const { step, totalSteps } of single.progressStream) { process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`); } console.log(); const singleBuffers = await single.outputs; for (let i = 0; i < singleBuffers.length; i++) { const out = path.join(outputDir, `sd2_esrgan_x4_seed${seed}_${i}.png`); fs.writeFileSync(out, singleBuffers[i]); console.log(`Saved: ${out}`); } console.log("Stats:", await single.stats); console.log("\nGenerating ESRGAN two-pass x16 upscale..."); const twoPass = diffusion({ ...baseParams, upscale: { repeats: 2 } }); for await (const { step, totalSteps } of twoPass.progressStream) { process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`); } console.log(); const twoPassBuffers = await twoPass.outputs; for (let i = 0; i < twoPassBuffers.length; i++) { const out = path.join(outputDir, `sd2_esrgan_x16_seed${seed}_${i}.png`); fs.writeFileSync(out, twoPassBuffers[i]); console.log(`Saved: ${out}`); } console.log("Stats:", await twoPass.stats); await unloadModel({ modelId, clearStorage: false }); console.log("Done."); process.exit(0); } catch (error) { console.error("❌ Error:", error); process.exit(1); } ``` ```ts file=/packages/sdk/examples/diffusion-esrgan-upscale.ts title="diffusion-esrgan-upscale.ts" lineNumbers import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0, REALESRGAN_X4PLUS_ANIME_6B, } from "@qvac/sdk"; import fs from "fs"; import path from "path"; // ESRGAN upscale example. // // Usage: // bun run examples/diffusion-esrgan-upscale.ts [esrganSrc] [prompt] [outputDir] const esrganArg: string | undefined = process.argv[2]; const promptArg: string | undefined = process.argv[3]; const outputDirArg: string | undefined = process.argv[4]; const esrganModelSrc = esrganArg ?? REALESRGAN_X4PLUS_ANIME_6B; const prompt = promptArg ?? "an illustrated red fox portrait, clean line art, soft watercolor background, detailed fur, crisp eyes"; const negative_prompt = "blurry, low quality, watermark, text"; const outputDir = outputDirArg ?? "."; const seed = 42; try { console.log("Loading SD 2.1 + ESRGAN upscaler..."); const modelId = await loadModel({ modelSrc: SD_V2_1_1B_Q8_0, modelType: "diffusion", modelConfig: { prediction: "v", upscaler: { type: "esrgan", model_src: esrganModelSrc, tile_size: 128, }, }, onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`), }); console.log(`Model loaded: ${modelId}`); // Source size is intentionally small — each ESRGAN repeat multiplies dimensions. const baseParams = { modelId, prompt, negative_prompt, width: 128, height: 128, steps: 5, cfg_scale: 7.5, seed, }; console.log(`\nGenerating ESRGAN x4 upscale: "${prompt}"`); const single = diffusion({ ...baseParams, upscale: true }); for await (const { step, totalSteps } of single.progressStream) { process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`); } console.log(); const singleBuffers = await single.outputs; for (let i = 0; i < singleBuffers.length; i++) { const out = path.join(outputDir, `sd2_esrgan_x4_seed${seed}_${i}.png`); fs.writeFileSync(out, singleBuffers[i]!); console.log(`Saved: ${out}`); } console.log("Stats:", await single.stats); console.log("\nGenerating ESRGAN two-pass x16 upscale..."); const twoPass = diffusion({ ...baseParams, upscale: { repeats: 2 } }); for await (const { step, totalSteps } of twoPass.progressStream) { process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`); } console.log(); const twoPassBuffers = await twoPass.outputs; for (let i = 0; i < twoPassBuffers.length; i++) { const out = path.join(outputDir, `sd2_esrgan_x16_seed${seed}_${i}.png`); fs.writeFileSync(out, twoPassBuffers[i]!); console.log(`Saved: ${out}`); } console.log("Stats:", await twoPass.stats); await unloadModel({ modelId, clearStorage: false }); console.log("Done."); process.exit(0); } catch (error) { console.error("❌ Error:", error); process.exit(1); } ``` #### Standalone The SDK also exposes `upscale()` — a standalone path that upscales any PNG/JPEG image without running diffusion. Load the ESRGAN model directly with `modelConfig.mode: "upscale"`: ```ts const modelId = await loadModel(REALESRGAN_X4PLUS_ANIME_6B, { modelType: "diffusion", modelConfig: { mode: "upscale", upscaler: { tile_size: 128 } }, }); const { outputs } = upscale({ modelId, image: pngBytes, repeats: 2 }); const [upscaledPng] = await outputs; ``` `tile_size` defaults to 128. Increase it to reduce tile seams on large inputs, at the cost of more memory per pass. **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/quickstart).