Text-to-image and image-to-image generation using a customized Diffusion engine.

Overview

Image generation runs on a customized Diffusion engine (qvac-ext-stable-diffusion.cpp). Load a supported model using modelType: "diffusion". Then, provide a text prompt describing the image to generate.

For image-to-image, also pass init_image (a Uint8Array of PNG or JPEG bytes) — the model transforms the input guided by the prompt instead of starting from noise. diffusion() returns one or more PNG images as Uint8Array buffers. Use progressStream to track generation progress step-by-step.

For higher-resolution outputs, you can chain a Real-ESRGAN upscaler. Either attach it as a post-processing step at load time and trigger it per generation via diffusion({ upscale }), or load an ESRGAN model standalone with modelConfig.mode: "upscale" and call upscale() on any PNG/JPEG buffer — generated or not. Both paths return PNG Uint8Array buffers and accept a repeats option that compounds the scale factor across sequential passes.

Functions

Use the following sequence of function calls:

For how to use each function, see SDK — API reference.

Models

Supported model families and their file layouts:

FLUX.2-klein: split layout — diffusion model *.gguf + LLM text encoder *.gguf (via llmModelSrc) + VAE *.safetensors (via vaeModelSrc).
SD1.x, SD2.x: single all-in-one *.gguf file. No companion files needed.
SDXL, SD3: may require separate CLIP/T5 text encoder files (clipLModelSrc, clipGModelSrc, t5XxlModelSrc) in modelConfig depending on the model variant.
ESRGAN: post-generation upscaler, *.pth format. Two ways to load:
- Paired with a diffusion model — set modelConfig.upscaler.model_src at loadModel() time alongside the diffusion modelSrc.
- Standalone — pass the ESRGAN model as the top-level modelSrc with modelConfig.mode: "upscale".

For models available as constants, see SDK — Models.

On upscaling: ESRGAN can be used standalone via modelConfig.mode: "upscale" + upscale(), or optionally paired with any supported diffusion family to upscale generated images in the same call via diffusion({ upscale }). Available constants: REALESRGAN_X4PLUS_ANIME_6B, REALESRGAN_X4PLUS.

Examples

FLUX.2-klein

The following script shows text-to-image generation using FLUX.2-klein with its split-layout model (separate diffusion model, LLM text encoder, and VAE):

diffusion-flux2-klein.js

import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M } from "@qvac/sdk";
import fs from "fs";
import path from "path";
// FLUX.2 [klein] uses a split-layout: separate diffusion model + LLM text encoder + VAE
const diffusionModelSrc = process.argv[2] || FLUX_2_KLEIN_4B_Q4_0;
const llmModelSrc = process.argv[3] || QWEN3_4B_Q4_K_M;
const vaeModelSrc = process.argv[4] || FLUX_2_KLEIN_4B_VAE;
const prompt = process.argv[5] || "a futuristic city at sunset, photorealistic";
const outputDir = process.argv[6] || ".";
console.log("Loading FLUX.2 [klein] split-layout model...");
const modelId = await loadModel({
    modelSrc: diffusionModelSrc,
    modelType: "diffusion",
    modelConfig: {
        device: "gpu",
        threads: 4,
        llmModelSrc,
        vaeModelSrc,
    },
    onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
});
console.log(`Model loaded: ${modelId}`);
console.log(`\nGenerating: "${prompt}"`);
const { progressStream, outputs, stats } = diffusion({
    modelId,
    prompt,
    width: 512,
    height: 512,
    steps: 20,
    guidance: 3.5,
    cfg_scale: 1,
    seed: -1,
});
for await (const { step, totalSteps } of progressStream) {
    process.stdout.write(`\rStep ${step}/${totalSteps}`);
}
console.log();
const buffers = await outputs;
for (let i = 0; i < buffers.length; i++) {
    const outputPath = path.join(outputDir, `flux2_${i}.png`);
    fs.writeFileSync(outputPath, buffers[i]);
    console.log(`Saved: ${outputPath}`);
}
console.log("\nStats:", await stats);
await unloadModel({ modelId, clearStorage: false });
console.log("Done.");
process.exit(0);

Stable Diffusion

The following script shows a minimal text-to-image generation example using a single all-in-one SD 2.1 model:

diffusion-simple.js

import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";
import fs from "fs";
// Minimal diffusion example — single GGUF model, no companion files needed.
// Works with SD 1.x / 2.x all-in-one models.
const modelSrc = process.argv[2] || SD_V2_1_1B_Q8_0;
const prompt = process.argv[3] || "a photo of a cat sitting on a windowsill";
const modelId = await loadModel({
    modelSrc,
    modelType: "diffusion",
    modelConfig: { prediction: "v" },
});
const { outputs } = diffusion({ modelId, prompt });
const buffers = await outputs;
fs.writeFileSync("output.png", buffers[0]);
console.log("Saved: output.png");
await unloadModel({ modelId, clearStorage: false });
process.exit(0);

Image-to-image

Pass init_image to transform an existing image guided by a text prompt. Behavior depends on the model family:

FLUX.2: in-context conditioning. Requires prediction: "flux2_flow" in modelConfig at loadModel() time; strength is ignored on this path.
SD / SDXL / SD3: SDEdit-style. Use strength to control how much the source is preserved (0 = keep source, 1 = ignore source).

The following script loads FLUX.2-klein in split-layout and transforms an input image using in-context conditioning (prediction: "flux2_flow"):

diffusion-flux2-klein-img2img.js

import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M, } from "@qvac/sdk";
import fs from "fs";
import path from "path";
// img2img with FLUX.2 [klein] split-layout — uses in-context conditioning ("flux2_flow").
const inputPath = process.argv[2];
const prompt = process.argv[3] || "oil painting style, vibrant colors";
const outputDir = process.argv[4] || ".";
const diffusionModelSrc = process.argv[5] || FLUX_2_KLEIN_4B_Q4_0;
const llmModelSrc = process.argv[6] || QWEN3_4B_Q4_K_M;
const vaeModelSrc = process.argv[7] || FLUX_2_KLEIN_4B_VAE;
if (!inputPath) {
    console.error("❌ Error: input image path is required");
    console.error("Usage: bun run bare:example dist/examples/diffusion-flux2-klein-img2img.js <inputImage> [prompt] [outputDir] [diffusionModelSrc] [llmModelSrc] [vaeModelSrc]");
    process.exit(1);
}
try {
    console.log("Loading FLUX.2 [klein] split-layout model...");
    const modelId = await loadModel({
        modelSrc: diffusionModelSrc,
        modelType: "diffusion",
        modelConfig: {
            device: "gpu",
            threads: 4,
            llmModelSrc,
            vaeModelSrc,
            prediction: "flux2_flow",
        },
        onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
    });
    console.log(`Model loaded: ${modelId}`);
    const init_image = new Uint8Array(fs.readFileSync(inputPath));
    console.log(`\nTransforming "${inputPath}" with prompt: "${prompt}"`);
    const { progressStream, outputs, stats } = diffusion({
        modelId,
        prompt,
        init_image,
        steps: 20,
        guidance: 3.5,
        cfg_scale: 1,
        seed: -1,
    });
    for await (const { step, totalSteps } of progressStream) {
        process.stdout.write(`\rStep ${step}/${totalSteps}`);
    }
    console.log();
    const buffers = await outputs;
    for (let i = 0; i < buffers.length; i++) {
        const outputPath = path.join(outputDir, `flux2_img2img_${i}.png`);
        fs.writeFileSync(outputPath, buffers[i]);
        console.log(`Saved: ${outputPath}`);
    }
    console.log("\nStats:", await stats);
    await unloadModel({ modelId, clearStorage: false });
    console.log("Done.");
    process.exit(0);
}
catch (error) {
    console.error("❌ Error:", error);
    process.exit(1);
}

Upscaling

Post-processing

Pass upscale to diffusion() to upscale generated images in the same call. Requires modelConfig.upscaler = { type: "esrgan", model_src, tile_size? } at loadModel() time. Behavior depends on the value passed:

true (or {} / { repeats: 1 }): single pass at the model's native scale factor (e.g. x4 for RealESRGAN_x4plus).
{ repeats: N }: N sequential passes — each pass multiplies the output dimensions by the model's scale factor (e.g. repeats: 2 with an x4 model → x16).
When batch_count > 1, every output image is upscaled independently.

The following script loads SD 2.1 with an ESRGAN upscaler and generates both a single-pass (x4) and a two-pass (x16) upscale from the same prompt:

diffusion-esrgan-upscale.js

import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0, REALESRGAN_X4PLUS_ANIME_6B, } from "@qvac/sdk";
import fs from "fs";
import path from "path";
// ESRGAN upscale example.
//
// Usage:
//   bun run examples/diffusion-esrgan-upscale.ts [esrganSrc] [prompt] [outputDir]
const esrganArg = process.argv[2];
const promptArg = process.argv[3];
const outputDirArg = process.argv[4];
const esrganModelSrc = esrganArg ??
    REALESRGAN_X4PLUS_ANIME_6B;
const prompt = promptArg ??
    "an illustrated red fox portrait, clean line art, soft watercolor background, detailed fur, crisp eyes";
const negative_prompt = "blurry, low quality, watermark, text";
const outputDir = outputDirArg ?? ".";
const seed = 42;
try {
    console.log("Loading SD 2.1 + ESRGAN upscaler...");
    const modelId = await loadModel({
        modelSrc: SD_V2_1_1B_Q8_0,
        modelType: "diffusion",
        modelConfig: {
            prediction: "v",
            upscaler: {
                type: "esrgan",
                model_src: esrganModelSrc,
                tile_size: 128,
            },
        },
        onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
    });
    console.log(`Model loaded: ${modelId}`);
    // Source size is intentionally small — each ESRGAN repeat multiplies dimensions.
    const baseParams = {
        modelId,
        prompt,
        negative_prompt,
        width: 128,
        height: 128,
        steps: 5,
        cfg_scale: 7.5,
        seed,
    };
    console.log(`\nGenerating ESRGAN x4 upscale: "${prompt}"`);
    const single = diffusion({ ...baseParams, upscale: true });
    for await (const { step, totalSteps } of single.progressStream) {
        process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`);
    }
    console.log();
    const singleBuffers = await single.outputs;
    for (let i = 0; i < singleBuffers.length; i++) {
        const out = path.join(outputDir, `sd2_esrgan_x4_seed${seed}_${i}.png`);
        fs.writeFileSync(out, singleBuffers[i]);
        console.log(`Saved: ${out}`);
    }
    console.log("Stats:", await single.stats);
    console.log("\nGenerating ESRGAN two-pass x16 upscale...");
    const twoPass = diffusion({ ...baseParams, upscale: { repeats: 2 } });
    for await (const { step, totalSteps } of twoPass.progressStream) {
        process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`);
    }
    console.log();
    const twoPassBuffers = await twoPass.outputs;
    for (let i = 0; i < twoPassBuffers.length; i++) {
        const out = path.join(outputDir, `sd2_esrgan_x16_seed${seed}_${i}.png`);
        fs.writeFileSync(out, twoPassBuffers[i]);
        console.log(`Saved: ${out}`);
    }
    console.log("Stats:", await twoPass.stats);
    await unloadModel({ modelId, clearStorage: false });
    console.log("Done.");
    process.exit(0);
}
catch (error) {
    console.error("❌ Error:", error);
    process.exit(1);
}

Standalone

The SDK also exposes upscale() — a standalone path that upscales any PNG/JPEG image without running diffusion. Load the ESRGAN model directly with modelConfig.mode: "upscale":

const modelId = await loadModel(REALESRGAN_X4PLUS_ANIME_6B, {
  modelType: "diffusion",
  modelConfig: { mode: "upscale", upscaler: { tile_size: 128 } },
});
const { outputs } = upscale({ modelId, image: pngBytes, repeats: 2 });
const [upscaledPng] = await outputs;

tile_size defaults to 128. Increase it to reduce tile seams on large inputs, at the cost of more memory per pass.

Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.

Image generation

On this page