# Image generation (/ai-capabilities/image-generation)



## Overview

Image generation runs on a **customized Diffusion engine** ([`qvac-ext-stable-diffusion.cpp`](https://github.com/tetherto/qvac-ext-stable-diffusion.cpp)). Load a supported model using `modelType: "diffusion"`. Then, provide a text `prompt` describing the image to generate.

For image-to-image, also pass `init_image` (a `Uint8Array` of PNG or JPEG bytes) — the model transforms the input guided by the prompt instead of starting from noise. `diffusion()` returns one or more PNG images as `Uint8Array` buffers. Use `progressStream` to track generation progress step-by-step.

For higher-resolution outputs, you can chain a Real-ESRGAN upscaler. Either attach it as a post-processing step at load time and trigger it per generation via `diffusion({ upscale })`, or load an ESRGAN model standalone with `modelConfig.mode: "upscale"` and call `upscale()` on any PNG/JPEG buffer — generated or not. Both paths return PNG `Uint8Array` buffers and accept a `repeats` option that compounds the scale factor across sequential passes.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/reference/api#loadmodel)
2. [`diffusion()`](/reference/api#diffusion) or `upscale()`
3. [`unloadModel()`](/reference/api#unloadmodel)

For how to use each function, see [SDK — API reference](/reference/api/).

## Models

Supported model families and their file layouts:

* **FLUX.2-klein**: split layout — diffusion model `*.gguf` + LLM text encoder `*.gguf` (via `llmModelSrc`) + VAE `*.safetensors` (via `vaeModelSrc`).
* **SD1.x, SD2.x**: single all-in-one `*.gguf` file. No companion files needed.
* **SDXL, SD3**: may require separate CLIP/T5 text encoder files (`clipLModelSrc`, `clipGModelSrc`, `t5XxlModelSrc`) in `modelConfig` depending on the model variant.
* **ESRGAN**: post-generation upscaler, `*.pth` format. Two ways to load:
  * **Paired with a diffusion model** — set `modelConfig.upscaler.model_src` at `loadModel()` time alongside the diffusion `modelSrc`.
  * **Standalone** — pass the ESRGAN model as the top-level `modelSrc` with `modelConfig.mode: "upscale"`.

For models available as constants, see [SDK — Models](/introduction#models).

<Callout type="info">
  **On upscaling:** ESRGAN can be used standalone via `modelConfig.mode: "upscale"` + `upscale()`, or optionally paired with any supported diffusion family to upscale generated images in the same call via `diffusion({ upscale })`. Available constants: `REALESRGAN_X4PLUS_ANIME_6B`, `REALESRGAN_X4PLUS`.
</Callout>

## Examples

### FLUX.2-klein

The following script shows text-to-image generation using FLUX.2-klein with its split-layout model (separate diffusion model, LLM text encoder, and VAE):

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/diffusion-flux2-klein.js title="diffusion-flux2-klein.js" lineNumbers
      import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M } from "@qvac/sdk";
      import fs from "fs";
      import path from "path";
      // FLUX.2 [klein] uses a split-layout: separate diffusion model + LLM text encoder + VAE
      const diffusionModelSrc = process.argv[2] || FLUX_2_KLEIN_4B_Q4_0;
      const llmModelSrc = process.argv[3] || QWEN3_4B_Q4_K_M;
      const vaeModelSrc = process.argv[4] || FLUX_2_KLEIN_4B_VAE;
      const prompt = process.argv[5] || "a futuristic city at sunset, photorealistic";
      const outputDir = process.argv[6] || ".";
      console.log("Loading FLUX.2 [klein] split-layout model...");
      const modelId = await loadModel({
          modelSrc: diffusionModelSrc,
          modelType: "diffusion",
          modelConfig: {
              device: "gpu",
              threads: 4,
              llmModelSrc,
              vaeModelSrc,
          },
          onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
      });
      console.log(`Model loaded: ${modelId}`);
      console.log(`\nGenerating: "${prompt}"`);
      const { progressStream, outputs, stats } = diffusion({
          modelId,
          prompt,
          width: 512,
          height: 512,
          steps: 20,
          guidance: 3.5,
          cfg_scale: 1,
          seed: -1,
      });
      for await (const { step, totalSteps } of progressStream) {
          process.stdout.write(`\rStep ${step}/${totalSteps}`);
      }
      console.log();
      const buffers = await outputs;
      for (let i = 0; i < buffers.length; i++) {
          const outputPath = path.join(outputDir, `flux2_${i}.png`);
          fs.writeFileSync(outputPath, buffers[i]);
          console.log(`Saved: ${outputPath}`);
      }
      console.log("\nStats:", await stats);
      await unloadModel({ modelId, clearStorage: false });
      console.log("Done.");
      process.exit(0);
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/diffusion-flux2-klein.ts title="diffusion-flux2-klein.ts" lineNumbers
      import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M } from "@qvac/sdk";
      import fs from "fs";
      import path from "path";

      // FLUX.2 [klein] uses a split-layout: separate diffusion model + LLM text encoder + VAE
      const diffusionModelSrc = process.argv[2] || FLUX_2_KLEIN_4B_Q4_0;
      const llmModelSrc = process.argv[3] || QWEN3_4B_Q4_K_M;
      const vaeModelSrc = process.argv[4] || FLUX_2_KLEIN_4B_VAE;
      const prompt = process.argv[5] || "a futuristic city at sunset, photorealistic";
      const outputDir = process.argv[6] || ".";

      console.log("Loading FLUX.2 [klein] split-layout model...");

      const modelId = await loadModel({
        modelSrc: diffusionModelSrc,
        modelType: "diffusion",
        modelConfig: {
          device: "gpu",
          threads: 4,
          llmModelSrc,
          vaeModelSrc,
        },
        onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
      });
      console.log(`Model loaded: ${modelId}`);

      console.log(`\nGenerating: "${prompt}"`);

      const { progressStream, outputs, stats } = diffusion({
        modelId,
        prompt,
        width: 512,
        height: 512,
        steps: 20,
        guidance: 3.5,
        cfg_scale: 1,
        seed: -1,
      });

      for await (const { step, totalSteps } of progressStream) {
        process.stdout.write(`\rStep ${step}/${totalSteps}`);
      }
      console.log();

      const buffers = await outputs;
      for (let i = 0; i < buffers.length; i++) {
        const outputPath = path.join(outputDir, `flux2_${i}.png`);
        fs.writeFileSync(outputPath, buffers[i]!);
        console.log(`Saved: ${outputPath}`);
      }

      console.log("\nStats:", await stats);
      await unloadModel({ modelId, clearStorage: false });
      console.log("Done.");
      process.exit(0);
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Stable Diffusion

The following script shows a minimal text-to-image generation example using a single all-in-one SD 2.1 model:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/diffusion-simple.js title="diffusion-simple.js" lineNumbers
      import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";
      import fs from "fs";
      // Minimal diffusion example — single GGUF model, no companion files needed.
      // Works with SD 1.x / 2.x all-in-one models.
      const modelSrc = process.argv[2] || SD_V2_1_1B_Q8_0;
      const prompt = process.argv[3] || "a photo of a cat sitting on a windowsill";
      const modelId = await loadModel({
          modelSrc,
          modelType: "diffusion",
          modelConfig: { prediction: "v" },
      });
      const { outputs } = diffusion({ modelId, prompt });
      const buffers = await outputs;
      fs.writeFileSync("output.png", buffers[0]);
      console.log("Saved: output.png");
      await unloadModel({ modelId, clearStorage: false });
      process.exit(0);
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/diffusion-simple.ts title="diffusion-simple.ts" lineNumbers
      import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";
      import fs from "fs";

      // Minimal diffusion example — single GGUF model, no companion files needed.
      // Works with SD 1.x / 2.x all-in-one models.
      const modelSrc = process.argv[2] || SD_V2_1_1B_Q8_0;
      const prompt = process.argv[3] || "a photo of a cat sitting on a windowsill";

      const modelId = await loadModel({
        modelSrc,
        modelType: "diffusion",
        modelConfig: { prediction: "v" },
      });

      const { outputs } = diffusion({ modelId, prompt });
      const buffers = await outputs;

      fs.writeFileSync("output.png", buffers[0]!);
      console.log("Saved: output.png");

      await unloadModel({ modelId, clearStorage: false });
      process.exit(0);
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Image-to-image

Pass `init_image` to transform an existing image guided by a text prompt. Behavior depends on the model family:

* **FLUX.2**: in-context conditioning. Requires `prediction: "flux2_flow"` in `modelConfig` at `loadModel()` time; `strength` is ignored on this path.
* **SD / SDXL / SD3**: SDEdit-style. Use `strength` to control how much the source is preserved (`0` = keep source, `1` = ignore source).

The following script loads FLUX.2-klein in split-layout and transforms an input image using in-context conditioning (`prediction: "flux2_flow"`):

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/diffusion-flux2-klein-img2img.js title="diffusion-flux2-klein-img2img.js" lineNumbers
      import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M, } from "@qvac/sdk";
      import fs from "fs";
      import path from "path";
      // img2img with FLUX.2 [klein] split-layout — uses in-context conditioning ("flux2_flow").
      const inputPath = process.argv[2];
      const prompt = process.argv[3] || "oil painting style, vibrant colors";
      const outputDir = process.argv[4] || ".";
      const diffusionModelSrc = process.argv[5] || FLUX_2_KLEIN_4B_Q4_0;
      const llmModelSrc = process.argv[6] || QWEN3_4B_Q4_K_M;
      const vaeModelSrc = process.argv[7] || FLUX_2_KLEIN_4B_VAE;
      if (!inputPath) {
          console.error("❌ Error: input image path is required");
          console.error("Usage: bun run bare:example dist/examples/diffusion-flux2-klein-img2img.js <inputImage> [prompt] [outputDir] [diffusionModelSrc] [llmModelSrc] [vaeModelSrc]");
          process.exit(1);
      }
      try {
          console.log("Loading FLUX.2 [klein] split-layout model...");
          const modelId = await loadModel({
              modelSrc: diffusionModelSrc,
              modelType: "diffusion",
              modelConfig: {
                  device: "gpu",
                  threads: 4,
                  llmModelSrc,
                  vaeModelSrc,
                  prediction: "flux2_flow",
              },
              onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
          });
          console.log(`Model loaded: ${modelId}`);
          const init_image = new Uint8Array(fs.readFileSync(inputPath));
          console.log(`\nTransforming "${inputPath}" with prompt: "${prompt}"`);
          const { progressStream, outputs, stats } = diffusion({
              modelId,
              prompt,
              init_image,
              steps: 20,
              guidance: 3.5,
              cfg_scale: 1,
              seed: -1,
          });
          for await (const { step, totalSteps } of progressStream) {
              process.stdout.write(`\rStep ${step}/${totalSteps}`);
          }
          console.log();
          const buffers = await outputs;
          for (let i = 0; i < buffers.length; i++) {
              const outputPath = path.join(outputDir, `flux2_img2img_${i}.png`);
              fs.writeFileSync(outputPath, buffers[i]);
              console.log(`Saved: ${outputPath}`);
          }
          console.log("\nStats:", await stats);
          await unloadModel({ modelId, clearStorage: false });
          console.log("Done.");
          process.exit(0);
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/diffusion-flux2-klein-img2img.ts title="diffusion-flux2-klein-img2img.ts" lineNumbers
      import {
        loadModel,
        unloadModel,
        diffusion,
        FLUX_2_KLEIN_4B_Q4_0,
        FLUX_2_KLEIN_4B_VAE,
        QWEN3_4B_Q4_K_M,
      } from "@qvac/sdk";
      import fs from "fs";
      import path from "path";

      // img2img with FLUX.2 [klein] split-layout — uses in-context conditioning ("flux2_flow").

      const inputPath = process.argv[2];
      const prompt = process.argv[3] || "oil painting style, vibrant colors";
      const outputDir = process.argv[4] || ".";
      const diffusionModelSrc = process.argv[5] || FLUX_2_KLEIN_4B_Q4_0;
      const llmModelSrc = process.argv[6] || QWEN3_4B_Q4_K_M;
      const vaeModelSrc = process.argv[7] || FLUX_2_KLEIN_4B_VAE;

      if (!inputPath) {
        console.error("❌ Error: input image path is required");
        console.error(
          "Usage: bun run bare:example dist/examples/diffusion-flux2-klein-img2img.js <inputImage> [prompt] [outputDir] [diffusionModelSrc] [llmModelSrc] [vaeModelSrc]",
        );
        process.exit(1);
      }

      try {
        console.log("Loading FLUX.2 [klein] split-layout model...");
        const modelId = await loadModel({
          modelSrc: diffusionModelSrc,
          modelType: "diffusion",
          modelConfig: {
            device: "gpu",
            threads: 4,
            llmModelSrc,
            vaeModelSrc,
            prediction: "flux2_flow",
          },
          onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
        });
        console.log(`Model loaded: ${modelId}`);

        const init_image = new Uint8Array(fs.readFileSync(inputPath));
        console.log(`\nTransforming "${inputPath}" with prompt: "${prompt}"`);

        const { progressStream, outputs, stats } = diffusion({
          modelId,
          prompt,
          init_image,
          steps: 20,
          guidance: 3.5,
          cfg_scale: 1,
          seed: -1,
        });

        for await (const { step, totalSteps } of progressStream) {
          process.stdout.write(`\rStep ${step}/${totalSteps}`);
        }
        console.log();

        const buffers = await outputs;
        for (let i = 0; i < buffers.length; i++) {
          const outputPath = path.join(outputDir, `flux2_img2img_${i}.png`);
          fs.writeFileSync(outputPath, buffers[i]!);
          console.log(`Saved: ${outputPath}`);
        }

        console.log("\nStats:", await stats);
        await unloadModel({ modelId, clearStorage: false });
        console.log("Done.");
        process.exit(0);
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Upscaling

#### Post-processing

Pass `upscale` to `diffusion()` to upscale generated images in the same call. Requires `modelConfig.upscaler = { type: "esrgan", model_src, tile_size? }` at `loadModel()` time. Behavior depends on the value passed:

* `true` (or `{}` / `{ repeats: 1 }`): single pass at the model's native scale factor (e.g. `x4` for RealESRGAN\_x4plus).
* `{ repeats: N }`: N sequential passes — each pass multiplies the output dimensions by the model's scale factor (e.g. `repeats: 2` with an `x4` model → `x16`).
* When `batch_count > 1`, every output image is upscaled independently.

The following script loads SD 2.1 with an ESRGAN upscaler and generates both a single-pass (`x4`) and a two-pass (`x16`) upscale from the same prompt:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/diffusion-esrgan-upscale.js title="diffusion-esrgan-upscale.js" lineNumbers
      import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0, REALESRGAN_X4PLUS_ANIME_6B, } from "@qvac/sdk";
      import fs from "fs";
      import path from "path";
      // ESRGAN upscale example.
      //
      // Usage:
      //   bun run examples/diffusion-esrgan-upscale.ts [esrganSrc] [prompt] [outputDir]
      const esrganArg = process.argv[2];
      const promptArg = process.argv[3];
      const outputDirArg = process.argv[4];
      const esrganModelSrc = esrganArg ??
          REALESRGAN_X4PLUS_ANIME_6B;
      const prompt = promptArg ??
          "an illustrated red fox portrait, clean line art, soft watercolor background, detailed fur, crisp eyes";
      const negative_prompt = "blurry, low quality, watermark, text";
      const outputDir = outputDirArg ?? ".";
      const seed = 42;
      try {
          console.log("Loading SD 2.1 + ESRGAN upscaler...");
          const modelId = await loadModel({
              modelSrc: SD_V2_1_1B_Q8_0,
              modelType: "diffusion",
              modelConfig: {
                  prediction: "v",
                  upscaler: {
                      type: "esrgan",
                      model_src: esrganModelSrc,
                      tile_size: 128,
                  },
              },
              onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
          });
          console.log(`Model loaded: ${modelId}`);
          // Source size is intentionally small — each ESRGAN repeat multiplies dimensions.
          const baseParams = {
              modelId,
              prompt,
              negative_prompt,
              width: 128,
              height: 128,
              steps: 5,
              cfg_scale: 7.5,
              seed,
          };
          console.log(`\nGenerating ESRGAN x4 upscale: "${prompt}"`);
          const single = diffusion({ ...baseParams, upscale: true });
          for await (const { step, totalSteps } of single.progressStream) {
              process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`);
          }
          console.log();
          const singleBuffers = await single.outputs;
          for (let i = 0; i < singleBuffers.length; i++) {
              const out = path.join(outputDir, `sd2_esrgan_x4_seed${seed}_${i}.png`);
              fs.writeFileSync(out, singleBuffers[i]);
              console.log(`Saved: ${out}`);
          }
          console.log("Stats:", await single.stats);
          console.log("\nGenerating ESRGAN two-pass x16 upscale...");
          const twoPass = diffusion({ ...baseParams, upscale: { repeats: 2 } });
          for await (const { step, totalSteps } of twoPass.progressStream) {
              process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`);
          }
          console.log();
          const twoPassBuffers = await twoPass.outputs;
          for (let i = 0; i < twoPassBuffers.length; i++) {
              const out = path.join(outputDir, `sd2_esrgan_x16_seed${seed}_${i}.png`);
              fs.writeFileSync(out, twoPassBuffers[i]);
              console.log(`Saved: ${out}`);
          }
          console.log("Stats:", await twoPass.stats);
          await unloadModel({ modelId, clearStorage: false });
          console.log("Done.");
          process.exit(0);
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/diffusion-esrgan-upscale.ts title="diffusion-esrgan-upscale.ts" lineNumbers
      import {
        loadModel,
        unloadModel,
        diffusion,
        SD_V2_1_1B_Q8_0,
        REALESRGAN_X4PLUS_ANIME_6B,
      } from "@qvac/sdk";
      import fs from "fs";
      import path from "path";

      // ESRGAN upscale example.
      //
      // Usage:
      //   bun run examples/diffusion-esrgan-upscale.ts [esrganSrc] [prompt] [outputDir]

      const esrganArg: string | undefined = process.argv[2];
      const promptArg: string | undefined = process.argv[3];
      const outputDirArg: string | undefined = process.argv[4];

      const esrganModelSrc =
        esrganArg ??
        REALESRGAN_X4PLUS_ANIME_6B;

      const prompt =
        promptArg ??
        "an illustrated red fox portrait, clean line art, soft watercolor background, detailed fur, crisp eyes";
      const negative_prompt = "blurry, low quality, watermark, text";
      const outputDir = outputDirArg ?? ".";
      const seed = 42;

      try {
        console.log("Loading SD 2.1 + ESRGAN upscaler...");
        const modelId = await loadModel({
          modelSrc: SD_V2_1_1B_Q8_0,
          modelType: "diffusion",
          modelConfig: {
            prediction: "v",
            upscaler: {
              type: "esrgan",
              model_src: esrganModelSrc,
              tile_size: 128,
            },
          },
          onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
        });
        console.log(`Model loaded: ${modelId}`);

        // Source size is intentionally small — each ESRGAN repeat multiplies dimensions.
        const baseParams = {
          modelId,
          prompt,
          negative_prompt,
          width: 128,
          height: 128,
          steps: 5,
          cfg_scale: 7.5,
          seed,
        };

        console.log(`\nGenerating ESRGAN x4 upscale: "${prompt}"`);
        const single = diffusion({ ...baseParams, upscale: true });
        for await (const { step, totalSteps } of single.progressStream) {
          process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`);
        }
        console.log();

        const singleBuffers = await single.outputs;
        for (let i = 0; i < singleBuffers.length; i++) {
          const out = path.join(outputDir, `sd2_esrgan_x4_seed${seed}_${i}.png`);
          fs.writeFileSync(out, singleBuffers[i]!);
          console.log(`Saved: ${out}`);
        }
        console.log("Stats:", await single.stats);

        console.log("\nGenerating ESRGAN two-pass x16 upscale...");
        const twoPass = diffusion({ ...baseParams, upscale: { repeats: 2 } });
        for await (const { step, totalSteps } of twoPass.progressStream) {
          process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`);
        }
        console.log();

        const twoPassBuffers = await twoPass.outputs;
        for (let i = 0; i < twoPassBuffers.length; i++) {
          const out = path.join(outputDir, `sd2_esrgan_x16_seed${seed}_${i}.png`);
          fs.writeFileSync(out, twoPassBuffers[i]!);
          console.log(`Saved: ${out}`);
        }
        console.log("Stats:", await twoPass.stats);

        await unloadModel({ modelId, clearStorage: false });
        console.log("Done.");
        process.exit(0);
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

#### Standalone

The SDK also exposes `upscale()` — a standalone path that upscales any PNG/JPEG image without running diffusion. Load the ESRGAN model directly with `modelConfig.mode: "upscale"`:

```ts
const modelId = await loadModel(REALESRGAN_X4PLUS_ANIME_6B, {
  modelType: "diffusion",
  modelConfig: { mode: "upscale", upscaler: { tile_size: 128 } },
});
const { outputs } = upscale({ modelId, image: pngBytes, repeats: 2 });
const [upscaledPng] = await outputs;
```

`tile_size` defaults to 128. Increase it to reduce tile seams on large inputs, at the cost of more memory per pass.

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/quickstart).
</Callout>
