Image generation
Text-to-image and image-to-image generation using a customized Diffusion engine.
Overview
Image generation runs on a customized Diffusion engine (qvac-ext-stable-diffusion.cpp). Load a supported model using modelType: "diffusion". Then, provide a text prompt describing the image to generate.
For image-to-image, also pass init_image (a Uint8Array of PNG or JPEG bytes) — the model transforms the input guided by the prompt instead of starting from noise. diffusion() returns one or more PNG images as Uint8Array buffers. Use progressStream to track generation progress step-by-step.
For higher-resolution outputs, you can chain a Real-ESRGAN upscaler. Either attach it as a post-processing step at load time and trigger it per generation via diffusion({ upscale }), or load an ESRGAN model standalone with modelConfig.mode: "upscale" and call upscale() on any PNG/JPEG buffer — generated or not. Both paths return PNG Uint8Array buffers and accept a repeats option that compounds the scale factor across sequential passes.
Functions
Use the following sequence of function calls:
loadModel()diffusion()orupscale()unloadModel()
For how to use each function, see SDK — API reference.
Models
Supported model families and their file layouts:
- FLUX.2-klein: split layout — diffusion model
*.gguf+ LLM text encoder*.gguf(viallmModelSrc) + VAE*.safetensors(viavaeModelSrc). - SD1.x, SD2.x: single all-in-one
*.gguffile. No companion files needed. - SDXL, SD3: may require separate CLIP/T5 text encoder files (
clipLModelSrc,clipGModelSrc,t5XxlModelSrc) inmodelConfigdepending on the model variant. - ESRGAN: post-generation upscaler,
*.pthformat. Two ways to load:- Paired with a diffusion model — set
modelConfig.upscaler.model_srcatloadModel()time alongside the diffusionmodelSrc. - Standalone — pass the ESRGAN model as the top-level
modelSrcwithmodelConfig.mode: "upscale".
- Paired with a diffusion model — set
For models available as constants, see SDK — Models.
On upscaling: ESRGAN can be used standalone via modelConfig.mode: "upscale" + upscale(), or optionally paired with any supported diffusion family to upscale generated images in the same call via diffusion({ upscale }). Available constants: REALESRGAN_X4PLUS_ANIME_6B, REALESRGAN_X4PLUS.
Examples
FLUX.2-klein
The following script shows text-to-image generation using FLUX.2-klein with its split-layout model (separate diffusion model, LLM text encoder, and VAE):
import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M } from "@qvac/sdk";
import fs from "fs";
import path from "path";
// FLUX.2 [klein] uses a split-layout: separate diffusion model + LLM text encoder + VAE
const diffusionModelSrc = process.argv[2] || FLUX_2_KLEIN_4B_Q4_0;
const llmModelSrc = process.argv[3] || QWEN3_4B_Q4_K_M;
const vaeModelSrc = process.argv[4] || FLUX_2_KLEIN_4B_VAE;
const prompt = process.argv[5] || "a futuristic city at sunset, photorealistic";
const outputDir = process.argv[6] || ".";
console.log("Loading FLUX.2 [klein] split-layout model...");
const modelId = await loadModel({
modelSrc: diffusionModelSrc,
modelType: "diffusion",
modelConfig: {
device: "gpu",
threads: 4,
llmModelSrc,
vaeModelSrc,
},
onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
});
console.log(`Model loaded: ${modelId}`);
console.log(`\nGenerating: "${prompt}"`);
const { progressStream, outputs, stats } = diffusion({
modelId,
prompt,
width: 512,
height: 512,
steps: 20,
guidance: 3.5,
cfg_scale: 1,
seed: -1,
});
for await (const { step, totalSteps } of progressStream) {
process.stdout.write(`\rStep ${step}/${totalSteps}`);
}
console.log();
const buffers = await outputs;
for (let i = 0; i < buffers.length; i++) {
const outputPath = path.join(outputDir, `flux2_${i}.png`);
fs.writeFileSync(outputPath, buffers[i]);
console.log(`Saved: ${outputPath}`);
}
console.log("\nStats:", await stats);
await unloadModel({ modelId, clearStorage: false });
console.log("Done.");
process.exit(0);Stable Diffusion
The following script shows a minimal text-to-image generation example using a single all-in-one SD 2.1 model:
import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";
import fs from "fs";
// Minimal diffusion example — single GGUF model, no companion files needed.
// Works with SD 1.x / 2.x all-in-one models.
const modelSrc = process.argv[2] || SD_V2_1_1B_Q8_0;
const prompt = process.argv[3] || "a photo of a cat sitting on a windowsill";
const modelId = await loadModel({
modelSrc,
modelType: "diffusion",
modelConfig: { prediction: "v" },
});
const { outputs } = diffusion({ modelId, prompt });
const buffers = await outputs;
fs.writeFileSync("output.png", buffers[0]);
console.log("Saved: output.png");
await unloadModel({ modelId, clearStorage: false });
process.exit(0);Image-to-image
Pass init_image to transform an existing image guided by a text prompt. Behavior depends on the model family:
- FLUX.2: in-context conditioning. Requires
prediction: "flux2_flow"inmodelConfigatloadModel()time;strengthis ignored on this path. - SD / SDXL / SD3: SDEdit-style. Use
strengthto control how much the source is preserved (0= keep source,1= ignore source).
The following script loads FLUX.2-klein in split-layout and transforms an input image using in-context conditioning (prediction: "flux2_flow"):
import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M, } from "@qvac/sdk";
import fs from "fs";
import path from "path";
// img2img with FLUX.2 [klein] split-layout — uses in-context conditioning ("flux2_flow").
const inputPath = process.argv[2];
const prompt = process.argv[3] || "oil painting style, vibrant colors";
const outputDir = process.argv[4] || ".";
const diffusionModelSrc = process.argv[5] || FLUX_2_KLEIN_4B_Q4_0;
const llmModelSrc = process.argv[6] || QWEN3_4B_Q4_K_M;
const vaeModelSrc = process.argv[7] || FLUX_2_KLEIN_4B_VAE;
if (!inputPath) {
console.error("❌ Error: input image path is required");
console.error("Usage: bun run bare:example dist/examples/diffusion-flux2-klein-img2img.js <inputImage> [prompt] [outputDir] [diffusionModelSrc] [llmModelSrc] [vaeModelSrc]");
process.exit(1);
}
try {
console.log("Loading FLUX.2 [klein] split-layout model...");
const modelId = await loadModel({
modelSrc: diffusionModelSrc,
modelType: "diffusion",
modelConfig: {
device: "gpu",
threads: 4,
llmModelSrc,
vaeModelSrc,
prediction: "flux2_flow",
},
onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
});
console.log(`Model loaded: ${modelId}`);
const init_image = new Uint8Array(fs.readFileSync(inputPath));
console.log(`\nTransforming "${inputPath}" with prompt: "${prompt}"`);
const { progressStream, outputs, stats } = diffusion({
modelId,
prompt,
init_image,
steps: 20,
guidance: 3.5,
cfg_scale: 1,
seed: -1,
});
for await (const { step, totalSteps } of progressStream) {
process.stdout.write(`\rStep ${step}/${totalSteps}`);
}
console.log();
const buffers = await outputs;
for (let i = 0; i < buffers.length; i++) {
const outputPath = path.join(outputDir, `flux2_img2img_${i}.png`);
fs.writeFileSync(outputPath, buffers[i]);
console.log(`Saved: ${outputPath}`);
}
console.log("\nStats:", await stats);
await unloadModel({ modelId, clearStorage: false });
console.log("Done.");
process.exit(0);
}
catch (error) {
console.error("❌ Error:", error);
process.exit(1);
}Upscaling
Post-processing
Pass upscale to diffusion() to upscale generated images in the same call. Requires modelConfig.upscaler = { type: "esrgan", model_src, tile_size? } at loadModel() time. Behavior depends on the value passed:
true(or{}/{ repeats: 1 }): single pass at the model's native scale factor (e.g.x4for RealESRGAN_x4plus).{ repeats: N }: N sequential passes — each pass multiplies the output dimensions by the model's scale factor (e.g.repeats: 2with anx4model →x16).- When
batch_count > 1, every output image is upscaled independently.
The following script loads SD 2.1 with an ESRGAN upscaler and generates both a single-pass (x4) and a two-pass (x16) upscale from the same prompt:
import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0, REALESRGAN_X4PLUS_ANIME_6B, } from "@qvac/sdk";
import fs from "fs";
import path from "path";
// ESRGAN upscale example.
//
// Usage:
// bun run examples/diffusion-esrgan-upscale.ts [esrganSrc] [prompt] [outputDir]
const esrganArg = process.argv[2];
const promptArg = process.argv[3];
const outputDirArg = process.argv[4];
const esrganModelSrc = esrganArg ??
REALESRGAN_X4PLUS_ANIME_6B;
const prompt = promptArg ??
"an illustrated red fox portrait, clean line art, soft watercolor background, detailed fur, crisp eyes";
const negative_prompt = "blurry, low quality, watermark, text";
const outputDir = outputDirArg ?? ".";
const seed = 42;
try {
console.log("Loading SD 2.1 + ESRGAN upscaler...");
const modelId = await loadModel({
modelSrc: SD_V2_1_1B_Q8_0,
modelType: "diffusion",
modelConfig: {
prediction: "v",
upscaler: {
type: "esrgan",
model_src: esrganModelSrc,
tile_size: 128,
},
},
onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
});
console.log(`Model loaded: ${modelId}`);
// Source size is intentionally small — each ESRGAN repeat multiplies dimensions.
const baseParams = {
modelId,
prompt,
negative_prompt,
width: 128,
height: 128,
steps: 5,
cfg_scale: 7.5,
seed,
};
console.log(`\nGenerating ESRGAN x4 upscale: "${prompt}"`);
const single = diffusion({ ...baseParams, upscale: true });
for await (const { step, totalSteps } of single.progressStream) {
process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`);
}
console.log();
const singleBuffers = await single.outputs;
for (let i = 0; i < singleBuffers.length; i++) {
const out = path.join(outputDir, `sd2_esrgan_x4_seed${seed}_${i}.png`);
fs.writeFileSync(out, singleBuffers[i]);
console.log(`Saved: ${out}`);
}
console.log("Stats:", await single.stats);
console.log("\nGenerating ESRGAN two-pass x16 upscale...");
const twoPass = diffusion({ ...baseParams, upscale: { repeats: 2 } });
for await (const { step, totalSteps } of twoPass.progressStream) {
process.stdout.write(`\rStep ${step}/${totalSteps}\x1b[K`);
}
console.log();
const twoPassBuffers = await twoPass.outputs;
for (let i = 0; i < twoPassBuffers.length; i++) {
const out = path.join(outputDir, `sd2_esrgan_x16_seed${seed}_${i}.png`);
fs.writeFileSync(out, twoPassBuffers[i]);
console.log(`Saved: ${out}`);
}
console.log("Stats:", await twoPass.stats);
await unloadModel({ modelId, clearStorage: false });
console.log("Done.");
process.exit(0);
}
catch (error) {
console.error("❌ Error:", error);
process.exit(1);
}Standalone
The SDK also exposes upscale() — a standalone path that upscales any PNG/JPEG image without running diffusion. Load the ESRGAN model directly with modelConfig.mode: "upscale":
const modelId = await loadModel(REALESRGAN_X4PLUS_ANIME_6B, {
modelType: "diffusion",
modelConfig: { mode: "upscale", upscaler: { tile_size: 128 } },
});
const { outputs } = upscale({ modelId, image: pngBytes, repeats: 2 });
const [upscaledPng] = await outputs;tile_size defaults to 128. Increase it to reduce tile seams on large inputs, at the cost of more memory per pass.
Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.