@qvac/diffusion-cpp

Overview

Bare module that adds support for text-to-image generation in QVAC using qvac-ext-stable-diffusion.cpp as the inference engine.

Models

Supports FLUX.2-klein, SD1.x, SD2.x, SDXL, and SD3 model families.

FLUX.2-klein

Three separate components are required:

Diffusion model (flux-2-klein-4b-Q8_0.gguf) — the main image transformer.
Text encoder (Qwen3-4B-Q4_K_M.gguf) — Qwen3 4B in standard GGML Q4_K_M format.
VAE (flux2-vae.safetensors) — standard safetensors format, compatible as-is.

Model file reference — FLUX.2-klein 4B

Role	File	Source
Diffusion model	`flux-2-klein-4b-Q8_0.gguf`	leejet/FLUX.2-klein-4B-GGUF
Text encoder	`Qwen3-4B-Q4_K_M.gguf`	unsloth/Qwen3-4B-GGUF
VAE	`flux2-vae.safetensors`	black-forest-labs/FLUX.2-klein-4B

Stable Diffusion

Stable Diffusion 1.x / 2.x — all-in-one checkpoint as a single *.gguf file
Stable Diffusion XL — all-in-one *.gguf or split CLIP encoders
Stable Diffusion 3 — safetensors with separate CLIP encoders

Requirements

Memory: 16 GB unified memory on Apple Silicon, or 8 GB VRAM on GPU.
Bare >= v1.24

Installation

npm i @qvac/diffusion-cpp

Quickstart

If you don't have Bare runtime, install it:

npm i -g bare

Create a new project:

mkdir qvac-diffusion-quickstart
cd qvac-diffusion-quickstart
npm init -y

Install dependencies:

npm i @qvac/diffusion-cpp bare-path bare-process bare-fs

Download the FLUX.2 [klein] 4B model files (~6.8 GB total):

mkdir -p models

curl -L -C - -o models/flux-2-klein-4b-Q8_0.gguf \
  https://huggingface.co/leejet/FLUX.2-klein-4B-GGUF/resolve/main/flux-2-klein-4b-Q8_0.gguf

curl -L -C - -o models/Qwen3-4B-Q4_K_M.gguf \
  https://huggingface.co/unsloth/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q4_K_M.gguf

curl -L -C - -o models/flux2-vae.safetensors \
  https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/resolve/main/vae/diffusion_pytorch_model.safetensors

Create index.js:

index.js

'use strict'

const path = require('bare-path')
const fs = require('bare-fs')
const process = require('bare-process')
const ImgStableDiffusion = require('@qvac/diffusion-cpp')

async function main () {
  const MODELS_DIR = path.resolve(__dirname, './models')

  const args = {
    logger: console,
    files: {
      model: path.join(MODELS_DIR, 'flux-2-klein-4b-Q8_0.gguf'),
      llm:   path.join(MODELS_DIR, 'Qwen3-4B-Q4_K_M.gguf'),
      vae:   path.join(MODELS_DIR, 'flux2-vae.safetensors')
    },
    config: { threads: 8 }
  }

  const model = new ImgStableDiffusion(args)
  await model.load()

  try {
    const images = []

    const response = await model.run({
      prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
      steps: 20,
      width: 512,
      height: 512,
      guidance: 3.5,
      seed: 42
    })

    await response
      .onUpdate(data => {
        if (data instanceof Uint8Array) {
          images.push(data)
        } else if (typeof data === 'string') {
          try {
            const tick = JSON.parse(data)
            if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
          } catch (_) {}
        }
      })
      .await()

    console.log('\n')

    if (images.length > 0) {
      fs.writeFileSync('output.png', images[0])
      console.log('Saved → output.png')
    }
  } catch (error) {
    console.error('Error occurred:', error.message || error)
  } finally {
    await model.unload()
  }
}

main().catch(error => {
  console.error('Fatal error:', error.message)
  process.exit(1)
})

Run index.js:

bare index.js

Usage

1. Import the model class

const ImgStableDiffusion = require('@qvac/diffusion-cpp')

2. Create the `args` object

const path = require('bare-path')

const MODELS_DIR = path.resolve(__dirname, './models')
const args = {
  logger: console,
  files: {
    model: path.join(MODELS_DIR, 'flux-2-klein-4b-Q8_0.gguf'),
    llm:   path.join(MODELS_DIR, 'Qwen3-4B-Q4_K_M.gguf'),
    vae:   path.join(MODELS_DIR, 'flux2-vae.safetensors')
  },
  config: { threads: 8 },
  opts: { stats: true }
}

Property	Required	Description
`files`	✅	Object of absolute paths to model files (see below)
`files.model`	✅	Absolute path to diffusion model file (diffusion-only GGUF for FLUX.2; all-in-one for SD1.x/2.x)
`files.clipL`	—	Absolute path to separate CLIP-L text encoder (SD3)
`files.clipG`	—	Absolute path to separate CLIP-G text encoder (SDXL / SD3)
`files.t5Xxl`	—	Absolute path to separate T5-XXL text encoder (SD3)
`files.llm`	—	Absolute path to Qwen3 LLM text encoder (FLUX.2 [klein])
`files.vae`	—	Absolute path to separate VAE file
`files.esrgan`	—	Absolute path to ESRGAN upscaler model for post-generation upscale
`config`	—	Native backend configuration object (see next section)
`logger`	—	Logger instance for JS wrapper logs (e.g. `console`)
`opts`	—	Additional options (e.g. `{ stats: true }`)

Native C++ logs are process-global. Configure native log routing once with require('@qvac/diffusion-cpp/addonLogging').setLogger(...).

3. Configure the native backend (`args.config`)

config is a field on the args object built in step 2 — there is no separate constructor argument. The native backend reads it during load().

args.config = {
  threads: 8  // CPU threads for tensor operations (Metal handles GPU automatically)
}

Config values are coerced to strings internally. Generation parameters (prompt, steps, seed, etc.) are JSON-serialized with their native types preserved.

Parameter	Type	Default	Description
`threads`	number	auto	Number of CPU threads for model loading and CPU ops
`type`	`'f32'` \| `'f16'` \| `'q4_0'` \| `'q8_0'` \| …	auto	Override weight quantisation type
`rng`	`'cpu'` \| `'cuda'` \| `'std_default'`	`'cuda'`	RNG backend (`'cuda'` = philox RNG — not GPU-specific despite the name; recommended)
`clip_on_cpu`	`true` \| `false`	`false`	Force CLIP encoder to run on CPU
`vae_on_cpu`	`true` \| `false`	`false`	Force VAE to run on CPU
`flash_attn`	`true` \| `false`	`false`	Enable flash attention (reduces memory)
`upscaler_tile_size`	number	`128`	ESRGAN upscaler tile size

4. Create a model instance

const model = new ImgStableDiffusion(args)

The constructor takes a single object containing files, config, logger, and opts. It stores configuration only — no memory is allocated yet.

5. Load the Model

await model.load()

This creates the native sd_ctx_t and loads all weights into memory. It can take 10–30 seconds depending on disk speed and model size. All model files must be passed as absolute paths via the files object.

6. Run Inference

The primary API. Returns a QvacResponse that streams step-progress ticks and the final PNG:

const images = []

const response = await model.run({
  prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
  steps: 20,
  width: 512,
  height: 512,
  guidance: 3.5,
  seed: 42
})

await response
  .onUpdate(data => {
    if (data instanceof Uint8Array) {
      images.push(data)
    } else if (typeof data === 'string') {
      try {
        const tick = JSON.parse(data)
        if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
      } catch (_) {}
    }
  })
  .await()

require('bare-fs').writeFileSync('output.png', images[0])

Generation parameters:

Parameter	Type	Default	Description
`prompt`	string	—	Text prompt
`negative_prompt`	string	`''`	Things to avoid in the output
`width`	number	`512`	Output width in pixels (multiple of 8)
`height`	number	`512`	Output height in pixels (multiple of 8)
`steps`	number	`20`	Number of diffusion steps
`guidance`	number	`3.5`	Distilled guidance scale (FLUX.2)
`cfg_scale`	number	`7.0`	Classifier-free guidance scale (SD1.x / SD2.x)
`sampling_method`	string	auto	Sampler name; auto-selects `euler` for FLUX.2, `euler_a` for SD1.x
`scheduler`	string	auto	Scheduler; auto-selected per model family
`seed`	number	`-1`	Random seed (-1 for random)
`batch_count`	number	`1`	Number of images to generate
`vae_tiling`	boolean	`false`	Enable VAE tiling (required for large images on 16 GB)
`cache_preset`	string	—	Step-caching preset: `slow`, `medium`, `fast`, `ultra`
`upscale`	boolean \| `{ repeats?: number }`	`false`	Post-generation ESRGAN upscale. Requires `files.esrgan`; `repeats` defaults to `1`

Do not set sampling_method: 'euler_a' for FLUX.2 models — it will produce random noise. Leave the field unset to let the library auto-select euler for flow-matching models.

7. Release Resources

await model.unload()

unload() calls free_sd_ctx which releases all GPU and CPU memory. The JS object can be safely garbage collected afterwards.

More resources

Package at npm

@qvac/diffusion-cpp

On this page