QVAC Logo

@qvac/diffusion-cpp

Text-to-image generation from text prompts.

Overview

Bare module that adds support for text-to-image generation in QVAC using qvac-ext-stable-diffusion.cpp as the inference engine.

Models

Supports FLUX.2-klein, SD1.x, SD2.x, SDXL, and SD3 model families.

FLUX.2-klein

Three separate components are required:

  • Diffusion model (flux-2-klein-4b-Q8_0.gguf) — the main image transformer.
  • Text encoder (Qwen3-4B-Q4_K_M.gguf) — Qwen3 4B in standard GGML Q4_K_M format.
  • VAE (flux2-vae.safetensors) — standard safetensors format, compatible as-is.

Model file reference — FLUX.2-klein 4B

RoleFileSource
Diffusion modelflux-2-klein-4b-Q8_0.ggufleejet/FLUX.2-klein-4B-GGUF
Text encoderQwen3-4B-Q4_K_M.ggufunsloth/Qwen3-4B-GGUF
VAEflux2-vae.safetensorsblack-forest-labs/FLUX.2-klein-4B

Stable Diffusion

  • Stable Diffusion 1.x / 2.x — all-in-one checkpoint as a single *.gguf file
  • Stable Diffusion XL — all-in-one *.gguf or split CLIP encoders
  • Stable Diffusion 3 — safetensors with separate CLIP encoders

Requirements

  • Memory: 16 GB unified memory on Apple Silicon, or 8 GB VRAM on GPU.
  • Bare \geq v1.24

Installation

npm i @qvac/diffusion-cpp

Quickstart

If you don't have Bare runtime, install it:

npm i -g bare

Create a new project:

mkdir qvac-diffusion-quickstart
cd qvac-diffusion-quickstart
npm init -y

Install dependencies:

npm i @qvac/diffusion-cpp bare-path bare-process bare-fs

Download the FLUX.2 [klein] 4B model files (~6.8 GB total):

mkdir -p models

curl -L -C - -o models/flux-2-klein-4b-Q8_0.gguf \
  https://huggingface.co/leejet/FLUX.2-klein-4B-GGUF/resolve/main/flux-2-klein-4b-Q8_0.gguf

curl -L -C - -o models/Qwen3-4B-Q4_K_M.gguf \
  https://huggingface.co/unsloth/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q4_K_M.gguf

curl -L -C - -o models/flux2-vae.safetensors \
  https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/resolve/main/vae/diffusion_pytorch_model.safetensors

Create index.js:

index.js
'use strict'

const path = require('bare-path')
const fs = require('bare-fs')
const process = require('bare-process')
const ImgStableDiffusion = require('@qvac/diffusion-cpp')

async function main () {
  const MODELS_DIR = path.resolve(__dirname, './models')

  const args = {
    logger: console,
    files: {
      model: path.join(MODELS_DIR, 'flux-2-klein-4b-Q8_0.gguf'),
      llm:   path.join(MODELS_DIR, 'Qwen3-4B-Q4_K_M.gguf'),
      vae:   path.join(MODELS_DIR, 'flux2-vae.safetensors')
    },
    config: { threads: 8 }
  }

  const model = new ImgStableDiffusion(args)
  await model.load()

  try {
    const images = []

    const response = await model.run({
      prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
      steps: 20,
      width: 512,
      height: 512,
      guidance: 3.5,
      seed: 42
    })

    await response
      .onUpdate(data => {
        if (data instanceof Uint8Array) {
          images.push(data)
        } else if (typeof data === 'string') {
          try {
            const tick = JSON.parse(data)
            if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
          } catch (_) {}
        }
      })
      .await()

    console.log('\n')

    if (images.length > 0) {
      fs.writeFileSync('output.png', images[0])
      console.log('Saved → output.png')
    }
  } catch (error) {
    console.error('Error occurred:', error.message || error)
  } finally {
    await model.unload()
  }
}

main().catch(error => {
  console.error('Fatal error:', error.message)
  process.exit(1)
})

Run index.js:

bare index.js

Usage

1. Import the model class

const ImgStableDiffusion = require('@qvac/diffusion-cpp')

2. Create the args object

const path = require('bare-path')

const MODELS_DIR = path.resolve(__dirname, './models')
const args = {
  logger: console,
  files: {
    model: path.join(MODELS_DIR, 'flux-2-klein-4b-Q8_0.gguf'),
    llm:   path.join(MODELS_DIR, 'Qwen3-4B-Q4_K_M.gguf'),
    vae:   path.join(MODELS_DIR, 'flux2-vae.safetensors')
  },
  config: { threads: 8 },
  opts: { stats: true }
}
PropertyRequiredDescription
filesObject of absolute paths to model files (see below)
files.modelAbsolute path to diffusion model file (diffusion-only GGUF for FLUX.2; all-in-one for SD1.x/2.x)
files.clipLAbsolute path to separate CLIP-L text encoder (SD3)
files.clipGAbsolute path to separate CLIP-G text encoder (SDXL / SD3)
files.t5XxlAbsolute path to separate T5-XXL text encoder (SD3)
files.llmAbsolute path to Qwen3 LLM text encoder (FLUX.2 [klein])
files.vaeAbsolute path to separate VAE file
files.esrganAbsolute path to ESRGAN upscaler model for post-generation upscale
configNative backend configuration object (see next section)
loggerLogger instance for JS wrapper logs (e.g. console)
optsAdditional options (e.g. { stats: true })

Native C++ logs are process-global. Configure native log routing once with require('@qvac/diffusion-cpp/addonLogging').setLogger(...).

3. Configure the native backend (args.config)

config is a field on the args object built in step 2 — there is no separate constructor argument. The native backend reads it during load().

args.config = {
  threads: 8  // CPU threads for tensor operations (Metal handles GPU automatically)
}

Config values are coerced to strings internally. Generation parameters (prompt, steps, seed, etc.) are JSON-serialized with their native types preserved.

ParameterTypeDefaultDescription
threadsnumberautoNumber of CPU threads for model loading and CPU ops
type'f32' | 'f16' | 'q4_0' | 'q8_0' | …autoOverride weight quantisation type
rng'cpu' | 'cuda' | 'std_default''cuda'RNG backend ('cuda' = philox RNG — not GPU-specific despite the name; recommended)
clip_on_cputrue | falsefalseForce CLIP encoder to run on CPU
vae_on_cputrue | falsefalseForce VAE to run on CPU
flash_attntrue | falsefalseEnable flash attention (reduces memory)
upscaler_tile_sizenumber128ESRGAN upscaler tile size

4. Create a model instance

const model = new ImgStableDiffusion(args)

The constructor takes a single object containing files, config, logger, and opts. It stores configuration only — no memory is allocated yet.

5. Load the Model

await model.load()

This creates the native sd_ctx_t and loads all weights into memory. It can take 10–30 seconds depending on disk speed and model size. All model files must be passed as absolute paths via the files object.

6. Run Inference

The primary API. Returns a QvacResponse that streams step-progress ticks and the final PNG:

const images = []

const response = await model.run({
  prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
  steps: 20,
  width: 512,
  height: 512,
  guidance: 3.5,
  seed: 42
})

await response
  .onUpdate(data => {
    if (data instanceof Uint8Array) {
      images.push(data)
    } else if (typeof data === 'string') {
      try {
        const tick = JSON.parse(data)
        if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
      } catch (_) {}
    }
  })
  .await()

require('bare-fs').writeFileSync('output.png', images[0])

Generation parameters:

ParameterTypeDefaultDescription
promptstringText prompt
negative_promptstring''Things to avoid in the output
widthnumber512Output width in pixels (multiple of 8)
heightnumber512Output height in pixels (multiple of 8)
stepsnumber20Number of diffusion steps
guidancenumber3.5Distilled guidance scale (FLUX.2)
cfg_scalenumber7.0Classifier-free guidance scale (SD1.x / SD2.x)
sampling_methodstringautoSampler name; auto-selects euler for FLUX.2, euler_a for SD1.x
schedulerstringautoScheduler; auto-selected per model family
seednumber-1Random seed (-1 for random)
batch_countnumber1Number of images to generate
vae_tilingbooleanfalseEnable VAE tiling (required for large images on 16 GB)
cache_presetstringStep-caching preset: slow, medium, fast, ultra
upscaleboolean | { repeats?: number }falsePost-generation ESRGAN upscale. Requires files.esrgan; repeats defaults to 1

Do not set sampling_method: 'euler_a' for FLUX.2 models — it will produce random noise. Leave the field unset to let the library auto-select euler for flow-matching models.

7. Release Resources

await model.unload()

unload() calls free_sd_ctx which releases all GPU and CPU memory. The JS object can be safely garbage collected afterwards.

More resources

Package at npm

On this page