# @qvac/diffusion-cpp (/addons/diffusion-cpp)


## Overview

[Bare module](https://bare.pears.com) that adds support for text-to-image generation in QVAC using [`qvac-ext-stable-diffusion.cpp`](https://github.com/tetherto/qvac-ext-stable-diffusion.cpp) as the inference engine.

## Models

Supports FLUX.2-klein, SD1.x, SD2.x, SDXL, and SD3 model families.

### FLUX.2-klein

Three separate components are required:

* **Diffusion model** (`flux-2-klein-4b-Q8_0.gguf`) — the main image transformer.
* **Text encoder** (`Qwen3-4B-Q4_K_M.gguf`) — Qwen3 4B in standard GGML Q4\_K\_M format.
* **VAE** (`flux2-vae.safetensors`) — standard safetensors format, compatible as-is.

### Model file reference — FLUX.2-klein 4B

| Role            | File                        | Source                                                                                                                                          |
| --------------- | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| Diffusion model | `flux-2-klein-4b-Q8_0.gguf` | [leejet/FLUX.2-klein-4B-GGUF](https://huggingface.co/leejet/FLUX.2-klein-4B-GGUF)                                                               |
| Text encoder    | `Qwen3-4B-Q4_K_M.gguf`      | [unsloth/Qwen3-4B-GGUF](https://huggingface.co/unsloth/Qwen3-4B-GGUF)                                                                           |
| VAE             | `flux2-vae.safetensors`     | [black-forest-labs/FLUX.2-klein-4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/blob/main/vae/diffusion_pytorch_model.safetensors) |

### Stable Diffusion

* **Stable Diffusion 1.x / 2.x** — all-in-one checkpoint as a single `*.gguf` file
* **Stable Diffusion XL** — all-in-one `*.gguf` or split CLIP encoders
* **Stable Diffusion 3** — safetensors with separate CLIP encoders

## Requirements

* Memory: 16 GB unified memory on Apple Silicon, or 8 GB VRAM on GPU.
* Bare $\geq$ v1.24

## Installation

```bash
npm i @qvac/diffusion-cpp
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-diffusion-quickstart
    cd qvac-diffusion-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/diffusion-cpp bare-path bare-process bare-fs
    ```
  </Step>

  <Step>
    Download the FLUX.2 \[klein] 4B model files (\~6.8 GB total):

    ```bash
    mkdir -p models

    curl -L -C - -o models/flux-2-klein-4b-Q8_0.gguf \
      https://huggingface.co/leejet/FLUX.2-klein-4B-GGUF/resolve/main/flux-2-klein-4b-Q8_0.gguf

    curl -L -C - -o models/Qwen3-4B-Q4_K_M.gguf \
      https://huggingface.co/unsloth/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q4_K_M.gguf

    curl -L -C - -o models/flux2-vae.safetensors \
      https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/resolve/main/vae/diffusion_pytorch_model.safetensors
    ```
  </Step>

  <Step>
    Create `index.js`:
  </Step>

  <WrapCode>
    ```js title="index.js" lineNumbers
    'use strict'

    const path = require('bare-path')
    const fs = require('bare-fs')
    const process = require('bare-process')
    const ImgStableDiffusion = require('@qvac/diffusion-cpp')

    async function main () {
      const MODELS_DIR = path.resolve(__dirname, './models')

      const args = {
        logger: console,
        files: {
          model: path.join(MODELS_DIR, 'flux-2-klein-4b-Q8_0.gguf'),
          llm:   path.join(MODELS_DIR, 'Qwen3-4B-Q4_K_M.gguf'),
          vae:   path.join(MODELS_DIR, 'flux2-vae.safetensors')
        },
        config: { threads: 8 }
      }

      const model = new ImgStableDiffusion(args)
      await model.load()

      try {
        const images = []

        const response = await model.run({
          prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
          steps: 20,
          width: 512,
          height: 512,
          guidance: 3.5,
          seed: 42
        })

        await response
          .onUpdate(data => {
            if (data instanceof Uint8Array) {
              images.push(data)
            } else if (typeof data === 'string') {
              try {
                const tick = JSON.parse(data)
                if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
              } catch (_) {}
            }
          })
          .await()

        console.log('\n')

        if (images.length > 0) {
          fs.writeFileSync('output.png', images[0])
          console.log('Saved → output.png')
        }
      } catch (error) {
        console.error('Error occurred:', error.message || error)
      } finally {
        await model.unload()
      }
    }

    main().catch(error => {
      console.error('Fatal error:', error.message)
      process.exit(1)
    })
    ```
  </WrapCode>

  <Step>
    Run `index.js`:

    ```bash
    bare index.js
    ```
  </Step>
</Steps>

## Usage

### 1. Import the model class

```js
const ImgStableDiffusion = require('@qvac/diffusion-cpp')
```

### 2. Create the `args` object

```js
const path = require('bare-path')

const MODELS_DIR = path.resolve(__dirname, './models')
const args = {
  logger: console,
  files: {
    model: path.join(MODELS_DIR, 'flux-2-klein-4b-Q8_0.gguf'),
    llm:   path.join(MODELS_DIR, 'Qwen3-4B-Q4_K_M.gguf'),
    vae:   path.join(MODELS_DIR, 'flux2-vae.safetensors')
  },
  config: { threads: 8 },
  opts: { stats: true }
}
```

| Property       | Required | Description                                                                                      |
| -------------- | -------- | ------------------------------------------------------------------------------------------------ |
| `files`        | ✅        | Object of absolute paths to model files (see below)                                              |
| `files.model`  | ✅        | Absolute path to diffusion model file (diffusion-only GGUF for FLUX.2; all-in-one for SD1.x/2.x) |
| `files.clipL`  | —        | Absolute path to separate CLIP-L text encoder (SD3)                                              |
| `files.clipG`  | —        | Absolute path to separate CLIP-G text encoder (SDXL / SD3)                                       |
| `files.t5Xxl`  | —        | Absolute path to separate T5-XXL text encoder (SD3)                                              |
| `files.llm`    | —        | Absolute path to Qwen3 LLM text encoder (FLUX.2 \[klein])                                        |
| `files.vae`    | —        | Absolute path to separate VAE file                                                               |
| `files.esrgan` | —        | Absolute path to ESRGAN upscaler model for post-generation upscale                               |
| `config`       | —        | Native backend configuration object (see next section)                                           |
| `logger`       | —        | Logger instance for JS wrapper logs (e.g. `console`)                                             |
| `opts`         | —        | Additional options (e.g. `{ stats: true }`)                                                      |

Native C++ logs are process-global. Configure native log routing once with `require('@qvac/diffusion-cpp/addonLogging').setLogger(...)`.

### 3. Configure the native backend (`args.config`)

`config` is a field on the `args` object built in step 2 — there is no separate constructor argument. The native backend reads it during `load()`.

```js
args.config = {
  threads: 8  // CPU threads for tensor operations (Metal handles GPU automatically)
}
```

Config values are coerced to strings internally. Generation parameters (prompt, steps, seed, etc.) are JSON-serialized with their native types preserved.

| Parameter            | Type                                            | Default  | Description                                                                          |
| -------------------- | ----------------------------------------------- | -------- | ------------------------------------------------------------------------------------ |
| `threads`            | number                                          | auto     | Number of CPU threads for model loading and CPU ops                                  |
| `type`               | `'f32'` \| `'f16'` \| `'q4_0'` \| `'q8_0'` \| … | auto     | Override weight quantisation type                                                    |
| `rng`                | `'cpu'` \| `'cuda'` \| `'std_default'`          | `'cuda'` | RNG backend (`'cuda'` = philox RNG — not GPU-specific despite the name; recommended) |
| `clip_on_cpu`        | `true` \| `false`                               | `false`  | Force CLIP encoder to run on CPU                                                     |
| `vae_on_cpu`         | `true` \| `false`                               | `false`  | Force VAE to run on CPU                                                              |
| `flash_attn`         | `true` \| `false`                               | `false`  | Enable flash attention (reduces memory)                                              |
| `upscaler_tile_size` | number                                          | `128`    | ESRGAN upscaler tile size                                                            |

### 4. Create a model instance

```js
const model = new ImgStableDiffusion(args)
```

The constructor takes a single object containing `files`, `config`, `logger`, and `opts`. It stores configuration only — no memory is allocated yet.

### 5. Load the Model

```js
await model.load()
```

This creates the native `sd_ctx_t` and loads all weights into memory. It can take 10–30 seconds depending on disk speed and model size. All model files must be passed as absolute paths via the `files` object.

### 6. Run Inference

The primary API. Returns a `QvacResponse` that streams step-progress ticks and the final PNG:

```js
const images = []

const response = await model.run({
  prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
  steps: 20,
  width: 512,
  height: 512,
  guidance: 3.5,
  seed: 42
})

await response
  .onUpdate(data => {
    if (data instanceof Uint8Array) {
      images.push(data)
    } else if (typeof data === 'string') {
      try {
        const tick = JSON.parse(data)
        if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
      } catch (_) {}
    }
  })
  .await()

require('bare-fs').writeFileSync('output.png', images[0])
```

**Generation parameters:**

| Parameter         | Type                              | Default | Description                                                                        |
| ----------------- | --------------------------------- | ------- | ---------------------------------------------------------------------------------- |
| `prompt`          | string                            | —       | Text prompt                                                                        |
| `negative_prompt` | string                            | `''`    | Things to avoid in the output                                                      |
| `width`           | number                            | `512`   | Output width in pixels (multiple of 8)                                             |
| `height`          | number                            | `512`   | Output height in pixels (multiple of 8)                                            |
| `steps`           | number                            | `20`    | Number of diffusion steps                                                          |
| `guidance`        | number                            | `3.5`   | Distilled guidance scale (FLUX.2)                                                  |
| `cfg_scale`       | number                            | `7.0`   | Classifier-free guidance scale (SD1.x / SD2.x)                                     |
| `sampling_method` | string                            | auto    | Sampler name; auto-selects `euler` for FLUX.2, `euler_a` for SD1.x                 |
| `scheduler`       | string                            | auto    | Scheduler; auto-selected per model family                                          |
| `seed`            | number                            | `-1`    | Random seed (-1 for random)                                                        |
| `batch_count`     | number                            | `1`     | Number of images to generate                                                       |
| `vae_tiling`      | boolean                           | `false` | Enable VAE tiling (required for large images on 16 GB)                             |
| `cache_preset`    | string                            | —       | Step-caching preset: `slow`, `medium`, `fast`, `ultra`                             |
| `upscale`         | boolean \| `{ repeats?: number }` | `false` | Post-generation ESRGAN upscale. Requires `files.esrgan`; `repeats` defaults to `1` |

<Callout type="info">
  Do not set `sampling_method: 'euler_a'` for FLUX.2 models — it will produce random noise. Leave the field unset to let the library auto-select `euler` for flow-matching models.
</Callout>

### 7. Release Resources

```js
await model.unload()
```

`unload()` calls `free_sd_ctx` which releases all GPU and CPU memory. The JS object can be safely garbage collected afterwards.

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/diffusion-cpp)