# CLI (/cli)


## Overview

QVAC CLI is provided through the `@qvac/cli` npm package and requires `@qvac/sdk` to be installed alongside it in your project. At the moment, it provides the following functionality:

* OpenAI-compatible HTTP server
* SDK bundling

## Usage

<Steps>
  <Step>
    Install the SDK and CLI in your project:

    ```bash
    npm install @qvac/sdk @qvac/cli
    ```

    See [Installation](/sdk/getting-started/installation) for environment-specific instructions of the SDK.
  </Step>

  <Step>
    Create a `qvac.config.*` file in the root of your project and add the configuration required for the functionality you want to use. See [Configuration](/sdk/getting-started/configuration) to learn how to do this.
  </Step>

  <Step>
    Run a command:

    ```bash
    qvac --help
    ```
  </Step>
</Steps>

## OpenAI-compatible HTTP server

QVAC CLI provides an HTTP server that is compatible with the [OpenAI REST API](https://developers.openai.com/api/reference/overview), enabling broad integration with the AI ecosystem. It internally translates HTTP requests into SDK calls. As a result, any system compatible with the OpenAI REST API can point to `http://localhost:11434/v1/` and work without changes.

<Card href="/http-server" title="HTTP server">
  Learn how to run a local HTTP server that exposes an OpenAI-compatible API.
</Card>

## SDK bundling

The `qvac bundle sdk` command allows you to select only the plugins you need in your project and reduce your application's final bundle size. See [Plugin system](/sdk/examples/utilities/plugin-system) to learn how to use it.

## Reference

Run `qvac --help` to see all available commands and `qvac <command> --help` for command-specific options.

### `qvac bundle sdk`

Generate a tree-shaken Bare worker bundle with selected plugins.

| Option                | Description                                                 |
| --------------------- | ----------------------------------------------------------- |
| `-c, --config <path>` | Config file path (default: auto-detect `qvac.config.*`)     |
| `--sdk-path <path>`   | Path to SDK package (default: auto-detect in node\_modules) |
| `--host <target>`     | Target host (repeatable)                                    |
| `--defer <module>`    | Defer a module (repeatable)                                 |
| `-q, --quiet`         | Minimal output                                              |
| `-v, --verbose`       | Detailed output                                             |

### `qvac serve openai`

Start an OpenAI-compatible REST API server.

| Option                 | Description                                             |
| ---------------------- | ------------------------------------------------------- |
| `-c, --config <path>`  | Config file path (default: auto-detect `qvac.config.*`) |
| `-p, --port <number>`  | Port to listen on (default: `11434`)                    |
| `-H, --host <address>` | Host to bind to (default: `127.0.0.1`)                  |
| `--model <alias>`      | Model alias to preload (repeatable, must be in config)  |
| `--api-key <key>`      | Require Bearer token authentication                     |
| `--cors`               | Enable CORS headers                                     |
| `-v, --verbose`        | Detailed output                                         |


# HTTP server (/http-server)


## Overview

To run the server, you need the `@qvac/sdk` and `@qvac/cli` npm packages installed in your project. The server is provided by `@qvac/cli` and internally translates HTTP requests into SDK calls. As a result, any system compatible with the [OpenAI REST API](https://developers.openai.com/api/reference/overview) can point to `http://localhost:11434/v1/` and work without changes.

### AI capabilities

At present, the HTTP server supports the following QVAC AI capabilities:

* [Completion](/sdk/examples/ai-tasks/completion)
* [Text embeddings](/sdk/examples/ai-tasks/text-embeddings)
* [Transcription](/sdk/examples/ai-tasks/transcription)

### Compatible tools

The following tools have been verified to work as drop-in replacements by pointing their base URL to the QVAC server:

| Tool                                            | Required endpoints                                           |
| ----------------------------------------------- | ------------------------------------------------------------ |
| [Continue.dev](https://continue.dev)            | `/v1/chat/completions` (streaming SSE), `/v1/models`         |
| [LangChain](https://langchain.com)              | `/v1/chat/completions`, `/v1/embeddings`, `/v1/models`       |
| [Open Interpreter](https://openinterpreter.com) | `/v1/chat/completions` (streaming, tool calls), `/v1/models` |

## Running the server

<Steps>
  <Step>
    Install the SDK and CLI in your project:

    ```bash
    npm install @qvac/sdk @qvac/cli
    ```

    See [Installation](/sdk/getting-started/installation) for environment-specific instructions of the SDK.
  </Step>

  <Step>
    Create the `qvac.config.*` file at the root of your project declaring which models the server can load. For example:

    ```json title="qvac.config.json"
    {
      "serve": {
        "models": {
          "my-llm": {
            "model": "QWEN3_600M_INST_Q4",
            "default": true,
            "config": { "ctx_size": 8192 }
          }
        }
      }
    }
    ```
  </Step>

  <Step>
    Start the server:

    ```bash
    qvac serve openai
    ```
  </Step>

  <Step>
    Send a request:

    ```bash
    curl http://localhost:11434/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "my-llm",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'
    ```
  </Step>
</Steps>

## Configuration

Models are declared in `qvac.config.*` under the `serve.models` key. The server can only load models listed under this key — requests for unlisted models return 404. Each key in `serve.models` is a **model alias** — the name that HTTP clients use in the `model` field of their requests. For the full schema of `serve.models`, see [Configuration — `ServeConfig`](/sdk/getting-started/configuration#serveconfig).

### Example

```json title="qvac.config.json"
{
  "serve": {
    "models": {
      "my-llm": {
        "model": "QWEN3_600M_INST_Q4",
        "default": true,
        "preload": true,
        "config": { "ctx_size": 8192, "tools": true }
      },
      "my-embed": {
        "model": "GTE_LARGE_FP16",
        "default": true
      },
      "whisper": {
        "model": "WHISPER_TINY",
        "default": true,
        "preload": true,
        "config": { "language": "en", "strategy": "greedy" }
      }
    }
  }
}
```

* **`model`**: SDK model constant name (e.g., `QWEN3_600M_INST_Q4`). The server resolves it to a download source and addon type automatically.
* **`default`**: when `true`, marks this model as the default for its endpoint category. This does not make the server auto-select the model for requests that omit `model`.
* **`preload`**: when `true`, the model is loaded into memory on server startup. When `false`, it is loaded on first request (cold start). Defaults to `true` for constant model entries.
* **`config`**: model config overrides passed to the underlying addon. Same options as [`modelConfig` in `loadModel()`](/sdk/api/loadModel/#modelconfig-varies-by-model-type).

<Callout type="warn">
  `default` field **does not** act as a fallback when an API request omits `model`. Requests must still include a `model` field; otherwise, the server returns `400`.
</Callout>

## CLI

```
qvac serve openai [options]
  -c, --config <path>    Config file path (default: auto-detect qvac.config.*)
  -p, --port <number>    Port to listen on (default: 11434)
  -H, --host <address>   Host to bind to (default: 127.0.0.1)
  --model <alias>        Model alias to preload (repeatable, must be in config)
  --api-key <key>        Require Bearer token authentication
  --cors                 Enable CORS headers
  -v, --verbose          Detailed output
```

## API

All endpoints follow the [OpenAI API](https://platform.openai.com/docs/api-reference) request and response format. Base path: `/v1`.

### Endpoints

* `GET /v1/models` — list loaded models
* `GET /v1/models/:id` — get model details
* `DELETE /v1/models/:id` — unload a model
* `POST /v1/chat/completions` — chat completions (blocking + SSE streaming)
* `POST /v1/embeddings` — text embeddings (single + batch)
* `POST /v1/audio/transcriptions` — audio transcription

### `GET /v1/models`

List all loaded models.

```bash
curl http://localhost:11434/v1/models
```

Response:

```json
{
  "object": "list",
  "data": [
    { "id": "my-llm", "object": "model", "created": 1718000000, "owned_by": "qvac" }
  ]
}
```

### `GET /v1/models/:id`

Get details of a specific loaded model.

```bash
curl http://localhost:11434/v1/models/my-llm
```

### `DELETE /v1/models/:id`

Unload a model, releasing its resources.

```bash
curl -X DELETE http://localhost:11434/v1/models/my-llm
```

Response:

```json
{ "id": "my-llm", "object": "model", "deleted": true }
```

### `POST /v1/chat/completions`

Generate a chat completion. Supports both blocking and streaming (SSE) modes, tool/function calling, and per-request generation parameters.

**Blocking request:**

```bash
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-llm",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "temperature": 0.7,
    "max_tokens": 256
  }'
```

**Streaming request (server-sent events):**

```bash
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-llm",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "stream": true
  }'
```

**Tool calling:**

```bash
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-llm",
    "messages": [{"role": "user", "content": "What is the weather in London?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": { "location": { "type": "string" } },
          "required": ["location"]
        }
      }
    }]
  }'
```

#### Generation parameters

The following OpenAI parameters are forwarded to the model on each request:

| OpenAI parameter        | SDK parameter       | Description                          |
| ----------------------- | ------------------- | ------------------------------------ |
| `temperature`           | `temp`              | Sampling temperature                 |
| `max_tokens`            | `predict`           | Maximum tokens to generate           |
| `max_completion_tokens` | `predict`           | Alias for `max_tokens`               |
| `top_p`                 | `top_p`             | Nucleus sampling threshold           |
| `seed`                  | `seed`              | Random seed for deterministic output |
| `frequency_penalty`     | `frequency_penalty` | Penalize frequent tokens             |
| `presence_penalty`      | `presence_penalty`  | Penalize already-present tokens      |

#### Unsupported parameters

The following OpenAI parameters are accepted but ignored (a warning is logged): `n`, `logprobs`, `response_format`, `stop`, `top_logprobs`, `logit_bias`, `parallel_tool_calls`, `stream_options`.

### `POST /v1/embeddings`

Generate text embeddings. Accepts a single string or a batch of strings.

**Single input:**

```bash
curl http://localhost:11434/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-embed",
    "input": "The quick brown fox"
  }'
```

**Batch input:**

```bash
curl http://localhost:11434/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-embed",
    "input": ["First sentence", "Second sentence"]
  }'
```

Response:

```json
{
  "object": "list",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.012, -0.034, ...] }
  ],
  "model": "my-embed",
  "usage": { "prompt_tokens": 0, "total_tokens": 0 }
}
```

<Callout type="info">
  `encoding_format` (only `float` is supported) and `dimensions` are accepted but ignored.
</Callout>

### `POST /v1/audio/transcriptions`

Transcribe audio using Whisper or Parakeet models. This endpoint uses `multipart/form-data` (not JSON). Maximum file size: 25 MB.

**For receiving a JSON response (default):**

```bash
curl http://localhost:11434/v1/audio/transcriptions \
  -F "file=@audio.wav" \
  -F "model=whisper" \
  -F "response_format=json"
```

Response:

```json
{ "text": "transcribed text here" }
```

**For receiving a plain text response:**

```bash
curl http://localhost:11434/v1/audio/transcriptions \
  -F "file=@audio.wav" \
  -F "model=whisper" \
  -F "response_format=text"
```

**With prompt** (Whisper uses it as `initial_prompt`):

```bash
curl http://localhost:11434/v1/audio/transcriptions \
  -F "file=@audio.wav" \
  -F "model=whisper" \
  -F "prompt=President Kennedy speech about space exploration"
```

#### Parameters

| Parameter         | Description                             | Required |
| ----------------- | --------------------------------------- | -------- |
| `file`            | Audio file to transcribe.               | Yes      |
| `model`           | Model alias (must be in config).        | Yes      |
| `response_format` | `json` (default) or `text`.             | No       |
| `prompt`          | Optional prompt forwarded to the model. | No       |

Unsupported `response_format` values (`srt`, `vtt`, `verbose_json`) return a `400` error.

<Callout type="info">
  `language` and `temperature` are accepted but currently only configurable at model load time (via `serve.models` config), not per-request. A warning is logged when these are sent.
</Callout>

## Authentication

By default, the server accepts unauthenticated requests on `127.0.0.1`. To require a Bearer token, run the server with the `--api-key` flag:

```bash
qvac serve openai --api-key my-secret-token
```

Clients must then include the token in the `Authorization` header:

```bash
curl http://localhost:11434/v1/models \
  -H "Authorization: Bearer my-secret-token"
```

Requests without a valid token receive a `401` response.


# QVAC by Tether: the Infinite Stable Intelligence Platform (/)


import { Sparkles, Telescope, Compass, Rocket, BookA, LayoutGrid, FolderTree, Blocks, MessageCircleQuestionMark, Wrench, DoorOpen, Package, Terminal } from 'lucide-react'
import { SiDiscord, SiHuggingface, SiExpo, SiElectron } from '@icons-pack/react-simple-icons';
import { CopyInline } from '@/components/copy-inline'
import { FeaturebaseIcon } from '@/components/featurebase-icon';
import { KeetIcon } from '@/components/keet-icon';

## Discover

<Cards>
  <Card href="/about-qvac/welcome" icon={<DoorOpen className="text-[var(--color-fd-primary)]" />} title="Welcome">
    Learn what QVAC is, why Tether built it, and what makes it different.
  </Card>

  <Card href="/about-qvac/vision" icon={<Telescope className="text-[var(--color-fd-primary)]" />} title="Our vision">
    Learn about Tether's long-term vision for QVAC.
  </Card>
</Cards>

## Build

<Cards>
  <Card href="/sdk/getting-started" icon={<Compass className="text-[var(--color-fd-primary)]" />} title="Get started with QVAC SDK">
    Use the JS/TS SDK to build local and P2P AI applications and systems.
  </Card>

  <Card href="/sdk/getting-started/quickstart" icon={<Rocket className="text-[var(--color-fd-primary)]" />} title="Quickstart">
    Run your first example using the JS/TS SDK.
  </Card>

  <Card href="/sdk/getting-started/installation" icon={<Package className="text-[var(--color-fd-primary)]" />} title="Installation">
    Supported environments and how to install the SDK for each one: Node.js, Bare, and Expo.
  </Card>

  <Card href="/sdk/api" icon={<BookA className="text-[var(--color-fd-primary)]" />} title="JS API reference">
    `@qvac/sdk` npm package exposes a function-centric, typed JS API.
  </Card>

  <Card href="/cli" icon={<Terminal className="text-[var(--color-fd-primary)]" />} title="CLI">
    `@qvac/cli` npm package provides tools and an HTTP server that exposes an OpenAI-compatible API.
  </Card>
</Cards>

## Explore

<Cards>
  <Card href="/about-qvac/flagship-apps" icon={<LayoutGrid className="text-[var(--color-fd-primary)]" />} title="Flagship applications">
    Desktop and mobile apps to empower users and showcase QVAC capabilities.
  </Card>

  <Card href="/addons" icon={<Blocks className="text-[var(--color-fd-primary)]" />} title="Addons">
    Reference for the QVAC components that provide AI capabilities.
  </Card>

  <Card href="https://huggingface.co/qvac" icon={<SiHuggingface className="text-[var(--color-fd-primary)]" />} title="Research">
    Ongoing research focused on advancing the state of the art in local AI. See the research outputs on Hugging Face.
  </Card>
</Cards>

## Learn

<Cards>
  <Card href="/sdk/tutorials/electron" icon={<SiElectron className="text-[var(--color-fd-primary)]" />} title="Build an Electron app">
    Hands-on tutorial on using QVAC SDK with Electron.
  </Card>

  <Card href="/sdk/tutorials/expo" icon={<SiExpo className="text-[var(--color-fd-primary)]" />} title="Build an Expo app">
    Hands-on tutorial on using QVAC SDK with Expo.
  </Card>
</Cards>

## Community

<Cards>
  <Card href="https://discord.com/invite/tetherdev" icon={<SiDiscord className="text-[var(--color-fd-primary)]" />} title="Discord">
    Where we gather. Get real-time help from the team and community, meet peers, join in, discuss QVAC.
  </Card>

  <Card
    icon={
    <KeetIcon className="text-[var(--color-fd-primary)]" />
  }
    title="Keet"
  >
    Also another place where we gather, but using [Keet P2P app](https://keet.io). <CopyInline textToCopy="pear://keet/nfo35rbknsh35m184tmww7s4ssjh1hftiysx8yqreyi5njwsknuxs47nwhsbzt6wijoxfzmi6a8fh7b49gun3qxea3mumixnti8spizeysthyxdm968t6zof93yqpgyygb39rmoobki7fo7rh5hod71map4gayedt3qiwhkjakycqnfa5kswn58mtt8qg">
    Copy this invite link [pear://keet/nfo35...](pear://keet/nfo35rbknsh35m184tmww7s4ssjh1hftiysx8yqreyi5njwsknuxs47nwhsbzt6wijoxfzmi6a8fh7b49gun3qxea3mumixnti8spizeysthyxdm968t6zof93yqpgyygb39rmoobki7fo7rh5hod71map4gayedt3qiwhkjakycqnfa5kswn58mtt8qg)
    </CopyInline> into your Keet app.
  </Card>
</Cards>

## Support

[*Use the appropriate channel in our **Discord** to talk directly with us.*](https://discord.com/invite/tetherdev) Get real-time help, report bugs, ask questions, share feedback, request and vote for features, take part in QVAC's evolution.

## Miscellaneous

Other resources:

* [Main website](https://qvac.tether.io)


# Flagship applications (/about-qvac/flagship-apps)


## Overview

AI is an extension of human intelligence, and you should own yours. With QVAC apps, your AI runs on your own devices. No subscriptions. No sharing of private data.

You can run QVAC apps 100% offline. But if you're on mobile and need more computing power for an AI task, you can connect to your PC over the internet. No one can stop you from using your AI.

### Benefits

* Total privacy
* Total ownership
* No subscriptions
* Unstoppable

## Catalog

To date, the following applications have been released. More coming soon.

<Cards>
  <Card href="https://qvac.tether.dev/products/workbench/" icon={<img src="/workbench.svg" alt="" width={48} height={48} />} title="Workbench">
    Desktop app to run AI models on your device or remotely and query your documents.
  </Card>

  <Card href="https://qvac.tether.dev/products/health/" icon={<img src="/health.svg" alt="" width={48} height={48} />} title="Health">
    Mobile app that empowers users to own their health and fitness data entirely on-device.
  </Card>
</Cards>


# How it works (/about-qvac/how-it-works)


## Overview

The SDK supports multiple JS runtimes, but its [underlying components](/addons) run only on [Bare](https://bare.pears.com). When the SDK runs in a runtime other than Bare, it spawns a Bare worker where all AI operations will take place. The worker is started lazily on the first RPC call and can be explicitly shut down with `close()`.

## Phase 1: initialization

<Mermaid
  chart="sequenceDiagram
    participant App as Application
    participant SDK as QVAC SDK
    participant RPC as RPC Client (singleton)
    participant Worker as Bare Worker (singleton)

    App->>SDK: Call loadModel() (or any API)
    SDK->>RPC: getRPC() (create runtime-specific RPC client)
    RPC->>Worker: Spawn worker (Node/Expo) or connect in-process (Bare)
    RPC->>Worker: Send __init_config (first call only)
    Worker->>Worker: Store config in memory
    Worker->>RPC: Init ack"
/>

The first time you call `loadModel()` (or any function other than `close()`), the SDK performs a complete initialization sequence. It initializes a runtime-specific RPC client and sends configuration to the worker via the internal `__init_config` message. The worker process is spawned once and reused for subsequent calls until you explicitly close it. In Bare runtime, no separate worker process is spawned; requests are handled in-process.

## Phase 2: model loading

<Mermaid
  chart="sequenceDiagram
    participant App as Application
    participant SDK as QVAC SDK
    participant RPC as RPC Client (singleton)
    participant Worker as Bare Worker (singleton)

    App->>SDK: Call loadModel(&#x22;llama-3&#x22;, options)
    SDK->>RPC: getRPC() (reuse singleton)
    RPC->>Worker: Send loadModel request
    Worker->>Worker: Download and load model into memory
    Worker->>RPC: Return success response
    RPC->>SDK: Model loaded successfully
    SDK->>App: loadModel() resolves with modelId"
/>

There is only a single RPC client and Bare worker per application, not per model — i.e., singleton pattern. The model is downloaded and loaded into memory, and registered with a unique ID. From that point on, it will be available for AI inference until you unload it — call `unloadModel()` to free its memory.

## Phase 3: inference

<Mermaid
  chart="sequenceDiagram
    participant App as Application
    participant SDK as QVAC SDK
    participant RPC as RPC Client (singleton)
    participant Worker as Bare Worker (singleton)

    App->>SDK: Call completion(&#x22;llama-3&#x22;, history)
    SDK->>RPC: Use existing RPC client (singleton)
    RPC->>Worker: Send completion request via IPC
    Worker->>Worker: Run inference with loaded model
    Worker->>RPC: Stream tokens via IPC
    RPC->>SDK: Stream tokens to user
    SDK->>App: Stream AI response tokens
    Worker->>RPC: Send final response with stats
    RPC->>SDK: Return completion stats
    SDK->>App: completion() resolves with full response"
/>

You can call `loadModel()` multiple times to make multiple models ready for use simultaneously. Additionally, you can perform AI inference multiple times with all of them. When you no longer need a model, call `unloadModel()` to free up resources.

## Phase 4: shutdown

<Mermaid
  chart="sequenceDiagram
    participant App as Application
    participant SDK as QVAC SDK
    participant RPC as RPC Client (singleton)
    participant Worker as Bare Worker (singleton)

    App->>SDK: Call close()
    SDK->>RPC: Close RPC client
    RPC->>Worker: Terminate worker process (Node/Expo)
    Worker->>Worker: Cleanup (unload models, stop swarms, close RAG)
    Worker->>RPC: Exit"
/>

`close()` explicitly shuts down the worker and releases the RPC connection. In Node/Expo, this terminates the worker process; in Bare, the call is a no-op since there is no separate worker process. After `close()`, the next SDK call will reinitialize the RPC client and spawn a fresh worker.

<Callout title="Tip" type="success">
  `unloadModel()` will automatically close the RPC connection when there are no active models or providers, but `close()` is the explicit way to shut down the SDK instance.
</Callout>


# Public launch (/about-qvac/public-launch)


Tether publicly launched QVAC in October 2025 at the Plan B Forum in Lugano. During the forum, Tether CEO Paolo Ardoino presented QVAC's future vision in his keynote, *Fiat Lux*:

<div className="relative w-full overflow-hidden rounded-lg" style={{ paddingTop: "56.25%" }}>
  <iframe className="absolute inset-0 h-full w-full" src="https://www.youtube.com/embed/UmNlwxsLXD8" title="YouTube video player" frameBorder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowFullScreen />
</div>

Tether VP of Engineering Subash and QVAC Product & Engineering Manager Marco Chiappetta also presented the technology overview in their keynote, *QVAC — Sovereign AI for the Universe*:

<div className="relative w-full overflow-hidden rounded-lg" style={{ paddingTop: "56.25%" }}>
  <iframe className="absolute inset-0 h-full w-full" src="https://www.youtube.com/embed/8jslUTAVdvU" title="YouTube video player" frameBorder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowFullScreen />
</div>


# Our vision (/about-qvac/vision)


## The world is waking up

Soon, 10 billion humans will share this planet with 10 billion autonomous machines and a trillion AI agents.

But the current model — routing every thought through a centralized server — is broken. It is too slow, too fragile, and too controlled. It won't ever scale to the end promise of AI: intelligence becoming part of the fabric of the universe itself. You cannot run the future of the world on a rented centralized cloud. It's not even a matter of opinion: the laws of physics that rule the universe, starting from the speed of light, are the main limiting factors for centralized AI.

Big tech platforms need to preserve the lie of a long-term sustainable centralized AI as much as possible, to justify their business models, incentivized by the monetization of personal data ingestion, supported by the endless capital raises that delay the cash hemorrhage keeping alive the mirage of the eventual breakeven operational viability.

*At Tether, we break conventions. We believe intelligence should not be a service you rent. It should be a foundational element possessed by its owner.*

## Enter QVAC

We are not building another chatbot. We are building the foundational layer for the next era of computing. We see AI as a new element of the periodic table — a raw material that can be embedded into the very fabric of reality. *QVAC is the atomic unit of this new world.*

It is a modular, highly efficient, local-first AI platform designed to run anywhere. From the most powerful industrial server to the smallest chip in a light bulb. We are designing AI like a LEGO block: standardized, stackable, and infinitely scalable. A framework that can evolve with silicon, that adapts to the future, for the next tens, hundreds, millions and billions of years.

## Infinite Scale, Unbound Power

The only way to scale intelligence to the needs of a ten-billion-strong society is to push it to the edge. When intelligence is local, it becomes instantaneous. An autonomous car does not ask the cloud for permission to brake; it reacts. A robot does not wait for a server response to catch a falling object; it moves.

By decoupling intelligence from the cloud, we unlock **unstoppable scalability**:

* **No central point of failure**: if the internet breaks, your world keeps thinking.
* **No latency**: thought becomes action instantly.
* **No limits**: we are not constrained by the number of GPUs in a data center. We are constrained only by the number of devices on Earth.

We are building QVAC to be the invisible intelligence engine of the 21st century.

We are daring to dismantle the traditional systems to build something fairer. Something resilient. Something efficient.

**Welcome to the decentralized mind. Welcome to QVAC.**

<span className="text-lg font-bold">
  We're Tether, Unstoppable Together.
</span>


# Welcome (/about-qvac/welcome)


## What is QVAC?

**QVAC** is an open-source, cross-platform ecosystem for building local-first, peer-to-peer **AI** applications and systems. With QVAC, you can run AI tasks like LLMs, speech, RAG, and more locally across Linux, macOS, Windows, Android, and iOS — or delegate inference to peers using its built-in P2P capabilities.

### Key features

* **Local-first:** load AI models and perform inference on your own machine. No third-party APIs, SaaS, or cloud involved.
* **P2P:** build unstoppable internet systems — like BitTorrent, IPFS, and blockchain networks, but for AI.
* **Cross-platform:** consistent developer experience across hardware, operating systems, and JS runtime environments — write code once, run it everywhere.
* **Open source:** 100% free to use and modify — build on top, contribute back, be part of our community.

### Ecosystem

QVAC is composed of JavaScript libraries that converge in the **JS SDK**. *The SDK is the main entry point for using QVAC*. It is type-safe and exposes all QVAC capabilities through a unified interface. It runs on Node.js, [Bare runtime](https://bare.pears.com), and [Expo](https://expo.dev).

Additionally, QVAC provides a CLI with tools and an HTTP server that exposes an [**OpenAI-compatible API**](https://platform.openai.com/docs/api-reference). *By implementing the OpenAI API format, QVAC can integrate with the broader AI ecosystem.*

Finally, QVAC also encompasses desktop and mobile **flagship applications** to empower users and showcase QVAC capabilities, as well as **research initiatives** to advance the state of the art in local AI.

## Why QVAC?

QVAC is Tether's answer to the centralized AI by ensuring it's not tied to massive data centers in the hands of a few *but is free to run on everyone's devices, without a central point of failure or without arbitrary censorship*.

We do so by enabling devices to run local AI models and ensuring connectivity only happens peer-to-peer, through [Holepunch](https://holepunch.to) technology. We do so by building an SDK for developers, and flagship end-user applications around it, to showcase the capabilities of our technology to a wider audience.

### Key differentiators

* **Unified approach:** other local AI solutions like Llama.rn, Cactus, and `react-native-executorch` each solve parts of the puzzle, but none offer a unified approach like QVAC — one single API surface, feature-complete, and consistent behavior across runtime environments and platforms.
* **Out-of-the-box P2P:** share and run AI models with peers, with no extra setup or prior knowledge required.

## What powers QVAC?

QVAC's key features are powered by the following technologies and standards:

* **Local AI:** leverages state-of-the-art open-source edge AI engines and models (e.g., [GGML](https://ggml.ai) and [ONNX](https://onnx.ai)).
* **P2P:** built on Holepunch technology.
* **Cross-platform:** runs on [**Bare**](https://bare.pears.com) by Holepunch, a lightweight, cross-platform JavaScript runtime.
* **OSS:** released under Apache 2.0 license.


# Addons (/addons)


## Overview

Each addon is a JavaScript [Bare module](https://bare.pears.com) that adds support for one or more AI tasks to QVAC ecosystem.

*If you use the SDK, you **don’t need** to use addons directly.* The SDK is the canonical way to build with QVAC: it unifies all addon capabilities under a single interface, enables QVAC to run on JS runtimes beyond Bare, and provides out-of-the-box P2P connectivity.

## Compatibility matrix

All addons support the following platforms:

| Platform | Min Version | Architecture | GPU API/Backend              | Notes                                                                       |
| -------- | ----------- | ------------ | ---------------------------- | --------------------------------------------------------------------------- |
| macOS    | 14.0+       | arm64        | Metal                        | Arch x64 supports CPU inference only; Intel iGPU acceleration not supported |
| iOS      | 17.0+       | arm64        | Metal                        | Requires Expo                                                               |
| Linux    | Ubuntu 22+  | arm64, x64   | Vulkan                       | Vulkan runtime required                                                     |
| Android  | 12+         | arm64        | Vulkan, OpenCL (Adreno 700+) | Requires Expo                                                               |
| Windows  | 10+         | x64          | Vulkan                       | Vulkan-capable GPU + vendor drivers required                                |

## Catalog

The following table lists all available addons for QVAC ecosystem:

| Package                                                              | AI Tasks               | Description                                                                                                                                   | inference engine                                                                                                 | Stability |
| -------------------------------------------------------------------- | ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | --------- |
| [`@qvac/llm-llamacpp`](/addons/llm-llamacpp)                         | Completion, multimodal | LLM inference for text generation and chat with support to images, and other media within a single conversation context.                      | [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp)                                         | stable    |
| [`@qvac/embed-llamacpp`](/addons/embed-llamacpp)                     | Text embeddings, RAG   | Vector embedding generation for semantic search, clustering, and retrieval that seamlessly suppports retrieval-augmented generation workflow. | `qvac-fabric-llm.cpp`                                                                                            | stable    |
| [`@qvac/translation-nmtcpp`](/addons/translation-nmtcpp)             | Translation            | Text-to-text neural machine translation (NMT).                                                                                                | `qvac-fabric-llm.cpp` and [Bergamot](https://browser.mt)                                                         | stable    |
| [`@qvac/transcription-whispercpp`](/addons/transcription-whispercpp) | Transcription          | Automatic speech recognition (ASR) for speech-to-text.                                                                                        | [`qvac-ext-lib-whisper.cpp`](https://github.com/tetherto/qvac-ext-lib-whisper.cpp)                               | stable    |
| [`@qvac/transcription-parakeet`](/addons/transcription-parakeet)     | Transcription          | Automatic speech recognition (ASR) for speech-to-text with speaker diarization.                                                               | [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) via [ONNX Runtime](https://onnxruntime.ai) | stable    |
| [`@qvac/tts-onnx`](/addons/tts-onnx)                                 | Text-to-speech         | Speech synthesis for text-to-speech (TTS).                                                                                                    | ONNX Runtime                                                                                                     | stable    |
| [`@qvac/ocr-onnx`](/addons/ocr-onnx)                                 | OCR                    | Optical character recognition (OCR) for extracting text from images.                                                                          | ONNX runtime                                                                                                     | stable    |
| [`@qvac/diffusion-cpp`](/addons/diffusion-cpp)                       | Image generation       | Text-to-image generation from text prompts.                                                                                                   | [`qvac-ext-stable-diffusion.cpp`](https://github.com/tetherto/qvac-ext-stable-diffusion.cpp)                     | stable    |


# @qvac/diffusion-cpp (/addons/diffusion-cpp)


## Overview

[Bare module](https://bare.pears.com) that adds support for text-to-image generation in QVAC using [`qvac-ext-stable-diffusion.cpp`](https://github.com/tetherto/qvac-ext-stable-diffusion.cpp) as the inference engine.

## Models

Supports SD1.x, SD2.x, SDXL, SD3, and FLUX.2-klein model families.

### FLUX.2-klein

Three separate components are required:

* **Diffusion model** (`flux-2-klein-4b-Q8_0.gguf`) — the main image transformer.
* **Text encoder** (`Qwen3-4B-Q4_K_M.gguf`) — Qwen3 4B in standard GGML Q4\_K\_M format.
* **VAE** (`flux2-vae.safetensors`) — standard safetensors format, compatible as-is.

### Model file reference — FLUX.2-klein 4B

| Role            | File                        | Source                                                                                                                                          |
| --------------- | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| Diffusion model | `flux-2-klein-4b-Q8_0.gguf` | [leejet/FLUX.2-klein-4B-GGUF](https://huggingface.co/leejet/FLUX.2-klein-4B-GGUF)                                                               |
| Text encoder    | `Qwen3-4B-Q4_K_M.gguf`      | [unsloth/Qwen3-4B-GGUF](https://huggingface.co/unsloth/Qwen3-4B-GGUF)                                                                           |
| VAE             | `flux2-vae.safetensors`     | [black-forest-labs/FLUX.2-klein-4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/blob/main/vae/diffusion_pytorch_model.safetensors) |

### Stable Diffusion

* **Stable Diffusion 1.x / 2.x** — all-in-one checkpoint as a single `*.gguf` file
* **Stable Diffusion XL** — all-in-one `*.gguf` or split CLIP encoders
* **Stable Diffusion 3** — safetensors with separate CLIP encoders

## Requirements

* Memory: 16 GB unified memory on Apple Silicon, or 8 GB VRAM on GPU.
* Bare $\geq$ v1.24

## Installation

```bash
npm i @qvac/diffusion-cpp
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-diffusion-quickstart
    cd qvac-diffusion-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/diffusion-cpp bare-path bare-process bare-fs
    ```
  </Step>

  <Step>
    Download the FLUX.2 \[klein] 4B model files (\~6.8 GB total):

    ```bash
    mkdir -p models

    curl -L -C - -o models/flux-2-klein-4b-Q8_0.gguf \
      https://huggingface.co/leejet/FLUX.2-klein-4B-GGUF/resolve/main/flux-2-klein-4b-Q8_0.gguf

    curl -L -C - -o models/Qwen3-4B-Q4_K_M.gguf \
      https://huggingface.co/unsloth/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q4_K_M.gguf

    curl -L -C - -o models/flux2-vae.safetensors \
      https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/blob/main/vae/diffusion_pytorch_model.safetensors
    ```
  </Step>

  <Step>
    Create `index.js`:
  </Step>

  <WrapCode>
    ```js title="index.js" lineNumbers
    'use strict'

    const path = require('bare-path')
    const fs = require('bare-fs')
    const process = require('bare-process')
    const ImgStableDiffusion = require('@qvac/diffusion-cpp')

    async function main () {
      const MODELS_DIR = path.resolve(__dirname, './models')

      const args = {
        logger: console,
        diskPath: MODELS_DIR,
        modelName: 'flux-2-klein-4b-Q8_0.gguf',
        llmModel: 'Qwen3-4B-Q4_K_M.gguf',
        vaeModel: 'flux2-vae.safetensors'
      }

      const config = {
        threads: 8
      }

      const model = new ImgStableDiffusion(args, config)
      await model.load()

      try {
        const images = []

        const response = await model.run({
          prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
          steps: 20,
          width: 512,
          height: 512,
          guidance: 3.5,
          seed: 42
        })

        await response
          .onUpdate(data => {
            if (data instanceof Uint8Array) {
              images.push(data)
            } else if (typeof data === 'string') {
              try {
                const tick = JSON.parse(data)
                if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
              } catch (_) {}
            }
          })
          .await()

        console.log('\n')

        if (images.length > 0) {
          fs.writeFileSync('output.png', images[0])
          console.log('Saved → output.png')
        }
      } catch (error) {
        console.error('Error occurred:', error.message || error)
      } finally {
        await model.unload()
      }
    }

    main().catch(error => {
      console.error('Fatal error:', error.message)
      process.exit(1)
    })
    ```
  </WrapCode>

  <Step>
    Run `index.js`:

    ```bash
    bare index.js
    ```
  </Step>
</Steps>

## Usage

### 1. Import the model class

```js
const ImgStableDiffusion = require('@qvac/diffusion-cpp')
```

### 2. Create the `args` object

```js
const path = require('bare-path')

const MODELS_DIR = path.resolve(__dirname, './models')
const args = {
  logger: console,
  diskPath: MODELS_DIR,
  modelName: 'flux-2-klein-4b-Q8_0.gguf',
  llmModel: 'Qwen3-4B-Q4_K_M.gguf',
  vaeModel: 'flux2-vae.safetensors'
}
```

| Property     | Required | Description                                                                          |
| ------------ | -------- | ------------------------------------------------------------------------------------ |
| `diskPath`   | ✅        | Local directory where model files are already stored                                 |
| `modelName`  | ✅        | Diffusion model file name (all-in-one for SD1.x/2.x; diffusion-only GGUF for FLUX.2) |
| `logger`     | —        | Logger instance (e.g. `console`)                                                     |
| `clipLModel` | —        | Separate CLIP-L text encoder (SD3)                                                   |
| `clipGModel` | —        | Separate CLIP-G text encoder (SDXL / SD3)                                            |
| `t5XxlModel` | —        | Separate T5-XXL text encoder (SD3)                                                   |
| `llmModel`   | —        | Qwen3 LLM text encoder (FLUX.2 \[klein])                                             |
| `vaeModel`   | —        | Separate VAE file                                                                    |

### 3. Create the `config` object

```js
const config = {
  threads: 8  // CPU threads for tensor operations (Metal handles GPU automatically)
}
```

All config values are coerced to strings internally before being passed to the native layer.

| Parameter     | Type                                            | Default  | Description                                                                          |
| ------------- | ----------------------------------------------- | -------- | ------------------------------------------------------------------------------------ |
| `threads`     | number                                          | auto     | Number of CPU threads for model loading and CPU ops                                  |
| `type`        | `'f32'` \| `'f16'` \| `'q4_0'` \| `'q8_0'` \| … | auto     | Override weight quantisation type                                                    |
| `rng`         | `'cpu'` \| `'cuda'` \| `'std_default'`          | `'cuda'` | RNG backend (`'cuda'` = philox RNG — not GPU-specific despite the name; recommended) |
| `clip_on_cpu` | `true` \| `false`                               | `false`  | Force CLIP encoder to run on CPU                                                     |
| `vae_on_cpu`  | `true` \| `false`                               | `false`  | Force VAE to run on CPU                                                              |
| `flash_attn`  | `true` \| `false`                               | `false`  | Enable flash attention (reduces memory)                                              |

### 4. Create a model instance

```js
const model = new ImgStableDiffusion(args, config)
```

The constructor stores configuration only — no memory is allocated yet.

### 5. Load the Model

```js
await model.load()
```

This creates the native `sd_ctx_t` and loads all weights into memory. It can take 10–30 seconds depending on disk speed and model size. All model files must already be present on disk at `diskPath`.

### 6. Run Inference

The primary API. Returns a `QvacResponse` that streams step-progress ticks and the final PNG:

```js
const images = []

const response = await model.run({
  prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
  steps: 20,
  width: 512,
  height: 512,
  guidance: 3.5,
  seed: 42
})

await response
  .onUpdate(data => {
    if (data instanceof Uint8Array) {
      images.push(data)
    } else if (typeof data === 'string') {
      try {
        const tick = JSON.parse(data)
        if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
      } catch (_) {}
    }
  })
  .await()

require('bare-fs').writeFileSync('output.png', images[0])
```

**Generation parameters:**

| Parameter         | Type    | Default | Description                                                        |
| ----------------- | ------- | ------- | ------------------------------------------------------------------ |
| `prompt`          | string  | —       | Text prompt                                                        |
| `negative_prompt` | string  | `''`    | Things to avoid in the output                                      |
| `width`           | number  | `512`   | Output width in pixels (multiple of 8)                             |
| `height`          | number  | `512`   | Output height in pixels (multiple of 8)                            |
| `steps`           | number  | `20`    | Number of diffusion steps                                          |
| `guidance`        | number  | `3.5`   | Distilled guidance scale (FLUX.2)                                  |
| `cfg_scale`       | number  | `7.0`   | Classifier-free guidance scale (SD1.x / SD2.x)                     |
| `sampling_method` | string  | auto    | Sampler name; auto-selects `euler` for FLUX.2, `euler_a` for SD1.x |
| `scheduler`       | string  | auto    | Scheduler; auto-selected per model family                          |
| `seed`            | number  | `-1`    | Random seed (-1 for random)                                        |
| `batch_count`     | number  | `1`     | Number of images to generate                                       |
| `vae_tiling`      | boolean | `false` | Enable VAE tiling (required for large images on 16 GB)             |
| `cache_preset`    | string  | —       | Step-caching preset: `slow`, `medium`, `fast`, `ultra`             |

<Callout type="info">
  Do not set `sampling_method: 'euler_a'` for FLUX.2 models — it will produce random noise. Leave the field unset to let the library auto-select `euler` for flow-matching models.
</Callout>

### 7. Release Resources

```js
await model.unload()
```

`unload()` calls `free_sd_ctx` which releases all GPU and CPU memory. The JS object can be safely garbage collected afterwards.

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/diffusion-cpp)


# @qvac/embed-llamacpp (/addons/embed-llamacpp)


## Overview

[Bare module](https://bare.pears.com) that adds support for text embeddings and RAG in QVAC using [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp) as the inference engine.

## Models

You can load any [`llama.cpp`](https://github.com/ggml-org/llama.cpp)-compatible embeddings model. Model file format: `*.gguf`.

## Requirement

Bare $\geq$ v1.24

## Installation

```bash
npm i @qvac/embed-llamacpp
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-embed-quickstart
    cd qvac-embed-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/dl-filesystem @qvac/embed-llamacpp
    ```
  </Step>

  <Step>
    Download a compatible model:

    ```bash
    curl -L --create-dirs -o models/gte-large_fp16.gguf \
      https://huggingface.co/ChristianAzinn/gte-large-gguf/resolve/main/gte-large_fp16.gguf
    ```
  </Step>

  <Step>
    Create `index.js`:
  </Step>

  <WrapCode>
    ```js title="index.js" lineNumbers'use strict'

    const FilesystemDL = require('@qvac/dl-filesystem')
    const GGMLBert = require('@qvac/embed-llamacpp')

    async function main () {
      const modelName = 'gte-large_fp16.gguf'
      const dirPath = './models'

      // 1. Initializing data loader
      const fsDL = new FilesystemDL({ dirPath })

      // 2. Configuring model settings
      const args = {
        loader: fsDL,
        logger: console,
        opts: { stats: true },
        diskPath: dirPath,
        modelName
      }
      const config = '-ngl\t25'

      // 3. Loading model
      const model = new GGMLBert(args, config)
      await model.load()

      try {
        // 4. Generating embeddings
        const query = 'Hello, can you suggest a game I can play with my 1 year old daughter?'
        const response = await model.run(query)
        const embeddings = await response.await()

        console.log('Embeddings shape:', embeddings.length, 'x', embeddings[0].length)
        console.log('First few values of first embedding:')
        console.log(embeddings[0].slice(0, 5))
      } catch (error) {
        const errorMessage = error?.message || error?.toString() || String(error)
        console.error('Error occurred:', errorMessage)
        console.error('Error details:', error)
      } finally {
        // 5. Cleaning up resources
        await model.unload()
        await fsDL.close()
      }
    }

    main().catch(console.error)
    ```
  </WrapCode>

  <Step>
    Run `index.js`:

    ```bash
    bare index.js
    ```
  </Step>
</Steps>

## Usage

### 1. Import the Model Class

```js
const GGMLBert = require('@qvac/embed-llamacpp')
```

### 2. Create a Data Loader

Data Loaders abstract the way model files are accessed. Use a `FileSystemDataLoader` to load model files from your local file system. Models can be downloaded directly from HuggingFace.

```js
const FilesystemDL = require('@qvac/dl-filesystem')

const dirPath = './models'
const modelName = 'gte-large_fp16.gguf'

const fsDL = new FilesystemDL({ dirPath })
```

### 3. Create the `args` obj

```js
const args = {
  loader: fsDL,
  logger: console,
  opts: { stats: true },
  diskPath: dirPath,
  modelName
}
```

The `args` obj contains the following properties:

* `loader`: The Data Loader instance from which the model file will be streamed.
* `logger`: This property is used to create a `QvacLogger` instance, which handles all logging functionality.
* `opts.stats`: This flag determines whether to calculate inference stats.
* `diskPath`: The local directory where the model file will be downloaded to.
* `modelName`: The name of model file in the Data Loader.

### 4. Create `config`

The `config` is a string consisting of a set of hyper-parameters which can be used to tweak the behaviour of the model.\
Each parameter is separated by a tab (`\t`) from its value, and different parameters are separated by newlines (`\n`).

```js
// an example of possible configuration
const config = '-ngl\t99\n--batch-size\t1024\n-dev\tgpu'
```

| Parameter        | Range / Type                                | Default       | Description                                                                           |
| ---------------- | ------------------------------------------- | ------------- | ------------------------------------------------------------------------------------- |
| -dev             | `"gpu"` or `"cpu"`                          | `"gpu"`       | Device to run inference on                                                            |
| -ngl             | integer                                     | 0             | Number of model layers to offload to GPU                                              |
| --batch-size     | integer                                     | 2048          | Tokens for processing multiple prompts together                                       |
| --pooling        | `{none,mean,cls,last,rank}`                 | model default | Pooling type for embeddings                                                           |
| --attention      | `{causal,non-causal}`                       | model default | Attention type for embeddings                                                         |
| --embd-normalize | integer                                     | 2             | Embedding normalization (-1=none, 0=max abs int16, 1=taxicab, 2=euclidean, >2=p-norm) |
| -fa              | `"on"`, `"off"`, or `"auto"`                | `"auto"`      | Enable/disable flash attention                                                        |
| --main-gpu       | integer, `"integrated"`, or `"dedicated"`   | —             | GPU selection for multi-GPU systems                                                   |
| verbosity        | 0 – 3 (0=ERROR, 1=WARNING, 2=INFO, 3=DEBUG) | 0             | Logging verbosity level                                                               |

#### IGPU/GPU  selection logic:

| Scenario                       | main-gpu not specified            | main-gpu: `"dedicated"` | main-gpu: `"integrated"` |
| ------------------------------ | --------------------------------- | ----------------------- | ------------------------ |
| Devices considered             | All GPUs (dedicated + integrated) | Only dedicated GPUs     | Only integrated GPUs     |
| System with iGPU only          | ✅ Uses iGPU                       | ❌ Falls back to CPU     | ✅ Uses iGPU              |
| System with dedicated GPU only | ✅ Uses dedicated GPU              | ✅ Uses dedicated GPU    | ❌ Falls back to CPU      |
| System with both               | ✅ Uses dedicated GPU (preferred)  | ✅ Uses dedicated GPU    | ✅ Uses integrated GPU    |

### 5. Instantiate the model

```js
const model = new GGMLBert(args, config)
```

### 6. Load the model

```js
await model.load()
```

*Optionally* you can pass the following parameters to tweak the loading behaviour.

* `close?`: This boolean value determines whether to close the Data Loader after loading. Defaults to `true`
* `reportProgressCallback?`: A callback function which gets called periodically with progress updates. It can be used to display overall progress percentage.

*For example:*

```js
await model.load(false, progress => process.stdout.write(`\rOverall Progress: ${progress.overallProgress}%`))
```

**Progress Callback Data**

The progress callback receives an object with the following properties:

| Property              | Type   | Description                            |
| --------------------- | ------ | -------------------------------------- |
| `action`              | string | Current operation being performed      |
| `totalSize`           | number | Total bytes to be loaded               |
| `totalFiles`          | number | Total number of files to process       |
| `filesProcessed`      | number | Number of files completed so far       |
| `currentFile`         | string | Name of file currently being processed |
| `currentFileProgress` | string | Percentage progress on current file    |
| `overallProgress`     | string | Overall loading progress percentage    |

### 7. Generate embeddings for input sequence

The model outputs a vector for the input sequence.

```js
const query = 'Hello, can you suggest a game I can play with my 1 year old daughter?'
const response = await model.run(query)
const embeddings = await response.await()
```

### 8. Release Resources

Unload the model when finished:

```javascript
try {
  await model.unload()
  await fsDL.close()
} catch (error) {
  console.error('Failed to unload model:', error)
}
```

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/embed-llamacpp)


# @qvac/llm-llamacpp (/addons/llm-llamacpp)


## Overview

[Bare module](https://bare.pears.com) that adds support for text completion and multimodal prompts in QVAC using [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp) as the inference engine.

## Models

You can load any [`llama.cpp`](https://github.com/ggml-org/llama.cpp)-compatible text-generation/chat model. Model file format: `*.gguf`.

## Requirement

Bare $\geq$ v1.24

## Installation

```bash
npm i @qvac/llm-llamacpp
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-llm-quickstart
    cd qvac-llm-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/dl-filesystem @qvac/llm-llamacpp bare-process
    ```
  </Step>

  <Step>
    Download a compatible model:

    ```bash
    curl -L --create-dirs -o models/Llama-3.2-1B-Instruct-Q4_0.gguf \
      https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_0.gguf
    ```
  </Step>

  <Step>
    Create `index.js`:
  </Step>

  <WrapCode>
    ```js title="index.js" lineNumbers'use strict'

    const LlmLlamacpp = require('@qvac/llm-llamacpp')
    const FilesystemDL = require('@qvac/dl-filesystem')
    const process = require('bare-process')

    async function main () {
      const modelName = 'Llama-3.2-1B-Instruct-Q4_0.gguf'
      const dirPath = './models'

      // 1. Initializing data loader
      const fsDL = new FilesystemDL({ dirPath })

      // 2. Configuring model settings
      const args = {
        loader: fsDL,
        opts: { stats: true },
        logger: console,
        diskPath: dirPath,
        modelName
      }

      const config = {
        device: 'gpu',
        gpu_layers: '999',
        ctx_size: '1024'
      }

      // 3. Loading model
      const model = new LlmLlamacpp(args, config)
      await model.load()

      try {
        // 4. Running inference with conversation prompt
        const prompt = [
          {
            role: 'system',
            content: 'You are a helpful, respectful and honest assistant.'
          },
          {
            role: 'user',
            content: 'what is bitcoin?'
          },
          {
            role: 'assistant',
            content: "It's a digital currency."
          },
          {
            role: 'user',
            content: 'Can you elaborate on the previous topic?'
          }
        ]

        const response = await model.run(prompt)
        let fullResponse = ''

        await response
          .onUpdate(data => {
            process.stdout.write(data)
            fullResponse += data
          })
          .await()

        console.log('\n')
        console.log('Full response:\n', fullResponse)
        console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
      } catch (error) {
        const errorMessage = error?.message || error?.toString() || String(error)
        console.error('Error occurred:', errorMessage)
        console.error('Error details:', error)
      } finally {
        // 5. Cleaning up resources
        await model.unload()
        await fsDL.close()
      }
    }

    main().catch(error => {
      console.error('Fatal error in main function:', {
        error: error.message,
        stack: error.stack,
        timestamp: new Date().toISOString()
      })
      process.exit(1)
    })
    ```
  </WrapCode>

  <Step>
    Run `index.js`:

    ```bash
    bare index.js
    ```
  </Step>
</Steps>

## Usage

### 1. Import the Model Class

```js
const LlmLlamacpp = require('@qvac/llm-llamacpp')
```

### 2. Create a Data Loader

Data Loaders abstract the way model files are accessed. Use a `FileSystemDataLoader` to load model files from your local file system. Models can be downloaded directly from HuggingFace.

```js
const FilesystemDL = require('@qvac/dl-filesystem')

const dirPath = './models'
const modelName = 'Llama-3.2-1B-Instruct-Q4_0.gguf'

const fsDL = new FilesystemDL({ dirPath })
```

### 3. Create the `args` obj

```js
const args = {
  loader: fsDL,
  opts: { stats: true },
  logger: console,
  diskPath: dirPath,
  modelName,
  // projectionModel: 'mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf' // for multimodal support you need to pass the projection model name
}
```

The `args` obj contains the following properties:

* `loader`: The Data Loader instance from which the model file will be streamed.
* `logger`: This property is used to create a `QvacLogger` instance, which handles all logging functionality.
* `opts.stats`: This flag determines whether to calculate inference stats.
* `diskPath`: The local directory where the model file will be downloaded to.
* `modelName`: The name of model file in the Data Loader.
* `projectionModel`: The name of the projection model file in the Data Loader. This is required for multimodal support.

### 4. Create the `config` obj

The `config` obj consists of a set of hyper-parameters which can be used to tweak the behaviour of the model.\
*All parameters must by strings.*

```js
// an example of possible configuration
const config = {
  gpu_layers: '99', // number of model layers offloaded to GPU.
  ctx_size: '1024', // context length
  device: 'cpu' // must be specified: 'gpu' or 'cpu' else it will throw an error
}
```

| Parameter          | Range / Type                                | Default                      | Description                                        |
| ------------------ | ------------------------------------------- | ---------------------------- | -------------------------------------------------- |
| device             | `"gpu"` or `"cpu"`                          | — (required)                 | Device to run inference on                         |
| gpu\_layers        | integer                                     | 0                            | Number of model layers to offload to GPU           |
| ctx\_size          | 0 – model-dependent                         | 4096 (0 = loaded from model) | Context window size                                |
| lora               | string                                      | —                            | Path to LoRA adapter file                          |
| temp               | 0.00 – 2.00                                 | 0.8                          | Sampling temperature                               |
| top\_p             | 0 – 1                                       | 0.9                          | Top-p (nucleus) sampling                           |
| top\_k             | 0 – 128                                     | 40                           | Top-k sampling                                     |
| predict            | integer (-1 = infinity)                     | -1                           | Maximum tokens to predict                          |
| seed               | integer                                     | -1 (random)                  | Random seed for sampling                           |
| no\_mmap           | "" (passing empty string sets the flag)     | —                            | Disable memory mapping for model loading           |
| reverse\_prompt    | string (comma-separated)                    | —                            | Stop generation when these strings are encountered |
| repeat\_penalty    | float                                       | 1.1                          | Repetition penalty                                 |
| presence\_penalty  | float                                       | 0                            | Presence penalty for sampling                      |
| frequency\_penalty | float                                       | 0                            | Frequency penalty for sampling                     |
| tools              | `"true"` or `"false"`                       | `"false"`                    | Enable tool calling with jinja templating          |
| verbosity          | 0 – 3 (0=ERROR, 1=WARNING, 2=INFO, 3=DEBUG) | 0                            | Logging verbosity level                            |
| n\_discarded       | integer                                     | 0                            | Tokens to discard in sliding window context        |
| main-gpu           | integer, `"integrated"`, or `"dedicated"`   | —                            | GPU selection for multi-GPU systems                |

#### IGPU/GPU  selection logic:

| Scenario                       | main-gpu not specified            | main-gpu: `"dedicated"` | main-gpu: `"integrated"` |
| ------------------------------ | --------------------------------- | ----------------------- | ------------------------ |
| Devices considered             | All GPUs (dedicated + integrated) | Only dedicated GPUs     | Only integrated GPUs     |
| System with iGPU only          | ✅ Uses iGPU                       | ❌ Falls back to CPU     | ✅ Uses iGPU              |
| System with dedicated GPU only | ✅ Uses dedicated GPU              | ✅ Uses dedicated GPU    | ❌ Falls back to CPU      |
| System with both               | ✅ Uses dedicated GPU (preferred)  | ✅ Uses dedicated GPU    | ✅ Uses integrated GPU    |

### 5. Create Model Instance

```js
const model = new LlmLlamacpp(args, config)
```

### 6. Load Model

```js
await model.load()
```

*Optionally* you can pass the following parameters to tweak the loading behaviour.

* `close?`: This boolean value determines whether to close the Data Loader after loading. Defaults to `true`
* `reportProgressCallback?`: A callback function which gets called periodically with progress updates. It can be used to display overall progress percentage.

*For example:*

```js
await model.load(false, progress => process.stdout.write(`\rOverall Progress: ${progress.overallProgress}%`))
```

**Progress Callback Data**

The progress callback receives an object with the following properties:

| Property              | Type   | Description                            |
| --------------------- | ------ | -------------------------------------- |
| `action`              | string | Current operation being performed      |
| `totalSize`           | number | Total bytes to be loaded               |
| `totalFiles`          | number | Total number of files to process       |
| `filesProcessed`      | number | Number of files completed so far       |
| `currentFile`         | string | Name of file currently being processed |
| `currentFileProgress` | string | Percentage progress on current file    |
| `overallProgress`     | string | Overall loading progress percentage    |

### 7. Run Inference

Pass an array of messages (following the chat completion format) to the `run` method. Process the generated tokens asynchronously:

```javascript
try {
  const messages = [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' }
  ]

  const response = await model.run(messages)
  const buffer = []

  // Option 1: Process streamed output using async iterator
  for await (const token of response.iterate()) {
    process.stdout.write(token) // Write token directly to output
    buffer.push(token)
  }

  // Option 2: Process streamed output using callback
  await response.onUpdate(token => { /* ... */ }).await()

  console.log('\n--- Full Response ---\n', buffer.join(''))

} catch (error) {
  console.error('Inference failed:', error)
}
```

### 8. Release Resources

Unload the model when finished:

```javascript
try {
  await model.unload()
  await fsDL.close()
} catch (error) {
  console.error('Failed to unload model:', error)
}
```

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/llm-llamacpp)


# @qvac/ocr-onnx (/addons/ocr-onnx)


## Overview

[Bare module](https://bare.pears.com) that adds support for OCR in QVAC using [ONNX runtime](https://onnxruntime.ai) as the inference engine. It runs a two-stage pipeline and requires compatible models for both stages:

* **Text detection**: locate text regions in an image
* **Text recognition**: decode characters in detected regions

## Models

You can load any ONNX Runtime-compatible OCR pipeline. Required files: `detector_craft.onnx` + `recognizer_<lang>.onnx` (file format: `*.onnx`).

## Requirement

Bare $\geq$ v1.24

## Installation

```bash
npm i @qvac/ocr-onnx
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-ocr-quickstart
    cd qvac-ocr-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/ocr-onnx bare-path
    ```
  </Step>

  <Step>
    Place the OCR model files (`detector_craft.onnx` and `recognizer_latin.onnx`) into `models/ocr/`. These are available from our model registry.
  </Step>

  <Step>
    Create `index.js`:
  </Step>

  <WrapCode>
    ```js title="index.js" lineNumbers'use strict'

    const path = require('bare-path')
    const { ONNXOcr } = require('@qvac/ocr-onnx')

    const imagePath = path.resolve('./my-image.jpg')

    async function main () {
      const detectorPath = './models/ocr/detector_craft.onnx'
      const recognizerPath = './models/ocr/recognizer_latin.onnx'

      const model = new ONNXOcr({
        params: {
          langList: ['en'],
          pathDetector: detectorPath,
          pathRecognizer: recognizerPath,
          useGPU: false
        },
        opts: { stats: true }
      })

      try {
        console.log('Loading OCR model...')
        await model.load()
        console.log('Model loaded.')

        console.log(`Running OCR on: ${imagePath}`)
        const response = await model.run({
          path: imagePath
        })

        console.log('Waiting for OCR results...')
        await response
          .onUpdate(data => {
            console.log('--- OCR Update ---')
            console.log('Output: ' + JSON.stringify(data.map(o => o[1])))
            console.log('--- data ---')
            console.log(JSON.stringify(data, null, 2))
            console.log('------------------')
          })
          .await()

        console.log('OCR finished!')
        if (response.stats) {
          console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
        }
      } catch (err) {
        console.error('Error during OCR processing:', err)
      } finally {
        console.log('Unloading model...')
        await model.unload()
        console.log('Model unloaded.')
      }
    }

    main().catch(console.error)
    ```
  </WrapCode>

  <Step>
    Run `index.js`:

    ```bash
    bare index.js
    ```
  </Step>
</Steps>

## Usage

The library provides a straightforward workflow for image-based text recognition:

### 1. Configure Parameters

Define the arguments for the OCR instance, including paths to the ONNX models and the list of languages to recognize.

```javascript
const args = {
  params: {
    // Required parameters
    langList: ['en'],                              // Language codes (ISO 639-1)
    pathDetector: './models/ocr/detector_craft.onnx',
    pathRecognizer: './models/ocr/recognizer_latin.onnx',
    // Or use prefix: pathRecognizerPrefix: './models/ocr/recognizer_',

    // Optional parameters
    useGPU: true,                    // Enable GPU acceleration (default: true)
    timeout: 120,                    // Inference timeout in seconds (default: 120)

    // Performance tuning (optional)
    magRatio: 1.5,                   // Detection magnification ratio (default: 1.5)
    defaultRotationAngles: [90, 270], // Rotation angles to try (default: [90, 270])
    contrastRetry: false,            // Retry low-confidence with contrast adjustment (default: false)
    lowConfidenceThreshold: 0.4,     // Threshold for contrast retry (default: 0.4)
    recognizerBatchSize: 32          // Batch size for recognizer inference (default: 32)
  },
  opts: {
    stats: true                      // Enable performance statistics logging
  }
}
```

#### Required Parameters

| Parameter              | Type       | Description                                                                                                                                                                |
| ---------------------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `langList`             | `string[]` | List of language codes (ISO 639-1). The first supported language determines the recognizer model. See [Supported Languages](#supported-languages).                         |
| `pathDetector`         | `string`   | Path to the detector ONNX model file.                                                                                                                                      |
| `pathRecognizer`       | `string`   | Path to the recognizer ONNX model file. **Required if `pathRecognizerPrefix` is not provided.**                                                                            |
| `pathRecognizerPrefix` | `string`   | Prefix path for recognizer model. The library appends the language suffix automatically (e.g., `recognizer_latin.onnx`). **Required if `pathRecognizer` is not provided.** |

#### Optional Parameters

| Parameter                | Type       | Default     | Description                                                                                                          |
| ------------------------ | ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------- |
| `useGPU`                 | `boolean`  | `true`      | Enable GPU/NPU/TPU acceleration. Falls back to CPU if unavailable.                                                   |
| `timeout`                | `number`   | `120`       | Maximum inference time in seconds. Increase for complex images or slower devices.                                    |
| `magRatio`               | `number`   | `1.5`       | Detection magnification ratio (1.0-2.0). Higher values improve detection of small text but increase processing time. |
| `defaultRotationAngles`  | `number[]` | `[90, 270]` | Rotation angles to try for text detection. Use `[]` to disable rotation variants.                                    |
| `contrastRetry`          | `boolean`  | `false`     | Re-process low-confidence regions with adjusted contrast. Improves accuracy but increases memory usage.              |
| `lowConfidenceThreshold` | `number`   | `0.4`       | Confidence threshold (0-1) below which contrast retry is triggered (when `contrastRetry` is enabled).                |
| `recognizerBatchSize`    | `number`   | `32`        | Number of text regions processed per batch. Lower values reduce memory usage on mobile devices.                      |

### 2. Create Model Instance

Import the library and create a new instance with the configured arguments.

```javascript
const { ONNXOcr } = require('@qvac/ocr-onnx')

const model = new ONNXOcr(args)
```

### 3. Load Model

Asynchronously load the ONNX models specified in the parameters.

```javascript
try {
  await model.load()
  console.log('OCR model loaded successfully.')
} catch (error) {
  console.error('Failed to load OCR model:', error)
}
```

### 4. Run OCR

Pass the path to the input image file to the `run` method. Supported formats: **BMP**, **JPEG**, and **PNG**.

```javascript
const imagePath = 'path/to/your/image.jpg'

try {
  const response = await model.run({
     path: imagePath,
     options: {
       paragraph: true,           // Group results into paragraphs (default: false)
       rotationAngles: [90, 270], // Override default rotation angles for this run
       boxMarginMultiplier: 1.0   // Adjust bounding box margins
     }
  })
  // ... process the response (see step 5)
} catch (error) {
  console.error('OCR failed:', error)
}
```

#### Runtime Options

| Option                | Type       | Default                      | Description                                                     |
| --------------------- | ---------- | ---------------------------- | --------------------------------------------------------------- |
| `paragraph`           | `boolean`  | `false`                      | Group detected text regions into paragraphs based on proximity. |
| `rotationAngles`      | `number[]` | Uses `defaultRotationAngles` | Override default rotation angles for this specific run.         |
| `boxMarginMultiplier` | `number`   | `1.0`                        | Multiplier for bounding box margins around detected text.       |

### 5. Process Output

The `run` method returns a `QvacResponse` object. Use its methods to handle the OCR results as they become available.

```javascript
// Option 1: Using onUpdate callback
await response
  .onUpdate(data => {
    // data contains OCR results for a chunk or the final result
    console.log('OCR Update:', JSON.stringify(data))
  })
  .await() // Wait for the entire process to complete

// Option 2: Using async iterator (if supported by QvacResponse in the future)
// for await (const data of response.iterate()) {
//   console.log('OCR Chunk:', JSON.stringify(data))
// }

// Access performance stats if enabled
if (response.stats) {
  console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
}
```

See [Output Format](#output-format) for the structure of the results.

### 6. Release Resources

Unload the model and free up resources when done.

```javascript
try {
  await model.unload()
  console.log('OCR model unloaded.')
} catch (error) {
  console.error('Failed to unload model:', error)
}
```

## Output Format

The output is typically received via the `onUpdate` callback of the `QvacResponse` object. It's a JSON array where each element represents a detected text block.

Each text block contains:

1. **Bounding Box:** An array of four `[x, y]` coordinate pairs defining the corners of the box around the detected text. Coordinates are clockwise, starting from the top-left relative to the text orientation.
2. **Detected Text:** The recognized text string.
3. **Confidence Score:** A numerical value indicating the model's confidence in the recognition (range may vary, often 0-1).

```json
[ // Array of detected text blocks
  [ // First text block
    [ // Bounding Box
      [x1, y1], // Top-left corner
      [x2, y2], // Top-right corner
      [x3, y3], // Bottom-right corner
      [x4, y4]  // Bottom-left corner
    ],
    "Detected Text String", // Recognized text
    0.95 // Confidence score
  ],
  [ // Second text block
    [ /* Bounding Box */ ],
    "Another piece of text",
    0.88
  ]
  // ... more text blocks
]
```

**Example:**

```json
[[
  [
    [10, 10],
    [150, 12],
    [149, 30],
    [9, 28]
  ],
  "Example Text",
  0.85
]]
```

The box coordinates are always provided in clockwise direction and starting from the top-left point with relation to the extracted text. Therefore, it is possible to know how extracted text is rotated based on this.

*(Note: The exact structure and timing of updates might depend on internal buffering and the `paragraph` option.)*

## Supported Languages

Language support is determined by the recognizer model used. Each recognizer model supports a specific set of languages. The library automatically selects the appropriate model based on the `langList` parameter.

| Recognizer Model             | Languages                                                                                                                                                                 |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `recognizer_latin.onnx`      | af, az, bs, cs, cy, da, de, en, es, et, fr, ga, hr, hu, id, is, it, ku, la, lt, lv, mi, ms, mt, nl, no, oc, pi, pl, pt, ro, rs\_latin, sk, sl, sq, sv, sw, tl, tr, uz, vi |
| `recognizer_arabic.onnx`     | ar, fa, ug, ur                                                                                                                                                            |
| `recognizer_cyrillic.onnx`   | ru, rs\_cyrillic, be, bg, uk, mn, abq, ady, kbd, ava, dar, inh, che, lbe, lez, tab, tjk                                                                                   |
| `recognizer_devanagari.onnx` | hi, mr, ne, bh, mai, ang, bho, mah, sck, new, gom, sa, bgc                                                                                                                |
| `recognizer_bengali.onnx`    | bn, as, mni                                                                                                                                                               |
| `recognizer_thai.onnx`       | th                                                                                                                                                                        |
| `recognizer_zh_sim.onnx`     | ch\_sim                                                                                                                                                                   |
| `recognizer_zh_tra.onnx`     | ch\_tra                                                                                                                                                                   |
| `recognizer_japanese.onnx`   | ja                                                                                                                                                                        |
| `recognizer_korean.onnx`     | ko                                                                                                                                                                        |
| `recognizer_tamil.onnx`      | ta                                                                                                                                                                        |
| `recognizer_telugu.onnx`     | te                                                                                                                                                                        |
| `recognizer_kannada.onnx`    | kn                                                                                                                                                                        |

See `supportedLanguages.js` for the complete language definitions.

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/ocr-onnx)


# @qvac/transcription-parakeet (/addons/transcription-parakeet)


## Overview

[Bare module](https://bare.pears.com) that adds support for transcription in QVAC using [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) ASR models via [ONNX Runtime](https://onnxruntime.ai) as the inference engine.

Parakeet supports multiple model variants:

* **TDT** — multilingual transcription (\~25 languages) with automatic language detection.
* **CTC** — english-only, fast transcription with punctuation and capitalization.
* **EOU** — real-time streaming with end-of-utterance detection (optimized for low latency).
* **Sortformer** — speaker diarization (up to 4 speakers).

## Models

Parakeet uses multiple model files depending on the variant:

**TDT (multilingual):**

* Encoder ONNX model
* Encoder data file
* Decoder ONNX model
* Vocabulary file
* Preprocessor ONNX model

**CTC (English-only):**

* Model ONNX file
* Model data file
* Tokenizer file

**Sortformer (diarization):**

* Single ONNX model file

**EOU (streaming):**

* Encoder ONNX model
* Decoder ONNX model
* Tokenizer file

Model files are available from Hugging Face:

* **CTC:** [parakeet-ctc-0.6b-ONNX](https://huggingface.co/onnx-community/parakeet-ctc-0.6b-ONNX/tree/main/onnx)
* **TDT:** [parakeet-tdt-0.6b-v3-onnx](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
* **EOU:** [parakeet-rs realtime\_eou\_120m-v1](https://huggingface.co/nvidia/parakeet_realtime_eou_120m-v1)
* **Sortformer:** [parakeet-rs sortformer](https://huggingface.co/altunenes/parakeet-rs)

## Requirement

Bare $\geq$ v1.20

## Installation

```bash
npm i @qvac/transcription-parakeet
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-parakeet-quickstart
    cd qvac-parakeet-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/dl-filesystem @qvac/transcription-parakeet bare-path bare-process
    ```
  </Step>

  <Step>
    Download the TDT model files and place them in `models/parakeet-tdt-0.6b-v3-onnx/`:

    * `encoder.onnx`
    * `encoder.onnx_data`
    * `decoder.onnx`
    * `vocab.txt`
    * `preprocessor.onnx`

    Download from [Hugging Face](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx).
  </Step>

  <Step>
    Create `index.js`:
  </Step>

  <WrapCode>
    ```js title="index.js" lineNumbers
    'use strict'

    const path = require('bare-path')
    const process = require('bare-process')
    const binding = require('@qvac/transcription-parakeet/binding')
    const { ParakeetInterface } = require('@qvac/transcription-parakeet/parakeet')

    async function main () {
      const modelPath = path.join('.', 'models', 'parakeet-tdt-0.6b-v3-onnx')
      const audioPath = path.join('.', 'my-audio.wav')

      const config = {
        modelPath,
        modelType: 'tdt',
        maxThreads: 4,
        useGPU: false
      }

      const transcriptions = []

      const outputCallback = (handle, event, data, error) => {
        if (event === 'transcription' && data && data.text) {
          transcriptions.push(data.text)
        }
      }

      const parakeet = new ParakeetInterface(binding, config, outputCallback)

      await parakeet.loadWeights()
      await parakeet.activate()

      const fs = require('bare-fs')
      const audioBuffer = fs.readFileSync(audioPath)
      const audioData = audioBuffer.subarray(44) // Skip WAV header

      await parakeet.append({ type: 'audio', data: audioData.buffer })
      await parakeet.append({ type: 'end of job' })

      // Wait briefly for processing
      await new Promise(resolve => setTimeout(resolve, 5000))

      console.log('=== TRANSCRIPTION ===')
      console.log(transcriptions.join(' '))
      console.log('=====================')

      await parakeet.destroyInstance()
    }

    main().catch(err => {
      console.error(err)
      process.exit(1)
    })
    ```
  </WrapCode>

  <Step>
    Run `index.js`:

    ```bash
    bare index.js
    ```
  </Step>
</Steps>

## Usage

### 1. Choose a Data Loader

First, select and instantiate a data loader that provides access to model files:

```javascript
// Option A: Filesystem Data Loader - for local model files
const FilesystemDL = require('@qvac/dl-filesystem')
const fsDL = new FilesystemDL({
  dirPath: './path/to/model/files'
})

// Option B: Hyperdrive Data Loader - for peer-to-peer distributed models
const HyperDriveDL = require('@qvac/dl-hyperdrive')
const hdDL = new HyperDriveDL({
  key: 'hd://<driveKey>',
  store: corestore
})
```

### 2. Configure Parakeet Parameters

The addon accepts the following configuration:

| Key          | Type    | Description                                  |
| ------------ | ------- | -------------------------------------------- |
| `modelPath`  | string  | Path to the model directory                  |
| `modelType`  | string  | `'ctc'`, `'tdt'`, `'eou'`, or `'sortformer'` |
| `maxThreads` | number  | Maximum CPU threads to use                   |
| `useGPU`     | boolean | Enable GPU acceleration                      |
| `language`   | string  | Language code or `'auto'` (TDT only)         |

### 3. Configuration Example

```javascript
// TDT (multilingual, recommended)
const config = {
  modelPath: './models/parakeet-tdt-0.6b-v3-onnx',
  modelType: 'tdt',
  maxThreads: 4,
  useGPU: false,
  language: 'auto'
}

// CTC (English-only, fastest)
const ctcConfig = {
  modelPath: './models/parakeet-ctc-0.6b-ONNX',
  modelType: 'ctc',
  maxThreads: 4,
  useGPU: false
}

// Sortformer (speaker diarization)
const sortformerConfig = {
  modelPath: './models/sortformer',
  modelType: 'sortformer',
  maxThreads: 4,
  useGPU: false
}
```

### 4. Create Model Instance

```javascript
const binding = require('@qvac/transcription-parakeet/binding')
const { ParakeetInterface } = require('@qvac/transcription-parakeet/parakeet')

const outputCallback = (handle, event, data, error) => {
  if (event === 'transcription' && data && data.text) {
    console.log('Transcription:', data.text)
  }
}

const parakeet = new ParakeetInterface(binding, config, outputCallback)
```

### 5. Load Model

Load model weights and activate the inference engine:

```javascript
try {
  await parakeet.loadWeights()
  await parakeet.activate()
} catch (error) {
  console.error('Failed to load model:', error)
}
```

### 6. Run Transcription

Pass audio data to the model for transcription:

```javascript
try {
  const fs = require('bare-fs')
  const audioBuffer = fs.readFileSync('path/to/your/audio.wav')
  const audioData = audioBuffer.subarray(44) // Skip WAV header for raw PCM

  await parakeet.append({ type: 'audio', data: audioData.buffer })
  await parakeet.append({ type: 'end of job' })
} catch (error) {
  console.error('Transcription failed:', error)
}
```

**Output Callback Events:**

* **`transcription`** — partial or complete transcription result (`data.text`, `data.confidence`, `data.isFinal`)
* **`progress`** — processing progress (`data.percent`, `data.timeElapsed`)
* **`diarization`** — speaker identification (`data.speakerId`, `data.startTime`, `data.endTime`)
* **`complete`** — job completed successfully
* **`error`** — error occurred (`error` string)

### 7. Release Resources

Always destroy the instance when finished to free memory and resources:

```javascript
try {
  await parakeet.destroyInstance()
} catch (error) {
  console.error('Failed to destroy instance:', error)
}
```

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/transcription-parakeet)


# @qvac/transcription-whispercpp (/addons/transcription-whispercpp)


## Overview

[Bare module](https://bare.pears.com) that adds support for transcription in QVAC using [`qvac-ext-lib-whisper.cpp`](https://github.com/tetherto/qvac-ext-lib-whisper.cpp) as the inference engine.

## Models

You should load two models:

* a [`whisper.cpp`](https://github.com/ggml-org/whisper.cpp)-compatible model for transcription. Model file format: `*.bin`; and
* a VAD model (e.g., Silero) converted to GGML. Model file format: `*.bin` *(optional, recommended)*.

## Requirement

Bare $\geq$ v1.24

## Installation

```bash
npm i @qvac/transcription-whispercpp
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-transcription-quickstart
    cd qvac-transcription-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/dl-filesystem @qvac/transcription-whispercpp bare-fs bare-process
    ```
  </Step>

  <Step>
    Download models and place them in `models/`:

    * A Whisper model (e.g., `ggml-tiny.bin`) from [Hugging Face](https://huggingface.co/ggerganov/whisper.cpp/tree/main)
    * *(Optional)* A Silero VAD model (`ggml-silero-v5.1.2.bin`)
  </Step>

  <Step>
    Create `index.js`:
  </Step>

  <WrapCode>
    ```js title="index.js" lineNumbers'use strict'

    const fs = require('bare-fs')
    const process = require('bare-process')
    const TranscriptionWhispercpp = require('@qvac/transcription-whispercpp')
    const FilesystemDL = require('@qvac/dl-filesystem')

    async function main () {
      const modelName = 'ggml-tiny.bin'
      const dirPath = './models'
      const audioFilePath = './my-audio.raw'

      // 1. Initializing data loader
      const fsDL = new FilesystemDL({ dirPath })

      // 2. Constructor arguments
      const constructorArgs = {
        modelName,
        loader: fsDL,
        diskPath: dirPath
      }

      // 3. Configuration object
      const config = {
        opts: { stats: true },
        whisperConfig: {
          audio_format: 's16le',
          vad_model_path: './models/ggml-silero-v5.1.2.bin',
          vad_params: {
            threshold: 0.35,
            min_speech_duration_ms: 200,
            min_silence_duration_ms: 150,
            max_speech_duration_s: 30,
            speech_pad_ms: 600,
            samples_overlap: 0.3
          },
          language: ''
        }
      }

      // 4. Loading model
      const model = new TranscriptionWhispercpp(constructorArgs, config)
      await model.load()

      // 5. Running transcription
      const bitRate = 128000
      const bytesPerSecond = bitRate / 8
      const audioStream = fs.createReadStream(audioFilePath, { highWaterMark: bytesPerSecond })

      const response = await model.run(audioStream)

      const full = []
      response.onUpdate((outputArr) => {
        const items = Array.isArray(outputArr) ? outputArr : [outputArr]
        const last = items[items.length - 1]
        if (last && last.text) console.log('[onUpdate]', last.start, '→', last.end, last.text)
      })

      for await (const output of response.iterate()) {
        const items = Array.isArray(output) ? output : [output]
        full.push(...items)
      }

      if (full.length) {
        const text = full.map(s => s.text).join(' ').trim()
        console.log('\n=== TRANSCRIPTION ===')
        console.log(text)
        console.log('=====================\n')
      } else {
        console.log('No transcription output received.')
      }

      // 6. Cleaning up resources
      await model.destroy()
      await fsDL.close()
    }

    main().catch(err => {
      console.error(err)
      process.exit(1)
    })
    ```
  </WrapCode>

  <Step>
    Run `index.js`:

    ```bash
    bare index.js
    ```
  </Step>
</Steps>

## Usage

### 1. Choose a Data Loader

First, select and instantiate a data loader that provides access to model files:

```javascript
// Option A: Filesystem Data Loader - for local model files
const FilesystemDL = require('@qvac/dl-filesystem')
const fsDL = new FilesystemDL({
  dirPath: './path/to/model/files' // Directory containing model weights and settings
})

// Option B: Hyperdrive Data Loader - for peer-to-peer distributed models
const HyperDriveDL = require('@qvac/dl-hyperdrive')
// Key comes from the Model Registry (see below)
const hdDL = new HyperDriveDL({
  key: 'hd://<driveKey>',  // Hyperdrive key containing model files
  store: corestore        // (Optional) A Corestore instance, If not provided, the Hyperdrive will use an in-memory store.
})
```

### 2. Configure Transcription Parameters

Most users interact with the addon exclusively through `index.js`. From that entrypoint we surface a small, safe subset of options; everything else keeps whisper.cpp defaults.

#### What index.js accepts

| Section         | Key                               | Description                                                                                                |
| --------------- | --------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| `contextParams` | `model`                           | Absolute or relative path to the `.bin` whisper model                                                      |
|                 |                                   | *(all other context keys keep their defaults because changing them forces a full reload, see below)*       |
| `whisperConfig` | *(any `whisper_full_params` key)* | Forwarded untouched. We surface convenience defaults in `index.js`, but every whisper.cpp flag is accepted |
| `miscConfig`    | `caption_enabled`                 | Formats segments with `<\|start\|>..<\|end\|>` markers                                                     |

#### Context keys that force a full reload

Internally `WhisperModel::configContextIsChanged()` watches `model`, `use_gpu`, `flash_attn` and `gpu_device`. If any of these change we must:

1. Call `unload()` (destroys the current `whisper_context` and `whisper_state`).
2. Recreate the context via `whisper_init_from_file_with_params`.
3. Warm up the model again before the next job.

Depending on model size this can take several seconds. Everything else in `whisperConfig`—language, temperatures, VAD settings, etc.—is applied in place and does **not** trigger a reload. If you are seeing unexpected pauses, double-check that you are not mutating these four context keys between jobs.

#### Advanced configuration

Need more than the handful of options exposed in `index.js`? The upstream whisper.cpp documentation lists every flag available through `whisper_full_params`. Rather than duplicating that matrix here, refer to:

* The official parameter reference: [`whisper_full_params`](https://github.com/ggerganov/whisper.cpp/blob/master/examples/stream/stream.cpp#L30-L96)
* Our longer examples for concrete shapes:
  * `examples/example.audio-ctx-chunking.js` (shows `offset_ms`, `duration_ms`, `audio_ctx`, and reload loops)
  * `examples/example.live-transcription.js` (shows streaming chunks into a single job)

Those scripts stay in sync with the codebase and are the best place to copy from when you need the raw addon surface.

### 3. Configuration Example

Quick JS-level configuration (what you typically pass to `new TranscriptionWhispercpp(...)`):

```javascript
const config = {
  contextParams: {
    model: './models/ggml-tiny.bin'
  },
  whisperConfig: {
    language: 'en',
    duration_ms: 0,
    temperature: 0.0,
    suppress_nst: true,
    n_threads: 0,
    vad_model_path: './models/ggml-silero-v5.1.2.bin',
    vadParams: {
      threshold: 0.6,
      min_speech_duration_ms: 250,
      min_silence_duration_ms: 200
    }
  },
  miscConfig: {
    caption_enabled: false
  }
}
```

Between this minimal configuration and the example scripts you should have everything needed, whether you are wiring the addon by hand or just instantiating `TranscriptionWhispercpp`.

**Available Whisper Models:**

* `ggml-tiny.bin` - Smallest, fastest (39MB)
* `ggml-base.bin` - Balanced size/accuracy (142MB)
* `ggml-small.bin` - Better accuracy (466MB)
* `ggml-medium.bin` - High accuracy (1.5GB)
* `ggml-large.bin` - Best accuracy (3.1GB)

**VAD Model:**

* `ggml-silero-v5.1.2.bin` - Silero VAD model for voice activity detection

Ensure model files are available in your chosen data loader source.

### 4. Create Model Instance

Import the specific Whisper model class based on the installed package and instantiate it:

```javascript
const TranscriptionWhispercpp = require('@qvac/transcription-whispercpp')

const model = new TranscriptionWhispercpp(args, config)
```

Note : This import changes depending on the package installed.

### 5. Load Model

Load the model weights and initialize the inference engine. Optionally provide a callback for progress updates:

```javascript
try {
  // Basic usage
  await model.load()

  // Advanced usage with progress tracking
  await model.load(
          false,  // Don't close loader after loading
          (progress) => console.log(`Loading: ${progress.overallProgress}% complete`)
  )
} catch (error) {
  console.error('Failed to load model:', error)
}
```

**Progress Callback Data**

The progress callback receives an object with the following properties:

| Property              | Type   | Description                            |
| --------------------- | ------ | -------------------------------------- |
| `action`              | string | Current operation being performed      |
| `totalSize`           | number | Total bytes to be loaded               |
| `totalFiles`          | number | Total number of files to process       |
| `filesProcessed`      | number | Number of files completed so far       |
| `currentFile`         | string | Name of file currently being processed |
| `currentFileProgress` | string | Percentage progress on current file    |
| `overallProgress`     | string | Overall loading progress percentage    |

### 6. Run Transcription

Pass an audio stream (e.g., from `bare-fs.createReadStream`) to the `run` method. Process the transcription results asynchronously.

There are two ways to receive transcription results:

#### Option 1: Real-time Streaming with `onUpdate()`

The `onUpdate()` callback receives each transcription segment **in real-time** as whisper.cpp generates them during processing. This is ideal for live transcription display or progressive updates.

```javascript
try {
  const audioStream = fs.createReadStream('path/to/your/audio.ogg', {
    highWaterMark: 16000 // Adjust based on bitrate (e.g., 128000 / 8)
  })

  const response = await model.run(audioStream)

  // Receive segments as they are transcribed (real-time streaming)
  await response
          .onUpdate(segment => {
            console.log('New segment transcribed:', segment)
            // Each segment arrives immediately after whisper.cpp processes it
          })
          .await() // Wait for transcription to complete

  console.log('Transcription finished!')

} catch (error) {
  console.error('Transcription failed:', error)
}
```

#### Option 2: Complete Result with `iterate()`

The `iterate()` method returns all transcription segments **after the entire transcription completes**. This is useful when you need the full result before processing.

```javascript
try {
  const audioStream = fs.createReadStream('path/to/your/audio.ogg', {
    highWaterMark: 16000
  })

  const response = await model.run(audioStream)

  // Wait for complete transcription, then iterate over all segments
  for await (const transcriptionChunk of response.iterate()) {
    console.log('Transcription chunk:', transcriptionChunk)
  }

  console.log('Transcription finished!')

} catch (error) {
  console.error('Transcription failed:', error)
}
```

**Key Differences:**

* **`onUpdate()`**: Real-time streaming - segments arrive as they are generated by whisper.cpp's `new_segment_callback`
* **`iterate()`**: Batch processing - all segments available after transcription completes

#### Chunking long recordings with reload()

`examples/example.audio-ctx-chunking.js` shows the production pattern: reuse a model instance, call `reload()` with `{ offset_ms, duration_ms, audio_ctx }` per chunk (first chunk uses `audio_ctx = 0`, subsequent ones clamp to \~1500), then run the full audio stream. The matching integration test (`test/integration/audio-ctx-chunking.test.js`) exercises exactly the same flow.

#### Live streaming a single job

`examples/example.live-transcription.js` feeds tiny PCM buffers into a pushable `Readable`, keeps a single `model.run(...)` open, and relies on `onUpdate()` for incremental text. `test/integration/live-stream-simulation.test.js` covers both the streaming case and a segmented loop without any `reload()` calls.

### 7. Release Resources

Always unload the model when finished to free up memory and resources:

```javascript
try {
  await model.unload()
  // If using Hyperdrive/Hyperbee, close the db instance if applicable
  await db.close()
} catch (error) {
  console.error('Failed to unload model:', error)
}
```

## Decoder + VAD + Whisper Integration AddOn

This package combines audio decoding, optional VAD trimming, and Whisper transcription into a single `TranscriptionFfmpegAddon`. It automatically:

1. Decodes or ingests raw PCM/encoded audio
2. (Optionally) applies Silero VAD to drop non-speech
3. Feeds speech segments to Whisper for transcription

The principles are the same than for the single Whisper addon but with some differences in the configuration interface.

### Usage

Import `TranscriptionFfmpegAddon` from the `transcription-ffmpeg.js` module:

```javascript
const TranscriptionFfmpegAddon = require('@qvac/transcription-whispercpp/transcription-ffmpeg')
```

### Configuration

When you instantiate `TranscriptionFfmpegAddon`, pass:

* `loader`: your data loader instance
* `params.decoder.audioFormat`: one of
  * `'decoded'` (raw PCM input - for pre-decoded audio files)
  * `'encoded'` | `'s16le'` | `'f32le'` | `'mp3'` | `'wav'` | `'m4a'` (for encoded audio files)
* `params.decoder.streamIndex`: stream index of the media file (default: 0)
* `params.decoder.inputBitrate`: bitrate of the media file in bps (used to calculate buffer size)

### Usage Example

See `examples/example.ffmpeg.js` for a full working script that demonstrates the FFmpeg decoder + Whisper transcription pipeline with encoded audio files (MP3, etc.).

### Additional Features

* **Progress Tracking:** Monitor loading progress with callbacks
* **Performance Stats:** Measure inference time with the `stats` option

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/transcription-whispercpp)


# @qvac/translation-nmtcpp (/addons/translation-nmtcpp)


> **Migration Note (v1.0.0+):** Opus/Marian model support has been removed. Only IndicTrans2 and Bergamot backends are supported. If you were using Opus models, migrate to Bergamot for European language pairs.

## Overview

[Bare module](https://bare.pears.com) that adds support for translation in QVAC using either [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp) or [Bergamot](https://browser.mt) as the inference engine.

## Models

You should load a model compatible with your chosen inference engine:

* `qvac-fabric-llm.cpp` (default): IndicTrans2, converted to GGML. Model file format: `*.bin`.
* Bergamot: Bergamot model bundle. Required files: model `*.bin` + `vocab*.spm`.

## Requirement

Bare $\geq$ v1.24

## Installation

```bash
npm i @qvac/translation-nmtcpp
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-translation-quickstart
    cd qvac-translation-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/translation-nmtcpp @qvac/dl-hyperdrive
    ```
  </Step>

  <Step>
    Create `example.js`:
  </Step>

  <WrapCode>
    ```js title="example.js" lineNumbers'use strict'

    /**
     * Quickstart Example
     *
     * This example demonstrates both translation backends:
     * 1. GGML backend - Downloads model via HyperdriveDL (English to Italian)
     * 2. Bergamot backend - Uses local model files (requires BERGAMOT_MODEL_PATH)
     *
     * Usage:
     *   bare examples/quickstart.js
     *   BERGAMOT_MODEL_PATH=/path/to/bergamot/model bare examples/quickstart.js
     *
     * Enable verbose C++ logging:
     *   VERBOSE=1 bare examples/quickstart.js
     */

    const TranslationNmtcpp = require('@qvac/translation-nmtcpp')
    const HyperdriveDL = require('@qvac/dl-hyperdrive')
    const fs = require('bare-fs')
    const path = require('bare-path')
    const process = require('bare-process')

    // ============================================================
    // LOGGING CONFIGURATION
    // Set VERBOSE=1 environment variable to enable C++ debug logs
    // ============================================================
    const VERBOSE = process.env.VERBOSE === '1' || process.env.VERBOSE === 'true'

    const logger = VERBOSE
      ? {
          info: (msg) => console.log('[C++ INFO]', msg),
          warn: (msg) => console.warn('[C++ WARN]', msg),
          error: (msg) => console.error('[C++ ERROR]', msg),
          debug: (msg) => console.log('[C++ DEBUG]', msg)
        }
      : null // null = suppress all C++ logs

    const text = 'Machine translation has revolutionized how we communicate across language barriers in the modern digital world.'

    async function testGGML () {
      console.log('\n=== Testing GGML Backend ===\n')

      // Create `DataLoader`
      const hdDL = new HyperdriveDL({
        // The hyperdrive key for en-it translation model weights and config
        key: 'hd://9ef58f31c20d5556722e0b58a5d262fd89801daf2e6cb28e3f21ac6e9228088f'
      })

      // Create the `args` object
      const args = {
        loader: hdDL,
        params: { mode: 'full', dstLang: 'it', srcLang: 'en' },
        diskPath: './models',
        modelName: 'model.bin',
        logger // Pass the logger
      }

      // Create Model Instance
      const model = new TranslationNmtcpp(args, { })

      // Load model
      await model.load()

      try {
        // Run the Model
        const response = await model.run(text)

        await response
          .onUpdate(data => {
            console.log(data)
          })
          .await()

        console.log('GGML translation finished!')
      } finally {
        // Unload the model
        await model.unload()

        // Close the DataLoader
        await hdDL.close()
      }
    }

    async function testBergamot () {
      console.log('\n=== Testing Bergamot Backend ===\n')

      // Use local model path for Bergamot - env var or relative path
      const bergamotPath = process.env.BERGAMOT_MODEL_PATH || './model/bergamot/enit'

      console.log('Model path:', bergamotPath)

      // Check if model directory exists
      if (!fs.existsSync(bergamotPath)) {
        console.log('Bergamot model directory not found, skipping test')
        console.log('Set BERGAMOT_MODEL_PATH env var or place model in ./model/bergamot/enit')
        return
      }

      console.log('Loading model...')

      // Create a local file loader for Bergamot models that are already on disk
      const localLoader = {
        ready: async () => { /* Models already on disk */ },
        close: async () => { /* No resources to close */ },
        download: async (filename) => {
          // Read file from local disk
          const filePath = path.join(bergamotPath, filename)
          return fs.readFileSync(filePath)
        },
        getFileSize: async (filename) => {
          const filePath = path.join(bergamotPath, filename)
          const stats = fs.statSync(filePath)
          return stats.size
        }
      }

      // Create the `args` object for Bergamot
      const args = {
        loader: localLoader,
        params: { mode: 'full', dstLang: 'it', srcLang: 'en' },
        diskPath: bergamotPath,
        modelName: 'model.enit.intgemm.alphas.bin',
        logger // Pass the logger
      }

      // Config with explicit vocab paths for Bergamot
      const config = {
        srcVocabName: 'vocab.enit.spm',
        dstVocabName: 'vocab.enit.spm',
        modelType: TranslationNmtcpp.ModelTypes.Bergamot
      }

      // Create Model Instance
      const model = new TranslationNmtcpp(args, config)

      // Load model
      await model.load()
      console.log('Model loaded successfully!')

      try {
        console.log('Running translation...')
        console.log('Input text:', text)

        // Run the Model
        const response = await model.run(text)

        await response
          .onUpdate(data => {
            console.log('Translation output:', data)
          })
          .await()

        console.log('Bergamot translation finished!')
      } finally {
        console.log('Unloading model...')
        await model.unload()

        // Close the local loader
        await localLoader.close()
        console.log('Done!')
      }
    }

    async function main () {
      try {
        // Test GGML backend
        await testGGML()

        // Test Bergamot backend
        await testBergamot()

        console.log('\n=== All Tests Completed Successfully! ===\n')
      } catch (error) {
        console.error('Test failed:', error)
        throw error
      }
    }

    main()
    ```
  </WrapCode>

  <Step>
    Run `example.js`:

    ```bash
    bare example.js
    ```
  </Step>
</Steps>

## Usage

The library provides a straightforward and intuitive workflow for translating text. Irrespective of the chosen model, the workflow remains the same:

### 1. Create `DataLoader`

In QVAC, the `DataLoader` class provides an interface for fetching model weights and other resources crucial for running AI Models. A `DataLoader` instance is required to successfully instantiate a `ModelClass`. We can create a `HyperdriveDL` using the following code.

```javascript
const HyperdriveDL = require('@qvac/dl-hyperdrive')

const hdDL = new HyperdriveDL({
  key: 'hd://528eb43b34c57b0fb7116e532cd596a9661b001870bdabf696243e8d079a74ca' // (Required) Hyperdrive key with 'hd://' prefix (raw hex also works)
  // store: corestore // (Optional) A Corestore instance for persistent storage. See Glossary for details.
})
```

<Callout type="info">
  It is extremely important that you provide the correct `key` when using a `HyperdriveDataLoader`. A `DataLoader` with model weights and settings for an `en-it` translation can obviously not be utilized for doing a `de-en` translation. Please ensure that the `key` being used aligns with the model (package) installed and the translation requirement.
</Callout>

### 2. Create the `args` object

The `args` object contains the `DataLoader` we created in the previous step and other translation parameters that control how the translation model operates, including which languages to translate between and what performance metrics to collect.

The structure varies slightly depending on which backend you're using.

#### IndicTrans2

For Indic language translations (English ↔ Hindi, Bengali, Tamil, etc.):

```javascript
const HyperdriveDL = require('@qvac/dl-hyperdrive')

const hdDL = new HyperdriveDL({
  key: 'hd://8c0f50e7c75527213a090d2f1dcd9dbdb8262e5549c8cbbb74cb7cb12b156892' // en-hi 200M model (INDICTRANS_EN_HI_200M_Q0F32)
})

const args = {
  loader: hdDL,
  params: {
    mode: 'full',
    srcLang: 'eng_Latn',   // Source language (ISO 15924 code)
    dstLang: 'hin_Deva'    // Target language (ISO 15924 code)
  },
  diskPath: './models/indic-en-hi-200M',              // Unique directory per model
  modelName: 'ggml-indictrans2-en-indic-dist-200M.bin' // Must match exact filename in Hyperdrive
}
```

**Key Parameters:**

| Parameter   | Description                                             | Example                                     |
| ----------- | ------------------------------------------------------- | ------------------------------------------- |
| `srcLang`   | Source language (ISO 15924)                             | `'eng_Latn'`, `'hin_Deva'`, `'ben_Beng'`    |
| `dstLang`   | Target language (ISO 15924)                             | `'eng_Latn'`, `'hin_Deva'`, `'tam_Taml'`    |
| `modelName` | Specific filename per model                             | `'ggml-indictrans2-en-indic-dist-200M.bin'` |
| `modelType` | **Required**: `TranslationNmtcpp.ModelTypes.IndicTrans` | -                                           |

**IndicTrans2 model naming pattern:**

* `ggml-indictrans2-{direction}-{size}.bin` for q0f32 quantization
* `ggml-indictrans2-{direction}-{size}-q0f16.bin` for q0f16 quantization
* `ggml-indictrans2-{direction}-{size}-q4_0.bin` for q4\_0 quantization

Where `direction` is `en-indic`, `indic-en`, or `indic-indic`, and `size` is `dist-200M`, `dist-320M`, or `1B`.

#### Bergamot

Bergamot models (Firefox Translations) are available via **Hyperdrive** or as local files.

**Option 1: Using Hyperdrive (Recommended)**

```javascript
const HyperdriveDL = require('@qvac/dl-hyperdrive')

const hdDL = new HyperdriveDL({
  key: 'hd://a8811fb494e4aee45ca06a011703a25df5275e5dfa59d6217f2d430c677f9fa6' // en-it Bergamot (BERGAMOT_ENIT)
})

const args = {
  loader: hdDL,
  params: {
    mode: 'full',
    srcLang: 'en',    // Source language (ISO 639-1 code)
    dstLang: 'it'     // Target language (ISO 639-1 code)
  },
  diskPath: './models/bergamot-en-it',           // Unique directory per model
  modelName: 'model.enit.intgemm.alphas.bin'     // Model file from Hyperdrive
}
```

**Option 2: Using Local Files**

```javascript
const fs = require('bare-fs')
const path = require('bare-path')

// Path to your locally downloaded Bergamot model directory
const bergamotPath = './models/bergamot-en-it'

const localLoader = {
  ready: async () => {},
  close: async () => {},
  download: async (filename) => {
    return fs.readFileSync(path.join(bergamotPath, filename))
  },
  getFileSize: async (filename) => {
    const stats = fs.statSync(path.join(bergamotPath, filename))
    return stats.size
  }
}

const args = {
  loader: localLoader,
  params: {
    mode: 'full',
    srcLang: 'en',
    dstLang: 'it'
  },
  diskPath: bergamotPath,
  modelName: 'model.enit.intgemm.alphas.bin'
}
```

**Bergamot Model Files by Language Pair:**

| Language Pair | Hyperdrive Key                     | Model File                      | Vocab File(s)                            |
| ------------- | ---------------------------------- | ------------------------------- | ---------------------------------------- |
| en→it         | `a8811fb494e4aee4...`              | `model.enit.intgemm.alphas.bin` | `vocab.enit.spm`                         |
| it→en         | `3b4be93d19dd9e9e...`              | `model.iten.intgemm.alphas.bin` | `vocab.iten.spm`                         |
| en→es         | `bf46f9b51d04f561...`              | `model.enes.intgemm.alphas.bin` | `vocab.enes.spm`                         |
| es→en         | `c3e983c8db3f64fa...`              | `model.esen.intgemm.alphas.bin` | `vocab.esen.spm`                         |
| en→fr         | `0a4f388c0449b777...`              | `model.enfr.intgemm.alphas.bin` | `vocab.enfr.spm`                         |
| fr→en         | `7a9b38b0c4637b2e...`              | `model.fren.intgemm.alphas.bin` | (see registry)                           |
| en→de         | (see Bergamot section in registry) | `model.ende.intgemm.alphas.bin` | `vocab.ende.spm`                         |
| en→ru         | `404279d9716f3191...`              | `model.enru.intgemm.alphas.bin` | `vocab.enru.spm`                         |
| ru→en         | `dad7f99c8d8c1723...`              | `model.ruen.intgemm.alphas.bin` | `vocab.ruen.spm`                         |
| en→zh         | `15d484200acea8b1...`              | `model.enzh.intgemm.alphas.bin` | `srcvocab.enzh.spm`, `trgvocab.enzh.spm` |
| zh→en         | `17eb4c3fcd23ac3c...`              | `model.zhen.intgemm.alphas.bin` | `vocab.zhen.spm`                         |
| en→ja         | `ac0b883d176ea3b1...`              | `model.enja.intgemm.alphas.bin` | `srcvocab.enja.spm`, `trgvocab.enja.spm` |
| ja→en         | `85012ed3c3ff5c2b...`              | `model.jaen.intgemm.alphas.bin` | `vocab.jaen.spm`                         |

**Key Parameters:**

| Parameter      | Description                                                     | Example                                     |
| -------------- | --------------------------------------------------------------- | ------------------------------------------- |
| `srcLang`      | Source language (ISO 639-1)                                     | `'en'`, `'es'`, `'de'`                      |
| `dstLang`      | Target language (ISO 639-1)                                     | `'it'`, `'fr'`, `'de'`                      |
| `modelName`    | Model weights file                                              | `'model.enit.intgemm.alphas.bin'`           |
| `srcVocabName` | **Required in config**: Source vocab file                       | `'vocab.enit.spm'` or `'srcvocab.enja.spm'` |
| `dstVocabName` | **Required in config**: Target vocab file                       | `'vocab.enit.spm'` or `'trgvocab.enja.spm'` |
| `modelType`    | **Required in config**: `TranslationNmtcpp.ModelTypes.Bergamot` | -                                           |

**Bergamot model file naming convention:**

* `model.{srctgt}.intgemm.alphas.bin` - Model weights (e.g., `model.enit.intgemm.alphas.bin`)
* `vocab.{srctgt}.spm` - Shared vocabulary for most language pairs
* `srcvocab.{srctgt}.spm` + `trgvocab.{srctgt}.spm` - Separate vocabs for CJK languages (zh, ja)

<Callout title="`diskPath` Configuration" type="info">
  Use a **unique directory per model** to avoid file conflicts when using multiple models:

  * `./models/indic-en-hi-200M` for IndicTrans English→Hindi
  * `./models/bergamot-en-it` for Bergamot English→Italian
</Callout>

<Callout type="info">
  The list of supported languages for the `srcLang` and `dstLang` parameters differ by model type.
</Callout>

### 3. Create the `config` object

The `config` object contains two types of parameters:

1. **Model-specific parameters** (required for some backends)
2. **Generation/decoding parameters** (optional, controls output quality)

#### Model-Specific Parameters

| Parameter      | IndicTrans2  | Bergamot     |
| -------------- | ------------ | ------------ |
| `modelType`    | **Required** | **Required** |
| `srcVocabName` | Not needed   | **Required** |
| `dstVocabName` | Not needed   | **Required** |

#### Generation/Decoding Parameters (IndicTrans Only)

These parameters control how the model generates output. **Note:** Full parameter support is only available for IndicTrans2 models. Bergamot has limited parameter support.

```javascript
// Generation parameters for IndicTrans2
const generationParams = {
  beamsize: 4,            // Beam search width (>=1). 1 disables beam search
  lengthpenalty: 0.6,     // Length normalization strength (>=0)
  maxlength: 128,         // Maximum generated tokens (>0)
  repetitionpenalty: 1.2, // Penalize previously generated tokens (0..2)
  norepeatngramsize: 2,   // Disallow repeating n-grams of this size (0..10)
  temperature: 0.8,       // Sampling temperature [0..2]
  topk: 40,               // Keep top-K logits [0..vocab_size]
  topp: 0.9               // Nucleus sampling threshold (0 < p <= 1)
}
```

### 4. Create Model Instance

Import `TranslationNmtcpp` and create an instance by combining `args` (from Step 2) with `config` parameters (from Step 3):

```javascript
const TranslationNmtcpp = require('@qvac/translation-nmtcpp')
```

#### IndicTrans2

```javascript
// IndicTrans - must specify modelType + generation parameters
const config = {
  modelType: TranslationNmtcpp.ModelTypes.IndicTrans,
  ...generationParams,  // Spread generation params from Step 3
  maxlength: 256        // Override for longer outputs
}

const model = new TranslationNmtcpp(args, config)
```

#### Bergamot

```javascript
// Bergamot - must specify modelType, vocab files (limited generation params support)
const config = {
  modelType: TranslationNmtcpp.ModelTypes.Bergamot,
  srcVocabName: 'vocab.enit.spm',    // Required: source vocabulary file
  dstVocabName: 'vocab.enit.spm',    // Required: target vocabulary file
  beamsize: 4                        // Only beamsize supported for Bergamot
}

const model = new TranslationNmtcpp(args, config)
```

**Available Model Types:**

```javascript
TranslationNmtcpp.ModelTypes = {
  IndicTrans: 'IndicTrans', // Indic language models
  Bergamot: 'Bergamot'      // Firefox Translations models
}
```

### 5. Load Model

```javascript
try {
  // Basic usage
  await model.load()
} catch (error) {
  console.error('Failed to load model:', error)
}
```

### 6. Run the Model

We can perform inference on the input text using the `run()` method. This method returns a [`QVACResponse`](#glossary) object.

```javascript
try {
  // Execute translation on input text
  const response = await model.run('Hello world! Welcome to the internet of peers!')

  // Process streamed output using callback
  await response
    .onUpdate(outputChunk => {
      // Handle each new piece of translated text
      console.log(outputChunk)
    })
    .await() // Wait for translation to complete

  // Access performance statistics (if enabled with opts.stats)
  if (response.stats) {
    console.log('Translation completed in:', response.stats.totalTime, 'ms')
  }
} catch (error) {
  console.error('Translation failed:', error)
}
```

### 7. Batch Translation (Bergamot Only)

For translating multiple texts efficiently, use the `runBatch()` method instead of calling `run()` multiple times.

<Callout type="info">
  `runBatch()` is only available with the **Bergamot backend**. IndicTrans2 models should use sequential `run()` calls.
</Callout>

```javascript
// Array of texts to translate (English)
const textsToTranslate = [
  'Hello world!',
  'How are you today?',
  'Machine translation has revolutionized communication.'
]

try {
  // Batch translation - returns array of translated strings
  const translations = await model.runBatch(textsToTranslate)

  // Output each translation
  translations.forEach((translatedText, index) => {
    console.log(`Original: ${textsToTranslate[index]}`)
    console.log(`Translated: ${translatedText}\n`)
  })
} catch (error) {
  console.error('Batch translation failed:', error)
}
```

**`runBatch()` vs `run()`:**

| Method            | Input            | Output                        | Backend Support            |
| ----------------- | ---------------- | ----------------------------- | -------------------------- |
| `run(text)`       | Single string    | `QVACResponse` with streaming | All (IndicTrans, Bergamot) |
| `runBatch(texts)` | Array of strings | Array of strings              | **Bergamot only**          |

<Callout type="info">
  `runBatch()` is significantly faster when translating multiple texts as it processes them in a single batch operation.
</Callout>

### 8. Unload the Model

```javascript
// Always unload the model when finished to free memory
try {
  await model.unload()
} catch (error) {
  console.error('Failed to unload model:', error)
}
```

## Supported Languages

### IndicTrans2 Models (Hyperdrive)

IndicTrans2 supports translation between English and 22 Indic languages. The following directions are available via Hyperdrive:

| Direction       | Hyperdrive Keys | Sizes    |
| --------------- | --------------- | -------- |
| English → Indic | Yes             | 200M, 1B |
| Indic → English | Yes             | 200M, 1B |
| Indic → Indic   | Yes             | 320M, 1B |

**Supported Indic Languages:**

<table>
  <tbody>
    <tr>
      <td>
        Assamese (asm_Beng)
      </td>

      <td>
        Kashmiri (Arabic) (kas_Arab)
      </td>

      <td>
        Punjabi (pan_Guru)
      </td>
    </tr>

    <tr>
      <td>
        Bengali (ben_Beng)
      </td>

      <td>
        Kashmiri (Devanagari) (kas_Deva)
      </td>

      <td>
        Sanskrit (san_Deva)
      </td>
    </tr>

    <tr>
      <td>
        Bodo (brx_Deva)
      </td>

      <td>
        Maithili (mai_Deva)
      </td>

      <td>
        Santali (sat_Olck)
      </td>
    </tr>

    <tr>
      <td>
        Dogri (doi_Deva)
      </td>

      <td>
        Malayalam (mal_Mlym)
      </td>

      <td>
        Sindhi (Arabic) (snd_Arab)
      </td>
    </tr>

    <tr>
      <td>
        English (eng_Latn)
      </td>

      <td>
        Marathi (mar_Deva)
      </td>

      <td>
        Sindhi (Devanagari) (snd_Deva)
      </td>
    </tr>

    <tr>
      <td>
        Konkani (gom_Deva)
      </td>

      <td>
        Manipuri (Bengali) (mni_Beng)
      </td>

      <td>
        Tamil (tam_Taml)
      </td>
    </tr>

    <tr>
      <td>
        Gujarati (guj_Gujr)
      </td>

      <td>
        Manipuri (Meitei) (mni_Mtei)
      </td>

      <td>
        Telugu (tel_Telu)
      </td>
    </tr>

    <tr>
      <td>
        Hindi (hin_Deva)
      </td>

      <td>
        Nepali (npi_Deva)
      </td>

      <td>
        Urdu (urd_Arab)
      </td>
    </tr>

    <tr>
      <td>
        Kannada (kan_Knda)
      </td>

      <td>
        Odia (ory_Orya)
      </td>

      <td />
    </tr>
  </tbody>
</table>

### Bergamot Models (Firefox Translations)

**Language pairs available via Hyperdrive:**

| Language   | Code | en→X | X→en |
| ---------- | ---- | ---- | ---- |
| Arabic     | ar   | Yes  | Yes  |
| Czech      | cs   | Yes  | Yes  |
| Spanish    | es   | Yes  | Yes  |
| French     | fr   | Yes  | Yes  |
| Italian    | it   | Yes  | Yes  |
| Japanese   | ja   | Yes  | Yes  |
| Portuguese | pt   | Yes  | Yes  |
| Russian    | ru   | Yes  | Yes  |
| Chinese    | zh   | Yes  | Yes  |

The Bergamot backend supports all language pairs available in [Firefox Translations](https://github.com/mozilla/firefox-translations-models). See the Firefox Translations models repository for the complete and up-to-date list of supported language pairs. **Download Firefox Translations models locally only if your language pair is not available via Hyperdrive.**

## ModelClasses and Packages

### ModelClass

The main class exported by this library is `TranslationNmtcpp`, which supports multiple translation backends:

```javascript
const TranslationNmtcpp = require('@qvac/translation-nmtcpp')

// Available model types
TranslationNmtcpp.ModelTypes = {
  IndicTrans: 'IndicTrans',  // For Indic language translations
  Bergamot: 'Bergamot'       // For Bergamot/Firefox translations
}
```

### Available Packages

#### Main Package

| Package                    | Description              | Backends             | Languages                                       |
| -------------------------- | ------------------------ | -------------------- | ----------------------------------------------- |
| `@qvac/translation-nmtcpp` | Main translation package | Bergamot, IndicTrans | See [Supported Languages](#supported-languages) |

The main package supports all three backends and all their respective languages. See [Supported Languages](#supported-languages) for the complete list.

## Logging

The library supports configurable logging for both JavaScript and C++ (native) components. By default, C++ logs are suppressed for cleaner output.

### Enabling C++ Logs

To enable verbose C++ logging, pass a `logger` object in the `args` parameter:

```javascript
// Enable C++ logging
const logger = {
  info: (msg) => console.log('[C++ INFO]', msg),
  warn: (msg) => console.warn('[C++ WARN]', msg),
  error: (msg) => console.error('[C++ ERROR]', msg),
  debug: (msg) => console.log('[C++ DEBUG]', msg)
}

const args = {
  loader: hdDL,
  params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
  diskPath: './models/bergamot-en-it',
  modelName: 'model.enit.intgemm.alphas.bin',
  logger  // Pass logger to enable C++ logs
}
```

### Disabling C++ Logs

To suppress all C++ logs, either omit the `logger` parameter or set it to `null`:

```javascript
const args = {
  loader: hdDL,
  params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
  diskPath: './models/bergamot-en-it',
  modelName: 'model.enit.intgemm.alphas.bin'
  // No logger = suppress C++ logs
}
```

### Using Environment Variables (Recommended for Examples)

All examples support the `VERBOSE` environment variable:

```bash
# Run with C++ logging disabled (default)
bare examples/example.hd.js

# Run with C++ logging enabled
VERBOSE=1 bare examples/example.hd.js
```

### Log Levels

The C++ backend supports these log levels (mapped from native priority):

| Priority | Level   | Description            |
| -------- | ------- | ---------------------- |
| 0        | `error` | Critical errors        |
| 1        | `warn`  | Warnings               |
| 2        | `info`  | Informational messages |
| 3        | `debug` | Debug/trace messages   |

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/translation-nmtcpp)


# @qvac/tts-onnx (/addons/tts-onnx)


## Overview

[Bare module](https://bare.pears.com) that adds support for text-to-speech in QVAC using [ONNX runtime](https://onnxruntime.ai) as the inference engine.

## Models

You can load any **Chatterbox** model bundle compatible with ONNX Runtime. Required files: tokenizer (`*.json`) + speech encoder, embed tokens, conditional decoder, and language model (`*.onnx`).

## Requirement

Bare $\geq$ v1.24

## Installation

```bash
npm i @qvac/tts-onnx
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-tts-quickstart
    cd qvac-tts-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/tts-onnx bare-fs bare-path
    ```
  </Step>

  <Step>
    Place the Chatterbox model files into `models/chatterbox/`: `tokenizer.json`, `speech_encoder.onnx`, `embed_tokens.onnx`, `conditional_decoder.onnx`, `language_model.onnx`. Also place a reference WAV file (for voice cloning) at `./reference.wav`.
  </Step>

  <Step>
    Create `index.js`:
  </Step>

  <WrapCode>
    ```js title="index.js" lineNumbers'use strict'

    const fs = require('bare-fs')
    const path = require('bare-path')
    const ONNXTTS = require('@qvac/tts-onnx')
    const { setLogger, releaseLogger } = require('@qvac/tts-onnx/addonLogging')

    const CHATTERBOX_SAMPLE_RATE = 24000

    const tokenizerPath = 'models/chatterbox/tokenizer.json'
    const speechEncoderPath = 'models/chatterbox/speech_encoder.onnx'
    const embedTokensPath = 'models/chatterbox/embed_tokens.onnx'
    const conditionalDecoderPath = 'models/chatterbox/conditional_decoder.onnx'
    const languageModelPath = 'models/chatterbox/language_model.onnx'

    const refWavPath = path.resolve('./reference.wav')

    async function main () {
      setLogger((priority, message) => {
        const priorityNames = {
          0: 'ERROR',
          1: 'WARNING',
          2: 'INFO',
          3: 'DEBUG',
          4: 'OFF'
        }
        const priorityName = priorityNames[priority] || 'UNKNOWN'
        const timestamp = new Date().toISOString()
        console.log(`[${timestamp}] [C++ log] [${priorityName}]: ${message}`)
      })

      // Load reference audio (16-bit PCM WAV)
      const wavBuf = fs.readFileSync(refWavPath)
      const dataOffset = 44 // standard WAV header size
      const int16 = new Int16Array(wavBuf.buffer, wavBuf.byteOffset + dataOffset, (wavBuf.length - dataOffset) / 2)
      const referenceAudio = new Float32Array(int16.length)
      for (let i = 0; i < int16.length; i++) referenceAudio[i] = int16[i] / 32768

      // Chatterbox configuration
      const chatterboxArgs = {
        tokenizerPath,
        speechEncoderPath,
        embedTokensPath,
        conditionalDecoderPath,
        languageModelPath,
        referenceAudio,
        opts: { stats: true },
        logger: console
      }

      const config = {
        language: 'en'
      }

      const model = new ONNXTTS(chatterboxArgs, config)

      try {
        console.log('Loading Chatterbox TTS model...')
        await model.load()
        console.log('Model loaded.')

        const textToSynthesize = 'Hello world! This is a test of the Chatterbox TTS system.'
        console.log(`Running TTS on: "${textToSynthesize}"`)

        const response = await model.run({
          input: textToSynthesize,
          type: 'text'
        })

        console.log('Waiting for TTS results...')
        let buffer = []

        await response
          .onUpdate(data => {
            if (data && data.outputArray) {
              buffer = buffer.concat(Array.from(data.outputArray))
            }
          })
          .await()

        console.log('TTS finished!')
        if (response.stats) {
          console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
        }

        console.log(`Generated ${buffer.length} audio samples at ${CHATTERBOX_SAMPLE_RATE}Hz`)
      } catch (err) {
        console.error('Error during TTS processing:', err)
      } finally {
        console.log('Unloading model...')
        await model.unload()
        console.log('Model unloaded.')
        releaseLogger()
      }
    }

    main().catch(console.error)
    ```
  </WrapCode>

  <Step>
    Run `index.js`:

    ```bash
    bare index.js
    ```
  </Step>
</Steps>

## Usage

### 1. Import the Model Class

```js
const { ONNXTTS } = require('@qvac/tts-onnx')
```

### 2. Create a Data Loader

Data Loaders abstract the way model files are accessed. It is recommended to utilize a `HyperdriveDataLoader` to stream the model file(s) from a `hyperdrive`. Optionally, you could use a `FileSystemDataLoader` to stream the model file(s) from your local file system.

```js
const store = new Corestore('./store')
const hdStore = store.namespace('hd')

// see examples folder for existing keys
const hdDL = new HyperDriveDL({
  key: 'hd://your-hyperdrive-key-here',
  store: hdStore
})
```

### 3. Create the `args` obj

```js
const args = {
  loader: hdDL,
  opts: { stats: true },
  logger: console,
  cache: './models/',
  tokenizerPath: 'chatterbox/tokenizer.json',
  speechEncoderPath: 'chatterbox/speech_encoder.onnx',
  embedTokensPath: 'chatterbox/embed_tokens.onnx',
  conditionalDecoderPath: 'chatterbox/conditional_decoder.onnx',
  languageModelPath: 'chatterbox/language_model.onnx',
  referenceAudio: referenceAudioFloat32Array
}
```

The `args` obj contains the following properties:

* `loader`: The Data Loader instance from which the model files will be streamed.
* `logger`: This property is used to create logging functionality.
* `opts.stats`: This flag determines whether to calculate inference stats.
* `cache`: The local directory where the model files will be downloaded to.
* `tokenizerPath`: Path to the Chatterbox tokenizer JSON file.
* `speechEncoderPath`: Path to the speech encoder ONNX model.
* `embedTokensPath`: Path to the embed tokens ONNX model.
* `conditionalDecoderPath`: Path to the conditional decoder ONNX model.
* `languageModelPath`: Path to the language model ONNX model.
* `referenceAudio`: Float32Array of reference audio samples for voice cloning.

### 4. Create the `config` obj

The `config` obj consists of a set of parameters which can be used to tweak the behaviour of the TTS model.

```js
const config = {
  language: 'en',
  useGPU: true,
}
```

| Parameter | Type    | Default | Description                                  |
| --------- | ------- | ------- | -------------------------------------------- |
| language  | string  | 'en'    | Language code (ISO 639-1 format)             |
| useGPU    | boolean | false   | Enable GPU acceleration based on EP provider |

### 5. Create Model Instance

```js
const model = new ONNXTTS(args, config)
```

### 6. Load Model

```js
await model.load()
```

*Optionally* you can pass the following parameters to tweak the loading behaviour.

* `closeLoader?`: This boolean value determines whether to close the Data Loader after loading. Defaults to `true`
* `reportProgressCallback?`: A callback function which gets called periodically with progress updates. It can be used to display overall progress percentage.

*For example:*

```js
await model.load(false, progress => process.stdout.write(`\rOverall Progress: ${progress.overallProgress}%`))
```

**Progress Callback Data**

The progress callback receives an object with the following properties:

| Property              | Type   | Description                            |
| --------------------- | ------ | -------------------------------------- |
| `action`              | string | Current operation being performed      |
| `totalSize`           | number | Total bytes to be loaded               |
| `totalFiles`          | number | Total number of files to process       |
| `filesProcessed`      | number | Number of files completed so far       |
| `currentFile`         | string | Name of file currently being processed |
| `currentFileProgress` | string | Percentage progress on current file    |
| `overallProgress`     | string | Overall loading progress percentage    |

### 7. Run TTS Synthesis

Pass the text to synthesize to the `run` method. Process the generated audio output asynchronously:

```javascript
try {
  const textToSynthesize = 'Hello world! This is a test of the TTS system.'
  let audioSamples = []

  const response = await model.run({
    input: textToSynthesize,
    type: 'text'
  })

  // Process output using callback to collect audio samples
  await response
    .onUpdate(data => {
      if (data.outputArray) {
        // Collect raw PCM audio samples
        const samples = Array.from(data.outputArray)
        audioSamples = audioSamples.concat(samples)
        console.log(`Received ${samples.length} audio samples`)
      }
      if (data.event === 'JobEnded') {
        console.log('TTS synthesis completed:', data.stats)
      }
    })
    .await() // Wait for the entire process to complete

  console.log(`Total audio samples generated: ${audioSamples.length}`)
  
  // audioSamples now contains the complete audio as PCM data (16-bit, 16kHz, mono)
  // You can create WAV files, stream to audio APIs, etc.

  // Access performance stats if enabled
  if (response.stats) {
    console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
  }

} catch (error) {
  console.error('TTS synthesis failed:', error)
}
```

### 8. Release Resources

Unload the model when finished:

```javascript
try {
  await model.unload()
  // Close P2P resources if applicable
} catch (error) {
  console.error('Failed to unload model:', error)
}
```

## Output Format

The output is received via the `onUpdate` callback of the response object. The TTS system provides raw audio data in the form of PCM samples.

### Output Events

The system generates different types of events during TTS synthesis:

#### 1. Audio Output Events

When audio data is available, the callback receives raw PCM samples:

```javascript
// Audio output event - contains only the raw PCM data
{
  outputArray: Int16Array([1234, -567, 890, -123, ...]) // 16-bit PCM samples
}
```

#### 2. Job Completion Events

When synthesis completes, performance statistics are provided:

```javascript
// Job completion event - contains performance statistics
{
  totalTime: 0.624621926,              // Total processing time in seconds
  tokensPerSecond: 219.33267837286903, // Processing speed
  realTimeFactor: 0.05818013468703428, // Real-time performance factor. Less than 1 means that streaming is possible
  audioDurationMs: 10736,              // Generated audio duration in milliseconds
  totalSamples: 171776                 // Total number of audio samples generated
}
```

**Audio Format Specifications:**

* **Sample Rate:** 24000 Hz
* **Format:** 16-bit signed PCM, mono channel
* **Data Type:** Int16Array containing raw audio samples

### Working with Audio Data

Here's how to collect and process the audio output:

```javascript
let audioSamples = []

const response = await model.run({
  input: 'Your text to synthesize',
  type: 'text'
})

await response
  .onUpdate(data => {
    if (data.outputArray) {
      // Check if this is an audio output event
      const samples = Array.from(data.outputArray)
      audioSamples = audioSamples.concat(samples)
      console.log(`Received ${samples.length} audio samples`)
    } else {
      // This is a completion event with statistics
      console.log('TTS completed with stats:', data)
    }
  })
  .await()

// audioSamples now contains all PCM samples as 16-bit integers
// Sample rate: 24000 Hz, Format: mono PCM
console.log(`Total audio samples generated: ${audioSamples.length}`)
```

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/tts-onnx)


# Configuration (/sdk/getting-started/configuration)


## Overview

QVAC configuration is loaded once during initialization from `qvac.config.*` file — `qvac.config.json`, `qvac.config.js`, or `qvac.config.ts` — and remains immutable for the lifetime of the SDK instance. To provide configuration, either set the `QVAC_CONFIG_PATH` environment variable or place a `qvac.config.*` file at the root of your project.

When using only the SDK, providing a configuration is **not required**. If you do not provide one, the SDK uses the default settings. *When using the HTTP server, however, providing a configuration **is required**.*

## Examples

Below are examples of `qvac.config.*` files. Values marked with `<placeholders>` should be replaced with your actual values.

<Tabs>
  <Tab value="json" label="JSON" default>
    <WrapCode>
      ```json title="qvac.config.json" lineNumbers
      {
        "plugins": ["<builtin_plugin_1>", "<custom_plugin_2>"],
        "loggerConsoleOutput": true,
        "loggerLevel": "info",
        "swarmRelays": ["<hyperbee_key_1>", "<hyperbee_key_2>"],
        "cacheDirectory": "</absolute/path/to/.qvac/models>",
        "httpDownloadConcurrency": 3,
        "httpConnectionTimeoutMs": 10000,
        "deviceDefaults": [
          {
            "name": "Samsung Galaxy force CPU",
            "match": { "platform": "android", "deviceBrand": "samsung" },
            "defaults": { "llm": { "device": "cpu" } }
          }
        ],
        "serve": {
          "models": {
            "<model_alias>": {
              "model": "<SDK_MODEL_CONSTANT>",
              "default": true,
              "preload": true,
              "config": {}
            }
          }
        }
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="js" label="JavaScript">
    <WrapCode>
      ```js title="qvac.config.js" lineNumbers
      module.exports = {
        plugins: ["<builtin_plugin_1>", "<custom_plugin_2>"],
        loggerConsoleOutput: true,
        loggerLevel: "info",
        swarmRelays: ["<hyperbee_key_1>", "<hyperbee_key_2>"],
        cacheDirectory: "</absolute/path/to/.qvac/models>",
        httpDownloadConcurrency: 3,
        httpConnectionTimeoutMs: 10000,
        deviceDefaults: [
          {
            name: "Samsung Galaxy force CPU",
            match: { platform: "android", deviceBrand: "samsung" },
            defaults: { llm: { device: "cpu" } },
          },
        ],
        serve: {
          models: {
            "<model_alias>": {
              model: "<SDK_MODEL_CONSTANT>",
              default: true,
              preload: true,
              config: {},
            },
          },
        },
      };
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts title="qvac.config.ts" lineNumbers
      import type { QvacConfig } from "@qvac/sdk";

      const config: QvacConfig = {
        plugins: ["<builtin_plugin_1>", "<custom_plugin_2>"],
        loggerConsoleOutput: true,
        loggerLevel: "info",
        swarmRelays: ["<hyperbee_key_1>", "<hyperbee_key_2>"],
        cacheDirectory: "</absolute/path/to/.qvac/models>",
        httpDownloadConcurrency: 3,
        httpConnectionTimeoutMs: 10000,
        deviceDefaults: [
          {
            name: "Samsung Galaxy force CPU",
            match: { platform: "android", deviceBrand: "samsung" },
            defaults: { llm: { device: "cpu" } },
          },
        ],
        serve: {
          models: {
            "<model_alias>": {
              model: "<SDK_MODEL_CONSTANT>",
              default: true,
              preload: true,
              config: {},
            },
          },
        },
      };

      export default config;
      ```
    </WrapCode>
  </Tab>
</Tabs>

## Options

The following table lists all supported configuration options for `qvac.config.*`:

| Option                                             | Description                                                                                                                | Type                                     | Required | Default                                                                |
| -------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | -------- | ---------------------------------------------------------------------- |
| [`plugins`](/sdk/examples/utilities/plugin-system) | Plugin specifiers to bundle (built-in and/or custom).                                                                      | `string[]`                               | No       | [All built-in plugins](/sdk/examples/utilities/plugin-system#built-in) |
| `loggerConsoleOutput`                              | Enable or disable console output for SDK loggers.                                                                          | `boolean`                                | No       | `true`                                                                 |
| `loggerLevel`                                      | Global log level for all SDK loggers.                                                                                      | `"error" \| "warn" \| "info" \| "debug"` | No       | `"info"`                                                               |
| `swarmRelays`                                      | Hyperswarm relay public keys (hex strings) for improved P2P connectivity (blind relays).                                   | `string[]`                               | No       | —                                                                      |
| `cacheDirectory`                                   | Absolute path to the directory where models and other cached assets are stored.                                            | `string`                                 | No       | `~/.qvac/models`                                                       |
| `httpDownloadConcurrency`                          | Maximum number of concurrent HTTP downloads for sharded models.                                                            | `number`                                 | No       | `3`                                                                    |
| `httpConnectionTimeoutMs`                          | Timeout in milliseconds for HTTP connection establishment (applies to HEAD and GET requests).                              | `number`                                 | No       | `10000`                                                                |
| `deviceDefaults`                                   | Override loaded model config for specific devices. First matching pattern wins. Use it to optimize for different hardware. | [`DevicePattern[]`](#devicepattern)      | No       | —                                                                      |
| `serve`                                            | Configuration for the [HTTP server](/http-server).                                                                         | [`ServeConfig`](#serveconfig)            | No       | —                                                                      |

### `DevicePattern`

| Field      | Description                                                 | Type                                            | Required |
| ---------- | ----------------------------------------------------------- | ----------------------------------------------- | -------- |
| `name`     | Human-readable label for this pattern (used in logs).       | `string`                                        | Yes      |
| `match`    | Which device(s) to target. All specified fields must match. | [`DeviceMatch`](#devicematch)                   | Yes      |
| `defaults` | Model config overrides to apply when matched.               | [`DeviceConfigDefaults`](#deviceconfigdefaults) | Yes      |

#### `DeviceMatch`

| Field                 | Description                                                                                    | Type                 | Required |
| --------------------- | ---------------------------------------------------------------------------------------------- | -------------------- | -------- |
| `platform`            | Target platform.                                                                               | `"android" \| "ios"` | Yes      |
| `deviceBrand`         | Case-insensitive exact brand (e.g., `"samsung"`, `"google"`).                                  | `string`             | No       |
| `deviceModelPrefix`   | Case-sensitive prefix match on the device model (e.g., `"Pixel 10"` matches `"Pixel 10 Pro"`). | `string`             | No       |
| `deviceModelContains` | Substring match on the device model (e.g., `"Galaxy"` matches `"Samsung Galaxy S25"`).         | `string`             | No       |

#### `DeviceConfigDefaults`

Maps each model-type key to a model config object. For example (inside `DevicePattern.defaults`):

```json
{
  "llm": { "device": "cpu", "ctx_size": 1024 },
  "embeddings": { "device": "cpu", "flashAttention": "off" }
}
```

Model types (allowed keys): `llm` | `embeddings` | `whisper` | `parakeet` | `nmt` | `tts` | `ocr`

<Callout type="info">
  **Important:** for the exact config fields supported by each model type (key), see [`modelConfig` — `loadModel()` at `@qvac/sdk` API reference](/sdk/api/loadModel/#modelconfig-varies-by-model-type).
</Callout>

### `ServeConfig`

The `serve.models` field is a map of model aliases to model entries. Keys are the **model aliases** — the names that HTTP clients use in the `model` field of their requests. Values can be either a **string** (SDK model constant name, e.g., `"QWEN3_600M_INST_Q4"`) or a **`ModelEntry` object**:

| Field     | Description                                                                                                              | Type      | Required                                               |
| --------- | ------------------------------------------------------------------------------------------------------------------------ | --------- | ------------------------------------------------------ |
| `model`   | SDK model constant name.                                                                                                 | `string`  | Yes (unless using `src` + `type`)                      |
| `src`     | Explicit model source (URL or path).                                                                                     | `string`  | Yes (if no `model`)                                    |
| `type`    | Model type: `llm` \| `embeddings` \| `whisper` \| `parakeet` \| `nmt` \| `tts` \| `ocr`.                                 | `string`  | Yes (if using `src`)                                   |
| `default` | Use as the default model for its endpoint category.                                                                      | `boolean` | No (`false`)                                           |
| `preload` | Load model into memory on server startup.                                                                                | `boolean` | No (`true` for constant entries, `false` for explicit) |
| `config`  | Model config overrides (same as [`modelConfig` in `loadModel()`](/sdk/api/loadModel/#modelconfig-varies-by-model-type)). | `object`  | No                                                     |

Example:

```json
{
  "serve": {
    "models": {
      "my-llm": {
        "model": "QWEN3_600M_INST_Q4",
        "default": true,
        "preload": true,
        "config": { "ctx_size": 8192 }
      },
      "my-embed": "GTE_LARGE_FP16"
    }
  }
}
```


# SDK (/sdk/getting-started)


## Overview

Install the npm package `@qvac/sdk` in your project. Then, load models and use them to perform AI inference locally, or delegate inference to peers using the built-in P2P capability.

{/*
  ## Releases

  - [Latest version: v0.7.0](https://www.npmjs.com/package/@qvac/sdk)
  - [Release notes for this version](https://github.com/tetherto/qvac-sdk/releases/tag/v0.5.0)
  */}

## Description

The JS SDK is cross-platform, type-safe, and pluggable, exposing all QVAC capabilities through a unified interface.

### Key features

* **Cross-platform:** portable code across Linux, macOS, and Windows (Node.js / [Bare runtime](https://bare.pears.com)); Android and iOS ([Expo](https://expo.dev)).
* **Pluggable**: build lean apps by including only what you need, and extend the SDK with custom plugins.
* **Type-safe:** typed JS API.
* **Unified interface:** multiple AI tasks, one single npm package to install in your project.

## Quickstart

<Card href="/sdk/getting-started/quickstart" title="Run your first example using the JS SDK">
  At the end, you’ll find instructions for running all examples in this documentation.
</Card>

## Installation

<Card href="/sdk/getting-started/installation" title="Install and run on Node.js, Bare, or Expo">
  Supported environments and how to install the SDK for each one.
</Card>

## Functionalities

### AI tasks

{/* <Task name>: <what computation is performed> for <what the developer achieves> via <engine> */}

{/* <task>: <tech process> for <use case>, via <engine> */}

* [**Completion:**](/sdk/examples/ai-tasks/completion) LLM inference for text generation and chat via [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp).
* [**Text embeddings:**](/sdk/examples/ai-tasks/text-embeddings) vector embedding generation for semantic search, clustering, and retrieval, via `qvac-fabric-llm.cpp`.
* [**Translation:**](/sdk/examples/ai-tasks/translation) text-to-text neural machine translation (NMT), via `qvac-fabric-llm.cpp` and [Bergamot](https://browser.mt).
* [**Transcription:**](/sdk/examples/ai-tasks/transcription) automatic speech recognition (ASR) for speech-to-text via [`qvac-ext-lib-whisper.cpp`](https://github.com/tetherto/qvac-ext-lib-whisper.cpp) or [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2).
* [**Text-to-Speech:**](/sdk/examples/ai-tasks/text-to-speech) speech synthesis for text-to-speech (TTS) via [ONNX Runtime](https://onnxruntime.ai).
* [**OCR:**](/sdk/examples/ai-tasks/ocr) optical character recognition (OCR) for extracting text from images via ONNX runtime.
* [**Image generation:**](/sdk/examples/ai-tasks/image-generation) text-to-image generation via [`qvac-ext-stable-diffusion.cpp`](https://github.com/tetherto/qvac-ext-stable-diffusion.cpp).
* [**Multimodal:**](/sdk/examples/ai-tasks/multimodal) LLM inference over text, images, and other media within a single conversation context.
* [**Fine-tuning:**](/sdk/examples/ai-tasks/fine-tuning) adapting LLMs to domain-specific tasks via LoRA.
* [**RAG:**](/sdk/examples/ai-tasks/rag) out-of-the-box retrieval-augmented generation workflow.

### P2P capabilities

* [**Delegated inference:**](/sdk/examples/p2p/delegated-inference) delegate inference to peers via the [Holepunch stack](https://holepunch.to), enabling resource sharing.
* **Fetch models:** download AI models from peers via the distributed model registry.
* [**Blind relays:**](/sdk/examples/p2p/blind-relays) connect peers across NATs/firewalls by routing traffic through relay nodes.

### Utilities

* [**Logging:**](/sdk/examples/utilities/logging) visibility into what's happening  during loading, inference, and other operations.
* [**Profiler:**](/sdk/examples/utilities/profiler) measure and export timing metrics across model loading, inference, and P2P delegation.
* [**Download Lifecycle:**](/sdk/examples/utilities/download-lifecycle) pause and resume model downloads.
* [**Sharded models:**](/sdk/examples/utilities/sharded-models) download a model that is sharded into multiple parts.

## Flow

Before you can use a model, you need to load it from some location into memory. Flow for performing AI inference:

1. Call function [`loadModel()`](/sdk/api/loadModel) to initialize the SDK and load one model. You can load multiple models simultaneously calling `loadModel()` again.
2. Perform AI tasks by calling the appropriate functions from SDK API — e.g., `completion()`.
3. When you are done with a model, call [`unloadModel()`](/sdk/api/unloadModel) to release computer resources.
4. Finally, close the SDK instance by calling [`close()`](/sdk/api/close).

## Models

Each [AI task](#ai-tasks) works with different model families, and among the supported ones, you can choose which to use and how to obtain them. `loadModel()` manages the download and caching of models (one or multiple files), and their loading from disk into memory, preparing them for use.

`loadModel()` supports loading models from three different locations:

* Local filesystem, by providing a path.
* HTTP server, by providing an HTTP URL.
* Our distributed model registry.

The SDK package does not ship with built-in models, **but** its API exposes constants representing preconfigured models (e.g., `LLAMA_3_2_1B_INST_Q4_0`). Each constant maps a model already published to our model registry. When calling `loadModel()`, you can provide one of these constants instead of a location, making model retrieval transparent.

<Card href="https://github.com/tetherto/qvac/blob/main/packages/sdk/models/registry/models.ts" title="Model registry index">
  See the index of models available in our distributed model registry.
</Card>

For more on querying the model registry, see [`modelRegistryList()`](/sdk/api/modelRegistryList), [`modelRegistrySearch()`](/sdk/api/modelRegistrySearch), and [`modelRegistryGetModel()`](/sdk/api/modelRegistryGetModel).

For more on loading models, see [`loadModel()` at `@qvac/sdk` API reference](/sdk/api/loadModel).

## Configuration

<Card href="/sdk/getting-started/configuration" title="qvac.config.*">
  Use `qvac.config.*` to configure QVAC's overall behavior.
</Card>

### Plugin system

<Cards className="grid-cols-1">
  <Card href="/sdk/examples/utilities/plugin-system" title="Built-in and custom plugins">
    Enable and disable built-in AI capabilities, and add new ones via custom plugins.
  </Card>

  <Card href="/sdk/examples/utilities/write-custom-plugin" title="Write a custom plugin">
    Guidelines to ship your custom plugin as a single npm package.
  </Card>
</Cards>

## JS API

<Card href="/sdk/api" title="API reference">
  `@qvac/sdk` npm package exposes a function-centric, typed JS API.
</Card>

## How it works

<Card href="/about-qvac/how-it-works" title="How it works">
  Understand what happens under the hood when you use QVAC SDK in your application
</Card>

## Other resources

* [SDK landing page](https://qvac.tether.io/dev/sdk/)
* [Package at npm](https://www.npmjs.com/package/@qvac/sdk)


# Installation (/sdk/getting-started/installation)


## Supported environments

QVAC SDK is distributed as the npm package `@qvac/sdk` for JavaScript/TypeScript projects.

### JS environments

* Node.js $\geq$ v22.17
* [Bare](https://bare.pears.com) $\geq$ v1.24
* [Expo](https://expo.dev) $\geq$ v54

### Compatibility matrix

| Platform | Min Version | Architecture | GPU API/Backend              | Notes                                                                       |
| -------- | ----------- | ------------ | ---------------------------- | --------------------------------------------------------------------------- |
| macOS    | 14.0+       | arm64        | Metal                        | Arch x64 supports CPU inference only; Intel iGPU acceleration not supported |
| iOS      | 17.0+       | arm64        | Metal                        | Requires Expo                                                               |
| Linux    | Ubuntu 22+  | arm64, x64   | Vulkan                       | Vulkan runtime required                                                     |
| Android  | 12+         | arm64        | Vulkan, OpenCL (Adreno 700+) | Requires Expo                                                               |
| Windows  | 10+         | x64          | Vulkan                       | Vulkan-capable GPU + vendor drivers required                                |

## Installation

```bash
npm i @qvac/sdk
```

### Linux

Requirements:

* Ubuntu 22 requires [g++](https://github.com/gcc-mirror/gcc) $\geq$ 13.
* Vulkan runtime: Vulkan loader + a GPU driver with Vulkan support

On desktop Linux distributions (e.g., Ubuntu Desktop), these requirements are typically satisfied out of the box.

<Callout type="info">
  On PCs, the Vulkan runtime is usually installed along with the GPU drivers. In other words, *if you've installed the correct driver for your GPU (with Vulkan support), you typically don't need to install anything else.*
</Callout>

To verify it, install Vulkan tools and run `vulkaninfo`:

<Tabs>
  <Tab value="debian" label="Debian/Ubuntu" default>
    ```bash
    sudo apt update
    sudo apt install -y vulkan-tools
    vulkaninfo --summary
    ```
  </Tab>

  <Tab value="fedora" label="Fedora/RHEL">
    ```bash
    sudo dnf install -y vulkan-tools vulkan-devel
    vulkaninfo --summary
    ```
  </Tab>
</Tabs>

In minimalist/headless installations (e.g., Ubuntu Server), you may need to manually install the Vulkan loader, and ensure a Vulkan-capable GPU driver (ICD) is installed. The exact packages vary by distro and GPU vendor. For example:

<Tabs>
  <Tab value="debian" label="Debian/Ubuntu (Intel/AMD via Mesa)" default>
    ```bash
    sudo apt update
    sudo apt install -y libvulkan1 mesa-vulkan-drivers
    vulkaninfo --summary
    ```
  </Tab>

  <Tab value="fedora" label="Fedora">
    ```bash
    sudo dnf install -y vulkan-loader mesa-vulkan-drivers
    vulkaninfo --summary
    ```
  </Tab>
</Tabs>

Ensure QVAC can detect the GPU Vulkan driver by adding your user to the `render` and `video` groups:

```bash
sudo usermod -aG render,video $USER
```

### Expo

<Steps>
  <Step>
    Install peer dependencies:

    ```bash
    npm i 'react-native-bare-kit@^0.11.5'
    npm i -D 'bare-pack@^1.5.1'
    npx expo install expo-file-system expo-build-properties expo-device
    ```

    <Callout type="success">
      **Tip:** use `npx expo install` for all `expo-*` packages to ensure compatibility with your project's Expo SDK version.
    </Callout>
  </Step>

  <Step>
    Configure `expo-build-properties` and add `@qvac/sdk/expo-plugin` to the `plugins` array in your `app.json`:

    ```json title="app.json"
    {
      "expo": {
        "plugins": [
          ["expo-build-properties", { // [!code ++]
            "android": { "minSdkVersion": 29 } // [!code ++]
          }], // [!code ++]
          "@qvac/sdk/expo-plugin" // [!code ++]
        ]
      }
    }
    ```
  </Step>

  <Step>
    Prebuild your project to generate the native files:

    ```bash
    npx expo prebuild
    ```
  </Step>

  <Step>
    Build and run it on a **physical device**:

    ```bash
    npx expo run:ios --device
    # or
    npx expo run:android --device
    ```
  </Step>
</Steps>

<Callout type="info">
  Due to limitations with `llamacpp`, QVAC currently does not run on emulators.
  You **must** use a physical device.
</Callout>

### Windows

Requirement:

* Vulkan runtime: Vulkan loader + a GPU driver with Vulkan support

This requirement is typically satisfied out of the box after installing the correct GPU vendor drivers. To verify it, [install Vulkan SDK](https://vulkan.lunarg.com) and run:

```powershell
vulkaninfo --summary
```


# Quickstart (/sdk/getting-started/quickstart)


## Requirements

* Node.js $\geq$ v22.17
* npm $\geq$ v10.9

## Step-by-step

<Steps>
  <Step>
    Create the examples workspace:

    ```bash
    mkdir qvac-examples
    cd qvac-examples
    npm init -y && npm pkg set type=module
    ```
  </Step>

  <Step>
    Install the SDK:

    ```bash
    npm i @qvac/sdk
    ```
  </Step>

  <Step>
    Create the quickstart script:

    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/quickstart.js title="quickstart.js" lineNumbers
      import { loadModel, LLAMA_3_2_1B_INST_Q4_0, completion, unloadModel, } from "@qvac/sdk";
      try {
          // Load a model into memory
          const modelId = await loadModel({
              modelSrc: LLAMA_3_2_1B_INST_Q4_0,
              modelType: "llm",
              onProgress: (progress) => {
                  console.log(progress);
              },
          });
          // You can use the loaded model multiple times
          const history = [
              {
                  role: "user",
                  content: "Explain quantum computing in one sentence",
              },
          ];
          const result = completion({ modelId, history, stream: true });
          for await (const token of result.tokenStream) {
              process.stdout.write(token);
          }
          // Unload model to free up system resources
          await unloadModel({ modelId });
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Step>

  <Step>
    Run the quickstart script with Node.js:

    ```bash
    node quickstart.js
    ```

    Or with the [Bare](https://bare.pears.com) runtime (same script; the SDK installs a Node-compatible `process` global on Bare so `process.stdout` and `process.exit` work without importing `bare-process` in your own file):

    ```bash
    bare quickstart.js
    ```

    You still need Bare and the SDK’s **Bare peer dependencies** (including `bare-process` and the other `bare-*` packages listed for `@qvac/sdk`) installed in the project. Recent npm versions resolve peers when you run `npm i @qvac/sdk`; if resolution fails, install the peers your package manager reports as missing.
  </Step>
</Steps>

## Running examples

Follow these instructions to run any example in this documentation:

* All examples are self-contained, runnable JavaScript scripts. Use the `qvac-examples` workspace created in this quickstart to store and run them as you explore this documentation.
* Run each example with the indicated compatible JavaScript environment. QVAC supports multiple environments (Node.js, Bare, and Expo). After you `import` from `@qvac/sdk`, Bare has a `process` global (via `bare-process`) for typical CLI patterns; some examples still use other Node-specific APIs (e.g. `fs` without Bare shims) and note their compatible environment.
* Some examples also provide a TypeScript version. If you want to run TS directly, install the required dev dependencies:
  ```bash
  npm i -D tsx typescript
  ```


# Build an Electron app (/sdk/tutorials/electron)


## What we'll build

We'll build an LLM chat desktop application using the following stack:

* **Electron** as the desktop runtime;
* **React** for the user interface;
* **Tailwind CSS** for styling; and
* **QVAC** to run LLM inference locally.

By the end of this tutorial, we'll have built the following:

<div style={{ display: "flex", justifyContent: "center" }}>
  <video src="/screencasts/electron-tutorial-demo.webm" controls muted autoPlay playsInline style={{ width: "50%", height: "auto" }} />
</div>

## Prerequisites

* Node.js $\geq$ v22.17
* npm $\geq$ v10.9
* Linux/macOS (Windows with small adjustments)

<Callout title="On Windows" type="info">
  Some commands are Bash‑specific. On Windows, use PowerShell/WSL or adapt them.
</Callout>

## Step 1: set up an Electron project

We'll use [electron-vite](https://electron-vite.org) to scaffold a full Electron + React + TypeScript project in a single command.

<Steps>
  <Step>
    Create a new project:

    ```bash
    npm create @quick-start/electron@latest llm-desktop-app -- --template react-ts
    ```

    When prompted, answer **No** to both "Add Electron updater plugin?" and "Enable Electron download mirror proxy?".
  </Step>

  <Step>
    Install dependencies and configure the dev script:

    ```bash
    cd llm-desktop-app
    npm install
    npm pkg set scripts.dev="electron-vite dev -- --no-sandbox"
    ```

    The `--no-sandbox` flag disables the Chromium sandbox, which is required on Linux when the SUID helper is not configured.
  </Step>

  <Step>
    Start the app:

    ```bash
    npm run dev
    ```
  </Step>
</Steps>

Confirm that an Electron window opens and renders the default electron-vite page:

<div style={{ display: "flex", justifyContent: "center" }}>
  <img src="/screenshots/electron-vite-window.png" alt="Electron window" style={{ width: "50%", height: "auto" }} />
</div>

The scaffold gives us a working Electron app with React, TypeScript, Vite HMR, and a preload script — all pre-configured. The key files we'll modify are:

| File                       | Purpose                              |
| -------------------------- | ------------------------------------ |
| `src/main/index.ts`        | Electron main process                |
| `src/preload/index.ts`     | IPC bridge between main and renderer |
| `src/preload/index.d.ts`   | Type declarations for the bridge     |
| `src/renderer/src/App.tsx` | React root component                 |
| `electron.vite.config.ts`  | Vite config for all three processes  |

## Step 2: create the chat UI

In this step, we'll add Tailwind CSS and replace the scaffold's default UI with a chat interface. The chat will use stub responses for now — we'll wire it to QVAC in the next step.

<Steps>
  <Step>
    Install Tailwind CSS:

    ```bash
    npm i tailwindcss @tailwindcss/vite
    ```
  </Step>

  <Step>
    Add the Tailwind plugin to the renderer in `electron.vite.config.ts`:

    ```ts title="electron.vite.config.ts" lineNumbers
    import { resolve } from 'path'
    import { defineConfig } from 'electron-vite'
    import react from '@vitejs/plugin-react'
    import tailwindcss from '@tailwindcss/vite' // [!code ++]

    export default defineConfig({
      main: {},
      preload: {},
      renderer: {
        resolve: {
          alias: {
            '@renderer': resolve('src/renderer/src')
          }
        },
        plugins: [react()] // [!code --]
        plugins: [react(), tailwindcss()] // [!code ++]
      }
    })
    ```

    This tells Vite to process Tailwind utility classes in the renderer.
  </Step>

  <Step>
    Replace the entire contents of `src/renderer/src/assets/main.css` with the following code:

    ```css title="src/renderer/src/assets/main.css"
    @import 'tailwindcss';
    ```

    This replaces the scaffold's default styles with Tailwind's base layer.
  </Step>

  <Step>
    Replace the entire contents of `src/renderer/src/main.tsx` with the following code:

    ```ts title="src/renderer/src/main.tsx" lineNumbers
    import './assets/main.css'
    import { createRoot } from 'react-dom/client'
    import App from './App'

    createRoot(document.getElementById('root')!).render(<App />)
    ```

    This removes React Strict Mode, which avoids double-invoked effects in development. That matters when we load a model in `useEffect`.
  </Step>

  <Step>
    Replace the entire contents of `src/renderer/src/App.tsx` with the following code:

    <WrapCode>
      ```ts title="src/renderer/src/App.tsx" lineNumbers
      import { useState } from 'react'

      type Message = { role: 'user' | 'assistant'; content: string }

      function App(): React.JSX.Element {
        const [loading, setLoading] = useState(false)
        const [messages, setMessages] = useState<Message[]>([])
        const [input, setInput] = useState('')

        const handleSend = (): void => {
          if (!input.trim()) return
          setMessages(prev => [
            ...prev,
            { role: 'user', content: input },
            { role: 'assistant', content: 'Stub response from the assistant.' }
          ])
          setInput('')
        }

        return (
          <div className="h-screen flex flex-col bg-zinc-950 text-zinc-100">
            {/* Header */}
            <header className="flex items-center gap-3 px-6 py-4 border-b border-zinc-800">
              <h1 className="text-lg font-semibold">LLM Desktop App</h1>
              <span className="ml-auto flex items-center gap-2 text-sm text-zinc-500">
                <span
                  className={`inline-block w-2 h-2 rounded-full ${
                    loading ? 'bg-amber-400 animate-pulse' : 'bg-emerald-400'
                  }`}
                />
                {loading ? 'Loading model…' : 'Ready'}
              </span>
            </header>

            {/* Messages */}
            <main className="flex-1 overflow-y-auto px-6 py-4 space-y-3">
              {loading ? (
                <div className="flex-1 flex items-center justify-center h-full">
                  <div className="flex gap-1">
                    <span className="w-2 h-2 rounded-full bg-zinc-600 animate-bounce [animation-delay:0ms]" />
                    <span className="w-2 h-2 rounded-full bg-zinc-600 animate-bounce [animation-delay:150ms]" />
                    <span className="w-2 h-2 rounded-full bg-zinc-600 animate-bounce [animation-delay:300ms]" />
                  </div>
                </div>
              ) : (
                messages.map((msg, i) => (
                  <div
                    key={i}
                    className={`flex ${msg.role === 'user' ? 'justify-end' : 'justify-start'}`}
                  >
                    <div
                      className={`max-w-[75%] px-4 py-2.5 rounded-2xl text-sm leading-relaxed ${
                        msg.role === 'user'
                          ? 'bg-indigo-600 text-white rounded-br-md'
                          : 'bg-zinc-800 text-zinc-100 rounded-bl-md'
                      }`}
                    >
                      {msg.content}
                    </div>
                  </div>
                ))
              )}
            </main>

            {/* Input */}
            <div className="px-6 py-4 border-t border-zinc-800">
              <div className="flex gap-3">
                <textarea
                  className="flex-1 resize-none rounded-xl bg-zinc-800 px-4 py-3 text-sm outline-none placeholder:text-zinc-500 focus:ring-2 focus:ring-indigo-500/50"
                  rows={1}
                  placeholder="Type a message…"
                  value={input}
                  onChange={e => setInput(e.target.value)}
                  onKeyDown={e => {
                    if (e.key === 'Enter' && !e.shiftKey) {
                      e.preventDefault()
                      handleSend()
                    }
                  }}
                />
                <button
                  onClick={handleSend}
                  className="self-end rounded-xl bg-indigo-600 px-4 py-3 text-sm font-medium text-white hover:bg-indigo-500 transition-colors"
                >
                  Send
                </button>
              </div>
            </div>
          </div>
        )
      }

      export default App
      ```

      We replaced the scaffold's default UI with a dark-themed chat interface. The header shows the app name and a status indicator. Messages render as right-aligned (user) or left-aligned (assistant) bubbles. For now, `handleSend` appends a stub response — we'll connect it to QVAC next.
    </WrapCode>
  </Step>

  <Step>
    Restart `npm run dev` so the Vite config changes take effect.
  </Step>
</Steps>

Confirm that you see the chat UI and that clicking **Send** appends a user message and a stub response:

<div style={{ display: "flex", justifyContent: "center" }}>
  <img src="/screenshots/ui-chat.png" alt="UI chat" style={{ width: "50%", height: "auto" }} />
</div>

## Step 3: add QVAC

Finally, in this step we'll wire the Electron main process to the QVAC SDK, expose a safe preload API, and stream completions into the chat UI.

<Steps>
  <Step>
    Install QVAC SDK:

    ```bash
    npm i @qvac/sdk
    ```
  </Step>

  <Step>
    Replace the entire contents of `src/preload/index.ts` with the following code:

    ```ts title="src/preload/index.ts" lineNumbers
    import { contextBridge, ipcRenderer } from 'electron'

    contextBridge.exposeInMainWorld('qvacAPI', {
      loadModel: (): Promise<string> => ipcRenderer.invoke('load-model'),
      infer: (history: { role: string; content: string }[]): Promise<void> =>
        ipcRenderer.invoke('infer', history),
      onCompletionStream: (cb: (token: string) => void): void => {
        ipcRenderer.on('completion-stream', (_event, token) => cb(token))
      },
      unloadModel: (): Promise<string> => ipcRenderer.invoke('unload-model')
    })
    ```

    This bridges the renderer to the main process via IPC. Each method maps to a handler we'll define next. The renderer never gets direct Node.js access.
  </Step>

  <Step>
    Replace the entire contents of `src/preload/index.d.ts` with the following code:

    ```ts title="src/preload/index.d.ts"
    declare global {
      interface Window {
        qvacAPI: {
          loadModel: () => Promise<string>
          infer: (history: { role: string; content: string }[]) => Promise<void>
          onCompletionStream: (cb: (token: string) => void) => void
          unloadModel: () => Promise<string>
        }
      }
    }

    export {}
    ```

    This tells TypeScript about the `window.qvacAPI` shape so the renderer gets full type checking.
  </Step>

  <Step>
    Replace the entire contents of `src/main/index.ts` with the following code:

    <WrapCode>
      ```ts title="src/main/index.ts" lineNumbers
      import { app, BrowserWindow, ipcMain } from 'electron'
      import { join } from 'path'
      import { electronApp, optimizer, is } from '@electron-toolkit/utils'
      import {
        LLAMA_3_2_1B_INST_Q4_0,
        loadModel,
        unloadModel,
        completion
      } from '@qvac/sdk'

      app.commandLine.appendSwitch('no-sandbox')

      let win: BrowserWindow | null = null
      let modelId: string | null = null

      function createWindow(): void {
        win = new BrowserWindow({
          width: 900,
          height: 670,
          show: false,
          webPreferences: {
            preload: join(__dirname, '../preload/index.js'),
            contextIsolation: true,
            nodeIntegration: false
          }
        })

        win.on('ready-to-show', () => win!.show())

        if (is.dev && process.env['ELECTRON_RENDERER_URL']) {
          win.loadURL(process.env['ELECTRON_RENDERER_URL'])
        } else {
          win.loadFile(join(__dirname, '../renderer/index.html'))
        }
      }

      function setupHandlers(): void {
        ipcMain.handle('load-model', async () => {
          modelId = await loadModel({
            modelSrc: LLAMA_3_2_1B_INST_Q4_0,
            modelType: 'llm',
            onProgress: (progress) => console.log(progress)
          })
          return 'model loaded'
        })

        ipcMain.handle('infer', async (_event, history) => {
          if (!modelId) throw new Error('Model not loaded.')

          const result = completion({ modelId, history, stream: true })
          for await (const token of result.tokenStream) {
            win?.webContents.send('completion-stream', token)
          }
          win?.webContents.send('completion-stream', '')
        })

        ipcMain.handle('unload-model', async () => {
          if (!modelId) throw new Error('Model not loaded.')
          await unloadModel({ modelId })
          modelId = null
          return 'model unloaded'
        })
      }

      app.whenReady().then(() => {
        electronApp.setAppUserModelId('com.electron')
        app.on('browser-window-created', (_, window) => {
          optimizer.watchWindowShortcuts(window)
        })
        createWindow()
        setupHandlers()
      })

      app.on('window-all-closed', () => {
        if (process.platform !== 'darwin') app.quit()
      })
      ```
    </WrapCode>

    The main process now loads `LLAMA_3_2_1B_INST_Q4_0` via `loadModel()`, runs streaming `completion()` calls, and forwards each token to the renderer over IPC. The `no-sandbox` flag is required for QVAC on Linux.
  </Step>

  <Step>
    Replace the entire contents of `src/renderer/src/App.tsx` with the following code:

    <WrapCode>
      ```ts title="src/renderer/src/App.tsx" lineNumbers
      import { useEffect, useRef, useState } from 'react'

      type Message = { role: 'user' | 'assistant'; content: string }

      function App(): React.JSX.Element {
        const [loading, setLoading] = useState(true)
        const [processing, setProcessing] = useState(false)
        const [messages, setMessages] = useState<Message[]>([])
        const [input, setInput] = useState('')
        const bottomRef = useRef<HTMLDivElement>(null)

        useEffect(() => {
          window.qvacAPI.loadModel().then(() => setLoading(false))

          window.qvacAPI.onCompletionStream((token) => {
            if (token === '') {
              setProcessing(false)
            } else {
              setMessages(prev => {
                const updated = [...prev]
                updated[updated.length - 1].content += token
                return updated
              })
            }
          })

          return () => { window.qvacAPI.unloadModel() }
        }, [])

        useEffect(() => {
          bottomRef.current?.scrollIntoView({ behavior: 'smooth' })
        }, [messages])

        const handleSend = (): void => {
          if (!input.trim() || processing || loading) return

          const nextHistory: Message[] = [
            ...messages,
            { role: 'user', content: input }
          ]
          setMessages([...nextHistory, { role: 'assistant', content: '' }])
          window.qvacAPI.infer([
            { role: 'system', content: 'You are a helpful assistant.' },
            ...nextHistory
          ])
          setInput('')
          setProcessing(true)
        }

        return (
          <div className="h-screen flex flex-col bg-zinc-950 text-zinc-100">
            {/* Header */}
            <header className="flex items-center gap-3 px-6 py-4 border-b border-zinc-800">
              <h1 className="text-lg font-semibold">LLM Desktop App</h1>
              <span className="ml-auto flex items-center gap-2 text-sm text-zinc-500">
                <span
                  className={`inline-block w-2 h-2 rounded-full ${
                    loading ? 'bg-amber-400 animate-pulse' : 'bg-emerald-400'
                  }`}
                />
                {loading ? 'Loading model…' : 'Ready'}
              </span>
            </header>

            {/* Messages */}
            <main className="flex-1 overflow-y-auto px-6 py-4 space-y-3">
              {loading ? (
                <div className="flex-1 flex items-center justify-center h-full">
                  <div className="flex gap-1">
                    <span className="w-2 h-2 rounded-full bg-zinc-600 animate-bounce [animation-delay:0ms]" />
                    <span className="w-2 h-2 rounded-full bg-zinc-600 animate-bounce [animation-delay:150ms]" />
                    <span className="w-2 h-2 rounded-full bg-zinc-600 animate-bounce [animation-delay:300ms]" />
                  </div>
                </div>
              ) : (
                messages.map((msg, i) => (
                  <div
                    key={i}
                    className={`flex ${msg.role === 'user' ? 'justify-end' : 'justify-start'}`}
                  >
                    <div
                      className={`max-w-[75%] px-4 py-2.5 rounded-2xl text-sm leading-relaxed ${
                        msg.role === 'user'
                          ? 'bg-indigo-600 text-white rounded-br-md'
                          : 'bg-zinc-800 text-zinc-100 rounded-bl-md'
                      }`}
                    >
                      {msg.content}
                    </div>
                  </div>
                ))
              )}
              <div ref={bottomRef} />
            </main>

            {/* Input */}
            <div className="px-6 py-4 border-t border-zinc-800">
              <div className="flex gap-3">
                <textarea
                  className="flex-1 resize-none rounded-xl bg-zinc-800 px-4 py-3 text-sm outline-none placeholder:text-zinc-500 focus:ring-2 focus:ring-indigo-500/50"
                  rows={1}
                  placeholder="Type a message…"
                  value={input}
                  onChange={e => setInput(e.target.value)}
                  onKeyDown={e => {
                    if (e.key === 'Enter' && !e.shiftKey) {
                      e.preventDefault()
                      handleSend()
                    }
                  }}
                />
                <button
                  onClick={handleSend}
                  disabled={processing || loading}
                  className="self-end rounded-xl bg-indigo-600 px-4 py-3 text-sm font-medium text-white hover:bg-indigo-500 transition-colors disabled:opacity-40 disabled:cursor-not-allowed"
                >
                  Send
                </button>
              </div>
            </div>
          </div>
        )
      }

      export default App
      ```
    </WrapCode>

    The chat UI is now fully wired to QVAC. On mount, it loads the model and subscribes to streaming tokens. `handleSend` pushes the conversation history through the preload bridge, and each token appends to the latest assistant bubble. The `bottomRef` keeps the view scrolled to the newest message.
  </Step>
</Steps>

## Task completed

Run the app from the project root:

```bash
npm run dev
```

On the first run, the model may download from peers (watch the terminal for progress). Once it finishes, type a message and press **Enter** or click **Send** — the response should stream into the UI token by token:

<div style={{ display: "flex", justifyContent: "center" }}>
  <img src="/screenshots/task-completed.png" alt="Task completed" style={{ width: "50%", height: "auto" }} />
</div>


# Build an Expo app (/sdk/tutorials/expo)


## What we'll build

We'll build an LLM chat mobile application using the following stack:

* [**Expo**](https://expo.dev) for for building and running a React Native app; and
* **QVAC** to run LLM inference locally.

## Prerequisites

* npm $\geq$ v10.9
* Linux/macOS (Windows with small adjustments)
* Mobile development environment with a physical device (iOS or Android)

<Callout type="info">
  Due to limitations with `llamacpp`, QVAC currently does not run on emulators.
  You **must** use a physical device.
</Callout>

<Callout title="On Windows" type="info">
  Some commands are Bash‑specific. On Windows, use PowerShell/WSL or adapt them.
</Callout>

## Step 1: set up your development environment

Follow Expo official documentation to set it up on your machine:

* [Expo docs — get started](https://docs.expo.dev/get-started/introduction/)
* [Expo docs — set up your environment](https://docs.expo.dev/get-started/set-up-your-environment/)

Confirm you can run the default template on your device.

## Step 2: set up an Expo project

Let's use the official Expo scaffold to create a minimal app structure.

<Steps>
  <Step>
    Create a new project:

    ```bash
    npx create-expo-app@latest qvac-expo-chat --template blank-typescript@sdk-54
    cd qvac-expo-chat
    ```

    This creates a minimal TypeScript Expo project with a single `App.tsx` entry point.

    <Callout type="info">
      During the SDK 55 transition period, Expo Go on a physical device requires an SDK 54 project. We pin `blank-typescript@sdk-54` to ensure compatibility with Expo Go. See [Expo docs — create a project](https://docs.expo.dev/get-started/create-a-project/) for details.
    </Callout>
  </Step>

  <Step>
    Start the Expo dev server:

    ```bash
    npx expo start
    ```
  </Step>

  <Step>
    Then run the app on your physical device following Expo's official workflow.
  </Step>
</Steps>

At this point, you should see the default Expo screen rendered on your phone. This verifies your Expo setup before we add QVAC.

## Step 3: install QVAC

In this step we'll add the QVAC SDK dependency and complete the Expo-specific installation steps required for running local inference on a physical mobile device.

<Steps>
  <Step>
    Install the SDK:

    ```bash
    npm i @qvac/sdk
    ```
  </Step>

  <Step>
    Add the following peer dependencies to your `package.json`:

    ```json title="package.json"
    {
      "dependencies": {
        "@qvac/sdk": "^0.7.0",
        "bare-rpc": "^1.0.0", // [!code ++]
        "expo": "~54.0.33",
        "expo-status-bar": "~3.0.9",
        "react": "19.1.0",
        "react-native": "0.81.5",
        "react-native-bare-kit": "^0.11.5" // [!code ++]
      },
      "devDependencies": {
        "@types/react": "~19.1.0",
        "bare-pack": "^1.5.1", // [!code ++]
        "typescript": "~5.9.2"
      }
    }
    ```
  </Step>

  <Step>
    Install the dependencies:

    ```bash
    npm install
    npx expo install expo-file-system expo-build-properties expo-device
    ```

    <Callout type="success">
      **Tip:** use `npx expo install` for all `expo-*` packages to ensure compatibility with your project's Expo SDK version.
    </Callout>
  </Step>

  <Step>
    Configure `expo-build-properties` and add `@qvac/sdk/expo-plugin` to the `plugins` array in your `app.json`:

    ```json title="app.json"
    {
      "expo": {
        "plugins": [
          ["expo-build-properties", { // [!code ++]
            "android": { "minSdkVersion": 29 } // [!code ++]
          }], // [!code ++]
          "@qvac/sdk/expo-plugin" // [!code ++]
        ]
      }
    }
    ```
  </Step>

  <Step>
    Prebuild your project to generate the native files:

    ```bash
    npx expo prebuild
    ```
  </Step>
</Steps>

## Step 4: make a smoke test

Before creating the chat UI, we'll run a smoke test to validate the full QVAC lifecycle in an Expo app.

<Steps>
  <Step>
    Replace the contents of `App.tsx` with the following code:

    <WrapCode>
      ```ts title="Apps.tsx" lineNumbers
      import React, { useEffect, useState } from "react";
      import { Platform, SafeAreaView, StatusBar, StyleSheet, Text, View } from "react-native";

      import {
        completion,
        downloadAsset,
        LLAMA_3_2_1B_INST_Q4_0,
        loadModel,
        type ModelProgressUpdate,
        unloadModel,
        VERBOSITY,
      } from "@qvac/sdk";

      export default function App() {
        const [status, setStatus] = useState("Starting…");
        const [output, setOutput] = useState("");
        const [progressPct, setProgressPct] = useState<number | null>(null);

        useEffect(() => {
          // Track model lifecycle and cancellation across async steps.
          let modelId: string | null = null;
          let cancelled = false;

          (async () => {
            try {
              setStatus("Downloading model…");
              await downloadAsset({
                assetSrc: LLAMA_3_2_1B_INST_Q4_0,
                onProgress: (progress: ModelProgressUpdate) => {
                  if (!cancelled) setProgressPct(Math.round(progress.percentage));
                },
              });

              if (cancelled) return;
              setProgressPct(null);

              setStatus("Loading model…");
              modelId = await loadModel({
                modelSrc: LLAMA_3_2_1B_INST_Q4_0,
                modelType: "llm",
                modelConfig: {
                  device: "gpu",
                  ctx_size: 2048,
                  verbosity: VERBOSITY.ERROR,
                },
                onProgress: (progress: ModelProgressUpdate) => {
                  if (!cancelled) setProgressPct(Math.round(progress.percentage));
                },
              });
              setProgressPct(null);

              if (cancelled) return;

              // 3) Run a streaming completion and update UI as tokens arrive.
              setStatus("Running completion…");

              const result = completion({
                modelId,
                history: [{ role: "user", content: "Say hello in one short sentence." }],
                stream: true,
              });

              let acc = "";
              for await (const token of result.tokenStream) {
                acc += token;
                if (!cancelled) setOutput(acc);
              }

              setStatus("Done ✅");
            } catch (e: any) {
              setStatus(`Error: ${e?.message ?? String(e)}`);
            }
          })();

          return () => {
            // Cleanup: cancel any in-flight work and unload the model.
            cancelled = true;
            if (modelId) {
              void unloadModel({ modelId, clearStorage: false }).catch(() => {});
            }
          };
        }, []);

        return (
          <SafeAreaView style={styles.safe}>
            {/* Minimal UI for the smoke test: title, status, streamed output. */}
            <View style={styles.container}>
              <Text style={styles.h1}>QVAC Smoke Test</Text>
              <Text style={styles.status}>
                {status}
                {progressPct != null ? ` (${progressPct}%)` : ""}
              </Text>
              {progressPct != null && (
                <View style={styles.progressBar}>
                  <View style={[styles.progressFill, { width: `${progressPct}%` }]} />
                </View>
              )}
              <Text style={styles.output}>{output}</Text>
            </View>
          </SafeAreaView>
        );
      }

      const styles = StyleSheet.create({
        // Basic dark theme layout.
        safe: { flex: 1, backgroundColor: "#0B0B0F", paddingTop: Platform.OS === "android" ? StatusBar.currentHeight : 0 },
        container: { flex: 1, padding: 16, gap: 12 },
        h1: { color: "white", fontSize: 18, fontWeight: "600" },
        status: { color: "#A7A7B3" },
        progressBar: {
          height: 8,
          backgroundColor: "#1A1A22",
          borderRadius: 4,
          overflow: "hidden",
        },
        progressFill: {
          height: "100%",
          backgroundColor: "#22C55E",
          borderRadius: 4,
        },
        output: { color: "white", fontSize: 16, lineHeight: 22 },
      });
      ```
    </WrapCode>
  </Step>

  <Step>
    Run the app on your physical device:

    ```bash
    # From the project root:
    npx expo run:android --device
    # or
    npx expo run:ios --device
    ```
  </Step>
</Steps>

On the first run, the model may take a while to download and load. Keep an eye on the terminal logs. To confirm the smoke test worked you should see:

* A status line progressing through **Downloading model…**, **Loading model…**, and **Running completion…**
* A short assistant output streaming into the UI
* A final status of **Done ✅**

## Step 5: add the chat UI

Now that QVAC is working in your Expo app, we'll replace the smoke test UI with a minimal chat interface:

* A message list with left/right bubbles
* A text input for composing messages
* Streaming updates into the latest assistant message bubble

<Steps>
  <Step>
    Replace the contents of `App.tsx` with the following code:

    <WrapCode>
      ```ts title="Apps.tsx" lineNumbers
      import React, { useEffect, useMemo, useRef, useState } from "react";
      import {
        ActivityIndicator,
        FlatList,
        KeyboardAvoidingView,
        Platform,
        SafeAreaView,
        StatusBar,
        StyleSheet,
        Text,
        TextInput,
        View,
      } from "react-native";

      import {
        completion,
        downloadAsset,
        LLAMA_3_2_1B_INST_Q4_0,
        loadModel,
        type ModelProgressUpdate,
        unloadModel,
        VERBOSITY,
      } from "@qvac/sdk";

      // Basic chat message shape for the UI.
      type Role = "user" | "assistant";
      type ChatMessage = { id: string; role: Role; content: string };

      function makeId() {
        // Lightweight unique-ish ID for list keys and message tracking.
        return `${Date.now()}-${Math.random().toString(16).slice(2)}`;
      }

      export default function App() {
        // Model lifecycle state.
        const [modelId, setModelId] = useState<string | null>(null);
        const [status, setStatus] = useState<string>("Initializing…");
        const [downloadPct, setDownloadPct] = useState<number | null>(null);

        // Chat UI state.
        const [input, setInput] = useState("");
        const [messages, setMessages] = useState<ChatMessage[]>([]);
        const [isGenerating, setIsGenerating] = useState(false);

        // Keep refs to the list and latest message array for async usage.
        const listRef = useRef<FlatList<ChatMessage>>(null);
        const messagesRef = useRef<ChatMessage[]>([]);
        messagesRef.current = messages;

        // Enable send only when ready and input isn't empty.
        const canSend = useMemo(() => {
          return !!modelId && !isGenerating && input.trim().length > 0;
        }, [modelId, isGenerating, input]);

        // Keep scrolled to bottom as messages grow.
        useEffect(() => {
          const t = setTimeout(() => {
            listRef.current?.scrollToEnd({ animated: true });
          }, 0);
          return () => clearTimeout(t);
        }, [messages]);

        useEffect(() => {
          // Initialize the model once on mount.
          let cancelled = false;

          (async () => {
            try {
              setStatus("Downloading model…");

              await downloadAsset({
                assetSrc: LLAMA_3_2_1B_INST_Q4_0,
                onProgress: (progress: ModelProgressUpdate) => {
                  if (!cancelled) setDownloadPct(Math.round(progress.percentage));
                },
              });

              if (cancelled) return;

              // Load the model into memory so we can run completions.
              setStatus("Loading model into memory…");

              const id = await loadModel({
                modelSrc: LLAMA_3_2_1B_INST_Q4_0,
                modelType: "llm",
                modelConfig: {
                  device: "gpu",
                  ctx_size: 2048,
                  verbosity: VERBOSITY.ERROR,
                },
                onProgress: (progress: ModelProgressUpdate) => {
                  if (!cancelled) setDownloadPct(Math.round(progress.percentage));
                },
              });

              if (cancelled) return;

              setModelId(id);
              setStatus("Ready");
              setDownloadPct(null);
            } catch (e: any) {
              if (!cancelled) {
                setStatus(`Init failed: ${e?.message ?? String(e)}`);
              }
            }
          })();

          return () => {
            // Cleanup on unmount: stop updates and unload the model.
            cancelled = true;

            // Cleanup: unload the model (don’t clear cache by default).
            // Note: React cleanup can’t be async directly, so we fire-and-forget.
            const id = modelId;
            if (id) {
              void unloadModel({ modelId: id, clearStorage: false }).catch(() => {});
            }
          };
          // Intentionally do NOT depend on modelId to avoid re-running init.
          // eslint-disable-next-line react-hooks/exhaustive-deps
        }, []);

        async function handleSend() {
          // Guard against sending before the model is ready or while generating.
          if (!modelId || isGenerating) return;

          const trimmed = input.trim();
          if (!trimmed) return;

          setInput("");
          setIsGenerating(true);

          // Append user message and a placeholder assistant message for streaming.
          const userMsg: ChatMessage = { id: makeId(), role: "user", content: trimmed };
          const assistantId = makeId();
          const assistantMsg: ChatMessage = { id: assistantId, role: "assistant", content: "" };

          setMessages((prev) => [...prev, userMsg, assistantMsg]);

          try {
            // Build chat history for the completion request.
            const history = [...messagesRef.current, userMsg].map((m) => ({
              role: m.role,
              content: m.content,
            }));

            // Run a streaming completion and update the last assistant bubble.
            const result = completion({
              modelId,
              history,
              stream: true,
            });

            let acc = "";

            for await (const token of result.tokenStream) {
              acc += token;

              // Update only the last assistant message content
              setMessages((prev) =>
                prev.map((m) => (m.id === assistantId ? { ...m, content: acc } : m))
              );
            }

            // Optional: stats (log only)
            try {
              const stats = await result.stats;
              console.log("📊 Completion stats:", stats);
            } catch {}
          } catch (e: any) {
            // Show any error in the assistant bubble.
            setMessages((prev) =>
              prev.map((m) =>
                m.id === assistantId
                  ? { ...m, content: `❌ Error: ${e?.message ?? String(e)}` }
                  : m
              )
            );
          } finally {
            setIsGenerating(false);
          }
        }

        return (
          <SafeAreaView style={styles.safe}>
            {/* Chat layout: header, message list, input row, and hint. */}
            <KeyboardAvoidingView
              style={styles.safe}
              behavior="padding"
              keyboardVerticalOffset={Platform.OS === "ios" ? 8 : StatusBar.currentHeight || 0}
            >
              <View style={styles.header}>
                <Text style={styles.title}>QVAC Expo Chat</Text>
                <Text style={styles.subtitle}>
                  {status}
                  {downloadPct != null ? ` (${downloadPct}%)` : ""}
                </Text>
                {downloadPct != null && (
                  <View style={styles.progressBar}>
                    <View style={[styles.progressFill, { width: `${downloadPct}%` }]} />
                  </View>
                )}
              </View>

              <View style={styles.chat}>
                <FlatList
                  ref={listRef}
                  data={messages}
                  keyExtractor={(m) => m.id}
                  renderItem={({ item }) => (
                    <View
                      style={[
                        styles.bubble,
                        item.role === "user" ? styles.bubbleUser : styles.bubbleAssistant,
                      ]}
                    >
                      <Text style={styles.bubbleText}>{item.content}</Text>
                    </View>
                  )}
                  contentContainerStyle={styles.chatContent}
                />
              </View>

              <View style={styles.inputRow}>
                <TextInput
                  style={styles.input}
                  value={input}
                  onChangeText={setInput}
                  placeholder={modelId ? "Type a message…" : "Loading model…"}
                  editable={!!modelId && !isGenerating}
                  returnKeyType="send"
                  onSubmitEditing={handleSend}
                  blurOnSubmit={false}
                />
                {isGenerating ? <ActivityIndicator /> : null}
              </View>

              <Text style={styles.hint}>
                Press “send/enter” to submit. Messages are streamed token-by-token.
              </Text>
            </KeyboardAvoidingView>
          </SafeAreaView>
        );
      }

      const styles = StyleSheet.create({
        // Simple dark theme chat UI.
        safe: { flex: 1, backgroundColor: "#0B0B0F", paddingTop: Platform.OS === "android" ? StatusBar.currentHeight : 0 },
        header: { paddingHorizontal: 16, paddingTop: 12, paddingBottom: 8 },
        title: { color: "white", fontSize: 18, fontWeight: "600" },
        subtitle: { color: "#A7A7B3", marginTop: 4 },
        progressBar: {
          height: 8,
          backgroundColor: "#1A1A22",
          borderRadius: 4,
          overflow: "hidden",
          marginTop: 8,
        },
        progressFill: {
          height: "100%",
          backgroundColor: "#22C55E",
          borderRadius: 4,
        },

        chat: { flex: 1 },
        chatContent: { paddingHorizontal: 16, paddingVertical: 12, gap: 10 },

        bubble: {
          maxWidth: "85%",
          paddingHorizontal: 12,
          paddingVertical: 10,
          borderRadius: 14,
        },
        bubbleUser: {
          alignSelf: "flex-end",
          backgroundColor: "#2B2BFF",
        },
        bubbleAssistant: {
          alignSelf: "flex-start",
          backgroundColor: "#1A1A22",
        },
        bubbleText: { color: "white", lineHeight: 20 },

        inputRow: {
          paddingHorizontal: 16,
          paddingVertical: 10,
          borderTopWidth: StyleSheet.hairlineWidth,
          borderTopColor: "#2A2A33",
          flexDirection: "row",
          gap: 10,
          alignItems: "center",
        },
        input: {
          flex: 1,
          backgroundColor: "#121219",
          color: "white",
          paddingHorizontal: 12,
          paddingVertical: 10,
          borderRadius: 12,
        },
        hint: {
          paddingHorizontal: 16,
          paddingBottom: 12,
          color: "#7E7E8A",
          fontSize: 12,
        },
      });
      ```
    </WrapCode>
  </Step>
</Steps>

The assistant response is streamed **token by token** as the model generates text.

## Task completed

Run the app again on your physical device:

```bash
npx expo run:ios --device
# or
npx expo run:android --device
```

On the first run, the model may download from peers (watch the terminal for progress). Once it finishes, type a message and press **Enter** or click **Send** — the response should stream into the UI token by token:


# cancel( ) (/sdk/api/cancel)


```ts
function cancel(params: CancelParams): Promise<void>;
```

## Parameters

| Name   | Type                            | Required? | Description                         |
| ------ | ------------------------------- | :-------: | ----------------------------------- |
| params | [`CancelParams`](#cancelparams) |     ✓     | The parameters for the cancellation |

### `CancelParams`

Discriminated union on `operation`. One of:

#### Cancel inference

| Field     | Type          | Required? | Description                          |
| --------- | ------------- | :-------: | ------------------------------------ |
| operation | `"inference"` |     ✓     | Operation type                       |
| modelId   | `string`      |     ✓     | The model ID to cancel inference for |

#### Cancel download

| Field       | Type              | Required? | Default | Description                                |
| ----------- | ----------------- | :-------: | ------- | ------------------------------------------ |
| operation   | `"downloadAsset"` |     ✓     | —       | Operation type                             |
| downloadKey | `string`          |     ✓     | —       | The download key to cancel                 |
| clearCache  | `boolean`         |     ✗     | `false` | If true, deletes the partial download file |

#### Cancel RAG

| Field     | Type     | Required? | Default     | Description                 |
| --------- | -------- | :-------: | ----------- | --------------------------- |
| operation | `"rag"`  |     ✓     | —           | Operation type              |
| workspace | `string` |     ✗     | `"default"` | The RAG workspace to cancel |

## Returns

`Promise<void>` — Resolves when the operation is cancelled.

## Throws

| Error                   | When                                             |
| ----------------------- | ------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"cancel"` |
| `CANCEL_FAILED`         | The server reports cancellation failure          |

## Examples

```ts
// Cancel inference
await cancel({ operation: "inference", modelId: "model-123" });
```

```ts
// Pause download (preserves partial file for automatic resume)
await cancel({ operation: "downloadAsset", downloadKey: "download-key" });
```

```ts
// Cancel download completely (deletes partial file)
await cancel({ operation: "downloadAsset", downloadKey: "download-key", clearCache: true });
```

```ts
// Cancel RAG operation on default workspace
await cancel({ operation: "rag" });
```

```ts
// Cancel RAG operation on specific workspace
await cancel({ operation: "rag", workspace: "my-workspace" });
```


# close( ) (/sdk/api/close)


```ts
function close(): Promise<void>;
```

Safe to call multiple times — subsequent calls are a no-op if already closed.

## Returns

`Promise<void>` — Resolves when the connection is closed.

## Example

```typescript
import { loadModel, completion, close } from "@qvac/sdk";

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
});

const result = completion({
  modelId,
  history: [{ role: "user", content: "Hello" }],
});

console.log(await result.text);

await close();
```


# completion( ) (/sdk/api/completion)


```ts
function completion(params: CompletionParams): {
  tokenStream: AsyncGenerator<string>;
  toolCallStream: AsyncGenerator<ToolCallEvent>;
  text: Promise<string>;
  toolCalls: Promise<ToolCallWithCall[]>;
  stats: Promise<CompletionStats | undefined>;
};
```

## Parameters

| Name   | Type                                    | Required? | Description               |
| ------ | --------------------------------------- | :-------: | ------------------------- |
| params | [`CompletionParams`](#completionparams) |     ✓     | The completion parameters |

### `CompletionParams`

| Field            | Type                                               | Required? | Default | Description                                                         |
| ---------------- | -------------------------------------------------- | :-------: | ------- | ------------------------------------------------------------------- |
| modelId          | `string`                                           |     ✓     | —       | The identifier of the model to use for completion                   |
| history          | [`HistoryMessage[]`](#historymessage)              |     ✓     | —       | Array of conversation messages                                      |
| stream           | `boolean`                                          |     ✗     | `true`  | Whether to stream tokens or return complete response                |
| tools            | [`Tool[]`](#tool) `\|` [`ToolInput[]`](#toolinput) |     ✗     | —       | Optional array of tools (Zod-schema ToolInput or full Tool objects) |
| mcp              | [`McpClientInput[]`](#mcpclientinput)              |     ✗     | —       | Optional array of MCP client inputs for tool integration            |
| kvCache          | `boolean \| string`                                |     ✗     | —       | KV cache configuration — see [kvCache](#kvcache)                    |
| generationParams | [`GenerationParams`](#generationparams)            |     ✗     | —       | Optional sampling / generation parameters                           |
| rpcOptions       | [`RPCOptions`](../index#rpcoptions)                |     ✗     | —       | Optional RPC transport options                                      |

#### `HistoryMessage`

| Field       | Type                          | Required? | Description                                              |
| ----------- | ----------------------------- | :-------: | -------------------------------------------------------- |
| role        | `string`                      |     ✓     | Message role (e.g., `"user"`, `"assistant"`, `"system"`) |
| content     | `string`                      |     ✓     | Message content                                          |
| attachments | [`Attachment[]`](#attachment) |     ✗     | Optional file attachments for multimodal models          |

##### `Attachment`

| Field | Type     | Required? | Description                 |
| ----- | -------- | :-------: | --------------------------- |
| path  | `string` |     ✓     | File path to the attachment |

#### `Tool`

Full tool definition in JSON Schema format (sent to the model):

| Field                 | Type                                            | Required? | Description                                    |
| --------------------- | ----------------------------------------------- | :-------: | ---------------------------------------------- |
| type                  | `"function"`                                    |     ✓     | Always `"function"`                            |
| name                  | `string`                                        |     ✓     | Tool name                                      |
| description           | `string`                                        |     ✓     | What the tool does                             |
| parameters            | `object`                                        |     ✓     | JSON Schema object describing the tool's input |
| parameters.type       | `"object"`                                      |     ✓     | Always `"object"`                              |
| parameters.properties | `Record<string, { type, description?, enum? }>` |     ✓     | Parameter definitions                          |
| parameters.required   | `string[]`                                      |     ✗     | Required parameter names                       |

#### `ToolInput`

Simplified tool definition using Zod schemas (auto-converted to `Tool` internally):

| Field       | Type                                                  | Required? | Description                                                                                        |
| ----------- | ----------------------------------------------------- | :-------: | -------------------------------------------------------------------------------------------------- |
| name        | `string`                                              |     ✓     | Tool name                                                                                          |
| description | `string`                                              |     ✓     | What the tool does                                                                                 |
| parameters  | `ZodObject`                                           |     ✓     | Zod schema describing the tool's input                                                             |
| handler     | `(args: Record<string, unknown>) => Promise<unknown>` |     ✗     | Handler function — when provided, returned `ToolCallWithCall` objects include an `invoke()` method |

#### `McpClientInput`

MCP (Model Context Protocol) client input for tool integration:

| Field            | Type                      | Required? | Description                      |
| ---------------- | ------------------------- | :-------: | -------------------------------- |
| client           | [`McpClient`](#mcpclient) |     ✓     | An MCP client instance           |
| includeResources | `boolean`                 |     ✗     | Whether to include MCP resources |

##### `McpClient`

Duck-typed MCP client interface:

| Method        | Signature                                                     | Required? | Description               |
| ------------- | ------------------------------------------------------------- | :-------: | ------------------------- |
| listTools     | `() => Promise<{ tools: McpTool[] }>`                         |     ✓     | Lists available tools     |
| callTool      | `(params: { name, arguments }) => Promise<McpToolCallResult>` |     ✓     | Calls a tool by name      |
| listResources | `() => Promise<{ resources: McpResource[] }>`                 |     ✗     | Lists available resources |
| readResource  | `(params: { uri }) => Promise<{ contents: unknown[] }>`       |     ✗     | Reads a resource by URI   |

#### `kvCache`

Cache files are organized hierarchically: `{kvCacheKey}/{modelId}/{configHash}.bin`

The `configHash` includes model config + system prompt to ensure cache isolation.

| Value                 | Behavior                                                       |
| --------------------- | -------------------------------------------------------------- |
| `true`                | Auto-generate cache key based on conversation history          |
| `"custom-key"`        | Use provided string as cache key for manual session management |
| `false` / `undefined` | No caching                                                     |

When cache exists, only the last message is sent to the model (includes multimodal attachments). Use [`deleteCache()`](../deleteCache) to remove cached sessions.

#### `GenerationParams`

Optional sampling and generation parameters (strict — no extra keys allowed):

| Field              | Type     | Required? | Description                                                                 |
| ------------------ | -------- | :-------: | --------------------------------------------------------------------------- |
| temp               | `number` |     ✗     | Temperature (0–2)                                                           |
| top\_p             | `number` |     ✗     | Top-p (nucleus) sampling (0–1)                                              |
| top\_k             | `number` |     ✗     | Top-k sampling                                                              |
| predict            | `number` |     ✗     | Max tokens to predict. `-1` = until stop token, `-2` = until context filled |
| seed               | `number` |     ✗     | Random seed for reproducibility                                             |
| frequency\_penalty | `number` |     ✗     | Frequency penalty                                                           |
| presence\_penalty  | `number` |     ✗     | Presence penalty                                                            |
| repeat\_penalty    | `number` |     ✗     | Repeat penalty                                                              |

## Returns

`object` — Object with the following fields:

| Field          | Type                                                            | Description                                                                          |
| -------------- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| tokenStream    | `AsyncGenerator<string>`                                        | Stream of generated tokens                                                           |
| toolCallStream | `AsyncGenerator<`[`ToolCallEvent`](#toolcallevent)`>`           | Stream of tool call events                                                           |
| text           | `Promise<string>`                                               | Complete generated text (resolves after stream ends)                                 |
| toolCalls      | `Promise<`[`ToolCallWithCall[]`](#toolcallwithcall)`>`          | Tool calls made during completion (may include `invoke()` when handler is available) |
| stats          | `Promise<`[`CompletionStats`](#completionstats) `\| undefined>` | Performance statistics                                                               |

### `CompletionStats`

| Field            | Type                          | Description                         |
| ---------------- | ----------------------------- | ----------------------------------- |
| timeToFirstToken | `number`                      | Time to first token in milliseconds |
| tokensPerSecond  | `number`                      | Tokens generated per second         |
| cacheTokens      | `number`                      | Number of cached tokens             |
| backendDevice    | `"cpu" \| "gpu" \| undefined` | Compute backend used for inference  |

### `ToolCallEvent`

Discriminated union on `type`. One of:

**Tool call:**

| Field          | Type                      | Description              |
| -------------- | ------------------------- | ------------------------ |
| type           | `"toolCall"`              | Event type               |
| call.id        | `string`                  | Call identifier          |
| call.name      | `string`                  | Tool name                |
| call.arguments | `Record<string, unknown>` | Tool arguments           |
| call.raw       | `string`                  | Raw call text (optional) |

**Tool call error:**

| Field         | Type                                                    | Description         |
| ------------- | ------------------------------------------------------- | ------------------- |
| type          | `"toolCallError"`                                       | Event type          |
| error.code    | `"PARSE_ERROR" \| "VALIDATION_ERROR" \| "UNKNOWN_TOOL"` | Error code          |
| error.message | `string`                                                | Error message       |
| error.raw     | `string`                                                | Raw text (optional) |

### `ToolCallWithCall`

Extends `ToolCall` with an optional `invoke()` method:

| Field     | Type                      | Description                                                              |
| --------- | ------------------------- | ------------------------------------------------------------------------ |
| id        | `string`                  | Call identifier                                                          |
| name      | `string`                  | Tool name                                                                |
| arguments | `Record<string, unknown>` | Tool arguments                                                           |
| raw       | `string`                  | Raw call text (optional)                                                 |
| invoke    | `() => Promise<unknown>`  | Executes the tool handler (present when a matching handler was provided) |

## Throws

| Error                 | When                         |
| --------------------- | ---------------------------- |
| `INVALID_TOOLS_ARRAY` | Invalid tools array provided |
| `INVALID_TOOL_SCHEMA` | A tool has an invalid schema |

## Example

```typescript
import { z } from "zod";

const result = completion({
  modelId: "llama-2",
  history: [
    { role: "user", content: "What's the weather in Tokyo?" }
  ],
  stream: true,
  tools: [{
    name: "get_weather",
    description: "Get current weather",
    parameters: z.object({
      city: z.string().describe("City name"),
    }),
    handler: async (args) => {
      return { temperature: 22, condition: "sunny" };
    }
  }],
  generationParams: {
    temp: 0.7,
    top_p: 0.9,
  }
});

for await (const token of result.tokenStream) {
  process.stdout.write(token);
}

for (const toolCall of await result.toolCalls) {
  if (toolCall.invoke) {
    const toolResult = await toolCall.invoke();
    console.log(toolResult);
  }
}
```


# defineDuplexHandler( ) (/sdk/api/defineDuplexHandler)


```ts
function defineDuplexHandler<TRequest extends ZodType, TResponse extends ZodType>(
  definition: DuplexPluginHandlerDefinition<TRequest, TResponse>
): PluginHandlerDefinition<TRequest, TResponse>;
```

## Parameters

| Name       | Type                                                              | Required? | Description                                                     |
| ---------- | ----------------------------------------------------------------- | :-------: | --------------------------------------------------------------- |
| definition | [`DuplexPluginHandlerDefinition`](#duplexpluginhandlerdefinition) |     ✓     | The duplex handler definition with schemas and handler function |

### `DuplexPluginHandlerDefinition`

| Field          | Type                                                                        | Required? | Description                                                                                               |
| -------------- | --------------------------------------------------------------------------- | :-------: | --------------------------------------------------------------------------------------------------------- |
| requestSchema  | `ZodType`                                                                   |     ✓     | Zod schema for validating incoming requests                                                               |
| responseSchema | `ZodType`                                                                   |     ✓     | Zod schema for validating outgoing responses                                                              |
| streaming      | `true`                                                                      |     ✓     | Must be `true` — duplex handlers are always streaming                                                     |
| duplex         | `true`                                                                      |     ✓     | Must be `true` — marks this handler as bidirectional                                                      |
| handler        | `(request, inputStream: AsyncIterable<Buffer>) => AsyncGenerator<response>` |     ✓     | The handler function — receives a validated request and an input stream, yields validated response chunks |

## Returns

`PluginHandlerDefinition<TRequest, TResponse>` — The same definition object, with full type inference applied. This is an identity function used for type checking.


# defineHandler( ) (/sdk/api/defineHandler)


```ts
function defineHandler<TRequest extends ZodType, TResponse extends ZodType>(
  definition: PluginHandlerDefinition<TRequest, TResponse>
): PluginHandlerDefinition<TRequest, TResponse>;
```

## Parameters

| Name       | Type                                                  | Required? | Description                                              |
| ---------- | ----------------------------------------------------- | :-------: | -------------------------------------------------------- |
| definition | [`PluginHandlerDefinition`](#pluginhandlerdefinition) |     ✓     | The handler definition with schemas and handler function |

### `PluginHandlerDefinition`

| Field          | Type                                                         | Required? | Description                                                                   |
| -------------- | ------------------------------------------------------------ | :-------: | ----------------------------------------------------------------------------- |
| requestSchema  | `ZodType`                                                    |     ✓     | Zod schema for validating incoming requests                                   |
| responseSchema | `ZodType`                                                    |     ✓     | Zod schema for validating outgoing responses                                  |
| streaming      | `boolean`                                                    |     ✓     | Whether this handler uses streaming responses                                 |
| handler        | `(request) => Promise<response> \| AsyncGenerator<response>` |     ✓     | The handler function — receives validated request, returns validated response |

## Returns

`PluginHandlerDefinition<TRequest, TResponse>` — The same definition object, with full type inference applied. This is an identity function used for type checking.


# definePlugin( ) (/sdk/api/definePlugin)


```ts
function definePlugin<T extends QvacPlugin>(plugin: T): T;
```

## Parameters

| Name   | Type                        | Required? | Description           |
| ------ | --------------------------- | :-------: | --------------------- |
| plugin | [`QvacPlugin`](#qvacplugin) |     ✓     | The plugin definition |

### `QvacPlugin`

| Field                          | Type                                                                                                 | Required? | Description                                                                                                                                                                                                  |
| ------------------------------ | ---------------------------------------------------------------------------------------------------- | :-------: | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| modelType                      | `string`                                                                                             |     ✓     | Unique identifier for the model type this plugin handles                                                                                                                                                     |
| displayName                    | `string`                                                                                             |     ✓     | Human-readable name for the plugin                                                                                                                                                                           |
| addonPackage                   | `string`                                                                                             |     ✓     | The npm package name of the addon this plugin wraps                                                                                                                                                          |
| createModel                    | `(params:` [`CreateModelParams`](#createmodelparams)`) =>` [`PluginModelResult`](#pluginmodelresult) |     ✓     | Factory function that creates a model instance                                                                                                                                                               |
| handlers                       | `Record<string,` [`PluginHandlerDefinition`](../defineHandler#pluginhandlerdefinition)`>`            |     ✓     | Map of handler names to handler definitions                                                                                                                                                                  |
| logging                        | [`PluginLogging`](#pluginlogging)                                                                    |     ✗     | Optional logging configuration                                                                                                                                                                               |
| loadConfigSchema               | `ZodType`                                                                                            |     ✗     | Zod schema used to validate and parse the `modelConfig` passed to `loadModel()`                                                                                                                              |
| resolveConfig                  | `(modelConfig, ctx) => Promise<ResolveResult>`                                                       |     ✗     | Optional hook to resolve model sources in modelConfig to local paths. Called before `createModel` if the plugin needs to download/resolve artifacts. Returns transformed config and optional artifact paths. |
| skipPrimaryModelPathValidation | `boolean`                                                                                            |     ✗     | When true, skips file-existence validation for modelPath. Use for plugins that derive paths from config.                                                                                                     |

### `CreateModelParams`

| Field       | Type                      | Required? | Description                                                                               |
| ----------- | ------------------------- | :-------: | ----------------------------------------------------------------------------------------- |
| modelId     | `string`                  |     ✓     | The model identifier                                                                      |
| modelPath   | `string`                  |     ✓     | Path to the model file                                                                    |
| modelConfig | `Record<string, unknown>` |     ✗     | Model-specific configuration                                                              |
| modelName   | `string`                  |     ✗     | Human-readable model name                                                                 |
| artifacts   | `Record<string, string>`  |     ✗     | Additional file paths (e.g., `projectionModelPath`, `vadModelPath`, `ttsConfigModelPath`) |

### `PluginModelResult`

| Field  | Type                          | Description                      |
| ------ | ----------------------------- | -------------------------------- |
| model  | [`PluginModel`](#pluginmodel) | The model instance               |
| loader | `unknown`                     | Engine-specific loader reference |

### `PluginModel`

| Method | Signature                            | Required? | Description                 |
| ------ | ------------------------------------ | :-------: | --------------------------- |
| load   | `(force?: boolean) => Promise<void>` |     ✓     | Loads the model into memory |
| unload | `() => void \| Promise<void>`        |     ✗     | Releases model resources    |

### `PluginLogging`

| Field     | Type      | Required? | Description              |
| --------- | --------- | :-------: | ------------------------ |
| module    | `unknown` |     ✓     | The addon logging module |
| namespace | `string`  |     ✓     | Logger namespace         |

## Returns

`T` — The same plugin object, with full type inference applied. This is an identity function used for type checking.

## Note

See the [Write a custom plugin](/sdk/examples/utilities/write-custom-plugin) guide for a full walkthrough.


# deleteCache( ) (/sdk/api/deleteCache)


```ts
function deleteCache(params: DeleteCacheParams): Promise<{ success: boolean }>;
```

## Parameters

| Name   | Type                                      | Required? | Description                 |
| ------ | ----------------------------------------- | :-------: | --------------------------- |
| params | [`DeleteCacheParams`](#deletecacheparams) |     ✓     | The delete cache parameters |

### `DeleteCacheParams`

Union type. One of:

#### Delete all caches

| Field | Type   | Required? | Description             |
| ----- | ------ | :-------: | ----------------------- |
| all   | `true` |     ✓     | Deletes all cache files |

#### Delete by cache key

| Field      | Type     | Required? | Description                                                                                      |
| ---------- | -------- | :-------: | ------------------------------------------------------------------------------------------------ |
| kvCacheKey | `string` |     ✓     | The cache key to delete                                                                          |
| modelId    | `string` |     ✗     | Specific model ID to delete within the cache key. If not provided, deletes the entire cache key. |

## Returns

`Promise<{ success: boolean }>` — Resolves with the success status.

## Throws

| Error                         | When                                        |
| ----------------------------- | ------------------------------------------- |
| `INVALID_DELETE_CACHE_PARAMS` | Neither `all` nor `kvCacheKey` was provided |
| `DELETE_CACHE_FAILED`         | The server reports cache deletion failure   |

## Examples

```typescript
// Delete all caches
await deleteCache({ all: true });

// Delete entire cache key (all models)
await deleteCache({ kvCacheKey: "my-session" });

// Delete only specific model within cache key
await deleteCache({ kvCacheKey: "my-session", modelId: "model-abc123" });
```


# diffusion( ) (/sdk/api/diffusion)


```ts
function diffusion(params: DiffusionClientParams): {
  progressStream: AsyncGenerator<DiffusionProgressTick>;
  outputs: Promise<Uint8Array[]>;
  stats: Promise<DiffusionStats | undefined>;
};
```

## Parameters

| Name   | Type                                              | Required? | Description              |
| ------ | ------------------------------------------------- | :-------: | ------------------------ |
| params | [`DiffusionClientParams`](#diffusionclientparams) |     ✓     | The diffusion parameters |

### `DiffusionClientParams`

| Field            | Type                                | Required? | Description                                                                                       |
| ---------------- | ----------------------------------- | :-------: | ------------------------------------------------------------------------------------------------- |
| modelId          | `string`                            |     ✓     | The identifier of the loaded diffusion model                                                      |
| prompt           | `string`                            |     ✓     | Text prompt describing the image to generate                                                      |
| negative\_prompt | `string`                            |     ✗     | Text describing what to avoid in the generated image                                              |
| width            | `number`                            |     ✗     | Image width in pixels (must be a multiple of 8)                                                   |
| height           | `number`                            |     ✗     | Image height in pixels (must be a multiple of 8)                                                  |
| steps            | `number`                            |     ✗     | Number of diffusion steps                                                                         |
| cfg\_scale       | `number`                            |     ✗     | Classifier-free guidance scale for SD 1.x / 2.x / XL / SD3 models (typical range 1–20, default 7) |
| guidance         | `number`                            |     ✗     | Distilled guidance for FLUX models (typical range 1–10, default 3.5)                              |
| sampling\_method | [`SamplingMethod`](#samplingmethod) |     ✗     | Sampling algorithm                                                                                |
| scheduler        | [`Scheduler`](#scheduler)           |     ✗     | Noise scheduler                                                                                   |
| seed             | `number`                            |     ✗     | Random seed for reproducibility                                                                   |
| batch\_count     | `number`                            |     ✗     | Number of images to generate                                                                      |
| vae\_tiling      | `boolean`                           |     ✗     | Enable VAE tiling for large images on limited VRAM                                                |
| cache\_preset    | `string`                            |     ✗     | Cache preset identifier                                                                           |

#### `SamplingMethod`

`"euler" | "euler_a" | "heun" | "dpm2" | "dpm++2m" | "dpm++2mv2" | "dpm++2s_a" | "lcm" | "ipndm" | "ipndm_v" | "ddim_trailing" | "tcd" | "res_multistep" | "res_2s"`

#### `Scheduler`

`"discrete" | "karras" | "exponential" | "ays" | "gits" | "sgm_uniform" | "simple" | "lcm" | "smoothstep" | "kl_optimal" | "bong_tangent"`

## Returns

`object` — Object with the following fields:

| Field          | Type                                                                  | Description                                                  |
| -------------- | --------------------------------------------------------------------- | ------------------------------------------------------------ |
| progressStream | `AsyncGenerator<`[`DiffusionProgressTick`](#diffusionprogresstick)`>` | Stream of generation progress ticks                          |
| outputs        | `Promise<Uint8Array[]>`                                               | Generated image buffers (resolves when generation completes) |
| stats          | `Promise<`[`DiffusionStats`](#diffusionstats) `\| undefined>`         | Performance statistics                                       |

### `DiffusionProgressTick`

| Field      | Type     | Description                  |
| ---------- | -------- | ---------------------------- |
| step       | `number` | Current diffusion step       |
| totalSteps | `number` | Total number of steps        |
| elapsedMs  | `number` | Elapsed time in milliseconds |

### `DiffusionStats`

| Field             | Type                  | Description                            |
| ----------------- | --------------------- | -------------------------------------- |
| modelLoadMs       | `number \| undefined` | Model loading time in milliseconds     |
| generationMs      | `number \| undefined` | Single generation time in milliseconds |
| totalGenerationMs | `number \| undefined` | Total generation time in milliseconds  |
| totalWallMs       | `number \| undefined` | Total wall-clock time in milliseconds  |
| totalSteps        | `number \| undefined` | Total diffusion steps performed        |
| totalGenerations  | `number \| undefined` | Number of generations completed        |
| totalImages       | `number \| undefined` | Number of images produced              |
| totalPixels       | `number \| undefined` | Total pixels generated                 |
| width             | `number \| undefined` | Output image width                     |
| height            | `number \| undefined` | Output image height                    |
| seed              | `number \| undefined` | Seed used for generation               |

## Example

```typescript
import fs from "fs";

// Basic usage
const { outputs, stats } = diffusion({ modelId, prompt: "a cat" });
const buffers = await outputs;
fs.writeFileSync("output.png", buffers[0]);

// With progress tracking
const { progressStream, outputs: images } = diffusion({
  modelId,
  prompt: "a cat sitting on a windowsill",
  width: 512,
  height: 512,
  steps: 20,
  cfg_scale: 7,
});

for await (const { step, totalSteps } of progressStream) {
  console.log(`${step}/${totalSteps}`);
}

const imageBuffers = await images;
```


# downloadAsset( ) (/sdk/api/downloadAsset)


```ts
function downloadAsset(options: DownloadAssetOptions, rpcOptions?: RPCOptions): Promise<string>;
```

## Parameters

| Name       | Type                                            | Required? | Description                    |
| ---------- | ----------------------------------------------- | :-------: | ------------------------------ |
| options    | [`DownloadAssetOptions`](#downloadassetoptions) |     ✓     | Download configuration         |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)             |     ✗     | Optional RPC transport options |

### `DownloadAssetOptions`

| Field      | Type                                                                              | Required? | Description                                                                                           |
| ---------- | --------------------------------------------------------------------------------- | :-------: | ----------------------------------------------------------------------------------------------------- |
| assetSrc   | `string`                                                                          |     ✓     | The location from which the asset is downloaded (local path, remote URL, or Hyperdrive `pear://` URL) |
| seed       | `boolean`                                                                         |     ✗     | Whether to seed the asset on Hyperdrive after download                                                |
| onProgress | `(progress:` [`ModelProgressUpdate`](../loadModel#modelprogressupdate)`) => void` |     ✗     | Callback for real-time download progress updates. When provided, streaming is used.                   |

## Returns

`Promise<string>` — Resolves to the asset ID (either the provided `assetSrc` or a generated ID).

## Throws

| Error                           | When                                                              |
| ------------------------------- | ----------------------------------------------------------------- |
| `DOWNLOAD_ASSET_FAILED`         | The download operation fails                                      |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends without a final response (when using `onProgress`) |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"downloadAsset"`           |

## Example

```typescript
// Download model without loading
const assetId = await downloadAsset({
  assetSrc: "/path/to/model.gguf",
  seed: true
});

// Download with progress tracking
const assetId = await downloadAsset({
  assetSrc: "pear://key123/model.gguf",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});
```


# embed( ) (/sdk/api/embed)


```ts
function embed(params: { modelId: string; text: string }, options?: RPCOptions): Promise<{ embedding: number[]; stats?: EmbedStats }>;
function embed(params: { modelId: string; text: string[] }, options?: RPCOptions): Promise<{ embedding: number[][]; stats?: EmbedStats }>;
```

## Parameters

| Name    | Type                                      | Required? | Description                    |
| ------- | ----------------------------------------- | :-------: | ------------------------------ |
| params  | [`EmbedParams`](#embedparams)             |     ✓     | The embedding parameters       |
| options | [`RPCOptions`](./shared-types#rpcoptions) |     ✗     | Optional RPC transport options |

### `EmbedParams`

| Field   | Type                 | Required? | Description                                                                                    |
| ------- | -------------------- | :-------: | ---------------------------------------------------------------------------------------------- |
| modelId | `string`             |     ✓     | The identifier of the embedding model to use                                                   |
| text    | `string \| string[]` |     ✓     | The input text(s) to embed. A single string returns `number[]`; an array returns `number[][]`. |

## Returns

`Promise<object>` — Resolves to an object with the following fields:

| Field     | Type                                        | Description                                                                                               |
| --------- | ------------------------------------------- | --------------------------------------------------------------------------------------------------------- |
| embedding | `number[] \| number[][]`                    | The embedding vector(s). Single `number[]` when `text` is a string; `number[][]` when `text` is an array. |
| stats     | [`EmbedStats`](#embedstats) ` \| undefined` | Performance statistics                                                                                    |

### `EmbedStats`

| Field           | Type                          | Description                          |
| --------------- | ----------------------------- | ------------------------------------ |
| totalTime       | `number \| undefined`         | Total embedding time in milliseconds |
| tokensPerSecond | `number \| undefined`         | Tokens processed per second          |
| totalTokens     | `number \| undefined`         | Total tokens processed               |
| backendDevice   | `"cpu" \| "gpu" \| undefined` | Compute backend used for inference   |

## Throws

| Error                   | When                                            |
| ----------------------- | ----------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"embed"` |

## Examples

```typescript
// Single text
const { embedding, stats } = await embed({
  modelId: "embedding-model",
  text: "Hello world",
});
console.log(embedding.length); // e.g. 384
console.log(stats?.tokensPerSecond);

// Multiple texts (batch)
const { embedding: vectors } = await embed({
  modelId: "embedding-model",
  text: ["Hello world", "How are you?"],
});
console.log(vectors.length); // 2
```


# Errors (/sdk/api/errors)


## Example

```typescript
import { SDK_CLIENT_ERROR_CODES, SDK_SERVER_ERROR_CODES } from "@qvac/sdk";

try {
  await loadModel({ modelSrc: "/path/to/model.gguf", modelType: "llm" });
} catch (error) {
  if (error.code === SDK_SERVER_ERROR_CODES.MODEL_LOAD_FAILED) {
    // handle model load failure
  }
}
```

## Client errors

Thrown on the client side (response validation, RPC, provider). Access via `SDK_CLIENT_ERROR_CODES.{ERROR_NAME}`.

| Error                             | Code  | Summary                                                     | Thrown by                                                                                                                                                                                                                                                                       |
| --------------------------------- | ----- | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE`           | 50001 | Invalid response type received.                             | `cancel()`, `downloadAsset()`, `embed()`, `finetune()`, `getModelInfo()`, `heartbeat()`, `loadModel()`, `loggingStream()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`, `resume()`, `startQVACProvider()`, `stopQVACProvider()`, `suspend()`, `unloadModel()` |
| `INVALID_OPERATION_IN_RESPONSE`   | 50002 | Response operation didn't match the expected RAG operation. | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                                                                   |
| `STREAM_ENDED_WITHOUT_RESPONSE`   | 50003 | Streaming RPC ended without a final response.               | `downloadAsset()`, `finetune()`, `loadModel()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                   |
| `INVALID_AUDIO_CHUNK_TYPE`        | 50004 | Invalid audio chunk input type provided.                    | `transcribe()`, `transcribeStream()`                                                                                                                                                                                                                                            |
| `INVALID_TOOLS_ARRAY`             | 50005 | Invalid tools array provided.                               | `completion()`                                                                                                                                                                                                                                                                  |
| `INVALID_TOOL_SCHEMA`             | 50006 | Invalid tool schema provided.                               | `completion()`                                                                                                                                                                                                                                                                  |
| `OCR_FAILED`                      | 50007 | OCR operation failed.                                       | `ocr()`                                                                                                                                                                                                                                                                         |
| `RPC_NO_HANDLER`                  | 50200 | No handler registered for request type.                     | Internal RPC layer                                                                                                                                                                                                                                                              |
| `RPC_REQUEST_NOT_SENT`            | 50201 | Request has not been sent yet.                              | Internal RPC layer                                                                                                                                                                                                                                                              |
| `RPC_RESPONSE_STREAM_NOT_CREATED` | 50202 | Response stream not created.                                | Internal RPC layer                                                                                                                                                                                                                                                              |
| `RPC_CONNECTION_FAILED`           | 50203 | RPC connection failed.                                      | Any API call (startup/transport)                                                                                                                                                                                                                                                |
| `PROVIDER_START_FAILED`           | 50400 | Failed to start provider.                                   | `startQVACProvider()`                                                                                                                                                                                                                                                           |
| `PROVIDER_STOP_FAILED`            | 50401 | Failed to stop provider.                                    | `stopQVACProvider()`                                                                                                                                                                                                                                                            |
| `DELEGATE_NO_FINAL_RESPONSE`      | 50402 | No final response received from delegated provider.         | `loadModel()`, `completion()`                                                                                                                                                                                                                                                   |
| `DELEGATE_PROVIDER_ERROR`         | 50403 | Delegated provider returned an error.                       | `loadModel()`, `completion()`                                                                                                                                                                                                                                                   |
| `DELEGATE_CONNECTION_FAILED`      | 50404 | Failed to connect to delegated provider.                    | `loadModel()`, `completion()`                                                                                                                                                                                                                                                   |
| `SDK_NOT_FOUND_IN_NODE_MODULES`   | 50600 | QVAC SDK not found in node\_modules.                        | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `WORKER_FILE_NOT_FOUND`           | 50601 | Worker file not found.                                      | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `CONFIG_FILE_NOT_FOUND`           | 50602 | Config file not found.                                      | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `CONFIG_FILE_INVALID`             | 50603 | Config file is invalid.                                     | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `CONFIG_FILE_PARSE_FAILED`        | 50604 | Failed to import/parse a `qvac.config.*` file.              | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `CONFIG_VALIDATION_FAILED`        | 50605 | Config validation failed (schema mismatch / invalid JSON).  | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `PEAR_WORKER_ENTRY_REQUIRED`      | 50606 | No plugins registered; Pear worker entry required.          | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `MULTIPLE_SDK_INSTALLATIONS`      | 50607 | Multiple QVAC SDK installations found.                      | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `PROFILER_INVALID_CAPACITY`       | 50800 | Ring buffer capacity is below minimum.                      | `profiler.enable()`                                                                                                                                                                                                                                                             |

## Server errors

Thrown by the server (model operations, downloads, cache, RAG). Access via `SDK_SERVER_ERROR_CODES.{ERROR_NAME}`.

| Error                                  | Code  | Summary                                                         | Thrown by                                                                                                                                                                                                                      |
| -------------------------------------- | ----- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `MODEL_ALREADY_REGISTERED`             | 52001 | Model with the same ID is already registered.                   | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_NOT_FOUND`                      | 52002 | Model ID not found in the registry.                             | `completion()`, `embed()`, `getModelInfo()`, `loggingStream()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`, `unloadModel()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()` |
| `MODEL_NOT_LOADED`                     | 52003 | Model exists but is not loaded.                                 | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`                                                                                                                      |
| `MODEL_IS_DELEGATED`                   | 52004 | Model is delegated and cannot be accessed as a local model.     | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`                                                                                                                      |
| `UNKNOWN_MODEL_TYPE`                   | 52005 | Unknown `modelType` in `loadModel()`.                           | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_LOAD_FAILED`                    | 52200 | Failed to load model (generic).                                 | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_NOT_FOUND`                 | 52201 | Model file not found at given path.                             | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_NOT_FOUND_IN_DIR`          | 52202 | Expected model file not found in directory (e.g., by type).     | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_LOCATE_FAILED`             | 52203 | Failed to locate model file for the given `modelType`.          | `loadModel()`                                                                                                                                                                                                                  |
| `PROJECTION_MODEL_REQUIRED`            | 52204 | Projection model source is required for multimodal LLM models.  | `loadModel()`                                                                                                                                                                                                                  |
| `VAD_MODEL_REQUIRED`                   | 52205 | VAD model source is required for this configuration.            | `loadModel()`                                                                                                                                                                                                                  |
| `TTS_ARTIFACTS_REQUIRED`               | 52208 | TTS (Chatterbox) model artifacts are missing.                   | `loadModel()`                                                                                                                                                                                                                  |
| `TTS_REFERENCE_AUDIO_REQUIRED`         | 52209 | TTS (Chatterbox) reference audio is required for voice cloning. | `loadModel()`                                                                                                                                                                                                                  |
| `PARAKEET_ARTIFACTS_REQUIRED`          | 52210 | Parakeet model sources are missing.                             | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_UNLOAD_FAILED`                  | 52400 | Failed to unload model.                                         | `unloadModel()`                                                                                                                                                                                                                |
| `EMBED_FAILED`                         | 52401 | Failed to generate embeddings.                                  | `embed()`                                                                                                                                                                                                                      |
| `EMBED_NO_EMBEDDINGS`                  | 52402 | No embeddings returned from model.                              | `embed()`                                                                                                                                                                                                                      |
| `TRANSCRIPTION_FAILED`                 | 52403 | Transcription failed.                                           | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `AUDIO_FILE_NOT_FOUND`                 | 52404 | Audio file not found or not accessible.                         | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `TRANSLATION_FAILED`                   | 52405 | Translation failed.                                             | `translate()`                                                                                                                                                                                                                  |
| `COMPLETION_FAILED`                    | 52406 | Completion failed.                                              | `completion()`                                                                                                                                                                                                                 |
| `ATTACHMENT_NOT_FOUND`                 | 52407 | Attachment file not found at path.                              | `completion()`                                                                                                                                                                                                                 |
| `CANCEL_FAILED`                        | 52408 | Failed to cancel operation.                                     | `cancel()`                                                                                                                                                                                                                     |
| `TEXT_TO_SPEECH_FAILED`                | 52409 | Text-to-speech operation failed.                                | `textToSpeech()`                                                                                                                                                                                                               |
| `CONFIG_RELOAD_NOT_SUPPORTED`          | 52410 | Model does not support hot config reload.                       | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `MODEL_TYPE_MISMATCH`                  | 52411 | Model type mismatch (expected vs provided).                     | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                       |
| `OCR_FAILED`                           | 52412 | OCR operation failed.                                           | `ocr()`                                                                                                                                                                                                                        |
| `IMAGE_FILE_NOT_FOUND`                 | 52413 | Image file not found or not accessible.                         | `ocr()`                                                                                                                                                                                                                        |
| `INVALID_IMAGE_INPUT`                  | 52414 | Invalid image input type provided.                              | `ocr()`                                                                                                                                                                                                                        |
| `RAG_SAVE_FAILED`                      | 52800 | Failed to save embeddings.                                      | `ragSaveEmbeddings()`                                                                                                                                                                                                          |
| `RAG_SEARCH_FAILED`                    | 52801 | Failed to search embeddings.                                    | `ragSearch()`                                                                                                                                                                                                                  |
| `RAG_DELETE_FAILED`                    | 52802 | Failed to delete embeddings.                                    | `ragDeleteEmbeddings()`                                                                                                                                                                                                        |
| `RAG_UNKNOWN_OPERATION`                | 52803 | Unknown RAG operation.                                          | `ragIngest()`, `ragReindex()`                                                                                                                                                                                                  |
| `RAG_HYPERDB_FAILED`                   | 52804 | HyperDB RAG operation failed.                                   | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_MODEL_MISMATCH`         | 52805 | Workspace is configured for a different embeddings model.       | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_NOT_FOUND`              | 52806 | RAG workspace not found.                                        | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_IN_USE`                 | 52807 | RAG workspace is in use and can't be closed/deleted.            | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_CLOSE_FAILED`           | 52808 | Failed to close RAG workspace.                                  | `ragCloseWorkspace()`                                                                                                                                                                                                          |
| `RAG_LIST_WORKSPACES_FAILED`           | 52809 | Failed to list RAG workspaces.                                  | `ragListWorkspaces()`                                                                                                                                                                                                          |
| `RAG_CHUNK_FAILED`                     | 52810 | Failed to chunk input into embeddings.                          | `ragChunk()`, `ragSaveEmbeddings()`                                                                                                                                                                                            |
| `RAG_WORKSPACE_NOT_OPEN`               | 52811 | RAG workspace is not open.                                      | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `FILE_NOT_FOUND`                       | 53000 | File not found.                                                 | `loadModel()`, `downloadAsset()`                                                                                                                                                                                               |
| `DOWNLOAD_CANCELLED`                   | 53001 | Download cancelled.                                             | `cancel()`, `downloadAsset()`, `loadModel()`                                                                                                                                                                                   |
| `CHECKSUM_VALIDATION_FAILED`           | 53002 | Downloaded file checksum validation failed.                     | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `HTTP_ERROR`                           | 53003 | HTTP request failed with status code.                           | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `NO_RESPONSE_BODY`                     | 53004 | No response body received from HTTP request.                    | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `RESPONSE_BODY_NOT_READABLE`           | 53005 | Response body is not readable.                                  | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `NO_BLOB_FOUND`                        | 53006 | No blob found for the requested resource.                       | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `DOWNLOAD_ASSET_FAILED`                | 53007 | Download failed.                                                | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `SEEDING_NOT_SUPPORTED`                | 53008 | Seeding is only supported for Hyperdrive models.                | `loadModel()`                                                                                                                                                                                                                  |
| `HYPERDRIVE_DOWNLOAD_FAILED`           | 53009 | Hyperdrive download failed.                                     | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `INVALID_SHARD_URL_PATTERN`            | 53010 | Invalid shard URL pattern for sharded downloads.                | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_EXTRACTION_FAILED`            | 53011 | Failed to extract an archive.                                   | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_UNSUPPORTED_TYPE`             | 53012 | Unsupported archive type.                                       | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_MISSING_SHARDS`               | 53013 | Archive is missing required shards.                             | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `PARTIAL_DOWNLOAD_OFFLINE`             | 53014 | Partial download exists but offline prevents resuming.          | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `REGISTRY_DOWNLOAD_FAILED`             | 53015 | Registry download failed.                                       | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `DELETE_CACHE_FAILED`                  | 53200 | Failed to delete cache.                                         | `deleteCache()`                                                                                                                                                                                                                |
| `INVALID_DELETE_CACHE_PARAMS`          | 53201 | Invalid parameters for `deleteCache()`.                         | `deleteCache()`                                                                                                                                                                                                                |
| `CACHE_DIR_NOT_ABSOLUTE`               | 53202 | Cache directory must be an absolute path.                       | `deleteCache()`                                                                                                                                                                                                                |
| `CACHE_DIR_NOT_WRITABLE`               | 53203 | Cache directory is not writable.                                | `deleteCache()`                                                                                                                                                                                                                |
| `SET_CONFIG_FAILED`                    | 53350 | Failed to set config.                                           | Any API call (during SDK initialization)                                                                                                                                                                                       |
| `CONFIG_ALREADY_SET`                   | 53351 | Config is immutable and has already been set.                   | Any API call (during SDK initialization)                                                                                                                                                                                       |
| `FFMPEG_NOT_AVAILABLE`                 | 53500 | Audio decoding required but FFmpeg is not available.            | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `AUDIO_PLAYER_FAILED`                  | 53501 | Audio player failed.                                            | `textToSpeech()`                                                                                                                                                                                                               |
| `INVALID_AUDIO_CHUNK_TYPE`             | 53502 | Invalid audio chunk type.                                       | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `DELEGATE_NO_FINAL_RESPONSE`           | 53700 | No final response received from delegated provider.             | `completion()`, `loadModel()`                                                                                                                                                                                                  |
| `DELEGATE_CONNECTION_FAILED`           | 53701 | Failed to connect to delegated provider.                        | `completion()`, `loadModel()`                                                                                                                                                                                                  |
| `DELEGATE_PROVIDER_ERROR`              | 53702 | Delegated provider returned an error.                           | `completion()`, `loadModel()`                                                                                                                                                                                                  |
| `RPC_NO_DATA_RECEIVED`                 | 53703 | No data received from request.                                  | Internal server RPC                                                                                                                                                                                                            |
| `RPC_UNKNOWN_REQUEST_TYPE`             | 53704 | Unknown request type received.                                  | Internal server RPC                                                                                                                                                                                                            |
| `LIFECYCLE_SUSPEND_FAILED`             | 53600 | Failed to suspend one or more resources.                        | `suspend()`                                                                                                                                                                                                                    |
| `LIFECYCLE_RESUME_FAILED`              | 53601 | Failed to resume one or more resources.                         | `resume()`                                                                                                                                                                                                                     |
| `PLUGIN_NOT_FOUND`                     | 53850 | Plugin not found for the specified model type.                  | `invokePlugin()`, `invokePluginStream()`, `loadModel()`                                                                                                                                                                        |
| `PLUGIN_HANDLER_NOT_FOUND`             | 53851 | Handler not found in plugin.                                    | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_REQUEST_VALIDATION_FAILED`     | 53852 | Plugin request validation failed.                               | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_RESPONSE_VALIDATION_FAILED`    | 53853 | Plugin response validation failed.                              | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_ALREADY_REGISTERED`            | 53854 | Plugin already registered for model type.                       | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_HANDLER_TYPE_MISMATCH`         | 53855 | Handler type mismatch (streaming vs non-streaming).             | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_LOGGING_INVALID`               | 53856 | Plugin has invalid logging configuration.                       | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_DEFINITION_INVALID`            | 53857 | Plugin definition is invalid.                                   | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_MODEL_TYPE_RESERVED`           | 53858 | Model type is reserved for built-in plugins.                    | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_LOAD_CONFIG_VALIDATION_FAILED` | 53859 | Model config validation failed for plugin.                      | `loadModel()`                                                                                                                                                                                                                  |
| `PATH_TRAVERSAL`                       | 53900 | Path traversal detected.                                        | `loadModel()`, `downloadAsset()`                                                                                                                                                                                               |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED`     | 53950 | Model registry query failed.                                    | `modelRegistryGetModel()`, `modelRegistryList()`, `modelRegistrySearch()`                                                                                                                                                      |


# finetune( ) (/sdk/api/finetune)


```ts
function finetune(params: FinetuneRunParams, rpcOptions?: RPCOptions): FinetuneHandle;
function finetune(params: FinetuneStopParams | FinetuneGetStateParams, rpcOptions?: RPCOptions): Promise<FinetuneResult>;
```

## Parameters

| Name       | Type                                                                                                                                            | Required? | Description                                               |
| ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | :-------: | --------------------------------------------------------- |
| params     | [`FinetuneRunParams`](#finetunerunparams) \| [`FinetuneStopParams`](#finetunestopparams) \| [`FinetuneGetStateParams`](#finetunegetstateparams) |     ✓     | The finetuning parameters — shape determines the overload |
| rpcOptions | [`RPCOptions`](./shared-types#rpcoptions)                                                                                                       |     ✗     | Optional RPC transport options                            |

### `FinetuneRunParams`

Used to start or resume a finetuning job.

| Field     | Type                                  | Required? | Description                                                                  |
| --------- | ------------------------------------- | :-------: | ---------------------------------------------------------------------------- |
| modelId   | `string`                              |     ✓     | The identifier of the loaded model to finetune                               |
| operation | `"start" \| "resume"`                 |     ✗     | Omit to let the add-on choose whether to start fresh or resume automatically |
| options   | [`FinetuneOptions`](#finetuneoptions) |     ✓     | Finetuning configuration                                                     |

### `FinetuneStopParams`

Used to pause or cancel a running job.

| Field     | Type                  | Required? | Description                 |
| --------- | --------------------- | :-------: | --------------------------- |
| modelId   | `string`              |     ✓     | The identifier of the model |
| operation | `"pause" \| "cancel"` |     ✓     | The stop operation          |

### `FinetuneGetStateParams`

Used to inspect the current state of a finetuning job.

| Field     | Type                                  | Required? | Description                 |
| --------- | ------------------------------------- | :-------: | --------------------------- |
| modelId   | `string`                              |     ✓     | The identifier of the model |
| operation | `"getState"`                          |     ✓     | Must be `"getState"`        |
| options   | [`FinetuneOptions`](#finetuneoptions) |     ✓     | Finetuning configuration    |

### `FinetuneOptions`

| Field               | Type                                        | Required? | Description                                           |
| ------------------- | ------------------------------------------- | :-------: | ----------------------------------------------------- |
| trainDatasetDir     | `string`                                    |     ✓     | Directory containing the training dataset             |
| validation          | [`FinetuneValidation`](#finetunevalidation) |     ✓     | Validation configuration                              |
| outputParametersDir | `string`                                    |     ✓     | Directory where output adapter parameters are written |
| numberOfEpochs      | `number`                                    |     ✗     | Number of epochs to run                               |
| learningRate        | `number`                                    |     ✗     | Learning rate override                                |
| contextLength       | `number`                                    |     ✗     | Context length override                               |
| batchSize           | `number`                                    |     ✗     | Batch size override                                   |
| microBatchSize      | `number`                                    |     ✗     | Micro batch size override                             |
| assistantLossOnly   | `boolean`                                   |     ✗     | Compute loss only on assistant tokens                 |
| loraRank            | `number`                                    |     ✗     | LoRA rank override                                    |
| loraAlpha           | `number`                                    |     ✗     | LoRA alpha override                                   |
| loraInitStd         | `number`                                    |     ✗     | LoRA initialization standard deviation                |
| loraSeed            | `number`                                    |     ✗     | LoRA initialization seed                              |
| loraModules         | `string`                                    |     ✗     | Comma-separated LoRA module selection                 |
| checkpointSaveDir   | `string`                                    |     ✗     | Directory for checkpoint snapshots                    |
| checkpointSaveSteps | `number`                                    |     ✗     | Checkpoint save interval (in steps)                   |
| chatTemplatePath    | `string`                                    |     ✗     | Custom chat template path                             |
| lrScheduler         | `"constant" \| "cosine" \| "linear"`        |     ✗     | Learning rate scheduler                               |
| lrMin               | `number`                                    |     ✗     | Minimum learning rate                                 |
| warmupRatio         | `number`                                    |     ✗     | Warmup ratio (0–1)                                    |
| warmupRatioSet      | `boolean`                                   |     ✗     | Enable warmup ratio                                   |
| warmupSteps         | `number`                                    |     ✗     | Warmup step count                                     |
| warmupStepsSet      | `boolean`                                   |     ✗     | Enable explicit warmup steps                          |
| weightDecay         | `number`                                    |     ✗     | Weight decay override                                 |

#### `FinetuneValidation`

Discriminated union on `type`:

| Variant                                | Fields                      | Description                        |
| -------------------------------------- | --------------------------- | ---------------------------------- |
| `{ type: "none" }`                     | —                           | No validation                      |
| `{ type: "split", fraction?: number }` | fraction defaults to `0.05` | Split training data for validation |
| `{ type: "dataset", path: string }`    | —                           | Use a separate validation dataset  |

## Returns

The return type depends on the `operation`:

**Run overload** (operation omitted, `"start"`, or `"resume"`):

`FinetuneHandle` — Object with the following fields:

| Field          | Type                                                        | Description                       |
| -------------- | ----------------------------------------------------------- | --------------------------------- |
| progressStream | `AsyncGenerator<`[`FinetuneProgress`](#finetuneprogress)`>` | Stream of training progress ticks |
| result         | `Promise<`[`FinetuneResult`](#finetuneresult)`>`            | Resolves when the job finishes    |

**Reply overload** (`"pause"`, `"cancel"`, or `"getState"`):

`Promise<FinetuneResult>`

### `FinetuneProgress`

| Field                 | Type             | Description                                                  |
| --------------------- | ---------------- | ------------------------------------------------------------ |
| is\_train             | `boolean`        | Whether this tick is from the training phase (vs validation) |
| loss                  | `number \| null` | Current loss value                                           |
| loss\_uncertainty     | `number \| null` | Loss uncertainty                                             |
| accuracy              | `number \| null` | Current accuracy                                             |
| accuracy\_uncertainty | `number \| null` | Accuracy uncertainty                                         |
| global\_steps         | `number`         | Total steps completed                                        |
| current\_epoch        | `number`         | Current epoch index                                          |
| current\_batch        | `number`         | Current batch index                                          |
| total\_batches        | `number`         | Total batches in the epoch                                   |
| elapsed\_ms           | `number`         | Elapsed time in milliseconds                                 |
| eta\_ms               | `number`         | Estimated time remaining in milliseconds                     |

### `FinetuneResult`

| Field  | Type                                              | Description                                        |
| ------ | ------------------------------------------------- | -------------------------------------------------- |
| type   | `"finetune"`                                      | Response type discriminator                        |
| status | [`FinetuneStatus`](#finetunestatus)               | Current job status                                 |
| stats  | [`FinetuneStats`](#finetunestats) ` \| undefined` | Final training statistics (present when completed) |

### `FinetuneStatus`

`"IDLE" | "RUNNING" | "PAUSED" | "CANCELLED" | "COMPLETED"`

### `FinetuneStats`

| Field                        | Type                          | Description                     |
| ---------------------------- | ----------------------------- | ------------------------------- |
| train\_loss                  | `number \| undefined`         | Final training loss             |
| train\_loss\_uncertainty     | `number \| null \| undefined` | Training loss uncertainty       |
| val\_loss                    | `number \| undefined`         | Final validation loss           |
| val\_loss\_uncertainty       | `number \| null \| undefined` | Validation loss uncertainty     |
| train\_accuracy              | `number \| undefined`         | Final training accuracy         |
| train\_accuracy\_uncertainty | `number \| null \| undefined` | Training accuracy uncertainty   |
| val\_accuracy                | `number \| undefined`         | Final validation accuracy       |
| val\_accuracy\_uncertainty   | `number \| null \| undefined` | Validation accuracy uncertainty |
| learning\_rate               | `number \| undefined`         | Final learning rate             |
| global\_steps                | `number`                      | Total steps completed           |
| epochs\_completed            | `number`                      | Total epochs completed          |

## Throws

| Error                           | When                                                          |
| ------------------------------- | ------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"finetune"`            |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Stream ended without receiving the terminal finetune response |

## Example

```typescript
const handle = finetune({
  modelId,
  options: {
    trainDatasetDir: "./dataset/train",
    validation: { type: "split", fraction: 0.05 },
    outputParametersDir: "./artifacts/lora",
    numberOfEpochs: 2,
  },
});

for await (const progress of handle.progressStream) {
  console.log(progress.global_steps, progress.loss);
}

console.log(await handle.result);

// Pause a running job
const pauseResult = await finetune({ modelId, operation: "pause" });
console.log(pauseResult.status); // "PAUSED"

// Inspect current state
const state = await finetune({
  modelId,
  operation: "getState",
  options: {
    trainDatasetDir: "./dataset/train",
    validation: { type: "none" },
    outputParametersDir: "./artifacts/lora",
  },
});
console.log(state.status);
```


# getLogger( ) (/sdk/api/getLogger)


```ts
function getLogger(namespace: string, options?: LoggerOptions): Logger;
```

## Parameters

| Name      | Type                              | Required? | Description                                                                                                                    |
| --------- | --------------------------------- | :-------: | ------------------------------------------------------------------------------------------------------------------------------ |
| namespace | `string`                          |     ✓     | Logger namespace (used for identification and filtering)                                                                       |
| options   | [`LoggerOptions`](#loggeroptions) |     ✗     | Optional logger configuration. When omitted, the logger is cached by namespace. When provided, a new logger is always created. |

### `LoggerOptions`

| Field         | Type                                                | Required? | Description                       |
| ------------- | --------------------------------------------------- | :-------: | --------------------------------- |
| level         | `"error" \| "warn" \| "info" \| "debug" \| "trace"` |     ✗     | Log level                         |
| namespace     | `string`                                            |     ✗     | Override namespace                |
| transports    | [`LogTransport[]`](#logtransport)                   |     ✗     | Custom log transports             |
| enableConsole | `boolean`                                           |     ✗     | Whether to output logs to console |

### `LogTransport`

```ts
type LogTransport = (
  level: LogLevel,
  namespace: string,
  message: string,
) => void | Promise<void>;
```

A callback function invoked for each log entry. `LogLevel` is `"error" | "warn" | "info" | "debug" | "trace"`.

## Returns

[`Logger`](#logger) — A logger instance.

### `Logger`

| Method           | Signature                                                | Description                      |
| ---------------- | -------------------------------------------------------- | -------------------------------- |
| error            | `(...args: unknown[]) => void`                           | Log at error level               |
| warn             | `(...args: unknown[]) => void`                           | Log at warn level                |
| info             | `(...args: unknown[]) => void`                           | Log at info level                |
| debug            | `(...args: unknown[]) => void`                           | Log at debug level               |
| trace            | `(...args: unknown[]) => void`                           | Log at trace level               |
| setLevel         | `(level) => void`                                        | Change the log level             |
| getLevel         | `() => LogLevel`                                         | Get the current log level        |
| addTransport     | `(transport:` [`LogTransport`](#logtransport)`) => void` | Add a custom transport           |
| setConsoleOutput | `(enabled: boolean) => void`                             | Enable or disable console output |

## Example

```typescript
import { getLogger } from "@qvac/sdk";

const logger = getLogger("my-app");

logger.info("Application started");
logger.debug("Debug details:", { key: "value" });

logger.setLevel("error");
logger.info("This will not be logged");

const verboseLogger = getLogger("my-app:verbose", {
  level: "debug",
  enableConsole: true,
  transports: [(level, namespace, message) => {
    // Custom transport: write to file, send to server, etc.
  }],
});
```


# getModelByName( ) (/sdk/api/getModelByName)


```ts
function getModelByName(name: string): RegistryItem | undefined;
```

## Parameters

| Name | Type     | Required? | Description                                               |
| ---- | -------- | :-------: | --------------------------------------------------------- |
| name | `string` |     ✓     | The human-readable model name (e.g., `"Llama 3.2 3B Q4"`) |

## Returns

[`RegistryItem`](#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

### `RegistryItem`

| Field           | Type                                                                                | Description                 |
| --------------- | ----------------------------------------------------------------------------------- | --------------------------- |
| name            | `string`                                                                            | Human-readable model name   |
| registryPath    | `string`                                                                            | Registry path               |
| registrySource  | `string`                                                                            | Registry source             |
| blobCoreKey     | `string`                                                                            | Hyperdrive blob core key    |
| blobBlockOffset | `number`                                                                            | Blob block offset           |
| blobBlockLength | `number`                                                                            | Blob block length           |
| blobByteOffset  | `number`                                                                            | Blob byte offset            |
| modelId         | `string`                                                                            | Unique model identifier     |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type            |
| expectedSize    | `number`                                                                            | Expected file size in bytes |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum            |
| engine          | `string`                                                                            | Inference engine            |
| quantization    | `string`                                                                            | Quantization level          |
| params          | `string`                                                                            | Model parameter count       |

## Example

```typescript
import { getModelByName } from "@qvac/sdk";

const model = getModelByName("Llama 3.2 3B Q4");
if (model) {
  console.log(model.modelId, model.expectedSize);
}
```


# getModelByPath( ) (/sdk/api/getModelByPath)


```ts
function getModelByPath(registryPath: string): RegistryItem | undefined;
```

## Parameters

| Name         | Type     | Required? | Description                                                                     |
| ------------ | -------- | :-------: | ------------------------------------------------------------------------------- |
| registryPath | `string` |     ✓     | The full registry path of the model (e.g., `"llama-3.2-1b-instruct-q4_0-gguf"`) |

## Returns

[`RegistryItem`](../getModelByName#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

## Example

```typescript
import { getModelByPath } from "@qvac/sdk";

const model = getModelByPath("llama-3.2-1b-instruct-q4_0-gguf");
if (model) {
  console.log(model.name, model.expectedSize);
}
```


# getModelBySrc( ) (/sdk/api/getModelBySrc)


```ts
function getModelBySrc(modelId: string, blobCoreKey: string): RegistryItem | undefined;
```

## Parameters

| Name        | Type     | Required? | Description                  |
| ----------- | -------- | :-------: | ---------------------------- |
| modelId     | `string` |     ✓     | The unique model identifier  |
| blobCoreKey | `string` |     ✓     | The Hyperdrive blob core key |

## Returns

[`RegistryItem`](../getModelByName#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

## Example

```typescript
import { getModelBySrc } from "@qvac/sdk";

const model = getModelBySrc("model-abc123", "blob-core-key-hex");
if (model) {
  console.log(model.name, model.expectedSize);
}
```


# getModelInfo( ) (/sdk/api/getModelInfo)


```ts
function getModelInfo(params: GetModelInfoParams): Promise<ModelInfo>;
```

## Parameters

| Name        | Type     | Required? | Description               |
| ----------- | -------- | :-------: | ------------------------- |
| params      | `object` |     ✓     | The query parameters      |
| params.name | `string` |     ✓     | The model name to look up |

## Returns

`Promise<`[`ModelInfo`](#modelinfo)`>` — Detailed information about the model.

### `ModelInfo`

| Field           | Type                                                                                | Description                                     |
| --------------- | ----------------------------------------------------------------------------------- | ----------------------------------------------- |
| name            | `string`                                                                            | Model name                                      |
| modelId         | `string`                                                                            | Unique model identifier                         |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type                                |
| expectedSize    | `number`                                                                            | Expected total file size in bytes               |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum                                |
| isCached        | `boolean`                                                                           | Whether the model is fully cached locally       |
| isLoaded        | `boolean`                                                                           | Whether the model is currently loaded in memory |
| cacheFiles      | [`CacheFileInfo[]`](#cachefileinfo)                                                 | Individual cache file details                   |
| registryPath    | `string`                                                                            | Registry path (optional)                        |
| registrySource  | `string`                                                                            | Registry source (optional)                      |
| engine          | `string`                                                                            | Inference engine (optional)                     |
| quantization    | `string`                                                                            | Quantization level (optional)                   |
| params          | `string`                                                                            | Model parameter count (optional)                |
| actualSize      | `number`                                                                            | Actual cached size in bytes (optional)          |
| cachedAt        | `Date`                                                                              | When the model was cached (optional)            |
| loadedInstances | [`LoadedInstance[]`](#loadedinstance)                                               | Currently loaded instances (optional)           |

### `CacheFileInfo`

| Field          | Type      | Description                          |
| -------------- | --------- | ------------------------------------ |
| filename       | `string`  | File name                            |
| path           | `string`  | Full file path                       |
| expectedSize   | `number`  | Expected size in bytes               |
| sha256Checksum | `string`  | SHA-256 checksum                     |
| isCached       | `boolean` | Whether this file is cached          |
| actualSize     | `number`  | Actual file size (optional)          |
| cachedAt       | `Date`    | When this file was cached (optional) |

### `LoadedInstance`

| Field      | Type      | Description                                      |
| ---------- | --------- | ------------------------------------------------ |
| registryId | `string`  | Registry identifier for this loaded instance     |
| loadedAt   | `Date`    | When the model was loaded                        |
| config     | `unknown` | Model configuration used at load time (optional) |

## Throws

| Error                   | When                                                   |
| ----------------------- | ------------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"getModelInfo"` |

## Example

```typescript
import { getModelInfo } from "@qvac/sdk";

const info = await getModelInfo({ name: "Llama 3.2 3B Q4" });
console.log(`Cached: ${info.isCached}, Loaded: ${info.isLoaded}`);
console.log(`Size: ${info.expectedSize} bytes`);
```


# heartbeat( ) (/sdk/api/heartbeat)


```ts
function heartbeat(params?: { delegate?: DelegateBase }): Promise<HeartbeatResponse>;
```

Sends a heartbeat round-trip to verify that a delegated provider is online or that the local SDK worker is responsive. When called without arguments, checks the local worker.

## Parameters

| Name            | Type                            | Required? | Description                                            |
| --------------- | ------------------------------- | :-------: | ------------------------------------------------------ |
| params          | `object`                        |     ✗     | Optional delegation target                             |
| params.delegate | [`DelegateBase`](#delegatebase) |     ✗     | The provider to check — omit to check the local worker |

### `DelegateBase`

| Field              | Type     | Required? | Description                                    |
| ------------------ | -------- | :-------: | ---------------------------------------------- |
| topic              | `string` |     ✓     | Hyperswarm topic hex string                    |
| providerPublicKey  | `string` |     ✓     | Provider peer public key hex string            |
| timeout            | `number` |     ✗     | Connection timeout in milliseconds (min 100)   |
| healthCheckTimeout | `number` |     ✗     | Health check timeout in milliseconds (min 100) |

## Returns

`Promise<HeartbeatResponse>`

### `HeartbeatResponse`

| Field  | Type          | Description                 |
| ------ | ------------- | --------------------------- |
| type   | `"heartbeat"` | Response type discriminator |
| number | `number`      | Round-trip sequence number  |

## Throws

| Error                   | When                                                |
| ----------------------- | --------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"heartbeat"` |

## Examples

```typescript
// Check if a delegated provider is online
try {
  await heartbeat({
    delegate: { topic: "topicHex", providerPublicKey: "peerHex", timeout: 3000 },
  });
  console.log("Provider is online");
} catch {
  console.log("Provider is offline");
}

// Check if the local SDK worker is responsive
await heartbeat();
```


# @qvac/sdk (/sdk/api)


## Overview

`@qvac/sdk` npm package exposes a function-centric, typed JS API.

## Functions

| Function                                             | Summary                                                                                           |
| ---------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| [`cancel()`](./cancel)                               | Cancels an ongoing operation.                                                                     |
| [`close()`](./close)                                 | Closes the SDK client connection and releases all associated resources.                           |
| [`completion()`](./completion)                       | Generates completion from a language model based on conversation history.                         |
| [`defineDuplexHandler()`](./defineDuplexHandler)     | Helper function to define a duplex (bidirectional streaming) handler with full type inference.    |
| [`defineHandler()`](./defineHandler)                 | Helper function to define a handler with full type inference.                                     |
| [`definePlugin()`](./definePlugin)                   | Helper function to define a plugin with full type inference.                                      |
| [`deleteCache()`](./deleteCache)                     | Deletes KV cache files.                                                                           |
| [`diffusion()`](./diffusion)                         | Generates images using a loaded diffusion model.                                                  |
| [`downloadAsset()`](./downloadAsset)                 | Downloads an asset (model file) without loading it into memory.                                   |
| [`embed()`](./embed)                                 | Generates embeddings for a single text using a specified model.                                   |
| [`finetune()`](./finetune)                           | Starts, resumes, inspects, pauses, or cancels a finetuning job.                                   |
| [`getLogger()`](./getLogger)                         | Creates or retrieves a namespaced logger instance.                                                |
| [`getModelByName()`](./getModelByName)               | Looks up a model in the built-in catalog by its constant name.                                    |
| [`getModelByPath()`](./getModelByPath)               | Looks up a model in the built-in catalog by its registry path.                                    |
| [`getModelBySrc()`](./getModelBySrc)                 | Looks up a model in the built-in catalog by model file ID and blob core key.                      |
| [`getModelInfo()`](./getModelInfo)                   | Returns status information for a catalog model, including cache state and loaded instances.       |
| [`heartbeat()`](./heartbeat)                         | Checks if a delegated provider or the local SDK worker is responsive.                             |
| [`invokePlugin()`](./invokePlugin)                   | Invoke a non-streaming plugin handler.                                                            |
| [`invokePluginStream()`](./invokePluginStream)       | Invoke a streaming plugin handler.                                                                |
| [`loadModel()`](./loadModel)                         | Loads a machine learning model from a local path, remote URL, or Hyperdrive key.                  |
| [`loggingStream()`](./loggingStream)                 | Opens a logging stream to receive real-time logs.                                                 |
| [`modelRegistryGetModel()`](./modelRegistryGetModel) | Fetches a single model entry from the registry by its path and source.                            |
| [`modelRegistryList()`](./modelRegistryList)         | Returns all available models from the QVAC distributed model registry.                            |
| [`modelRegistrySearch()`](./modelRegistrySearch)     | Searches the model registry with optional filters for model type, engine, and quantization.       |
| [`ocr()`](./ocr)                                     | Performs Optical Character Recognition (OCR) on an image to extract text.                         |
| [`ragChunk()`](./ragChunk)                           | Chunks documents into smaller pieces for embedding.                                               |
| [`ragCloseWorkspace()`](./ragCloseWorkspace)         | Closes a RAG workspace, releasing in-memory resources (Corestore, HyperDB adapter, RAG instance). |
| [`ragDeleteEmbeddings()`](./ragDeleteEmbeddings)     | Deletes document embeddings from the RAG vector database.                                         |
| [`ragDeleteWorkspace()`](./ragDeleteWorkspace)       | Deletes a RAG workspace and all its data.                                                         |
| [`ragIngest()`](./ragIngest)                         | Ingests documents into the RAG vector database.                                                   |
| [`ragListWorkspaces()`](./ragListWorkspaces)         | Lists all RAG workspaces with their open status.                                                  |
| [`ragReindex()`](./ragReindex)                       | Reindexes the RAG database to optimize search performance.                                        |
| [`ragSaveEmbeddings()`](./ragSaveEmbeddings)         | Saves pre-embedded documents to the RAG vector database.                                          |
| [`ragSearch()`](./ragSearch)                         | Searches for similar documents in the RAG vector database.                                        |
| [`resume()`](./resume)                               | Resumes all suspended Hyperswarm and Corestore resources.                                         |
| [`startQVACProvider()`](./startQVACProvider)         | Starts a provider service that offers QVAC capabilities to remote peers.                          |
| [`stopQVACProvider()`](./stopQVACProvider)           | Stops a running provider service and leaves the specified topic.                                  |
| [`suspend()`](./suspend)                             | Suspends all active Hyperswarm and Corestore resources.                                           |
| [`textToSpeech()`](./textToSpeech)                   | Converts text to speech audio using a loaded TTS model.                                           |
| [`transcribe()`](./transcribe)                       | Collects all streaming results into a single string response.                                     |
| [`transcribeStream()`](./transcribeStream)           | Streams audio transcription results in real-time, yielding text chunks as they become available.  |
| [`translate()`](./translate)                         | Translates text from one language to another using a specified translation model.                 |
| [`unloadModel()`](./unloadModel)                     | Unloads a previously loaded model from the server.                                                |

## Object

| Object                   | Summary                                                                       |
| ------------------------ | ----------------------------------------------------------------------------- |
| [`profiler`](./profiler) | Singleton object that collects and exports profiling data for SDK operations. |

## Shared types

<Card href="./shared-types" title="Shared types">
  Common type definitions used across SDK functions.
</Card>

## Errors

<Card href="./errors" title="Error codes">
  All errors thrown by the SDK and how to handle them via `SDK_CLIENT_ERROR_CODES` and `SDK_SERVER_ERROR_CODES`.
</Card>


# invokePlugin( ) (/sdk/api/invokePlugin)


```ts
function invokePlugin<TResponse = unknown, TParams = unknown>(
  options: InvokePluginOptions<TParams>,
  rpcOptions?: RPCOptions
): Promise<TResponse>;
```

## Parameters

| Name       | Type                                          | Required? | Description                    |
| ---------- | --------------------------------------------- | :-------: | ------------------------------ |
| options    | [`InvokePluginOptions`](#invokepluginoptions) |     ✓     | The invocation options         |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)           |     ✗     | Optional RPC transport options |

### `InvokePluginOptions`

| Field   | Type      | Required? | Description                                                                         |
| ------- | --------- | :-------: | ----------------------------------------------------------------------------------- |
| modelId | `string`  |     ✓     | The model ID of the loaded plugin model                                             |
| handler | `string`  |     ✓     | The handler name to invoke (as defined in the plugin)                               |
| params  | `TParams` |     ✓     | Parameters to pass to the handler (validated against the handler's `requestSchema`) |

## Returns

`Promise<TResponse>` — The handler's response (validated against the handler's `responseSchema`).

## Throws

| Error                   | When                                                   |
| ----------------------- | ------------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"pluginInvoke"` |

## Example

```typescript
const result = await invokePlugin<{ answer: string }>({
  modelId: "my-custom-model",
  handler: "summarize",
  params: { text: "Long document..." },
});
console.log(result.answer);
```


# invokePluginStream( ) (/sdk/api/invokePluginStream)


```ts
function invokePluginStream<TResponse = unknown, TParams = unknown>(
  options: InvokePluginOptions<TParams>,
  rpcOptions?: RPCOptions
): AsyncGenerator<TResponse>;
```

## Parameters

| Name       | Type                                                         | Required? | Description                                                          |
| ---------- | ------------------------------------------------------------ | :-------: | -------------------------------------------------------------------- |
| options    | [`InvokePluginOptions`](../invokePlugin#invokepluginoptions) |     ✓     | The invocation options (same as [`invokePlugin()`](../invokePlugin)) |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)                          |     ✗     | Optional RPC transport options                                       |

## Returns

`AsyncGenerator<TResponse>` — Yields streamed chunks from the handler until `done` is received.

## Throws

| Error                   | When                                                          |
| ----------------------- | ------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | A chunk's type does not match expected `"pluginInvokeStream"` |

## Example

```typescript
const stream = invokePluginStream<{ token: string }>({
  modelId: "my-custom-model",
  handler: "generateStream",
  params: { prompt: "Tell me a story" },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.token);
}
```


# loadModel( ) (/sdk/api/loadModel)


```ts
// Load new model
function loadModel(options: LoadModelOptions, rpcOptions?: RPCOptions): Promise<string>;

// Hot-reload config on an already-loaded model
function loadModel(options: ReloadConfigOptions, rpcOptions?: RPCOptions): Promise<string>;
```

Supports multiple model types: LLM, Whisper (speech recognition), Parakeet (NVIDIA NeMo transcription), embeddings, NMT (translation), TTS, and OCR. Handles local file paths, HTTP/HTTPS URLs, Hyperdrive URLs (`pear://`), and registry URLs.

When `onProgress` is provided, streaming is used for real-time download progress. Otherwise, a simple request-response pattern is used.

## Parameters

| Name       | Type                                                                                       | Required? | Description                                        |
| ---------- | ------------------------------------------------------------------------------------------ | :-------: | -------------------------------------------------- |
| options    | [`LoadModelOptions`](#loadmodeloptions) `\|` [`ReloadConfigOptions`](#reloadconfigoptions) |     ✓     | Configuration for loading or hot-reloading a model |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)                                                        |     ✗     | Optional RPC transport options                     |

### `LoadModelOptions`

Common fields present in all variants:

| Field       | Type                                      | Required? | Default | Description                                                                                                |
| ----------- | ----------------------------------------- | :-------: | ------- | ---------------------------------------------------------------------------------------------------------- |
| modelSrc    | `string \| ModelDescriptor`               |     ✓     | —       | Model source — local path, HTTP(S) URL, Hyperdrive `pear://` URL, registry URL, or a model constant object |
| modelType   | `string`                                  |     ✓     | —       | The type of model — see [model type variants](#model-type-variants)                                        |
| modelConfig | `object`                                  |     ✗     | `{}`    | Model-specific configuration (varies by `modelType`)                                                       |
| seed        | `boolean`                                 |     ✗     | `false` | Whether to seed the model on Hyperdrive after download                                                     |
| delegate    | [`Delegate`](#delegate)                   |     ✗     | —       | Delegation configuration for remote inference                                                              |
| onProgress  | `(progress: ModelProgressUpdate) => void` |     ✗     | —       | Callback for real-time download progress                                                                   |
| logger      | `Logger`                                  |     ✗     | —       | Logger instance — model operation logs are forwarded to this logger                                        |

### `Delegate`

Optional delegation configuration for remote (P2P) inference:

| Field              | Type      | Required? | Default | Description                                                |
| ------------------ | --------- | :-------: | ------- | ---------------------------------------------------------- |
| topic              | `string`  |     ✓     | —       | P2P topic for delegation                                   |
| providerPublicKey  | `string`  |     ✓     | —       | Provider's public key                                      |
| timeout            | `number`  |     ✗     | —       | Timeout in milliseconds (min 100)                          |
| fallbackToLocal    | `boolean` |     ✗     | `false` | Whether to fallback to local inference if delegation fails |
| forceNewConnection | `boolean` |     ✗     | `false` | Force a new connection to the provider                     |

### `ReloadConfigOptions`

Hot-reload configuration on an already-loaded model without reloading the model weights. Currently supported for Whisper models only.

| Field       | Type     | Required? | Description                                      |
| ----------- | -------- | :-------: | ------------------------------------------------ |
| modelId     | `string` |     ✓     | The ID of an existing loaded model (16-char hex) |
| modelType   | `string` |     ✓     | The type of model (must match the loaded model)  |
| modelConfig | `object` |     ✓     | New configuration to apply                       |

### Model type variants

The `modelType` field determines which variant of `modelConfig` is accepted.

#### `"llm"`

All LLM-specific fields live inside `modelConfig`. See [LLM `modelConfig`](#llm-modelconfig) for the full reference.

#### `"whisper"`

All Whisper-specific fields live inside `modelConfig`. See [Whisper `modelConfig`](#whisper-modelconfig) for the full reference.

#### `"parakeet"`

NVIDIA NeMo Parakeet models for speech recognition. `modelConfig` is **required**.

See [Parakeet `modelConfig`](#parakeet-modelconfig) for the full reference.

#### `"embeddings"`

All embeddings fields live inside `modelConfig`. See [Embeddings `modelConfig`](#embeddings-modelconfig) for the full reference.

#### `"nmt"`

`modelConfig` is **required** and is a discriminated union on `engine`. See [NMT `modelConfig`](#nmt-modelconfig) for the full reference.

#### `"tts"`

`modelConfig` is **required** and is a discriminated union on `ttsEngine`:

**Chatterbox engine** (`ttsEngine: "chatterbox"`):

| Field                    | Type                           | Required? | Description                          |
| ------------------------ | ------------------------------ | :-------: | ------------------------------------ |
| ttsEngine                | `"chatterbox"`                 |     ✓     | Engine discriminator                 |
| language                 | `"en" \| "es" \| "de" \| "it"` |     ✓     | Output language                      |
| ttsTokenizerSrc          | `string \| ModelDescriptor`    |     ✓     | Tokenizer model source               |
| ttsSpeechEncoderSrc      | `string \| ModelDescriptor`    |     ✓     | Speech encoder model source          |
| ttsEmbedTokensSrc        | `string \| ModelDescriptor`    |     ✓     | Embed tokens model source            |
| ttsConditionalDecoderSrc | `string \| ModelDescriptor`    |     ✓     | Conditional decoder model source     |
| ttsLanguageModelSrc      | `string \| ModelDescriptor`    |     ✓     | Language model source                |
| referenceAudioSrc        | `string \| ModelDescriptor`    |     ✓     | Reference WAV file for voice cloning |

**Supertonic engine** (`ttsEngine: "supertonic"`):

| Field                | Type                           | Required? | Description                  |
| -------------------- | ------------------------------ | :-------: | ---------------------------- |
| ttsEngine            | `"supertonic"`                 |     ✓     | Engine discriminator         |
| language             | `"en" \| "es" \| "de" \| "it"` |     ✓     | Output language              |
| ttsTokenizerSrc      | `string \| ModelDescriptor`    |     ✓     | Tokenizer model source       |
| ttsTextEncoderSrc    | `string \| ModelDescriptor`    |     ✓     | Text encoder model source    |
| ttsLatentDenoiserSrc | `string \| ModelDescriptor`    |     ✓     | Latent denoiser model source |
| ttsVoiceDecoderSrc   | `string \| ModelDescriptor`    |     ✓     | Voice decoder model source   |
| ttsVoiceSrc          | `string \| ModelDescriptor`    |     ✓     | Voice `.bin` file source     |
| ttsSpeed             | `number`                       |     ✗     | Speech speed multiplier      |
| ttsNumInferenceSteps | `number`                       |     ✗     | Number of inference steps    |

#### `"ocr"`

All OCR-specific fields live inside `modelConfig`. See [OCR `modelConfig`](#ocr-modelconfig) for the full reference.

### Custom plugin

Any `modelType` string that is not a built-in type. `modelConfig` accepts `Record<string, unknown>`.

### `modelConfig` reference

#### LLM `modelConfig`

| Field              | Type                        | Default                          | Description                                                                 |
| ------------------ | --------------------------- | -------------------------------- | --------------------------------------------------------------------------- |
| ctx\_size          | `number`                    | `1024`                           | Context window size                                                         |
| device             | `string`                    | `"gpu"`                          | Device to use                                                               |
| gpu\_layers        | `number`                    | `99`                             | Number of layers offloaded to GPU                                           |
| system\_prompt     | `string`                    | `"You are a helpful assistant."` | System prompt                                                               |
| temp               | `number`                    | —                                | Temperature (0–2)                                                           |
| top\_p             | `number`                    | —                                | Top-p sampling (0–1)                                                        |
| top\_k             | `number`                    | —                                | Top-k sampling (0–128)                                                      |
| seed               | `number`                    | —                                | Random seed                                                                 |
| predict            | `number`                    | —                                | Max tokens to predict. `-1` = until stop token, `-2` = until context filled |
| lora               | `string`                    | —                                | LoRA adapter path                                                           |
| no\_mmap           | `boolean`                   | —                                | Disable memory-mapped I/O                                                   |
| verbosity          | `0 \| 1 \| 2 \| 3`          | —                                | Engine verbosity — use exported `VERBOSITY` constant                        |
| presence\_penalty  | `number`                    | —                                | Presence penalty                                                            |
| frequency\_penalty | `number`                    | —                                | Frequency penalty                                                           |
| repeat\_penalty    | `number`                    | —                                | Repeat penalty                                                              |
| stop\_sequences    | `string[]`                  | —                                | Custom stop sequences                                                       |
| n\_discarded       | `number`                    | —                                | Number of discarded tokens                                                  |
| tools              | `boolean`                   | —                                | Enable tool calling support                                                 |
| projectionModelSrc | `string \| ModelDescriptor` | —                                | Projection model source for multimodal models                               |

#### Whisper `modelConfig`

Common fields:

| Field            | Type                        | Description                                                                                                                                    |
| ---------------- | --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| language         | `string`                    | Language code (e.g., `"en"`)                                                                                                                   |
| translate        | `boolean`                   | Whether to translate to English                                                                                                                |
| strategy         | `"greedy" \| "beam_search"` | Sampling strategy                                                                                                                              |
| temperature      | `number`                    | Temperature                                                                                                                                    |
| initial\_prompt  | `string`                    | Initial prompt for the decoder                                                                                                                 |
| detect\_language | `boolean`                   | Auto-detect language                                                                                                                           |
| vad\_params      | `object`                    | VAD parameters — `{ threshold?, min_speech_duration_ms?, min_silence_duration_ms?, max_speech_duration_s?, speech_pad_ms?, samples_overlap? }` |
| audio\_format    | `"f32le" \| "s16le"`        | Audio format                                                                                                                                   |
| contextParams    | `object`                    | Context parameters — `{ model?, use_gpu?, flash_attn?, gpu_device? }`                                                                          |
| miscConfig       | `object`                    | Miscellaneous config — `{ caption_enabled? }`                                                                                                  |
| vadModelSrc      | `string \| ModelDescriptor` | VAD model source for voice activity detection                                                                                                  |

Additional fields: `n_threads`, `n_max_text_ctx`, `offset_ms`, `duration_ms`, `audio_ctx`, `no_context`, `no_timestamps`, `single_segment`, `print_special`, `print_progress`, `print_realtime`, `print_timestamps`, `token_timestamps`, `thold_pt`, `thold_ptsum`, `max_len`, `split_on_word`, `max_tokens`, `debug_mode`, `tdrz_enable`, `suppress_regex`, `suppress_blank`, `suppress_nst`, `length_penalty`, `temperature_inc`, `entropy_thold`, `logprob_thold`, `greedy_best_of`, `beam_search_beam_size`. All optional. See `whisperConfigSchema` in the source for details.

#### Parakeet `modelConfig`

`modelConfig` is **required**. Parakeet models support three variants via `modelType`: `"tdt"` (default), `"ctc"`, and `"sortformer"`.

**Runtime config:**

| Field             | Type                             | Default | Description                 |
| ----------------- | -------------------------------- | ------- | --------------------------- |
| modelType         | `"tdt" \| "ctc" \| "sortformer"` | `"tdt"` | Parakeet model variant      |
| maxThreads        | `number`                         | —       | Maximum inference threads   |
| useGPU            | `boolean`                        | —       | Use GPU acceleration        |
| sampleRate        | `number`                         | —       | Audio sample rate           |
| channels          | `number`                         | —       | Audio channels              |
| captionEnabled    | `boolean`                        | —       | Enable caption mode         |
| timestampsEnabled | `boolean`                        | —       | Enable timestamps in output |

**Model sources (all `string \| ModelDescriptor`, all optional):**

| Field                   | Description              |
| ----------------------- | ------------------------ |
| parakeetEncoderSrc      | TDT encoder model source |
| parakeetEncoderDataSrc  | TDT encoder data source  |
| parakeetDecoderSrc      | TDT decoder model source |
| parakeetVocabSrc        | TDT vocabulary source    |
| parakeetPreprocessorSrc | TDT preprocessor source  |
| parakeetCtcModelSrc     | CTC model source         |
| parakeetCtcModelDataSrc | CTC model data source    |
| parakeetTokenizerSrc    | CTC tokenizer source     |
| parakeetSortformerSrc   | Sortformer model source  |

#### Embeddings `modelConfig`

| Field          | Type                                            | Default | Description                                          |
| -------------- | ----------------------------------------------- | ------- | ---------------------------------------------------- |
| gpuLayers      | `number`                                        | `99`    | Number of layers offloaded to GPU                    |
| device         | `"gpu" \| "cpu"`                                | `"gpu"` | Device to use                                        |
| batchSize      | `number`                                        | `1024`  | Embedding batch size                                 |
| pooling        | `"none" \| "mean" \| "cls" \| "last" \| "rank"` | —       | Pooling strategy                                     |
| attention      | `"causal" \| "non-causal"`                      | —       | Attention type                                       |
| embdNormalize  | `number`                                        | —       | Embedding normalization (integer)                    |
| flashAttention | `"on" \| "off" \| "auto"`                       | —       | Flash attention toggle                               |
| mainGpu        | `number \| "integrated" \| "dedicated"`         | —       | GPU device selection                                 |
| verbosity      | `0 \| 1 \| 2 \| 3`                              | —       | Engine verbosity — use exported `VERBOSITY` constant |

#### NMT `modelConfig`

Discriminated union on `engine`. Common generation parameters (all optional):

| Field             | Type     | Default  | Description           |
| ----------------- | -------- | -------- | --------------------- |
| mode              | `"full"` | `"full"` | Translation mode      |
| beamsize          | `number` | `4`      | Beam size             |
| lengthpenalty     | `number` | `1.0`    | Length penalty        |
| maxlength         | `number` | `512`    | Max output length     |
| repetitionpenalty | `number` | `1.0`    | Repetition penalty    |
| norepeatngramsize | `number` | `0`      | No-repeat n-gram size |
| temperature       | `number` | `0.3`    | Temperature           |
| topk              | `number` | `0`      | Top-k sampling        |
| topp              | `number` | `1.0`    | Top-p sampling        |

Engine-specific:

* **Opus**: *Deprecated in v1.0.0. Use Bergamot for European language pairs.*
* **Bergamot**: `from`/`to` accept 24 languages (en, ar, bg, ca, cs, de, es, et, fi, fr, hu, is, it, ja, ko, lt, lv, nl, pl, pt, ru, sk, sl, uk, zh). Additional fields: `srcVocabSrc`, `dstVocabSrc`, `normalize`, `pivotModel`
* **IndicTrans**: `from`/`to` accept 26 Indic language codes (e.g., `"eng_Latn"`, `"hin_Deva"`)

**Bergamot `pivotModel`** (optional) — for translation via an intermediate language:

| Field       | Type                        | Required? | Description                 |
| ----------- | --------------------------- | :-------: | --------------------------- |
| modelSrc    | `string \| ModelDescriptor` |     ✓     | Pivot model source          |
| srcVocabSrc | `string \| ModelDescriptor` |     ✗     | Source vocabulary file      |
| dstVocabSrc | `string \| ModelDescriptor` |     ✗     | Destination vocabulary file |
| normalize   | `number`                    |     ✗     | Normalization factor        |

Plus all common generation parameters above.

#### OCR `modelConfig`

| Field                  | Type                        | Description                            |
| ---------------------- | --------------------------- | -------------------------------------- |
| langList               | `string[]`                  | Languages to detect                    |
| useGPU                 | `boolean`                   | Use GPU acceleration                   |
| timeout                | `number`                    | Timeout in milliseconds                |
| pipelineMode           | `"easyocr" \| "doctr"`      | OCR pipeline mode                      |
| magRatio               | `number`                    | Magnification ratio for detection      |
| defaultRotationAngles  | `number[]`                  | Rotation angles to try                 |
| contrastRetry          | `boolean`                   | Retry with contrast adjustment         |
| lowConfidenceThreshold | `number`                    | Threshold for low-confidence filtering |
| recognizerBatchSize    | `number`                    | Batch size for recognizer              |
| decodingMethod         | `"ctc" \| "attention"`      | Decoding method                        |
| straightenPages        | `boolean`                   | Straighten pages before recognition    |
| detectorModelSrc       | `string \| ModelDescriptor` | Detector model source                  |

### `ModelProgressUpdate`

| Field                       | Type              | Description                                            |
| --------------------------- | ----------------- | ------------------------------------------------------ |
| type                        | `"modelProgress"` | Event type                                             |
| downloaded                  | `number`          | Bytes downloaded so far                                |
| total                       | `number`          | Total bytes expected                                   |
| percentage                  | `number`          | Download percentage                                    |
| downloadKey                 | `string`          | Unique download key (use with [`cancel()`](../cancel)) |
| shardInfo                   | `object`          | Shard progress (optional, for sharded models)          |
| shardInfo.currentShard      | `number`          | Current shard index                                    |
| shardInfo.totalShards       | `number`          | Total number of shards                                 |
| shardInfo.shardName         | `string`          | Current shard file name                                |
| shardInfo.overallDownloaded | `number`          | Total bytes downloaded across all shards               |
| shardInfo.overallTotal      | `number`          | Total bytes across all shards                          |
| shardInfo.overallPercentage | `number`          | Overall percentage across all shards                   |
| onnxInfo                    | `object`          | ONNX multi-file progress (optional, for ONNX models)   |
| onnxInfo.currentFile        | `string`          | Current file being downloaded                          |
| onnxInfo.fileIndex          | `number`          | Current file index                                     |
| onnxInfo.totalFiles         | `number`          | Total number of files                                  |
| onnxInfo.overallDownloaded  | `number`          | Total bytes downloaded across all files                |
| onnxInfo.overallTotal       | `number`          | Total bytes across all files                           |
| onnxInfo.overallPercentage  | `number`          | Overall percentage across all files                    |

## Returns

`Promise<string>` — Resolves to the model ID (used to reference the model in subsequent API calls).

## Throws

| Error                           | When                                                              |
| ------------------------------- | ----------------------------------------------------------------- |
| `MODEL_LOAD_FAILED`             | Model loading fails                                               |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends without a final response (when using `onProgress`) |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"loadModel"`               |

## Example

```typescript
// Local file path
const modelId = await loadModel({
  modelSrc: "/home/user/models/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Remote URL with progress tracking
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../model.gguf",
  modelType: "llm",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});

// Hyperdrive URL
const modelId = await loadModel({
  modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Multimodal model with projection
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../main-model.gguf",
  modelType: "llm",
  modelConfig: {
    ctx_size: 512,
    projectionModelSrc: "https://huggingface.co/.../projection-model.gguf"
  }
});

// Whisper with VAD model
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../whisper-model.gguf",
  modelType: "whisper",
  modelConfig: {
    language: "en",
    strategy: "greedy",
    vadModelSrc: "https://huggingface.co/.../vad-model.bin"
  }
});

// With logger forwarding
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
  logger
});
```


# loggingStream( ) (/sdk/api/loggingStream)


```ts
function loggingStream(params: LoggingParams): AsyncGenerator<LoggingStreamResponse>;
```

## Parameters

| Name      | Type     | Required? | Description                                                                                                                   |
| --------- | -------- | :-------: | ----------------------------------------------------------------------------------------------------------------------------- |
| params    | `object` |     ✓     | The logging stream parameters                                                                                                 |
| params.id | `string` |     ✓     | The identifier to stream logs for. Pass a model ID for model logs, or the exported constant `SDK_LOG_ID` for SDK server logs. |

## Returns

`AsyncGenerator<`[`LoggingStreamResponse`](#loggingstreamresponse)`>` — Yields log messages in real time.

### `LoggingStreamResponse`

| Field     | Type                                     | Description               |
| --------- | ---------------------------------------- | ------------------------- |
| type      | `"loggingStream"`                        | Response type             |
| id        | `string`                                 | Identifier being streamed |
| level     | `"error" \| "warn" \| "info" \| "debug"` | Log level                 |
| namespace | `string`                                 | Logger namespace          |
| message   | `string`                                 | Log message               |
| timestamp | `number`                                 | Unix timestamp            |

## Throws

| Error                   | When                                                     |
| ----------------------- | -------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | A chunk's type does not match expected `"loggingStream"` |

## Example

```typescript
import { loggingStream, SDK_LOG_ID } from "@qvac/sdk";

// Stream logs from a loaded model
const logStream = loggingStream({ id: "my-model-id" });

for await (const logMessage of logStream) {
  console.log(`[${logMessage.level}] ${logMessage.namespace}: ${logMessage.message}`);
}

// Or stream SDK server logs
const sdkLogs = loggingStream({ id: SDK_LOG_ID });
```


# modelRegistryGetModel( ) (/sdk/api/modelRegistryGetModel)


```ts
function modelRegistryGetModel(registryPath: string, registrySource: string): Promise<ModelRegistryEntry>;
```

## Parameters

| Name           | Type     | Required? | Description                    |
| -------------- | -------- | :-------: | ------------------------------ |
| registryPath   | `string` |     ✓     | The registry path of the model |
| registrySource | `string` |     ✓     | The registry source identifier |

## Returns

`Promise<`[`ModelRegistryEntry`](#modelregistryentry)`>` — The matching model entry.

### `ModelRegistryEntry`

| Field           | Type                                                                                | Description                 |
| --------------- | ----------------------------------------------------------------------------------- | --------------------------- |
| name            | `string`                                                                            | Human-readable model name   |
| registryPath    | `string`                                                                            | Registry path               |
| registrySource  | `string`                                                                            | Registry source             |
| blobCoreKey     | `string`                                                                            | Hyperdrive blob core key    |
| blobBlockOffset | `number`                                                                            | Blob block offset           |
| blobBlockLength | `number`                                                                            | Blob block length           |
| blobByteOffset  | `number`                                                                            | Blob byte offset            |
| modelId         | `string`                                                                            | Unique model identifier     |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type            |
| expectedSize    | `number`                                                                            | Expected file size in bytes |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum            |
| engine          | `string`                                                                            | Inference engine            |
| quantization    | `string`                                                                            | Quantization level          |
| params          | `string`                                                                            | Parameter count             |

## Throws

| Error                              | When                                           |
| ---------------------------------- | ---------------------------------------------- |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails or model is not found |

## Example

```typescript
const model = await modelRegistryGetModel("llama-3.2-3b-q4", "qvac");
console.log(model.name, model.expectedSize);
```


# modelRegistryList( ) (/sdk/api/modelRegistryList)


```ts
function modelRegistryList(): Promise<ModelRegistryEntry[]>;
```

## Returns

`Promise<`[`ModelRegistryEntry[]`](../modelRegistryGetModel#modelregistryentry)`>` — Array of all models in the registry.

## Throws

| Error                              | When                     |
| ---------------------------------- | ------------------------ |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails |

## Example

```typescript
import { modelRegistryList } from "@qvac/sdk";

const models = await modelRegistryList();
for (const model of models) {
  console.log(model.registryPath, model.addon);
}
```


# modelRegistrySearch( ) (/sdk/api/modelRegistrySearch)


```ts
function modelRegistrySearch(params?: ModelRegistrySearchParams): Promise<ModelRegistryEntry[]>;
```

## Parameters

| Name   | Type                                                      | Required? | Description                                              |
| ------ | --------------------------------------------------------- | :-------: | -------------------------------------------------------- |
| params | [`ModelRegistrySearchParams`](#modelregistrysearchparams) |     ✗     | Optional search filters. If omitted, returns all models. |

### `ModelRegistrySearchParams`

| Field        | Type                                                                                | Required? | Description                              |
| ------------ | ----------------------------------------------------------------------------------- | :-------: | ---------------------------------------- |
| filter       | `string`                                                                            |     ✗     | Free-text filter on model name           |
| engine       | `string`                                                                            |     ✗     | Filter by inference engine               |
| quantization | `string`                                                                            |     ✗     | Filter by quantization level             |
| modelType    | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` |     ✗     | Filter by model type (alias for `addon`) |
| addon        | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` |     ✗     | Filter by addon type                     |

## Returns

`Promise<`[`ModelRegistryEntry[]`](../modelRegistryGetModel#modelregistryentry)`>` — Matching model entries.

## Throws

| Error                              | When                     |
| ---------------------------------- | ------------------------ |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails |

## Example

```typescript
// Search LLM models
const llmModels = await modelRegistrySearch({ modelType: "llm" });

// Search by name
const llamaModels = await modelRegistrySearch({ filter: "llama" });

// Combined filters
const models = await modelRegistrySearch({
  modelType: "llm",
  quantization: "Q4_K_M",
});
```


# ocr( ) (/sdk/api/ocr)


```ts
function ocr(params: OCRClientParams): {
  blockStream: AsyncGenerator<OCRTextBlock[]>;
  blocks: Promise<OCRTextBlock[]>;
  stats: Promise<OCRStats | undefined>;
};
```

## Parameters

| Name   | Type                                  | Required? | Description        |
| ------ | ------------------------------------- | :-------: | ------------------ |
| params | [`OCRClientParams`](#ocrclientparams) |     ✓     | The OCR parameters |

### `OCRClientParams`

| Field   | Type                        | Required? | Default | Description                                       |
| ------- | --------------------------- | :-------: | ------- | ------------------------------------------------- |
| modelId | `string`                    |     ✓     | —       | The identifier of the loaded OCR model            |
| image   | `string \| Buffer`          |     ✓     | —       | Image input as file path (string) or image buffer |
| options | [`OCROptions`](#ocroptions) |     ✗     | —       | Optional OCR options                              |
| stream  | `boolean`                   |     ✗     | `false` | Whether to stream blocks as they're detected      |

#### `OCROptions`

| Field     | Type      | Required? | Description           |
| --------- | --------- | :-------: | --------------------- |
| paragraph | `boolean` |     ✗     | Enable paragraph mode |

## Returns

`object` — Object with the following fields:

| Field       | Type                                                  | Description                                                 |
| ----------- | ----------------------------------------------------- | ----------------------------------------------------------- |
| blockStream | `AsyncGenerator<`[`OCRTextBlock[]`](#ocrtextblock)`>` | Stream of detected text blocks (active when `stream: true`) |
| blocks      | `Promise<`[`OCRTextBlock[]`](#ocrtextblock)`>`        | All detected text blocks (populated when `stream: false`)   |
| stats       | `Promise<`[`OCRStats`](#ocrstats) `\| undefined>`     | Performance statistics                                      |

### `OCRTextBlock`

| Field      | Type                               | Description                                |
| ---------- | ---------------------------------- | ------------------------------------------ |
| text       | `string`                           | Detected text                              |
| bbox       | `[number, number, number, number]` | Bounding box `[x1, y1, x2, y2]` (optional) |
| confidence | `number`                           | Detection confidence score (optional)      |

### `OCRStats`

| Field           | Type     | Description                             |
| --------------- | -------- | --------------------------------------- |
| detectionTime   | `number` | Detection phase time in ms (optional)   |
| recognitionTime | `number` | Recognition phase time in ms (optional) |
| totalTime       | `number` | Total OCR time in ms (optional)         |

## Example

```typescript
// Non-streaming mode (default)
const { blocks } = ocr({ modelId, image: "/path/to/image.png" });
for (const block of await blocks) {
  console.log(block.text, block.bbox, block.confidence);
}

// Streaming mode
const { blockStream } = ocr({ modelId, image: imageBuffer, stream: true });
for await (const blocks of blockStream) {
  console.log("Detected:", blocks);
}
```


# profiler (/sdk/api/profiler)


```ts
const profiler: {
  enable(options?: ProfilerRuntimeOptions): void;
  disable(): void;
  isEnabled(): boolean;
  exportJSON(options?: { includeRecentEvents?: boolean }): ProfilerExport;
  exportTable(): string;
  exportSummary(): string;
  onRecord(callback: (event: ProfilingEvent) => void): () => void;
  getConfig(): ResolvedProfilerConfig;
  getAggregates(): Record<string, AggregatedStats>;
  clear(): void;
};
```

## Methods

| Method                              | Description                                                                |
| ----------------------------------- | -------------------------------------------------------------------------- |
| [`enable()`](#enable)               | Enables profiling and resets aggregated data                               |
| [`disable()`](#disable)             | Disables profiling                                                         |
| [`isEnabled()`](#isenabled)         | Returns whether profiling is currently enabled                             |
| [`exportJSON()`](#exportjson)       | Exports profiling data as a structured JSON object                         |
| [`exportTable()`](#exporttable)     | Exports aggregated stats as a formatted ASCII table                        |
| [`exportSummary()`](#exportsummary) | Exports a human-readable summary string                                    |
| [`onRecord()`](#onrecord)           | Registers a listener for profiling events; returns an unsubscribe function |
| [`getConfig()`](#getconfig)         | Returns the current effective profiler configuration                       |
| [`getAggregates()`](#getaggregates) | Returns all aggregated stats keyed by operation name                       |
| [`clear()`](#clear)                 | Clears all aggregated data and recent events                               |

### `enable()`

```ts
function enable(options?: ProfilerRuntimeOptions): void
```

Enables profiling and resets all previously aggregated data.

**Parameters**

| Name    | Type                                                | Required? | Description              |
| ------- | --------------------------------------------------- | :-------: | ------------------------ |
| options | [`ProfilerRuntimeOptions`](#profilerruntimeoptions) |     ✗     | Runtime profiler options |

#### `ProfilerRuntimeOptions`

| Field                  | Type                     | Required? | Default     | Description                                                           |
| ---------------------- | ------------------------ | :-------: | ----------- | --------------------------------------------------------------------- |
| mode                   | `"summary" \| "verbose"` |     ✗     | `"summary"` | Profiling detail level — `"verbose"` retains recent events            |
| includeServerBreakdown | `boolean`                |     ✗     | `false`     | Include server-side timing breakdown in profiling data                |
| operationFilters       | `string[]`               |     ✗     | `[]`        | Only profile operations whose names match these filters (empty = all) |

**Returns**

`void`

### `disable()`

```ts
function disable(): void
```

Disables profiling. New SDK operations will no longer be recorded.

**Parameters**

No parameters.

**Returns**

`void`

### `isEnabled()`

```ts
function isEnabled(): boolean
```

Returns whether profiling is currently enabled.

**Parameters**

No parameters.

**Returns**

`boolean` — `true` if profiling is enabled, `false` otherwise.

### `exportJSON()`

```ts
function exportJSON(options?: { includeRecentEvents?: boolean }): ProfilerExport
```

Exports profiling data as a structured JSON object.

**Parameters**

| Name                        | Type      | Required? | Description                                                              |
| --------------------------- | --------- | :-------: | ------------------------------------------------------------------------ |
| options                     | `object`  |     ✗     | Export options                                                           |
| options.includeRecentEvents | `boolean` |     ✗     | Include recent events in the export (only available in `"verbose"` mode) |

**Returns**

[`ProfilerExport`](#profilerexport) — structured JSON object containing configuration snapshot, aggregated statistics, and optionally recent events.

#### `ProfilerExport`

| Field                         | Type                                                      | Description                                                     |
| ----------------------------- | --------------------------------------------------------- | --------------------------------------------------------------- |
| config                        | `object`                                                  | Snapshot of the profiler configuration at export time           |
| config.enabled                | `boolean`                                                 | Whether profiling was enabled                                   |
| config.mode                   | `"summary" \| "verbose"`                                  | Profiling mode                                                  |
| config.includeServerBreakdown | `boolean`                                                 | Whether server breakdown was enabled                            |
| config.operationFilters       | `string[]`                                                | Active operation filters                                        |
| config.maxRecentEvents        | `number`                                                  | Max recent events setting                                       |
| aggregates                    | `Record<string,` [`AggregatedStats`](#aggregatedstats)`>` | Aggregated statistics keyed by operation name                   |
| recentEvents                  | [`ProfilingEvent[]`](#profilingevent)                     | Recent profiling events (only when `includeRecentEvents: true`) |
| exportedAt                    | `number`                                                  | Monotonic timestamp of the export                               |

#### `AggregatedStats`

Per-operation aggregated statistics:

| Field | Type     | Description                                |
| ----- | -------- | ------------------------------------------ |
| count | `number` | Number of recorded events                  |
| min   | `number` | Minimum duration in milliseconds           |
| max   | `number` | Maximum duration in milliseconds           |
| avg   | `number` | Average duration in milliseconds           |
| sum   | `number` | Total accumulated duration in milliseconds |
| last  | `number` | Most recent duration in milliseconds       |

#### `ProfilingEvent`

Individual profiling event recorded during an SDK operation:

| Field     | Type                                        | Required? | Description                                                          |
| --------- | ------------------------------------------- | :-------: | -------------------------------------------------------------------- |
| ts        | `number`                                    |     ✓     | Monotonic timestamp in milliseconds                                  |
| op        | `string`                                    |     ✓     | Operation name (e.g., `"completion"`, `"loadModel"`)                 |
| kind      | [`ProfilingEventKind`](#profilingeventkind) |     ✓     | Event category                                                       |
| profileId | `string`                                    |     ✗     | Unique identifier for the profiling session                          |
| phase     | `string`                                    |     ✗     | Sub-phase within the operation (e.g., `"rpc.send"`, `"handler.run"`) |
| ms        | `number`                                    |     ✗     | Duration in milliseconds                                             |
| count     | `number`                                    |     ✗     | Count metric (e.g., tokens, chunks)                                  |
| bytes     | `number`                                    |     ✗     | Byte count metric                                                    |
| gauges    | `Record<string, number>`                    |     ✗     | Numeric gauges (e.g., throughput, token counters)                    |
| tags      | `Record<string, string>`                    |     ✗     | String tags (e.g., `handlerType`, `sourceType`, `modelId`)           |

#### `ProfilingEventKind`

`"rpc" | "handler" | "download" | "load" | "delegation"`

### `exportTable()`

```ts
function exportTable(): string
```

Exports aggregated stats as a formatted ASCII table suitable for console output.

**Parameters**

No parameters.

**Returns**

`string` — formatted ASCII table of all aggregated profiling data.

### `exportSummary()`

```ts
function exportSummary(): string
```

Exports a human-readable summary of all aggregated profiling data.

**Parameters**

No parameters.

**Returns**

`string` — human-readable summary string.

### `onRecord()`

```ts
function onRecord(callback: (event: ProfilingEvent) => void): () => void
```

Registers a listener that is called for every profiling event.

**Parameters**

| Name     | Type                                                     | Required? | Description                               |
| -------- | -------------------------------------------------------- | :-------: | ----------------------------------------- |
| callback | `(event:` [`ProfilingEvent`](#profilingevent)`) => void` |     ✓     | Function called with each profiling event |

**Returns**

`() => void` — an unsubscribe function. Call it to stop receiving events.

### `getConfig()`

```ts
function getConfig(): ResolvedProfilerConfig
```

Returns the current effective profiler configuration.

**Parameters**

No parameters.

**Returns**

[`ResolvedProfilerConfig`](#resolvedprofilerconfig):

#### `ResolvedProfilerConfig`

| Field                  | Type                     | Description                                             |
| ---------------------- | ------------------------ | ------------------------------------------------------- |
| enabled                | `boolean`                | Whether profiling is currently enabled                  |
| mode                   | `"summary" \| "verbose"` | Active profiling mode                                   |
| includeServerBreakdown | `boolean`                | Whether server breakdown is included                    |
| operationFilters       | `string[]`               | Active operation filters                                |
| maxRecentEvents        | `number`                 | Maximum number of recent events retained (default 1000) |

### `getAggregates()`

```ts
function getAggregates(): Record<string, AggregatedStats>
```

Returns all aggregated stats keyed by operation name.

**Parameters**

No parameters.

**Returns**

[`Record<string, AggregatedStats>`](#aggregatedstats) — all aggregated stats keyed by operation name.

### `clear()`

```ts
function clear(): void
```

Clears all aggregated data and recent events. Does not disable profiling.

**Parameters**

No parameters.

**Returns**

`void`

## Example

```typescript
import { profiler, completion, loadModel } from "@qvac/sdk";

profiler.enable({ mode: "verbose", includeServerBreakdown: true });

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
});

const result = completion({
  modelId,
  history: [{ role: "user", content: "Hello!" }],
});

await result.text;

console.log(profiler.exportTable());

const json = profiler.exportJSON({ includeRecentEvents: true });
console.log("Aggregates:", json.aggregates);

profiler.clear();
profiler.disable();
```


# ragChunk( ) (/sdk/api/ragChunk)


```ts
function ragChunk(params: RagChunkParams, options?: RPCOptions): Promise<RagDoc[]>;
```

Part of the segregated flow: `ragChunk()` → [`embed()`](../embed) → [`ragSaveEmbeddings()`](../ragSaveEmbeddings)

## Parameters

| Name             | Type                                | Required? | Description                    |
| ---------------- | ----------------------------------- | :-------: | ------------------------------ |
| params           | `object`                            |     ✓     | The chunking parameters        |
| params.documents | `string \| string[]`                |     ✓     | Documents to chunk             |
| params.chunkOpts | [`ChunkOptions`](#chunkoptions)     |     ✗     | Chunking options               |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options |

### `ChunkOptions`

| Field         | Type                                                       | Required? | Description             |
| ------------- | ---------------------------------------------------------- | :-------: | ----------------------- |
| chunkSize     | `number`                                                   |     ✗     | Maximum chunk size      |
| chunkOverlap  | `number`                                                   |     ✗     | Overlap between chunks  |
| chunkStrategy | `"character" \| "paragraph"`                               |     ✗     | Chunking strategy       |
| splitStrategy | `"character" \| "word" \| "token" \| "sentence" \| "line"` |     ✗     | Text splitting strategy |

## Returns

`Promise<RagDoc[]>` — Array of chunk results.

| Field   | Type     | Description        |
| ------- | -------- | ------------------ |
| id      | `string` | Chunk identifier   |
| content | `string` | Chunk text content |

## Throws

| Error              | When                         |
| ------------------ | ---------------------------- |
| `RAG_CHUNK_FAILED` | The chunking operation fails |

## Example

```typescript
const chunks = await ragChunk({
  documents: ["Long document text here..."],
  chunkOpts: {
    chunkSize: 256,
    chunkOverlap: 50,
    chunkStrategy: "paragraph",
  },
});
```


# ragCloseWorkspace( ) (/sdk/api/ragCloseWorkspace)


```ts
function ragCloseWorkspace(params?: RagCloseWorkspaceParams, options?: RPCOptions): Promise<void>;
```

Releases Corestore, HyperDB adapter, and RAG instance. Workspace data remains on disk unless `deleteOnClose` is set.

## Parameters

| Name                 | Type                                | Required? | Default     | Description                                             |
| -------------------- | ----------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.workspace     | `string`                            |     ✗     | `"default"` | Workspace to close                                      |
| params.deleteOnClose | `boolean`                           |     ✗     | `false`     | If true, deletes workspace data from disk after closing |
| options              | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                          |

## Returns

`Promise<void>` — Resolves when the workspace is closed.

## Throws

| Error                        | When                      |
| ---------------------------- | ------------------------- |
| `RAG_WORKSPACE_CLOSE_FAILED` | The close operation fails |

## Example

```typescript
// Close a workspace
await ragCloseWorkspace({ workspace: "my-docs" });

// Close and delete in one call
await ragCloseWorkspace({ workspace: "my-docs", deleteOnClose: true });
```


# ragDeleteEmbeddings( ) (/sdk/api/ragDeleteEmbeddings)


```ts
function ragDeleteEmbeddings(params: RagDeleteEmbeddingsParams, options?: RPCOptions): Promise<void>;
```

## Parameters

| Name             | Type                                | Required? | Default     | Description                                             |
| ---------------- | ----------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.ids       | `string[]`                          |     ✓     | —           | Array of document IDs to delete                         |
| params.modelId   | `string`                            |     ✗     | —           | Embedding model ID (required if no cached RAG instance) |
| params.workspace | `string`                            |     ✗     | `"default"` | Workspace to delete from                                |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                          |

## Returns

`Promise<void>` — Resolves when the embeddings are deleted.

## Throws

| Error               | When                                                  |
| ------------------- | ----------------------------------------------------- |
| `RAG_DELETE_FAILED` | The delete operation fails or workspace doesn't exist |

## Example

```typescript
await ragDeleteEmbeddings({
  ids: ["doc-1", "doc-2"],
  workspace: "my-docs",
});
```


# ragDeleteWorkspace( ) (/sdk/api/ragDeleteWorkspace)


```ts
function ragDeleteWorkspace(params: RagDeleteWorkspaceParams, options?: RPCOptions): Promise<void>;
```

The workspace must not be currently loaded/in-use.

## Parameters

| Name             | Type                                | Required? | Description                     |
| ---------------- | ----------------------------------- | :-------: | ------------------------------- |
| params.workspace | `string`                            |     ✓     | Name of the workspace to delete |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options  |

## Returns

`Promise<void>` — Resolves when the workspace is deleted.

## Throws

| Error               | When                                               |
| ------------------- | -------------------------------------------------- |
| `RAG_DELETE_FAILED` | The workspace doesn't exist or is currently loaded |

## Example

```typescript
await ragDeleteWorkspace({ workspace: "my-docs" });
```


# ragIngest( ) (/sdk/api/ragIngest)


```ts
function ragIngest(params: RagIngestParams, options?: RPCOptions): Promise<{ processed: RagSaveEmbeddingsResult[]; droppedIndices: number[] }>;
```

Full pipeline: chunk → embed → save. Implicitly opens (or creates) the workspace.

## Parameters

| Name                    | Type                                       | Required? | Default     | Description                                                  |
| ----------------------- | ------------------------------------------ | :-------: | ----------- | ------------------------------------------------------------ |
| params.modelId          | `string`                                   |     ✓     | —           | The embedding model identifier                               |
| params.documents        | `string \| string[]`                       |     ✓     | —           | Documents to ingest                                          |
| params.chunk            | `boolean`                                  |     ✗     | `true`      | Whether to chunk documents before embedding                  |
| params.chunkOpts        | [`ChunkOptions`](../ragChunk#chunkoptions) |     ✗     | —           | Chunking options                                             |
| params.workspace        | `string`                                   |     ✗     | `"default"` | Workspace for isolated storage. Created if it doesn't exist. |
| params.onProgress       | `(stage, current, total) => void`          |     ✗     | —           | Progress callback                                            |
| params.progressInterval | `number`                                   |     ✗     | —           | Minimum interval between progress updates in ms              |
| options                 | [`RPCOptions`](../index#rpcoptions)        |     ✗     | —           | Optional RPC transport options                               |

## Returns

| Field          | Type                        | Description                                                   |
| -------------- | --------------------------- | ------------------------------------------------------------- |
| processed      | `RagSaveEmbeddingsResult[]` | Array of `{ status: "fulfilled" \| "rejected", id?, error? }` |
| droppedIndices | `number[]`                  | Indices of documents that were dropped                        |

## Throws

| Error                           | When                                                  |
| ------------------------------- | ----------------------------------------------------- |
| `RAG_SAVE_FAILED`               | The ingestion operation fails                         |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`) |

## Example

```typescript
const result = await ragIngest({
  modelId,
  documents: ["Document 1", "Document 2"],
  workspace: "my-docs",
  onProgress: (stage, current, total) => {
    console.log(`[${stage}] ${current}/${total}`);
  },
});
```


# ragListWorkspaces( ) (/sdk/api/ragListWorkspaces)


```ts
function ragListWorkspaces(options?: RPCOptions): Promise<RagWorkspaceInfo[]>;
```

## Parameters

| Name    | Type                                | Required? | Description                    |
| ------- | ----------------------------------- | :-------: | ------------------------------ |
| options | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options |

## Returns

`Promise<RagWorkspaceInfo[]>` — Array of workspace info.

| Field | Type      | Description                                         |
| ----- | --------- | --------------------------------------------------- |
| name  | `string`  | Workspace name                                      |
| open  | `boolean` | Whether the workspace is currently loaded in memory |

## Throws

| Error                        | When                |
| ---------------------------- | ------------------- |
| `RAG_LIST_WORKSPACES_FAILED` | The operation fails |

## Example

```typescript
const workspaces = await ragListWorkspaces();
// [{ name: "default", open: true }, { name: "my-docs", open: false }]
```


# ragReindex( ) (/sdk/api/ragReindex)


```ts
function ragReindex(params: RagReindexParams, options?: RPCOptions): Promise<RagReindexResult>;
```

For HyperDB, rebalances centroids using k-means clustering. Requires a minimum number of documents (16 by default).

## Parameters

| Name              | Type                                | Required? | Default     | Description                                             |
| ----------------- | ----------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.modelId    | `string`                            |     ✗     | —           | Embedding model ID (required if no cached RAG instance) |
| params.workspace  | `string`                            |     ✗     | `"default"` | Workspace to reindex. Must already exist.               |
| params.onProgress | `(stage, current, total) => void`   |     ✗     | —           | Progress callback                                       |
| options           | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                          |

## Returns

| Field     | Type                      | Description                                             |
| --------- | ------------------------- | ------------------------------------------------------- |
| reindexed | `boolean`                 | Whether reindexing was performed                        |
| details   | `Record<string, unknown>` | Additional details (e.g., reason if skipped) — optional |

## Throws

| Error                           | When                                                   |
| ------------------------------- | ------------------------------------------------------ |
| `RAG_SAVE_FAILED`               | The reindex operation fails or workspace doesn't exist |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`)  |

## Example

```typescript
const result = await ragReindex({ workspace: "my-docs" });

if (!result.reindexed) {
  console.log("Reindex skipped:", result.details?.reason);
}
```


# ragSaveEmbeddings( ) (/sdk/api/ragSaveEmbeddings)


```ts
function ragSaveEmbeddings(params: RagSaveEmbeddingsParams, options?: RPCOptions): Promise<RagSaveEmbeddingsResult[]>;
```

Part of the segregated flow: [`ragChunk()`](../ragChunk) → [`embed()`](../embed) → `ragSaveEmbeddings()`. Implicitly opens (or creates) the workspace.

## Parameters

| Name                    | Type                                  | Required? | Default     | Description                                                    |
| ----------------------- | ------------------------------------- | :-------: | ----------- | -------------------------------------------------------------- |
| params.documents        | [`RagEmbeddedDoc[]`](#ragembeddeddoc) |     ✓     | —           | Pre-embedded documents                                         |
| params.modelId          | `string`                              |     ✗     | —           | Embedding model ID (required if no cached RAG instance exists) |
| params.workspace        | `string`                              |     ✗     | `"default"` | Workspace for isolated storage                                 |
| params.onProgress       | `(stage, current, total) => void`     |     ✗     | —           | Progress callback                                              |
| params.progressInterval | `number`                              |     ✗     | —           | Minimum interval between progress updates in ms                |
| options                 | [`RPCOptions`](../index#rpcoptions)   |     ✗     | —           | Optional RPC transport options                                 |

### `RagEmbeddedDoc`

| Field            | Type                      | Required? | Description                          |
| ---------------- | ------------------------- | :-------: | ------------------------------------ |
| id               | `string`                  |     ✓     | Document identifier                  |
| content          | `string`                  |     ✓     | Document text content                |
| embedding        | `number[]`                |     ✓     | Pre-computed embedding vector        |
| embeddingModelId | `string`                  |     ✓     | Model used to generate the embedding |
| metadata         | `Record<string, unknown>` |     ✗     | Optional metadata                    |

## Returns

`Promise<RagSaveEmbeddingsResult[]>` — Array of `{ status: "fulfilled" | "rejected", id?, error? }`.

## Throws

| Error                           | When                                                  |
| ------------------------------- | ----------------------------------------------------- |
| `RAG_SAVE_FAILED`               | The save operation fails                              |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`) |

## Example

```typescript
const chunks = await ragChunk({ documents: ["text1", "text2"] });
const embeddings = await embed({ modelId, text: chunks.map(c => c.content) });
const embeddedDocs = chunks.map((chunk, i) => ({
  ...chunk,
  embedding: embeddings[i],
  embeddingModelId: modelId,
}));
const result = await ragSaveEmbeddings({
  documents: embeddedDocs,
  workspace: "my-workspace",
});
```


# ragSearch( ) (/sdk/api/ragSearch)


```ts
function ragSearch(params: RagSearchParams, options?: RPCOptions): Promise<RagSearchResult[]>;
```

## Parameters

| Name             | Type                                | Required? | Default     | Description                              |
| ---------------- | ----------------------------------- | :-------: | ----------- | ---------------------------------------- |
| params.modelId   | `string`                            |     ✓     | —           | The embedding model identifier           |
| params.query     | `string`                            |     ✓     | —           | The search query text                    |
| params.topK      | `number`                            |     ✗     | `5`         | Number of top results to retrieve        |
| params.n         | `number`                            |     ✗     | `3`         | Number of centroids for IVF index search |
| params.workspace | `string`                            |     ✗     | `"default"` | Workspace to search in                   |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options           |

## Returns

`Promise<RagSearchResult[]>` — Array of search results. Empty array if workspace doesn't exist.

| Field   | Type     | Description           |
| ------- | -------- | --------------------- |
| id      | `string` | Document identifier   |
| content | `string` | Document text content |
| score   | `number` | Similarity score      |

## Throws

| Error               | When                       |
| ------------------- | -------------------------- |
| `RAG_SEARCH_FAILED` | The search operation fails |

## Example

```typescript
const results = await ragSearch({
  modelId,
  query: "AI and machine learning",
  topK: 5,
  workspace: "my-docs",
});
```


# resume( ) (/sdk/api/resume)


```ts
function resume(): Promise<void>;
```

Resumes all suspended Hyperswarm and Corestore resources. Idempotent — calling while already active is a no-op. Also serves as the recovery path after a partial suspend failure.

Typically used in mobile apps when the application returns to the foreground, paired with [`suspend()`](./suspend) when it moves to the background.

## Parameters

None.

## Returns

`Promise<void>`

## Throws

| Error                     | When                                             |
| ------------------------- | ------------------------------------------------ |
| `INVALID_RESPONSE_TYPE`   | Response type does not match expected `"resume"` |
| `LIFECYCLE_RESUME_FAILED` | One or more resources failed to resume           |

## Example

```typescript
await resume();
```


# Shared types (/sdk/api/shared-types)


## `RPCOptions`

Many functions accept an optional `rpcOptions` parameter for transport-level configuration:

| Field              | Type                                    | Required? | Default | Description                                                   |
| ------------------ | --------------------------------------- | :-------: | ------- | ------------------------------------------------------------- |
| timeout            | `number`                                |     ✗     | —       | Request timeout in milliseconds (min 100)                     |
| forceNewConnection | `boolean`                               |     ✗     | `false` | Force a new RPC connection instead of reusing an existing one |
| profiling          | [`PerCallProfiling`](#percallprofiling) |     ✗     | —       | Per-call profiling configuration                              |

### `PerCallProfiling`

| Field                  | Type                     | Required? | Description                                            |
| ---------------------- | ------------------------ | :-------: | ------------------------------------------------------ |
| enabled                | `boolean`                |     ✗     | Enable profiling for this call                         |
| includeServerBreakdown | `boolean`                |     ✗     | Include server-side timing breakdown in profiling data |
| mode                   | `"summary" \| "verbose"` |     ✗     | Profiling detail level                                 |


# startQVACProvider( ) (/sdk/api/startQVACProvider)


```ts
function startQVACProvider(params: ProvideParams): Promise<object>;
```

The provider's keypair can be controlled via the seed option or the `QVAC_HYPERSWARM_SEED` environment variable.

## Parameters

| Name            | Type                                | Required? | Description                         |
| --------------- | ----------------------------------- | :-------: | ----------------------------------- |
| params.topic    | `string`                            |     ✓     | Topic hex string for peer discovery |
| params.firewall | [`FirewallConfig`](#firewallconfig) |     ✗     | Optional firewall configuration     |

### `FirewallConfig`

| Field      | Type                | Required? | Default   | Description                  |
| ---------- | ------------------- | :-------: | --------- | ---------------------------- |
| mode       | `"allow" \| "deny"` |     ✗     | `"allow"` | Firewall mode                |
| publicKeys | `string[]`          |     ✗     | `[]`      | Public keys to allow or deny |

## Returns

`Promise<object>` — The provide response containing `success`, `publicKey`, and optional `error`.

## Throws

| Error                   | When                                              |
| ----------------------- | ------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"provide"` |
| `PROVIDER_START_FAILED` | The server reports provider start failure         |

## Example

```typescript
const response = await startQVACProvider({
  topic: "a1b2c3d4...",
  firewall: {
    mode: "allow",
    publicKeys: ["peer-public-key-hex"],
  },
});
console.log("Provider public key:", response.publicKey);
```


# stopQVACProvider( ) (/sdk/api/stopQVACProvider)


```ts
function stopQVACProvider(params: StopProvideParams): Promise<object>;
```

## Parameters

| Name         | Type     | Required? | Description               |
| ------------ | -------- | :-------: | ------------------------- |
| params.topic | `string` |     ✓     | Topic hex string to leave |

## Returns

`Promise<object>` — The stop provide response containing `success` and optional `error`.

## Throws

| Error                   | When                                                  |
| ----------------------- | ----------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"stopProvide"` |
| `PROVIDER_STOP_FAILED`  | The server reports provider stop failure              |

## Example

```typescript
await stopQVACProvider({ topic: "a1b2c3d4..." });
```


# suspend( ) (/sdk/api/suspend)


```ts
function suspend(): Promise<void>;
```

Suspends all active Hyperswarm and Corestore resources. Idempotent — calling while already suspended is a no-op.

Typically used in mobile apps when the application moves to the background, paired with [`resume()`](./resume) when it returns to the foreground.

## Parameters

None.

## Returns

`Promise<void>`

## Throws

| Error                      | When                                                      |
| -------------------------- | --------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE`    | Response type does not match expected `"suspend"`         |
| `LIFECYCLE_SUSPEND_FAILED` | One or more resources failed to suspend (partial failure) |

## Example

```typescript
await suspend();
```


# textToSpeech( ) (/sdk/api/textToSpeech)


```ts
function textToSpeech(params: TtsClientParams, options?: RPCOptions): {
  bufferStream: AsyncGenerator<number>;
  buffer: Promise<number[]>;
  done: Promise<boolean>;
};
```

## Parameters

| Name             | Type                                | Required? | Default  | Description                                           |
| ---------------- | ----------------------------------- | :-------: | -------- | ----------------------------------------------------- |
| params.modelId   | `string`                            |     ✓     | —        | The identifier of the loaded TTS model                |
| params.text      | `string`                            |     ✓     | —        | The text to convert to speech (non-empty)             |
| params.inputType | `string`                            |     ✗     | `"text"` | Input type                                            |
| params.stream    | `boolean`                           |     ✗     | `true`   | Whether to stream audio samples or return all at once |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —        | Optional RPC transport options                        |

## Returns

`object` — Object with the following fields:

| Field        | Type                     | Description                                            |
| ------------ | ------------------------ | ------------------------------------------------------ |
| bufferStream | `AsyncGenerator<number>` | Stream of audio samples (active when `stream: true`)   |
| buffer       | `Promise<number[]>`      | Complete audio buffer (populated when `stream: false`) |
| done         | `Promise<boolean>`       | Resolves to `true` when generation completes           |

## Example

```typescript
// Streaming mode
const { bufferStream } = textToSpeech({ modelId, text: "Hello world" });
for await (const sample of bufferStream) {
  // process audio sample
}

// Non-streaming mode
const { buffer } = textToSpeech({ modelId, text: "Hello world", stream: false });
const audioData = await buffer;
```


# transcribe( ) (/sdk/api/transcribe)


```ts
function transcribe(params: TranscribeClientParams, options?: RPCOptions): Promise<string>;
```

Collects all streaming results from [`transcribeStream()`](../transcribeStream) into a single string.

## Parameters

| Name              | Type                                | Required? | Description                                        |
| ----------------- | ----------------------------------- | :-------: | -------------------------------------------------- |
| params.modelId    | `string`                            |     ✓     | The identifier of the transcription model          |
| params.audioChunk | `string \| Buffer`                  |     ✓     | Audio input as file path (string) or audio buffer  |
| params.prompt     | `string`                            |     ✗     | Optional initial prompt to guide the transcription |
| options           | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options                     |

## Returns

`Promise<string>` — The complete transcribed text.

## Throws

| Error                  | When                |
| ---------------------- | ------------------- |
| `TRANSCRIPTION_FAILED` | Transcription fails |

## Example

```typescript
const text = await transcribe({
  modelId: "whisper-model",
  audioChunk: "/path/to/audio.wav",
});
console.log(text);
```

## `SUPPORTED_AUDIO_FORMATS`

Use this exported constant to check which audio formats are accepted before passing a file to `transcribe()`. Avoids runtime errors from unsupported formats.

```typescript
import { SUPPORTED_AUDIO_FORMATS } from "@qvac/sdk";

const ext = filePath.split(".").pop();
if (!SUPPORTED_AUDIO_FORMATS.includes(ext)) {
  console.error(`Unsupported format: ${ext}`);
}
```


# transcribeStream( ) (/sdk/api/transcribeStream)


```ts
function transcribeStream(params: TranscribeClientParams, options?: RPCOptions): AsyncGenerator<string>;
```

Yields text chunks as they become available from the model.

## Parameters

| Name              | Type                                | Required? | Description                                        |
| ----------------- | ----------------------------------- | :-------: | -------------------------------------------------- |
| params.modelId    | `string`                            |     ✓     | The identifier of the transcription model          |
| params.audioChunk | `string \| Buffer`                  |     ✓     | Audio input as file path (string) or audio buffer  |
| params.prompt     | `string`                            |     ✗     | Optional initial prompt to guide the transcription |
| options           | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options                     |

## Returns

`AsyncGenerator<string>` — Yields text chunks as they are transcribed.

## Throws

| Error                  | When                |
| ---------------------- | ------------------- |
| `TRANSCRIPTION_FAILED` | Transcription fails |

## Example

```typescript
const stream = transcribeStream({
  modelId: "whisper-model",
  audioChunk: "/path/to/audio.wav",
});

for await (const textChunk of stream) {
  process.stdout.write(textChunk);
}
```

## `SUPPORTED_AUDIO_FORMATS`

Use this exported constant to check which audio formats are accepted before passing a file to `transcribeStream()`. Avoids runtime errors from unsupported formats.

```typescript
import { SUPPORTED_AUDIO_FORMATS } from "@qvac/sdk";

const ext = filePath.split(".").pop();
if (!SUPPORTED_AUDIO_FORMATS.includes(ext)) {
  console.error(`Unsupported format: ${ext}`);
}
```


# translate( ) (/sdk/api/translate)


```ts
function translate(params: TranslateClientParams): {
  tokenStream: AsyncGenerator<string>;
  text: Promise<string>;
  stats: Promise<TranslationStats | undefined>;
};
```

Supports both NMT (Neural Machine Translation) and LLM models.

## Parameters

| Name             | Type                                | Required? | Default     | Description                                                          |
| ---------------- | ----------------------------------- | :-------: | ----------- | -------------------------------------------------------------------- |
| params.modelId   | `string`                            |     ✓     | —           | The identifier of the translation model                              |
| params.text      | `string \| string[]`                |     ✓     | —           | The input text(s) to translate. NMT supports array; LLM only string. |
| params.modelType | `"nmt" \| "llm"`                    |     ✓     | —           | The type of translation model                                        |
| params.stream    | `boolean`                           |     ✗     | `true`      | Whether to stream tokens or return complete response                 |
| params.from      | `string`                            |     ✗     | auto-detect | Source language code (LLM only)                                      |
| params.to        | `string`                            |  ✓ (LLM)  | —           | Target language code (LLM only)                                      |
| params.context   | `string`                            |     ✗     | —           | Additional context for translation (LLM only)                        |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                                       |

## Returns

`object` — Object with the following fields:

| Field       | Type                                                              | Description                 |
| ----------- | ----------------------------------------------------------------- | --------------------------- |
| tokenStream | `AsyncGenerator<string>`                                          | Stream of translated tokens |
| text        | `Promise<string>`                                                 | Complete translated text    |
| stats       | `Promise<`[`TranslationStats`](#translationstats) `\| undefined>` | Translation statistics      |

### `TranslationStats`

| Field           | Type     | Description                     |
| --------------- | -------- | ------------------------------- |
| processedTokens | `number` | Number of tokens processed      |
| processingTime  | `number` | Processing time in milliseconds |

## Throws

| Error                | When              |
| -------------------- | ----------------- |
| `TRANSLATION_FAILED` | Translation fails |

## Example

```typescript
// Streaming mode (default)
const result = translate({
  modelId: "nmt-model",
  text: "Hello world",
  from: "en",
  to: "es",
  modelType: "llm",
});

for await (const token of result.tokenStream) {
  process.stdout.write(token);
}

// Non-streaming mode
const response = translate({
  modelId: "nmt-model",
  text: "Hello world",
  from: "en",
  to: "es",
  modelType: "llm",
  stream: false,
});

console.log(await response.text);
```


# unloadModel( ) (/sdk/api/unloadModel)


```ts
function unloadModel(params: UnloadModelParams): Promise<void>;
```

When the last model is unloaded and no providers are active, the RPC connection is automatically closed, allowing the process to exit naturally.

## Parameters

| Name                | Type      | Required? | Default | Description                                  |
| ------------------- | --------- | :-------: | ------- | -------------------------------------------- |
| params.modelId      | `string`  |     ✓     | —       | The unique identifier of the model to unload |
| params.clearStorage | `boolean` |     ✗     | `false` | Whether to clear the storage for the model   |

## Returns

`Promise<void>` — Resolves when the model is unloaded.

## Throws

| Error                   | When                                                  |
| ----------------------- | ----------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"unloadModel"` |
| `MODEL_UNLOAD_FAILED`   | The server reports unload failure                     |

## Example

```typescript
await unloadModel({ modelId: "model-123" });

// Unload and clear cached files
await unloadModel({ modelId: "model-123", clearStorage: true });
```


# cancel( ) (/dev/sdk/api/cancel)


```typescript
function cancel(params: infer): Promise
```

## Description

Cancels an ongoing operation.

## Parameters

| Name     | Type    | Required? | Description                         |
| -------- | ------- | :-------: | ----------------------------------- |
| `params` | `infer` |     ✓     | The parameters for the cancellation |

## Returns

```typescript
Promise
```

## Examples

```typescript
// Cancel inference
await cancel({ operation: "inference", modelId: "model-123" });
```

```typescript
// Pause download (preserves partial file for automatic resume)
await cancel({ operation: "downloadAsset", downloadKey: "download-key" });
```

```typescript
// Cancel download completely (deletes partial file)
await cancel({ operation: "downloadAsset", downloadKey: "download-key", clearCache: true });
```

```typescript
// Cancel RAG operation on default workspace
await cancel({ operation: "rag" });
```

```typescript
// Cancel RAG operation on specific workspace
await cancel({ operation: "rag", workspace: "my-workspace" });
```


# close( ) (/dev/sdk/api/close)


```typescript
function close(): Promise
```

## Description

No description available

## Returns

```typescript
Promise
```


# completion( ) (/dev/sdk/api/completion)


```typescript
function completion(params: CompletionParams): { stats: Promise<any>; text: Promise<string>; tokenStream: AsyncGenerator<string>; toolCalls: Promise<any[]>; toolCallStream: AsyncGenerator<infer<any>> }
```

## Description

Generates completion from a language model based on conversation history.

## Parameters

| Name     | Type               | Required? | Description               |
| -------- | ------------------ | :-------: | ------------------------- |
| `params` | `CompletionParams` |     ✓     | The completion parameters |

## Returns

```typescript
{ stats: Promise<any>; text: Promise<string>; tokenStream: AsyncGenerator<string>; toolCalls: Promise<any[]>; toolCallStream: AsyncGenerator<infer<any>> }
```

## Examples

```typescript
import { z } from "zod";

const result = completion({
  modelId: "llama-2",
  history: [
    { role: "user", content: "What's the weather in Tokyo?" }
  ],
  stream: true,
  tools: [{
    name: "get_weather",
    description: "Get current weather",
    parameters: z.object({
      city: z.string().describe("City name"),
    }),
    handler: async (args) => {
      return { temperature: 22, condition: "sunny" };
    }
  }]
});

for await (const token of result.tokenStream) {
  process.stdout.write(token);
}

for (const toolCall of await result.toolCalls) {
  if (toolCall.invoke) {
    const toolResult = await toolCall.invoke();
    console.log(toolResult);
  }
}
```


# defineHandler( ) (/dev/sdk/api/defineHandler)


```typescript
function defineHandler(definition: PluginHandlerDefinition): PluginHandlerDefinition
```

## Description

Helper function to define a handler with full type inference.
This is an identity function that provides type checking.

## Parameters

| Name         | Type                      | Required? | Description    |
| ------------ | ------------------------- | :-------: | -------------- |
| `definition` | `PluginHandlerDefinition` |     ✓     | No description |

## Returns

```typescript
PluginHandlerDefinition
```


# definePlugin( ) (/dev/sdk/api/definePlugin)


```typescript
function definePlugin(plugin: T): T
```

## Description

Helper function to define a plugin with full type inference.
This is an identity function that provides type checking.

## Parameters

| Name     | Type | Required? | Description    |
| -------- | ---- | :-------: | -------------- |
| `plugin` | `T`  |     ✓     | No description |

## Returns

```typescript
T
```


# deleteCache( ) (/dev/sdk/api/deleteCache)


```typescript
function deleteCache(params: { all: true } | { kvCacheKey: string; modelId?: string }): Promise
```

## Description

Deletes KV cache files.

## Parameters

| Name     | Type             |                  Required?                  | Description |                             |
| -------- | ---------------- | :-----------------------------------------: | ----------- | --------------------------- |
| `params` | \`\{ all: true } | \{ kvCacheKey: string; modelId?: string }\` | ✓           | The delete cache parameters |

## Returns

```typescript
Promise
```

## Examples

```typescript
// Delete all caches
await deleteCache({ all: true });

// Delete entire cache key (all models)
await deleteCache({ kvCacheKey: "my-session" });

// Delete only specific model within cache key
await deleteCache({ kvCacheKey: "my-session", modelId: "model-abc123" });
```


# downloadAsset( ) (/dev/sdk/api/downloadAsset)


```typescript
function downloadAsset(options: any): Promise
```

## Description

Downloads an asset (model file) without loading it into memory.

This function is specifically designed for download-only operations and
doesn't accept runtime configuration options like modelConfig or delegate.
Use this for download-only operations instead of loadModel for better semantic clarity.

## Parameters

| Name      | Type  | Required? | Description                       |
| --------- | ----- | :-------: | --------------------------------- |
| `options` | `any` |     ✓     | Download configuration including: |

* assetSrc: The location from which the asset is downloaded (local path, remote URL, or Hyperdrive URL)
* seed: Optional boolean for hyperdrive seeding
* onProgress: Optional callback for download progress |

## Returns

```typescript
Promise
```

## Examples

```typescript
// Download model without loading
const assetId = await downloadAsset({
  assetSrc: "/path/to/model.gguf",
  seed: true
});

// Download with progress tracking
const assetId = await downloadAsset({
  assetSrc: "pear://key123/model.gguf",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});
```


# embed( ) (/dev/sdk/api/embed)


```typescript
function embed(params: { modelId: string; text: string }): Promise
```

## Description

Generates embeddings for a single text using a specified model.

## Parameters

| Name     | Type                                  | Required? | Description                      |
| -------- | ------------------------------------- | :-------: | -------------------------------- |
| `params` | `\{ modelId: string; text: string \}` |     ✓     | The parameters for the embedding |

## Returns

```typescript
Promise
```


# getLogger( ) (/dev/sdk/api/getLogger)


```typescript
function getLogger(namespace: string, options?: LoggerOptions): Logger
```

## Description

No description available

## Parameters

| Name        | Type            | Required? | Description    |
| ----------- | --------------- | :-------: | -------------- |
| `namespace` | `string`        |     ✓     | No description |
| `options`   | `LoggerOptions` |     ✗     | No description |

## Returns

```typescript
Logger
```


# getModelByName( ) (/dev/sdk/api/getModelByName)


```typescript
function getModelByName(name: string): RegistryItem | —
```

## Description

No description available

## Parameters

| Name   | Type     | Required? | Description    |
| ------ | -------- | :-------: | -------------- |
| `name` | `string` |     ✓     | No description |

## Returns

```typescript
RegistryItem | —
```


# getModelByPath( ) (/dev/sdk/api/getModelByPath)


```typescript
function getModelByPath(registryPath: string): RegistryItem | —
```

## Description

No description available

## Parameters

| Name           | Type     | Required? | Description    |
| -------------- | -------- | :-------: | -------------- |
| `registryPath` | `string` |     ✓     | No description |

## Returns

```typescript
RegistryItem | —
```


# getModelBySrc( ) (/dev/sdk/api/getModelBySrc)


```typescript
function getModelBySrc(modelId: string, blobCoreKey: string): RegistryItem | —
```

## Description

No description available

## Parameters

| Name          | Type     | Required? | Description    |
| ------------- | -------- | :-------: | -------------- |
| `modelId`     | `string` |     ✓     | No description |
| `blobCoreKey` | `string` |     ✓     | No description |

## Returns

```typescript
RegistryItem | —
```


# getModelInfo( ) (/dev/sdk/api/getModelInfo)


```typescript
function getModelInfo(params: input): Promise
```

## Description

No description available

## Parameters

| Name     | Type    | Required? | Description    |
| -------- | ------- | :-------: | -------------- |
| `params` | `input` |     ✓     | No description |

## Returns

```typescript
Promise
```


# @qvac/sdk (/dev/sdk/api)


## Overview

`@qvac/sdk` npm package exposes a function-centric, typed JS API.

## Functions

| Function                                                        | Summary                                                                                           | Signature                                                                                                                                                                                          |
| --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`cancel()`](./cancel)                                          | Cancels an ongoing operation.                                                                     | `cancel(params: infer): Promise`                                                                                                                                                                   |
| [`close()`](./close)                                            | No description available                                                                          | `close(): Promise`                                                                                                                                                                                 |
| [`completion()`](./completion)                                  | Generates completion from a language model based on conversation history.                         | `completion(params: CompletionParams): { stats: Promise<any>; text: Promise<string>; tokenStream: AsyncGenerator<string>; toolCalls: Promise<any[]>; toolCallStream: AsyncGenerator<infer<any>> }` |
| [`defineHandler()`](./defineHandler)                            | Helper function to define a handler with full type inference.                                     | `defineHandler(definition: PluginHandlerDefinition): PluginHandlerDefinition`                                                                                                                      |
| [`definePlugin()`](./definePlugin)                              | Helper function to define a plugin with full type inference.                                      | `definePlugin(plugin: T): T`                                                                                                                                                                       |
| [`deleteCache()`](./deleteCache)                                | Deletes KV cache files.                                                                           | `deleteCache(params: { all: true } \| { kvCacheKey: string; modelId?: string }): Promise`                                                                                                          |
| [`downloadAsset()`](./downloadAsset)                            | Downloads an asset (model file) without loading it into memory.                                   | `downloadAsset(options: any): Promise`                                                                                                                                                             |
| [`embed()`](./embed)                                            | Generates embeddings for a single text using a specified model.                                   | `embed(params: { modelId: string; text: string }): Promise`                                                                                                                                        |
| [`getLogger()`](./getLogger)                                    | No description available                                                                          | `getLogger(namespace: string, options?: LoggerOptions): Logger`                                                                                                                                    |
| [`getModelByName()`](./getModelByName)                          | No description available                                                                          | `getModelByName(name: string): RegistryItem \| undefined`                                                                                                                                          |
| [`getModelByPath()`](./getModelByPath)                          | No description available                                                                          | `getModelByPath(registryPath: string): RegistryItem \| undefined`                                                                                                                                  |
| [`getModelBySrc()`](./getModelBySrc)                            | No description available                                                                          | `getModelBySrc(modelId: string, blobCoreKey: string): RegistryItem \| undefined`                                                                                                                   |
| [`getModelInfo()`](./getModelInfo)                              | No description available                                                                          | `getModelInfo(params: input): Promise`                                                                                                                                                             |
| [`invokePlugin()`](./invokePlugin)                              | Invoke a non-streaming plugin handler.                                                            | `invokePlugin(options: InvokePluginOptions): Promise`                                                                                                                                              |
| [`invokePluginStream()`](./invokePluginStream)                  | Invoke a streaming plugin handler.                                                                | `invokePluginStream(options: InvokePluginOptions): AsyncGenerator`                                                                                                                                 |
| [`loadModel()`](./loadModel)                                    | Loads a machine learning model from a local path, remote URL, or Hyperdrive key.                  | `loadModel(options: any): Promise`                                                                                                                                                                 |
| [`loggingStream()`](./loggingStream)                            | Opens a logging stream to receive real-time logs.                                                 | `loggingStream(params: infer): AsyncGenerator`                                                                                                                                                     |
| [`modelRegistryGetModel()`](./modelRegistryGetModel)            | No description available                                                                          | `modelRegistryGetModel(registryPath: string, registrySource: string): Promise`                                                                                                                     |
| [`modelRegistryList()`](./modelRegistryList)                    | No description available                                                                          | `modelRegistryList(): Promise`                                                                                                                                                                     |
| [`modelRegistrySearch()`](./modelRegistrySearch)                | No description available                                                                          | `modelRegistrySearch(params: ModelRegistrySearchParams): Promise`                                                                                                                                  |
| [`ocr()`](./ocr)                                                | Performs Optical Character Recognition (OCR) on an image to extract text.                         | `ocr(params: OCRClientParams): { blocks: Promise<infer<any>[]>; blockStream: AsyncGenerator<infer<any>[]>; stats: Promise<any> }`                                                                  |
| [`ping()`](./ping)                                              | Sends a ping request to the server and returns the pong response.                                 | `ping(): Promise`                                                                                                                                                                                  |
| [`ragChunk()`](./ragChunk)                                      | Chunks documents into smaller pieces for embedding.                                               | `ragChunk(params: infer): Promise`                                                                                                                                                                 |
| [`ragCloseWorkspace()`](./ragCloseWorkspace)                    | Closes a RAG workspace, releasing in-memory resources (Corestore, HyperDB adapter, RAG instance). | `ragCloseWorkspace(params?: any): Promise`                                                                                                                                                         |
| [`ragDeleteEmbeddings()`](./ragDeleteEmbeddings)                | Deletes document embeddings from the RAG vector database.                                         | `ragDeleteEmbeddings(params: infer): Promise`                                                                                                                                                      |
| [`ragDeleteWorkspace()`](./ragDeleteWorkspace)                  | Deletes a RAG workspace and all its data.                                                         | `ragDeleteWorkspace(params: infer): Promise`                                                                                                                                                       |
| [`ragIngest()`](./ragIngest)                                    | Ingests documents into the RAG vector database.                                                   | `ragIngest(params: any): Promise`                                                                                                                                                                  |
| [`ragListWorkspaces()`](./ragListWorkspaces)                    | Lists all RAG workspaces with their open status.                                                  | `ragListWorkspaces(): Promise`                                                                                                                                                                     |
| [`ragReindex()`](./ragReindex)                                  | Reindexes the RAG database to optimize search performance.                                        | `ragReindex(params: any): Promise`                                                                                                                                                                 |
| [`ragSaveEmbeddings()`](./ragSaveEmbeddings)                    | Saves pre-embedded documents to the RAG vector database.                                          | `ragSaveEmbeddings(params: any): Promise`                                                                                                                                                          |
| [`ragSearch()`](./ragSearch)                                    | Searches for similar documents in the RAG vector database.                                        | `ragSearch(params: input): Promise`                                                                                                                                                                |
| [`startQVACProvider()`](./startQVACProvider)                    | Starts a provider service that offers QVAC capabilities to remote peers.                          | `startQVACProvider(params: infer): Promise`                                                                                                                                                        |
| [`stopQVACProvider()`](./stopQVACProvider)                      | Stops a running provider service and leaves the specified topic.                                  | `stopQVACProvider(params: infer): Promise`                                                                                                                                                         |
| [`textToSpeech()`](./textToSpeech)                              | No description available                                                                          | `textToSpeech(params: infer): { buffer: Promise<number[]>; bufferStream: AsyncGenerator<number>; done: Promise<boolean> }`                                                                         |
| [`transcribe()`](./transcribe)                                  | This function provides a simple interface for transcribing audio by                               |                                                                                                                                                                                                    |
| collecting all streaming results into a single string response. | `transcribe(params: TranscribeClientParams): Promise`                                             |                                                                                                                                                                                                    |
| [`transcribeStream()`](./transcribeStream)                      | This function streams audio transcription results in real-time, yielding                          |                                                                                                                                                                                                    |
| text chunks as they become available from the model.            | `transcribeStream(params: TranscribeClientParams): AsyncGenerator`                                |                                                                                                                                                                                                    |
| [`translate()`](./translate)                                    | Translates text from one language to another using a specified translation model.                 | `translate(params: TranslateClientParams): { stats: Promise<any>; text: Promise<string>; tokenStream: AsyncGenerator<string> }`                                                                    |
| [`unloadModel()`](./unloadModel)                                | Unloads a previously loaded model from the server.                                                | `unloadModel(params: input): Promise`                                                                                                                                                              |


# invokePlugin( ) (/dev/sdk/api/invokePlugin)


```typescript
function invokePlugin(options: InvokePluginOptions): Promise
```

## Description

Invoke a non-streaming plugin handler.

## Parameters

| Name      | Type                  | Required? | Description    |
| --------- | --------------------- | :-------: | -------------- |
| `options` | `InvokePluginOptions` |     ✓     | No description |

## Returns

```typescript
Promise
```


# invokePluginStream( ) (/dev/sdk/api/invokePluginStream)


```typescript
function invokePluginStream(options: InvokePluginOptions): AsyncGenerator
```

## Description

Invoke a streaming plugin handler.

## Parameters

| Name      | Type                  | Required? | Description    |
| --------- | --------------------- | :-------: | -------------- |
| `options` | `InvokePluginOptions` |     ✓     | No description |

## Returns

```typescript
AsyncGenerator
```


# loadModel( ) (/dev/sdk/api/loadModel)


```typescript
function loadModel(options: any): Promise
```

## Description

Loads a machine learning model from a local path, remote URL, or Hyperdrive key.

This function supports multiple model types: LLM (Large Language Model), Whisper (speech recognition),
embeddings, NMT (translation), and TTS. It can handle both local file paths and Hyperdrive URLs (pear://).

When `onProgress` is provided, the function uses streaming to provide real-time download progress.
Otherwise, it uses a simple request-response pattern for faster execution.

## Parameters

| Name      | Type  | Required? | Description                                                                                    |
| --------- | ----- | :-------: | ---------------------------------------------------------------------------------------------- |
| `options` | `any` |     ✓     | An object that defines all configuration parameters required for loading the model, including: |

* modelSrc: The location from which the model weights are fetched (local path, remote URL, or Hyperdrive URL)
* modelType: The type of model ("llm", "whisper", "embeddings", "nmt", or "tts")
* modelConfig: Model-specific configuration options
* projectionModelSrc: (LLM only) Projection model source for multimodal models
* vadModelSrc: (Whisper only) VAD model source for voice activity detection
* onProgress: Callback for download progress updates
* logger: Logger instance for model operation logs |

## Returns

```typescript
Promise
```

## Examples

```typescript
// Local file path - absolute path
const localModelId = await loadModel({
  modelSrc: "/home/user/models/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { contextSize: 2048 }
});

// Local file path - relative path
const relativeModelId = await loadModel({
  modelSrc: "./models/whisper-base.gguf",
  modelType: "whisper"
});

// Hyperdrive URL with key and path
const hyperdriveId = await loadModel({
  modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { contextSize: 2048 }
});

// Remote HTTP/HTTPS URL with progress tracking
const remoteId = await loadModel({
  modelSrc: "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf",
  modelType: "llm",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});

// Multimodal model with projection
const multimodalId = await loadModel({
  modelSrc: "https://huggingface.co/.../main-model.gguf",
  modelType: "llm",
  projectionModelSrc: "https://huggingface.co/.../projection-model.gguf",
  modelConfig: { ctx_size: 512 },
  onProgress: (progress) => {
    console.log(`Loading: ${progress.percentage}%`);
  }
});

// Whisper with VAD model
const whisperId = await loadModel({
  modelSrc: "https://huggingface.co/.../whisper-model.gguf",
  modelType: "whisper",
  vadModelSrc: "https://huggingface.co/.../vad-model.bin",
  modelConfig: {
    mode: "caption",
    output_format: "plaintext",
    min_seconds: 2,
    max_seconds: 6
  }
});

// Load with automatic logging - logs from the model will be forwarded to your logger
import { getLogger } from "@/logging";
const logger = getLogger("my-app");

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
  logger // Pass logger in options
});
```


# loggingStream( ) (/dev/sdk/api/loggingStream)


```typescript
function loggingStream(params: infer): AsyncGenerator
```

## Description

Opens a logging stream to receive real-time logs.

## Parameters

| Name     | Type    | Required? | Description                          |
| -------- | ------- | :-------: | ------------------------------------ |
| `params` | `infer` |     ✓     | The arguments for the logging stream |

## Returns

```typescript
AsyncGenerator
```

## Examples

```typescript
// Open a logging stream for a model
const logStream = loggingStream({ id: 'my-model-id' });

// Or stream SDK server logs
const sdkLogs = loggingStream({ id: SDK_LOG_ID });

for await (const logMessage of logStream) {
  console.log(`[${logMessage.level}] ${logMessage.namespace}: ${logMessage.message}`);
}
```


# modelRegistryGetModel( ) (/dev/sdk/api/modelRegistryGetModel)


```typescript
function modelRegistryGetModel(registryPath: string, registrySource: string): Promise
```

## Description

No description available

## Parameters

| Name             | Type     | Required? | Description    |
| ---------------- | -------- | :-------: | -------------- |
| `registryPath`   | `string` |     ✓     | No description |
| `registrySource` | `string` |     ✓     | No description |

## Returns

```typescript
Promise
```


# modelRegistryList( ) (/dev/sdk/api/modelRegistryList)


```typescript
function modelRegistryList(): Promise
```

## Description

No description available

## Returns

```typescript
Promise
```


# modelRegistrySearch( ) (/dev/sdk/api/modelRegistrySearch)


```typescript
function modelRegistrySearch(params: ModelRegistrySearchParams): Promise
```

## Description

No description available

## Parameters

| Name     | Type                        | Required? | Description    |
| -------- | --------------------------- | :-------: | -------------- |
| `params` | `ModelRegistrySearchParams` |     ✓     | No description |

## Returns

```typescript
Promise
```


# ocr( ) (/dev/sdk/api/ocr)


```typescript
function ocr(params: OCRClientParams): { blocks: Promise<infer<any>[]>; blockStream: AsyncGenerator<infer<any>[]>; stats: Promise<any> }
```

## Description

Performs Optical Character Recognition (OCR) on an image to extract text.

## Parameters

| Name     | Type              | Required? | Description        |
| -------- | ----------------- | :-------: | ------------------ |
| `params` | `OCRClientParams` |     ✓     | The OCR parameters |

## Returns

```typescript
{ blocks: Promise<infer<any>[]>; blockStream: AsyncGenerator<infer<any>[]>; stats: Promise<any> }
```

## Examples

```typescript
// Non-streaming mode (default) - get all blocks at once
const { blocks } = ocr({ modelId, image: "/path/to/image.png" });
for (const block of await blocks) {
  console.log(block.text, block.bbox, block.confidence);
}

// Streaming mode - process blocks as they arrive
const { blockStream } = ocr({ modelId, image: imageBuffer, stream: true });
for await (const blocks of blockStream) {
  console.log("Detected:", blocks);
}
```


# ping( ) (/dev/sdk/api/ping)


```typescript
function ping(): Promise
```

## Description

Sends a ping request to the server and returns the pong response.

## Returns

```typescript
Promise
```


# ragChunk( ) (/dev/sdk/api/ragChunk)


```typescript
function ragChunk(params: infer): Promise
```

## Description

Chunks documents into smaller pieces for embedding.
Part of the segregated flow: ragChunk() → embed() → ragSaveEmbeddings()

## Parameters

| Name     | Type    | Required? | Description                 |
| -------- | ------- | :-------: | --------------------------- |
| `params` | `infer` |     ✓     | The parameters for chunking |

## Returns

```typescript
Promise
```

## Examples

```typescript
const chunks = await ragChunk({
  documents: ["Long document text here..."],
  chunkOpts: {
    chunkSize: 256,
    chunkOverlap: 50,
    chunkStrategy: "paragraph",
  },
});
```


# ragCloseWorkspace( ) (/dev/sdk/api/ragCloseWorkspace)


```typescript
function ragCloseWorkspace(params?: any): Promise
```

## Description

Closes a RAG workspace, releasing in-memory resources (Corestore, HyperDB adapter, RAG instance).

**Workspace lifecycle:** Workspaces are implicitly opened.
This function explicitly closes them, releasing memory and file locks. The workspace data
remains on disk unless `deleteOnClose` is set to true.

## Parameters

| Name     | Type  | Required? | Description                |
| -------- | ----- | :-------: | -------------------------- |
| `params` | `any` |     ✗     | The parameters for closing |

## Returns

```typescript
Promise
```

## Examples

```typescript
// Close a specific workspace
await ragCloseWorkspace({ workspace: "my-docs" });

// Close and delete in one call
await ragCloseWorkspace({ workspace: "my-docs", deleteOnClose: true });
```


# ragDeleteEmbeddings( ) (/dev/sdk/api/ragDeleteEmbeddings)


```typescript
function ragDeleteEmbeddings(params: infer): Promise
```

## Description

Deletes document embeddings from the RAG vector database.

**Workspace lifecycle:** This operation requires an existing workspace.

## Parameters

| Name     | Type    | Required? | Description                            |
| -------- | ------- | :-------: | -------------------------------------- |
| `params` | `infer` |     ✓     | The parameters for deleting embeddings |

## Returns

```typescript
Promise
```

## Examples

```typescript
await ragDeleteEmbeddings({
  ids: ["doc-1", "doc-2"],
  workspace: "my-docs",
});
```


# ragDeleteWorkspace( ) (/dev/sdk/api/ragDeleteWorkspace)


```typescript
function ragDeleteWorkspace(params: infer): Promise
```

## Description

Deletes a RAG workspace and all its data.
The workspace must not be currently loaded/in-use.

## Parameters

| Name     | Type    | Required? | Description                 |
| -------- | ------- | :-------: | --------------------------- |
| `params` | `infer` |     ✓     | The parameters for deletion |

## Returns

```typescript
Promise
```

## Examples

```typescript
await ragDeleteWorkspace({ workspace: "my-docs" });
```


# ragIngest( ) (/dev/sdk/api/ragIngest)


```typescript
function ragIngest(params: any): Promise
```

## Description

Ingests documents into the RAG vector database.
Full pipeline: chunk → embed → save

**Workspace lifecycle:** This operation implicitly opens (or creates) the workspace.
The workspace remains open until closed.

## Parameters

| Name     | Type  | Required? | Description                  |
| -------- | ----- | :-------: | ---------------------------- |
| `params` | `any` |     ✓     | The parameters for ingestion |

## Returns

```typescript
Promise
```

## Examples

```typescript
// Simple ingest
const result = await ragIngest({
  modelId,
  documents: ["Document 1", "Document 2"],
});

// With progress tracking
const result = await ragIngest({
  modelId,
  documents: ["Document 1", "Document 2"],
  workspace: "my-docs",
  onProgress: (stage, current, total) => {
    console.log(`[${stage}] ${current}/${total}`);
  },
});
```


# ragListWorkspaces( ) (/dev/sdk/api/ragListWorkspaces)


```typescript
function ragListWorkspaces(): Promise
```

## Description

Lists all RAG workspaces with their open status.

Returns all workspaces that exist on disk. The `open` field indicates whether
the workspace is currently loaded in memory and holding active resources
(Corestore, HyperDB adapter, and possibly a RAG instance).

## Returns

```typescript
Promise
```

## Examples

```typescript
const workspaces = await ragListWorkspaces();
// [{ name: "default", open: true }, { name: "my-docs", open: false }]
```


# ragReindex( ) (/dev/sdk/api/ragReindex)


```typescript
function ragReindex(params: any): Promise
```

## Description

Reindexes the RAG database to optimize search performance.
For HyperDB, this rebalances centroids using k-means clustering.

**Workspace lifecycle:** This operation requires an existing workspace.

**Note:** Reindex requires a minimum number of documents to perform clustering.
For HyperDB, this is 16 documents by default. If there are insufficient documents,
`reindexed` will be `false` with `details` explaining the reason.

## Parameters

| Name     | Type  | Required? | Description                   |
| -------- | ----- | :-------: | ----------------------------- |
| `params` | `any` |     ✓     | The parameters for reindexing |

## Returns

```typescript
Promise
```

## Examples

```typescript
// Simple reindex
const result = await ragReindex({
  workspace: "my-docs",
});

// Check result
if (!result.reindexed) {
  console.log("Reindex skipped:", result.details?.reason);
}

// With progress tracking
const result = await ragReindex({
  workspace: "my-docs",
  onProgress: (stage, current, total) => {
    console.log(`[${stage}] ${current}/${total}`);
  },
});
```


# ragSaveEmbeddings( ) (/dev/sdk/api/ragSaveEmbeddings)


```typescript
function ragSaveEmbeddings(params: any): Promise
```

## Description

Saves pre-embedded documents to the RAG vector database.
Part of the segregated flow: chunk() → embed() → saveEmbeddings()

**Workspace lifecycle:** This operation implicitly opens (or creates) the workspace.
The workspace remains open until closed.

## Parameters

| Name     | Type  | Required? | Description               |
| -------- | ----- | :-------: | ------------------------- |
| `params` | `any` |     ✓     | The parameters for saving |

## Returns

```typescript
Promise
```

## Examples

```typescript
// Segregated flow
const chunks = await ragChunk({ documents: ["text1", "text2"] });
const embeddings = await embed({ modelId, text: chunks.map(c => c.content) });
const embeddedDocs = chunks.map((chunk, i) => ({
  ...chunk,
  embedding: embeddings[i],
  embeddingModelId: modelId,
}));
const result = await ragSaveEmbeddings({
  documents: embeddedDocs,
  workspace: "my-workspace",
});
```


# ragSearch( ) (/dev/sdk/api/ragSearch)


```typescript
function ragSearch(params: input): Promise
```

## Description

Searches for similar documents in the RAG vector database.

**Workspace lifecycle:** This operation requires an existing workspace. If the workspace
doesn't exist, returns an empty array.

## Parameters

| Name     | Type    | Required? | Description                  |
| -------- | ------- | :-------: | ---------------------------- |
| `params` | `input` |     ✓     | The parameters for searching |

## Returns

```typescript
Promise
```

## Examples

```typescript
const results = await ragSearch({
  modelId,
  query: "AI and machine learning",
  topK: 5,
  workspace: "my-docs",
});
```


# startQVACProvider( ) (/dev/sdk/api/startQVACProvider)


```typescript
function startQVACProvider(params: infer): Promise
```

## Description

Starts a provider service that offers QVAC capabilities to remote peers.
The provider's keypair can be controlled via the seed option or QVAC\_HYPERSWARM\_SEED environment variable.

## Parameters

| Name     | Type    | Required? | Description    |
| -------- | ------- | :-------: | -------------- |
| `params` | `infer` |     ✓     | No description |

## Returns

```typescript
Promise
```


# stopQVACProvider( ) (/dev/sdk/api/stopQVACProvider)


```typescript
function stopQVACProvider(params: infer): Promise
```

## Description

Stops a running provider service and leaves the specified topic.

## Parameters

| Name     | Type    | Required? | Description    |
| -------- | ------- | :-------: | -------------- |
| `params` | `infer` |     ✓     | No description |

## Returns

```typescript
Promise
```


# textToSpeech( ) (/dev/sdk/api/textToSpeech)


```typescript
function textToSpeech(params: infer): { buffer: Promise<number[]>; bufferStream: AsyncGenerator<number>; done: Promise<boolean> }
```

## Description

No description available

## Parameters

| Name     | Type    | Required? | Description    |
| -------- | ------- | :-------: | -------------- |
| `params` | `infer` |     ✓     | No description |

## Returns

```typescript
{ buffer: Promise<number[]>; bufferStream: AsyncGenerator<number>; done: Promise<boolean> }
```


# transcribe( ) (/dev/sdk/api/transcribe)


```typescript
function transcribe(params: TranscribeClientParams): Promise
```

## Description

This function provides a simple interface for transcribing audio by
collecting all streaming results into a single string response.

## Parameters

| Name     | Type                     | Required? | Description                         |
| -------- | ------------------------ | :-------: | ----------------------------------- |
| `params` | `TranscribeClientParams` |     ✓     | The arguments for the transcription |

## Returns

```typescript
Promise
```


# transcribeStream( ) (/dev/sdk/api/transcribeStream)


```typescript
function transcribeStream(params: TranscribeClientParams): AsyncGenerator
```

## Description

This function streams audio transcription results in real-time, yielding
text chunks as they become available from the model.

## Parameters

| Name     | Type                     | Required? | Description                         |
| -------- | ------------------------ | :-------: | ----------------------------------- |
| `params` | `TranscribeClientParams` |     ✓     | The arguments for the transcription |

## Returns

```typescript
AsyncGenerator
```


# translate( ) (/dev/sdk/api/translate)


```typescript
function translate(params: TranslateClientParams): { stats: Promise<any>; text: Promise<string>; tokenStream: AsyncGenerator<string> }
```

## Description

Translates text from one language to another using a specified translation model.
Supports both NMT (Neural Machine Translation) and LLM models.

## Parameters

| Name     | Type                    | Required? | Description                      |
| -------- | ----------------------- | :-------: | -------------------------------- |
| `params` | `TranslateClientParams` |     ✓     | Translation configuration object |

## Returns

```typescript
{ stats: Promise<any>; text: Promise<string>; tokenStream: AsyncGenerator<string> }
```

## Examples

```typescript
// Streaming mode (default)
const result = translate({
  modelId: "modelId",
  text: "Hello world",
  from: "en",
  to: "es"
  modelType: "llm",
});

for await (const token of result.tokenStream) {
  console.log(token);
}

// Non-streaming mode
const response = translate({
  modelId: "modelId",
  text: "Hello world",
  from: "en",
  to: "es"
  modelType: "llm",
  stream: false,
});

console.log(await response.text);
```


# unloadModel( ) (/dev/sdk/api/unloadModel)


```typescript
function unloadModel(params: input): Promise
```

## Description

Unloads a previously loaded model from the server.

When the last model is unloaded (no more models remain), this function
automatically closes the RPC connection, allowing the process to exit
naturally without requiring manual cleanup.

## Parameters

| Name     | Type    | Required? | Description                            |
| -------- | ------- | :-------: | -------------------------------------- |
| `params` | `input` |     ✓     | The parameters for unloading the model |

## Returns

```typescript
Promise
```


# cancel( ) (/v0.7.0/sdk/api/cancel)


```ts
function cancel(params): Promise<void>;
```

## Parameters

| Name   | Type                            | Required? | Description                         |
| ------ | ------------------------------- | :-------: | ----------------------------------- |
| params | [`CancelParams`](#cancelparams) |     ✓     | The parameters for the cancellation |

### `CancelParams`

Discriminated union on `operation`. One of:

#### Cancel inference

| Field     | Type          | Required? | Description                          |
| --------- | ------------- | :-------: | ------------------------------------ |
| operation | `"inference"` |     ✓     | Operation type                       |
| modelId   | `string`      |     ✓     | The model ID to cancel inference for |

#### Cancel download

| Field       | Type              | Required? | Default | Description                                |
| ----------- | ----------------- | :-------: | ------- | ------------------------------------------ |
| operation   | `"downloadAsset"` |     ✓     | —       | Operation type                             |
| downloadKey | `string`          |     ✓     | —       | The download key to cancel                 |
| clearCache  | `boolean`         |     ✗     | `false` | If true, deletes the partial download file |

#### Cancel RAG

| Field     | Type     | Required? | Default     | Description                 |
| --------- | -------- | :-------: | ----------- | --------------------------- |
| operation | `"rag"`  |     ✓     | —           | Operation type              |
| workspace | `string` |     ✗     | `"default"` | The RAG workspace to cancel |

## Returns

`Promise<void>` — Resolves when the operation is cancelled.

## Throws

| Error                   | When                                             |
| ----------------------- | ------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"cancel"` |
| `CANCEL_FAILED`         | The server reports cancellation failure          |

## Examples

```ts
// Cancel inference
await cancel({ operation: "inference", modelId: "model-123" });
```

```ts
// Pause download (preserves partial file for automatic resume)
await cancel({ operation: "downloadAsset", downloadKey: "download-key" });
```

```ts
// Cancel download completely (deletes partial file)
await cancel({ operation: "downloadAsset", downloadKey: "download-key", clearCache: true });
```

```ts
// Cancel RAG operation on default workspace
await cancel({ operation: "rag" });
```

```ts
// Cancel RAG operation on specific workspace
await cancel({ operation: "rag", workspace: "my-workspace" });
```


# close( ) (/v0.7.0/sdk/api/close)


```ts
function close(): Promise<void>;
```

## Parameters

None.

## Returns

`Promise<void>` — Resolves when the connection is closed. If no connection exists, resolves immediately.


# completion( ) (/v0.7.0/sdk/api/completion)


```ts
function completion(params): {
  tokenStream: AsyncGenerator<string>;
  toolCallStream: AsyncGenerator<ToolCallEvent>;
  text: Promise<string>;
  toolCalls: Promise<ToolCallWithCall[]>;
  stats: Promise<CompletionStats | undefined>;
};
```

## Parameters

| Name   | Type                                    | Required? | Description               |
| ------ | --------------------------------------- | :-------: | ------------------------- |
| params | [`CompletionParams`](#completionparams) |     ✓     | The completion parameters |

### `CompletionParams`

| Field   | Type                                               | Required? | Default | Description                                                         |
| ------- | -------------------------------------------------- | :-------: | ------- | ------------------------------------------------------------------- |
| modelId | `string`                                           |     ✓     | —       | The identifier of the model to use for completion                   |
| history | [`HistoryMessage[]`](#historymessage)              |     ✓     | —       | Array of conversation messages                                      |
| stream  | `boolean`                                          |     ✗     | `true`  | Whether to stream tokens or return complete response                |
| tools   | [`Tool[]`](#tool) `\|` [`ToolInput[]`](#toolinput) |     ✗     | —       | Optional array of tools (Zod-schema ToolInput or full Tool objects) |
| mcp     | [`McpClientInput[]`](#mcpclientinput)              |     ✗     | —       | Optional array of MCP client inputs for tool integration            |
| kvCache | `boolean \| string`                                |     ✗     | —       | KV cache configuration — see [kvCache](#kvcache)                    |

#### `HistoryMessage`

| Field       | Type                          | Required? | Description                                              |
| ----------- | ----------------------------- | :-------: | -------------------------------------------------------- |
| role        | `string`                      |     ✓     | Message role (e.g., `"user"`, `"assistant"`, `"system"`) |
| content     | `string`                      |     ✓     | Message content                                          |
| attachments | [`Attachment[]`](#attachment) |     ✗     | Optional file attachments for multimodal models          |

##### `Attachment`

| Field | Type     | Required? | Description                 |
| ----- | -------- | :-------: | --------------------------- |
| path  | `string` |     ✓     | File path to the attachment |

#### `Tool`

Full tool definition in JSON Schema format (sent to the model):

| Field                 | Type                                            | Required? | Description                                    |
| --------------------- | ----------------------------------------------- | :-------: | ---------------------------------------------- |
| type                  | `"function"`                                    |     ✓     | Always `"function"`                            |
| name                  | `string`                                        |     ✓     | Tool name                                      |
| description           | `string`                                        |     ✓     | What the tool does                             |
| parameters            | `object`                                        |     ✓     | JSON Schema object describing the tool's input |
| parameters.type       | `"object"`                                      |     ✓     | Always `"object"`                              |
| parameters.properties | `Record<string, { type, description?, enum? }>` |     ✓     | Parameter definitions                          |
| parameters.required   | `string[]`                                      |     ✗     | Required parameter names                       |

#### `ToolInput`

Simplified tool definition using Zod schemas (auto-converted to `Tool` internally):

| Field       | Type                                                  | Required? | Description                                                                                        |
| ----------- | ----------------------------------------------------- | :-------: | -------------------------------------------------------------------------------------------------- |
| name        | `string`                                              |     ✓     | Tool name                                                                                          |
| description | `string`                                              |     ✓     | What the tool does                                                                                 |
| parameters  | `ZodObject`                                           |     ✓     | Zod schema describing the tool's input                                                             |
| handler     | `(args: Record<string, unknown>) => Promise<unknown>` |     ✗     | Handler function — when provided, returned `ToolCallWithCall` objects include an `invoke()` method |

#### `McpClientInput`

MCP (Model Context Protocol) client input for tool integration:

| Field            | Type                      | Required? | Description                      |
| ---------------- | ------------------------- | :-------: | -------------------------------- |
| client           | [`McpClient`](#mcpclient) |     ✓     | An MCP client instance           |
| includeResources | `boolean`                 |     ✗     | Whether to include MCP resources |

##### `McpClient`

Duck-typed MCP client interface:

| Method        | Signature                                                     | Required? | Description               |
| ------------- | ------------------------------------------------------------- | :-------: | ------------------------- |
| listTools     | `() => Promise<{ tools: McpTool[] }>`                         |     ✓     | Lists available tools     |
| callTool      | `(params: { name, arguments }) => Promise<McpToolCallResult>` |     ✓     | Calls a tool by name      |
| listResources | `() => Promise<{ resources: McpResource[] }>`                 |     ✗     | Lists available resources |
| readResource  | `(params: { uri }) => Promise<{ contents: unknown[] }>`       |     ✗     | Reads a resource by URI   |

#### `kvCache`

Cache files are organized hierarchically: `{kvCacheKey}/{modelId}/{configHash}.bin`

The `configHash` includes model config + system prompt to ensure cache isolation.

| Value                 | Behavior                                                       |
| --------------------- | -------------------------------------------------------------- |
| `true`                | Auto-generate cache key based on conversation history          |
| `"custom-key"`        | Use provided string as cache key for manual session management |
| `false` / `undefined` | No caching                                                     |

When cache exists, only the last message is sent to the model (includes multimodal attachments). Use [`deleteCache()`](../deleteCache) to remove cached sessions.

## Returns

`object` — Object with the following fields:

| Field          | Type                                                            | Description                                                                          |
| -------------- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| tokenStream    | `AsyncGenerator<string>`                                        | Stream of generated tokens                                                           |
| toolCallStream | `AsyncGenerator<`[`ToolCallEvent`](#toolcallevent)`>`           | Stream of tool call events                                                           |
| text           | `Promise<string>`                                               | Complete generated text (resolves after stream ends)                                 |
| toolCalls      | `Promise<`[`ToolCallWithCall[]`](#toolcallwithcall)`>`          | Tool calls made during completion (may include `invoke()` when handler is available) |
| stats          | `Promise<`[`CompletionStats`](#completionstats) `\| undefined>` | Performance statistics                                                               |

### `CompletionStats`

| Field            | Type     | Description                         |
| ---------------- | -------- | ----------------------------------- |
| timeToFirstToken | `number` | Time to first token in milliseconds |
| tokensPerSecond  | `number` | Tokens generated per second         |
| cacheTokens      | `number` | Number of cached tokens             |

### `ToolCallEvent`

Discriminated union on `type`. One of:

**Tool call:**

| Field          | Type                      | Description              |
| -------------- | ------------------------- | ------------------------ |
| type           | `"toolCall"`              | Event type               |
| call.id        | `string`                  | Call identifier          |
| call.name      | `string`                  | Tool name                |
| call.arguments | `Record<string, unknown>` | Tool arguments           |
| call.raw       | `string`                  | Raw call text (optional) |

**Tool call error:**

| Field         | Type                                                    | Description         |
| ------------- | ------------------------------------------------------- | ------------------- |
| type          | `"toolCallError"`                                       | Event type          |
| error.code    | `"PARSE_ERROR" \| "VALIDATION_ERROR" \| "UNKNOWN_TOOL"` | Error code          |
| error.message | `string`                                                | Error message       |
| error.raw     | `string`                                                | Raw text (optional) |

### `ToolCallWithCall`

Extends `ToolCall` with an optional `invoke()` method:

| Field     | Type                      | Description                                                              |
| --------- | ------------------------- | ------------------------------------------------------------------------ |
| id        | `string`                  | Call identifier                                                          |
| name      | `string`                  | Tool name                                                                |
| arguments | `Record<string, unknown>` | Tool arguments                                                           |
| raw       | `string`                  | Raw call text (optional)                                                 |
| invoke    | `() => Promise<unknown>`  | Executes the tool handler (present when a matching handler was provided) |

## Throws

| Error                 | When                         |
| --------------------- | ---------------------------- |
| `INVALID_TOOLS_ARRAY` | Invalid tools array provided |
| `INVALID_TOOL_SCHEMA` | A tool has an invalid schema |

## Example

```typescript
import { z } from "zod";

const result = completion({
  modelId: "llama-2",
  history: [
    { role: "user", content: "What's the weather in Tokyo?" }
  ],
  stream: true,
  tools: [{
    name: "get_weather",
    description: "Get current weather",
    parameters: z.object({
      city: z.string().describe("City name"),
    }),
    handler: async (args) => {
      return { temperature: 22, condition: "sunny" };
    }
  }]
});

for await (const token of result.tokenStream) {
  process.stdout.write(token);
}

for (const toolCall of await result.toolCalls) {
  if (toolCall.invoke) {
    const toolResult = await toolCall.invoke();
    console.log(toolResult);
  }
}
```


# defineHandler( ) (/v0.7.0/sdk/api/defineHandler)


```ts
function defineHandler<TRequest extends ZodType, TResponse extends ZodType>(
  definition: PluginHandlerDefinition<TRequest, TResponse>
): PluginHandlerDefinition<TRequest, TResponse>;
```

## Parameters

| Name       | Type                                                  | Required? | Description                                              |
| ---------- | ----------------------------------------------------- | :-------: | -------------------------------------------------------- |
| definition | [`PluginHandlerDefinition`](#pluginhandlerdefinition) |     ✓     | The handler definition with schemas and handler function |

### `PluginHandlerDefinition`

| Field          | Type                                                         | Required? | Description                                                                   |
| -------------- | ------------------------------------------------------------ | :-------: | ----------------------------------------------------------------------------- |
| requestSchema  | `ZodType`                                                    |     ✓     | Zod schema for validating incoming requests                                   |
| responseSchema | `ZodType`                                                    |     ✓     | Zod schema for validating outgoing responses                                  |
| streaming      | `boolean`                                                    |     ✓     | Whether this handler uses streaming responses                                 |
| handler        | `(request) => Promise<response> \| AsyncGenerator<response>` |     ✓     | The handler function — receives validated request, returns validated response |

## Returns

`PluginHandlerDefinition<TRequest, TResponse>` — The same definition object, with full type inference applied. This is an identity function used for type checking.


# definePlugin( ) (/v0.7.0/sdk/api/definePlugin)


```ts
function definePlugin<T extends QvacPlugin>(plugin: T): T;
```

## Parameters

| Name   | Type                        | Required? | Description           |
| ------ | --------------------------- | :-------: | --------------------- |
| plugin | [`QvacPlugin`](#qvacplugin) |     ✓     | The plugin definition |

### `QvacPlugin`

| Field        | Type                                                                                                 | Required? | Description                                              |
| ------------ | ---------------------------------------------------------------------------------------------------- | :-------: | -------------------------------------------------------- |
| modelType    | `string`                                                                                             |     ✓     | Unique identifier for the model type this plugin handles |
| displayName  | `string`                                                                                             |     ✓     | Human-readable name for the plugin                       |
| addonPackage | `string`                                                                                             |     ✓     | The npm package name of the addon this plugin wraps      |
| createModel  | `(params:` [`CreateModelParams`](#createmodelparams)`) =>` [`PluginModelResult`](#pluginmodelresult) |     ✓     | Factory function that creates a model instance           |
| handlers     | `Record<string,` [`PluginHandlerDefinition`](../defineHandler#pluginhandlerdefinition)`>`            |     ✓     | Map of handler names to handler definitions              |
| logging      | [`PluginLogging`](#pluginlogging)                                                                    |     ✗     | Optional logging configuration                           |

### `CreateModelParams`

| Field       | Type                      | Required? | Description                                                                               |
| ----------- | ------------------------- | :-------: | ----------------------------------------------------------------------------------------- |
| modelId     | `string`                  |     ✓     | The model identifier                                                                      |
| modelPath   | `string`                  |     ✓     | Path to the model file                                                                    |
| modelConfig | `Record<string, unknown>` |     ✗     | Model-specific configuration                                                              |
| modelName   | `string`                  |     ✗     | Human-readable model name                                                                 |
| artifacts   | `Record<string, string>`  |     ✗     | Additional file paths (e.g., `projectionModelPath`, `vadModelPath`, `ttsConfigModelPath`) |

### `PluginModelResult`

| Field  | Type                          | Description                      |
| ------ | ----------------------------- | -------------------------------- |
| model  | [`PluginModel`](#pluginmodel) | The model instance               |
| loader | `unknown`                     | Engine-specific loader reference |

### `PluginModel`

| Method | Signature                            | Required? | Description                 |
| ------ | ------------------------------------ | :-------: | --------------------------- |
| load   | `(force?: boolean) => Promise<void>` |     ✓     | Loads the model into memory |
| unload | `() => void \| Promise<void>`        |     ✗     | Releases model resources    |

### `PluginLogging`

| Field     | Type      | Required? | Description              |
| --------- | --------- | :-------: | ------------------------ |
| module    | `unknown` |     ✓     | The addon logging module |
| namespace | `string`  |     ✓     | Logger namespace         |

## Returns

`T` — The same plugin object, with full type inference applied. This is an identity function used for type checking.

## Note

See the [Write a custom plugin](/sdk/examples/utilities/write-custom-plugin) guide for a full walkthrough.


# deleteCache( ) (/v0.7.0/sdk/api/deleteCache)


```ts
function deleteCache(params): Promise<{ success: boolean }>;
```

## Parameters

| Name   | Type                                      | Required? | Description                 |
| ------ | ----------------------------------------- | :-------: | --------------------------- |
| params | [`DeleteCacheParams`](#deletecacheparams) |     ✓     | The delete cache parameters |

### `DeleteCacheParams`

Union type. One of:

#### Delete all caches

| Field | Type   | Required? | Description             |
| ----- | ------ | :-------: | ----------------------- |
| all   | `true` |     ✓     | Deletes all cache files |

#### Delete by cache key

| Field      | Type     | Required? | Description                                                                                      |
| ---------- | -------- | :-------: | ------------------------------------------------------------------------------------------------ |
| kvCacheKey | `string` |     ✓     | The cache key to delete                                                                          |
| modelId    | `string` |     ✗     | Specific model ID to delete within the cache key. If not provided, deletes the entire cache key. |

## Returns

`Promise<{ success: boolean }>` — Resolves with the success status.

## Throws

| Error                         | When                                        |
| ----------------------------- | ------------------------------------------- |
| `INVALID_DELETE_CACHE_PARAMS` | Neither `all` nor `kvCacheKey` was provided |
| `DELETE_CACHE_FAILED`         | The server reports cache deletion failure   |

## Examples

```typescript
// Delete all caches
await deleteCache({ all: true });

// Delete entire cache key (all models)
await deleteCache({ kvCacheKey: "my-session" });

// Delete only specific model within cache key
await deleteCache({ kvCacheKey: "my-session", modelId: "model-abc123" });
```


# downloadAsset( ) (/v0.7.0/sdk/api/downloadAsset)


```ts
function downloadAsset(options): Promise<string>;
```

## Parameters

| Name    | Type                                            | Required? | Description            |
| ------- | ----------------------------------------------- | :-------: | ---------------------- |
| options | [`DownloadAssetOptions`](#downloadassetoptions) |     ✓     | Download configuration |

### `DownloadAssetOptions`

| Field      | Type                                                                              | Required? | Description                                                                                           |
| ---------- | --------------------------------------------------------------------------------- | :-------: | ----------------------------------------------------------------------------------------------------- |
| assetSrc   | `string`                                                                          |     ✓     | The location from which the asset is downloaded (local path, remote URL, or Hyperdrive `pear://` URL) |
| seed       | `boolean`                                                                         |     ✗     | Whether to seed the asset on Hyperdrive after download                                                |
| onProgress | `(progress:` [`ModelProgressUpdate`](../loadModel#modelprogressupdate)`) => void` |     ✗     | Callback for real-time download progress updates. When provided, streaming is used.                   |

## Returns

`Promise<string>` — Resolves to the asset ID (either the provided `assetSrc` or a generated ID).

## Throws

| Error                           | When                                                              |
| ------------------------------- | ----------------------------------------------------------------- |
| `DOWNLOAD_ASSET_FAILED`         | The download operation fails                                      |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends without a final response (when using `onProgress`) |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"downloadAsset"`           |

## Example

```typescript
// Download model without loading
const assetId = await downloadAsset({
  assetSrc: "/path/to/model.gguf",
  seed: true
});

// Download with progress tracking
const assetId = await downloadAsset({
  assetSrc: "pear://key123/model.gguf",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});
```


# embed( ) (/v0.7.0/sdk/api/embed)


```ts
function embed(params: { modelId: string; text: string }): Promise<number[]>;
function embed(params: { modelId: string; text: string[] }): Promise<number[][]>;
```

## Parameters

| Name   | Type                          | Required? | Description              |
| ------ | ----------------------------- | :-------: | ------------------------ |
| params | [`EmbedParams`](#embedparams) |     ✓     | The embedding parameters |

### `EmbedParams`

| Field   | Type                 | Required? | Description                                                                                    |
| ------- | -------------------- | :-------: | ---------------------------------------------------------------------------------------------- |
| modelId | `string`             |     ✓     | The identifier of the embedding model to use                                                   |
| text    | `string \| string[]` |     ✓     | The input text(s) to embed. A single string returns `number[]`; an array returns `number[][]`. |

## Returns

* `Promise<number[]>` — When `text` is a single string, returns the embedding vector.
* `Promise<number[][]>` — When `text` is an array, returns an array of embedding vectors.

## Throws

| Error                   | When                                            |
| ----------------------- | ----------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"embed"` |

## Examples

```typescript
// Single text
const vector = await embed({ modelId: "embedding-model", text: "Hello world" });
console.log(vector.length); // e.g. 384

// Multiple texts (batch)
const vectors = await embed({
  modelId: "embedding-model",
  text: ["Hello world", "How are you?"]
});
console.log(vectors.length); // 2
```


# Errors (/v0.7.0/sdk/api/errors)


## Example

```ts
import { SDK_CLIENT_ERROR_CODES, SDK_SERVER_ERROR_CODES } from "@qvac/sdk";

try {
  await loadModel({ modelSrc: "/path/to/model.gguf", modelType: "llm" });
} catch (error) {
  if (error.code === SDK_SERVER_ERROR_CODES.MODEL_LOAD_FAILED) {
    // handle model load failure
  }
}
```

## Client errors

Thrown on the client side (response validation, RPC, provider). Access via `SDK_CLIENT_ERROR_CODES.{ERROR_NAME}`.

| Error                           | Code  | Summary                                                     | Thrown by                                                                                                                                                                                                                           |
| ------------------------------- | ----- | ----------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE`         | 50001 | Invalid response type received.                             | `cancel()`, `downloadAsset()`, `embed()`, `getModelInfo()`, `loadModel()`, `loggingStream()`, `ping()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`, `startQVACProvider()`, `stopQVACProvider()`, `unloadModel()` |
| `INVALID_OPERATION_IN_RESPONSE` | 50002 | Response operation didn't match the expected RAG operation. | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                       |
| `STREAM_ENDED_WITHOUT_RESPONSE` | 50003 | Streaming RPC ended without a final response.               | `downloadAsset()`, `loadModel()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                     |
| `INVALID_AUDIO_CHUNK_TYPE`      | 50004 | Invalid audio chunk input type provided.                    | `transcribe()`, `transcribeStream()`                                                                                                                                                                                                |
| `INVALID_TOOLS_ARRAY`           | 50005 | Invalid tools array provided.                               | `completion()`                                                                                                                                                                                                                      |
| `INVALID_TOOL_SCHEMA`           | 50006 | Invalid tool schema provided.                               | `completion()`                                                                                                                                                                                                                      |
| `OCR_FAILED`                    | 50007 | OCR operation failed.                                       | `ocr()`                                                                                                                                                                                                                             |
| `RPC_CONNECTION_FAILED`         | 50203 | RPC connection failed.                                      | Any API call (startup/transport)                                                                                                                                                                                                    |
| `PROVIDER_START_FAILED`         | 50400 | Failed to start provider.                                   | `startQVACProvider()`                                                                                                                                                                                                               |
| `PROVIDER_STOP_FAILED`          | 50401 | Failed to stop provider.                                    | `stopQVACProvider()`                                                                                                                                                                                                                |
| `DELEGATE_NO_FINAL_RESPONSE`    | 50402 | No final response received from delegated provider.         | `loadModel()`, `completion()`                                                                                                                                                                                                       |
| `DELEGATE_CONNECTION_FAILED`    | 50404 | Failed to connect to delegated provider.                    | `loadModel()`, `completion()`                                                                                                                                                                                                       |
| `CONFIG_FILE_PARSE_FAILED`      | 50604 | Failed to import/parse a `qvac.config.*` file.              | Any API call (during SDK initialization)                                                                                                                                                                                            |
| `CONFIG_VALIDATION_FAILED`      | 50605 | Config validation failed (schema mismatch / invalid JSON).  | Any API call (during SDK initialization)                                                                                                                                                                                            |

## Server errors

Thrown by the server (model operations, downloads, cache, RAG). Access via `SDK_SERVER_ERROR_CODES.{ERROR_NAME}`.

| Error                          | Code  | Summary                                                        | Thrown by                                                                                                                                                                                                                      |
| ------------------------------ | ----- | -------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `MODEL_ALREADY_REGISTERED`     | 52001 | Model with the same ID is already registered.                  | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_NOT_FOUND`              | 52002 | Model ID not found in the registry.                            | `completion()`, `embed()`, `getModelInfo()`, `loggingStream()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`, `unloadModel()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()` |
| `MODEL_NOT_LOADED`             | 52003 | Model exists but is not loaded.                                | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`                                                                                                                      |
| `MODEL_IS_DELEGATED`           | 52004 | Model is delegated and cannot be accessed as a local model.    | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`                                                                                                                      |
| `UNKNOWN_MODEL_TYPE`           | 52005 | Unknown `modelType` in `loadModel()`.                          | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_LOAD_FAILED`            | 52200 | Failed to load model (generic).                                | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_NOT_FOUND`         | 52201 | Model file not found at given path.                            | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_NOT_FOUND_IN_DIR`  | 52202 | Expected model file not found in directory (e.g., by type).    | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_LOCATE_FAILED`     | 52203 | Failed to locate model file for the given `modelType`.         | `loadModel()`                                                                                                                                                                                                                  |
| `PROJECTION_MODEL_REQUIRED`    | 52204 | Projection model source is required for multimodal LLM models. | `loadModel()`                                                                                                                                                                                                                  |
| `VAD_MODEL_REQUIRED`           | 52205 | VAD model source is required for this configuration.           | `loadModel()`                                                                                                                                                                                                                  |
| `TTS_CONFIG_MODEL_REQUIRED`    | 52206 | `ttsConfigModelPath` is required for TTS models.               | `loadModel()`                                                                                                                                                                                                                  |
| `ESPEAK_DATA_PATH_REQUIRED`    | 52207 | `eSpeakDataPath` is required for TTS models.                   | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_UNLOAD_FAILED`          | 52400 | Failed to unload model.                                        | `unloadModel()`                                                                                                                                                                                                                |
| `EMBED_FAILED`                 | 52401 | Failed to generate embeddings.                                 | `embed()`                                                                                                                                                                                                                      |
| `EMBED_NO_EMBEDDINGS`          | 52402 | No embeddings returned from model.                             | `embed()`                                                                                                                                                                                                                      |
| `TRANSCRIPTION_FAILED`         | 52403 | Transcription failed.                                          | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `AUDIO_FILE_NOT_FOUND`         | 52404 | Audio file not found or not accessible.                        | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `TRANSLATION_FAILED`           | 52405 | Translation failed.                                            | `translate()`                                                                                                                                                                                                                  |
| `COMPLETION_FAILED`            | 52406 | Completion failed.                                             | `completion()`                                                                                                                                                                                                                 |
| `ATTACHMENT_NOT_FOUND`         | 52407 | Attachment file not found at path.                             | `completion()`                                                                                                                                                                                                                 |
| `CANCEL_FAILED`                | 52408 | Failed to cancel operation.                                    | `cancel()`                                                                                                                                                                                                                     |
| `TEXT_TO_SPEECH_FAILED`        | 52409 | Text-to-speech operation failed.                               | `textToSpeech()`                                                                                                                                                                                                               |
| `CONFIG_RELOAD_NOT_SUPPORTED`  | 52410 | Model does not support hot config reload.                      | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `MODEL_TYPE_MISMATCH`          | 52411 | Model type mismatch (expected vs provided).                    | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                       |
| `IMAGE_FILE_NOT_FOUND`         | 52413 | Image file not found or not accessible.                        | `ocr()`                                                                                                                                                                                                                        |
| `INVALID_IMAGE_INPUT`          | 52414 | Invalid image input type provided.                             | `ocr()`                                                                                                                                                                                                                        |
| `RAG_SAVE_FAILED`              | 52800 | Failed to save embeddings.                                     | `ragSaveEmbeddings()`                                                                                                                                                                                                          |
| `RAG_SEARCH_FAILED`            | 52801 | Failed to search embeddings.                                   | `ragSearch()`                                                                                                                                                                                                                  |
| `RAG_DELETE_FAILED`            | 52802 | Failed to delete embeddings.                                   | `ragDeleteEmbeddings()`                                                                                                                                                                                                        |
| `RAG_WORKSPACE_MODEL_MISMATCH` | 52805 | Workspace is configured for a different embeddings model.      | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_NOT_FOUND`      | 52806 | RAG workspace not found.                                       | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_IN_USE`         | 52807 | RAG workspace is in use and can't be closed/deleted.           | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_NOT_OPEN`       | 52811 | RAG workspace is not open.                                     | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_CHUNK_FAILED`             | 52810 | Failed to chunk input into embeddings.                         | `ragSaveEmbeddings()`                                                                                                                                                                                                          |
| `DOWNLOAD_CANCELLED`           | 53001 | Download cancelled.                                            | `cancel()`, `downloadAsset()`, `loadModel()`                                                                                                                                                                                   |
| `CHECKSUM_VALIDATION_FAILED`   | 53002 | Downloaded file checksum validation failed.                    | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `HTTP_ERROR`                   | 53003 | HTTP request failed with status code.                          | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `DOWNLOAD_ASSET_FAILED`        | 53007 | Download failed.                                               | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `HYPERDRIVE_DOWNLOAD_FAILED`   | 53009 | Hyperdrive download failed.                                    | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `INVALID_SHARD_URL_PATTERN`    | 53010 | Invalid shard URL pattern for sharded downloads.               | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_EXTRACTION_FAILED`    | 53011 | Failed to extract an archive.                                  | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_UNSUPPORTED_TYPE`     | 53012 | Unsupported archive type.                                      | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_MISSING_SHARDS`       | 53013 | Archive is missing required shards.                            | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `PARTIAL_DOWNLOAD_OFFLINE`     | 53014 | Partial download exists but offline prevents resuming.         | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `DELETE_CACHE_FAILED`          | 53200 | Failed to delete cache.                                        | `deleteCache()`                                                                                                                                                                                                                |
| `INVALID_DELETE_CACHE_PARAMS`  | 53201 | Invalid parameters for `deleteCache()`.                        | `deleteCache()`                                                                                                                                                                                                                |
| `CACHE_DIR_NOT_ABSOLUTE`       | 53202 | Cache directory must be an absolute path.                      | `deleteCache()`                                                                                                                                                                                                                |
| `CACHE_DIR_NOT_WRITABLE`       | 53203 | Cache directory is not writable.                               | `deleteCache()`                                                                                                                                                                                                                |
| `FFMPEG_NOT_AVAILABLE`         | 53500 | Audio decoding required but FFmpeg is not available.           | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |


# getLogger( ) (/v0.7.0/sdk/api/getLogger)


```ts
function getLogger(namespace: string, options?: LoggerOptions): Logger;
```

## Parameters

| Name      | Type                              | Required? | Description                                                                                                                    |
| --------- | --------------------------------- | :-------: | ------------------------------------------------------------------------------------------------------------------------------ |
| namespace | `string`                          |     ✓     | Logger namespace (used for identification and filtering)                                                                       |
| options   | [`LoggerOptions`](#loggeroptions) |     ✗     | Optional logger configuration. When omitted, the logger is cached by namespace. When provided, a new logger is always created. |

### `LoggerOptions`

| Field         | Type                                                | Required? | Description                       |
| ------------- | --------------------------------------------------- | :-------: | --------------------------------- |
| level         | `"error" \| "warn" \| "info" \| "debug" \| "trace"` |     ✗     | Log level                         |
| namespace     | `string`                                            |     ✗     | Override namespace                |
| transports    | [`LogTransport[]`](#logtransport)                   |     ✗     | Custom log transports             |
| enableConsole | `boolean`                                           |     ✗     | Whether to output logs to console |

### `LogTransport`

```ts
type LogTransport = (
  level: LogLevel,
  namespace: string,
  message: string,
) => void | Promise<void>;
```

A callback function invoked for each log entry. `LogLevel` is `"error" | "warn" | "info" | "debug" | "trace"`.

## Returns

[`Logger`](#logger) — A logger instance.

### `Logger`

| Method           | Signature                                                | Description                      |
| ---------------- | -------------------------------------------------------- | -------------------------------- |
| error            | `(...args: unknown[]) => void`                           | Log at error level               |
| warn             | `(...args: unknown[]) => void`                           | Log at warn level                |
| info             | `(...args: unknown[]) => void`                           | Log at info level                |
| debug            | `(...args: unknown[]) => void`                           | Log at debug level               |
| trace            | `(...args: unknown[]) => void`                           | Log at trace level               |
| setLevel         | `(level) => void`                                        | Change the log level             |
| getLevel         | `() => LogLevel`                                         | Get the current log level        |
| addTransport     | `(transport:` [`LogTransport`](#logtransport)`) => void` | Add a custom transport           |
| setConsoleOutput | `(enabled: boolean) => void`                             | Enable or disable console output |


# getModelByName( ) (/v0.7.0/sdk/api/getModelByName)


```ts
function getModelByName(name: string): RegistryItem | undefined;
```

## Parameters

| Name | Type     | Required? | Description                                               |
| ---- | -------- | :-------: | --------------------------------------------------------- |
| name | `string` |     ✓     | The human-readable model name (e.g., `"Llama 3.2 3B Q4"`) |

## Returns

[`RegistryItem`](#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

### `RegistryItem`

| Field           | Type                                                                                | Description                 |
| --------------- | ----------------------------------------------------------------------------------- | --------------------------- |
| name            | `string`                                                                            | Human-readable model name   |
| registryPath    | `string`                                                                            | Registry path               |
| registrySource  | `string`                                                                            | Registry source             |
| blobCoreKey     | `string`                                                                            | Hyperdrive blob core key    |
| blobBlockOffset | `number`                                                                            | Blob block offset           |
| blobBlockLength | `number`                                                                            | Blob block length           |
| blobByteOffset  | `number`                                                                            | Blob byte offset            |
| modelId         | `string`                                                                            | Unique model identifier     |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type            |
| expectedSize    | `number`                                                                            | Expected file size in bytes |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum            |
| engine          | `string`                                                                            | Inference engine            |
| quantization    | `string`                                                                            | Quantization level          |
| params          | `string`                                                                            | Model parameter count       |

## Example

```typescript
import { getModelByName } from "@qvac/sdk";

const model = getModelByName("Llama 3.2 3B Q4");
if (model) {
  console.log(model.modelId, model.expectedSize);
}
```


# getModelBySrc( ) (/v0.7.0/sdk/api/getModelBySrc)


```ts
function getModelBySrc(modelId: string, blobCoreKey: string): RegistryItem | undefined;
```

## Parameters

| Name        | Type     | Required? | Description                  |
| ----------- | -------- | :-------: | ---------------------------- |
| modelId     | `string` |     ✓     | The unique model identifier  |
| blobCoreKey | `string` |     ✓     | The Hyperdrive blob core key |

## Returns

[`RegistryItem`](../getModelByName#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found. See [`getModelByName()`](../getModelByName#registryitem) for the full `RegistryItem` shape.

## Example

```typescript
import { getModelBySrc } from "@qvac/sdk";

const model = getModelBySrc("model-abc123", "blob-core-key-hex");
if (model) {
  console.log(model.name, model.expectedSize);
}
```


# getModelInfo( ) (/v0.7.0/sdk/api/getModelInfo)


```ts
function getModelInfo(params): Promise<ModelInfo>;
```

## Parameters

| Name        | Type     | Required? | Description               |
| ----------- | -------- | :-------: | ------------------------- |
| params      | `object` |     ✓     | The query parameters      |
| params.name | `string` |     ✓     | The model name to look up |

## Returns

`Promise<`[`ModelInfo`](#modelinfo)`>` — Detailed information about the model.

### `ModelInfo`

| Field           | Type                                                                                | Description                                     |
| --------------- | ----------------------------------------------------------------------------------- | ----------------------------------------------- |
| name            | `string`                                                                            | Model name                                      |
| modelId         | `string`                                                                            | Unique model identifier                         |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type                                |
| expectedSize    | `number`                                                                            | Expected total file size in bytes               |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum                                |
| isCached        | `boolean`                                                                           | Whether the model is fully cached locally       |
| isLoaded        | `boolean`                                                                           | Whether the model is currently loaded in memory |
| cacheFiles      | [`CacheFileInfo[]`](#cachefileinfo)                                                 | Individual cache file details                   |
| registryPath    | `string`                                                                            | Registry path (optional)                        |
| registrySource  | `string`                                                                            | Registry source (optional)                      |
| engine          | `string`                                                                            | Inference engine (optional)                     |
| quantization    | `string`                                                                            | Quantization level (optional)                   |
| params          | `string`                                                                            | Model parameter count (optional)                |
| actualSize      | `number`                                                                            | Actual cached size in bytes (optional)          |
| cachedAt        | `Date`                                                                              | When the model was cached (optional)            |
| loadedInstances | [`LoadedInstance[]`](#loadedinstance)                                               | Currently loaded instances (optional)           |

### `CacheFileInfo`

| Field          | Type      | Description                          |
| -------------- | --------- | ------------------------------------ |
| filename       | `string`  | File name                            |
| path           | `string`  | Full file path                       |
| expectedSize   | `number`  | Expected size in bytes               |
| sha256Checksum | `string`  | SHA-256 checksum                     |
| isCached       | `boolean` | Whether this file is cached          |
| actualSize     | `number`  | Actual file size (optional)          |
| cachedAt       | `Date`    | When this file was cached (optional) |

### `LoadedInstance`

| Field      | Type      | Description                                      |
| ---------- | --------- | ------------------------------------------------ |
| registryId | `string`  | Registry identifier for this loaded instance     |
| loadedAt   | `Date`    | When the model was loaded                        |
| config     | `unknown` | Model configuration used at load time (optional) |

## Throws

| Error                   | When                                                   |
| ----------------------- | ------------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"getModelInfo"` |

## Example

```typescript
import { getModelInfo } from "@qvac/sdk";

const info = await getModelInfo({ name: "Llama 3.2 3B Q4" });
console.log(`Cached: ${info.isCached}, Loaded: ${info.isLoaded}`);
console.log(`Size: ${info.expectedSize} bytes`);
```


# @qvac/sdk (/v0.7.0/sdk/api)


## Overview

`@qvac/sdk` npm package exposes a function-centric, typed JS API.

## Functions

| Function                                             | Summary                                                                                                               | Signature                                                                          |
| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- |
| [`cancel()`](./cancel)                               | Cancels an ongoing operation.                                                                                         | `cancel(params): Promise<void>`                                                    |
| [`close()`](./close)                                 | Closes the RPC connection to the QVAC server.                                                                         | `close(): Promise<void>`                                                           |
| [`completion()`](./completion)                       | Generates completion from a language model based on conversation history.                                             | `completion(params): { tokenStream, toolCallStream, text, toolCalls, stats }`      |
| [`defineHandler()`](./defineHandler)                 | Helper function to define a handler with full type inference.                                                         | `defineHandler<TRequest, TResponse>(definition): PluginHandlerDefinition`          |
| [`definePlugin()`](./definePlugin)                   | Helper function to define a plugin with full type inference.                                                          | `definePlugin<T>(plugin): T`                                                       |
| [`deleteCache()`](./deleteCache)                     | Deletes KV cache files.                                                                                               | `deleteCache(params): Promise<{ success: boolean }>`                               |
| [`downloadAsset()`](./downloadAsset)                 | Downloads an asset (model file) without loading it into memory.                                                       | `downloadAsset(options): Promise<string>`                                          |
| [`embed()`](./embed)                                 | Generates embeddings for a single text using a specified model.                                                       | `embed(params): Promise<number[]>`                                                 |
| [`getLogger()`](./getLogger)                         | Creates or retrieves a cached logger instance for the given namespace.                                                | `getLogger(namespace, options?): Logger`                                           |
| [`getModelByName()`](./getModelByName)               | Retrieves a model constant from the built-in registry by its human-readable name.                                     | `getModelByName(name): RegistryItem \| undefined`                                  |
| [`getModelBySrc()`](./getModelBySrc)                 | Retrieves a model constant from the built-in registry by its model ID and blob core key.                              | `getModelBySrc(modelId, blobCoreKey): RegistryItem \| undefined`                   |
| [`getModelInfo()`](./getModelInfo)                   | Retrieves detailed information about a model, including cache status and loaded instances.                            | `getModelInfo(params): Promise<ModelInfo>`                                         |
| [`invokePlugin()`](./invokePlugin)                   | Invoke a non-streaming plugin handler.                                                                                | `invokePlugin<TResponse, TParams>(options): Promise<TResponse>`                    |
| [`invokePluginStream()`](./invokePluginStream)       | Invoke a streaming plugin handler.                                                                                    | `invokePluginStream<TResponse, TParams>(options): AsyncGenerator<TResponse>`       |
| [`loadModel()`](./loadModel)                         | Loads a machine learning model from a local path, remote URL, or Hyperdrive key.                                      | `loadModel(options): Promise<string>`                                              |
| [`loggingStream()`](./loggingStream)                 | Opens a logging stream to receive real-time logs.                                                                     | `loggingStream(params): AsyncGenerator<LoggingStreamResponse>`                     |
| [`modelRegistryGetModel()`](./modelRegistryGetModel) | Retrieves a single model entry from the QVAC model registry by path and source.                                       | `modelRegistryGetModel(registryPath, registrySource): Promise<ModelRegistryEntry>` |
| [`modelRegistryList()`](./modelRegistryList)         | Lists all available models from the QVAC model registry.                                                              | `modelRegistryList(): Promise<ModelRegistryEntry[]>`                               |
| [`modelRegistrySearch()`](./modelRegistrySearch)     | Searches the QVAC model registry with optional filters.                                                               | `modelRegistrySearch(params?): Promise<ModelRegistryEntry[]>`                      |
| [`ocr()`](./ocr)                                     | Performs Optical Character Recognition (OCR) on an image to extract text.                                             | `ocr(params): { blockStream, blocks, stats }`                                      |
| [`ping()`](./ping)                                   | Sends a ping request to the server and returns the pong response.                                                     | `ping(): Promise<{ type: "pong", number: number }>`                                |
| [`ragChunk()`](./ragChunk)                           | Chunks documents into smaller pieces for embedding.                                                                   | `ragChunk(params): Promise<RagDoc[]>`                                              |
| [`ragCloseWorkspace()`](./ragCloseWorkspace)         | Closes a RAG workspace, releasing in-memory resources (Corestore, HyperDB adapter, RAG instance).                     | `ragCloseWorkspace(params?): Promise<void>`                                        |
| [`ragDeleteEmbeddings()`](./ragDeleteEmbeddings)     | Deletes document embeddings from the RAG vector database.                                                             | `ragDeleteEmbeddings(params): Promise<void>`                                       |
| [`ragDeleteWorkspace()`](./ragDeleteWorkspace)       | Deletes a RAG workspace and all its data.                                                                             | `ragDeleteWorkspace(params): Promise<void>`                                        |
| [`ragIngest()`](./ragIngest)                         | Ingests documents into the RAG vector database.                                                                       | `ragIngest(params): Promise<{ processed, droppedIndices }>`                        |
| [`ragListWorkspaces()`](./ragListWorkspaces)         | Lists all RAG workspaces with their open status.                                                                      | `ragListWorkspaces(): Promise<RagWorkspaceInfo[]>`                                 |
| [`ragReindex()`](./ragReindex)                       | Reindexes the RAG database to optimize search performance.                                                            | `ragReindex(params): Promise<RagReindexResult>`                                    |
| [`ragSaveEmbeddings()`](./ragSaveEmbeddings)         | Saves pre-embedded documents to the RAG vector database.                                                              | `ragSaveEmbeddings(params): Promise<RagSaveEmbeddingsResult[]>`                    |
| [`ragSearch()`](./ragSearch)                         | Searches for similar documents in the RAG vector database.                                                            | `ragSearch(params): Promise<RagSearchResult[]>`                                    |
| [`startQVACProvider()`](./startQVACProvider)         | Starts a provider service that offers QVAC capabilities to remote peers.                                              | `startQVACProvider(params): Promise<{ success, publicKey? }>`                      |
| [`stopQVACProvider()`](./stopQVACProvider)           | Stops a running provider service and leaves the specified topic.                                                      | `stopQVACProvider(params): Promise<{ success }>`                                   |
| [`textToSpeech()`](./textToSpeech)                   | Converts text to speech audio.                                                                                        | `textToSpeech(params): { bufferStream, buffer, done }`                             |
| [`transcribe()`](./transcribe)                       | Provides a simple interface for transcribing audio by collecting all streaming results into a single string response. | `transcribe(params): Promise<string>`                                              |
| [`transcribeStream()`](./transcribeStream)           | Streams audio transcription results in real-time, yielding text chunks as they become available from the model.       | `transcribeStream(params): AsyncGenerator<string>`                                 |
| [`translate()`](./translate)                         | Translates text from one language to another using a specified translation model.                                     | `translate(params): { tokenStream, text, stats }`                                  |
| [`unloadModel()`](./unloadModel)                     | Unloads a previously loaded model from the server.                                                                    | `unloadModel(params): Promise<void>`                                               |

## Errors

<Card href="./errors" title="Error codes">
  All errors thrown by the SDK and how to handle them via `SDK_CLIENT_ERROR_CODES` and `SDK_SERVER_ERROR_CODES`.
</Card>


# invokePlugin( ) (/v0.7.0/sdk/api/invokePlugin)


```ts
function invokePlugin<TResponse = unknown, TParams = unknown>(
  options: InvokePluginOptions<TParams>
): Promise<TResponse>;
```

## Parameters

| Name    | Type                                          | Required? | Description            |
| ------- | --------------------------------------------- | :-------: | ---------------------- |
| options | [`InvokePluginOptions`](#invokepluginoptions) |     ✓     | The invocation options |

### `InvokePluginOptions`

| Field   | Type      | Required? | Description                                                                         |
| ------- | --------- | :-------: | ----------------------------------------------------------------------------------- |
| modelId | `string`  |     ✓     | The model ID of the loaded plugin model                                             |
| handler | `string`  |     ✓     | The handler name to invoke (as defined in the plugin)                               |
| params  | `TParams` |     ✓     | Parameters to pass to the handler (validated against the handler's `requestSchema`) |

## Returns

`Promise<TResponse>` — The handler's response (validated against the handler's `responseSchema`).

## Throws

| Error                   | When                                                   |
| ----------------------- | ------------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"pluginInvoke"` |

## Example

```typescript
const result = await invokePlugin<{ answer: string }>({
  modelId: "my-custom-model",
  handler: "summarize",
  params: { text: "Long document..." },
});
console.log(result.answer);
```


# invokePluginStream( ) (/v0.7.0/sdk/api/invokePluginStream)


```ts
function invokePluginStream<TResponse = unknown, TParams = unknown>(
  options: InvokePluginOptions<TParams>
): AsyncGenerator<TResponse>;
```

## Parameters

| Name    | Type                                                         | Required? | Description                                                          |
| ------- | ------------------------------------------------------------ | :-------: | -------------------------------------------------------------------- |
| options | [`InvokePluginOptions`](../invokePlugin#invokepluginoptions) |     ✓     | The invocation options (same as [`invokePlugin()`](../invokePlugin)) |

## Returns

`AsyncGenerator<TResponse>` — Yields streamed chunks from the handler until `done` is received.

## Throws

| Error                   | When                                                          |
| ----------------------- | ------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | A chunk's type does not match expected `"pluginInvokeStream"` |

## Example

```typescript
const stream = invokePluginStream<{ token: string }>({
  modelId: "my-custom-model",
  handler: "generateStream",
  params: { prompt: "Tell me a story" },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.token);
}
```


# loadModel( ) (/v0.7.0/sdk/api/loadModel)


```ts
// Load new model
function loadModel(options: LoadModelOptions): Promise<string>;

// Hot-reload config on an already-loaded model
function loadModel(options: ReloadConfigOptions): Promise<string>;
```

Supports multiple model types: LLM, Whisper (speech recognition), embeddings, NMT (translation), TTS, and OCR. Handles local file paths, HTTP/HTTPS URLs, Hyperdrive URLs (`pear://`), and registry URLs.

When `onProgress` is provided, streaming is used for real-time download progress. Otherwise, a simple request-response pattern is used.

## Parameters

| Name    | Type                                                                                       | Required? | Description                                        |
| ------- | ------------------------------------------------------------------------------------------ | :-------: | -------------------------------------------------- |
| options | [`LoadModelOptions`](#loadmodeloptions) `\|` [`ReloadConfigOptions`](#reloadconfigoptions) |     ✓     | Configuration for loading or hot-reloading a model |

### `LoadModelOptions`

Common fields present in all variants:

| Field       | Type                                      | Required? | Default | Description                                                                                                                                                                                                     |
| ----------- | ----------------------------------------- | :-------: | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| modelSrc    | `string`                                  |     ✓     | —       | Model source — local path, HTTP(S) URL, Hyperdrive `pear://` URL, or registry URL                                                                                                                               |
| modelType   | `string`                                  |     ✓     | —       | The type of model — see [model type variants](#model-type-variants)                                                                                                                                             |
| modelConfig | `object`                                  |     ✗     | `{}`    | Model-specific configuration (varies by `modelType`). LLM models accept a `verbosity` field — use the exported `VERBOSITY` constant (`VERBOSITY.ERROR`, `VERBOSITY.WARN`, `VERBOSITY.INFO`, `VERBOSITY.DEBUG`). |
| seed        | `boolean`                                 |     ✗     | `false` | Whether to seed the model on Hyperdrive after download                                                                                                                                                          |
| delegate    | [`Delegate`](#delegate)                   |     ✗     | —       | Delegation configuration for remote inference                                                                                                                                                                   |
| onProgress  | `(progress: ModelProgressUpdate) => void` |     ✗     | —       | Callback for real-time download progress                                                                                                                                                                        |
| logger      | `Logger`                                  |     ✗     | —       | Logger instance — model operation logs are forwarded to this logger                                                                                                                                             |

### `Delegate`

Optional delegation configuration for remote (P2P) inference:

| Field              | Type      | Required? | Default | Description                                                |
| ------------------ | --------- | :-------: | ------- | ---------------------------------------------------------- |
| topic              | `string`  |     ✓     | —       | P2P topic for delegation                                   |
| providerPublicKey  | `string`  |     ✓     | —       | Provider's public key                                      |
| timeout            | `number`  |     ✗     | —       | Timeout in milliseconds (min 100)                          |
| fallbackToLocal    | `boolean` |     ✗     | `false` | Whether to fallback to local inference if delegation fails |
| forceNewConnection | `boolean` |     ✗     | `false` | Force a new connection to the provider                     |

### `ReloadConfigOptions`

Hot-reload configuration on an already-loaded model without reloading the model weights. Currently supported for Whisper models only.

| Field       | Type     | Required? | Description                                      |
| ----------- | -------- | :-------: | ------------------------------------------------ |
| modelId     | `string` |     ✓     | The ID of an existing loaded model (16-char hex) |
| modelType   | `string` |     ✓     | The type of model (must match the loaded model)  |
| modelConfig | `object` |     ✓     | New configuration to apply                       |

### Model type variants

Additional fields depend on `modelType`:

#### `"llm"`

| Field              | Type              | Required? | Default  | Description                                   |
| ------------------ | ----------------- | :-------: | -------- | --------------------------------------------- |
| projectionModelSrc | `string`          |     ✗     | —        | Projection model source for multimodal models |
| toolFormat         | `"json" \| "xml"` |     ✗     | `"json"` | Tool call format                              |

#### `"whisper"`

| Field       | Type     | Required? | Description                                   |
| ----------- | -------- | :-------: | --------------------------------------------- |
| vadModelSrc | `string` |     ✗     | VAD model source for voice activity detection |

#### `"embeddings"`

| Field                      | Type            | Required? | Default | Description                       |
| -------------------------- | --------------- | :-------: | ------- | --------------------------------- |
| modelConfig.gpuLayers      | `number`        |     ✗     | `99`    | Number of layers offloaded to GPU |
| modelConfig.device         | `string`        |     ✗     | `"gpu"` | Device to use                     |
| modelConfig.batchSize      | `number`        |     ✗     | `1024`  | Embedding batch size              |
| modelConfig.ctxSize        | `number`        |     ✗     | —       | Context size                      |
| modelConfig.flashAttention | `"on" \| "off"` |     ✗     | —       | Flash attention toggle            |
| modelConfig.rawConfig      | `string`        |     ✗     | —       | Raw CLI override (advanced)       |

#### `"nmt"`

`modelConfig` is **required** and is a discriminated union on `engine`:

| Field              | Type                                   | Required? | Description                                                                                                   |
| ------------------ | -------------------------------------- | :-------: | ------------------------------------------------------------------------------------------------------------- |
| modelConfig.engine | `"Opus" \| "Bergamot" \| "IndicTrans"` |     ✓     | Translation engine (determines available languages). **Opus is deprecated in v1.0.0 — use Bergamot instead.** |
| modelConfig.from   | `string`                               |     ✓     | Source language code                                                                                          |
| modelConfig.to     | `string`                               |     ✓     | Target language code                                                                                          |
| srcVocabSrc        | `string`                               |     ✗     | Source vocabulary file                                                                                        |
| dstVocabSrc        | `string`                               |     ✗     | Destination vocabulary file                                                                                   |

See [`modelConfig` details](#nmt-modelconfig) below for generation parameters.

#### `"tts"`

`modelConfig` is **required** and is a discriminated union on `ttsEngine`:

**Chatterbox engine** (`ttsEngine: "chatterbox"`):

| Field                    | Type                           | Required? | Description                          |
| ------------------------ | ------------------------------ | :-------: | ------------------------------------ |
| ttsEngine                | `"chatterbox"`                 |     ✓     | Engine discriminator                 |
| language                 | `"en" \| "es" \| "de" \| "it"` |     ✓     | Output language                      |
| ttsTokenizerSrc          | `string`                       |     ✓     | Tokenizer model source               |
| ttsSpeechEncoderSrc      | `string`                       |     ✓     | Speech encoder model source          |
| ttsEmbedTokensSrc        | `string`                       |     ✓     | Embed tokens model source            |
| ttsConditionalDecoderSrc | `string`                       |     ✓     | Conditional decoder model source     |
| ttsLanguageModelSrc      | `string`                       |     ✓     | Language model source                |
| referenceAudioSrc        | `string`                       |     ✓     | Reference WAV file for voice cloning |

**Supertonic engine** (`ttsEngine: "supertonic"`):

| Field                | Type                           | Required? | Description                  |
| -------------------- | ------------------------------ | :-------: | ---------------------------- |
| ttsEngine            | `"supertonic"`                 |     ✓     | Engine discriminator         |
| language             | `"en" \| "es" \| "de" \| "it"` |     ✓     | Output language              |
| ttsTokenizerSrc      | `string`                       |     ✓     | Tokenizer model source       |
| ttsTextEncoderSrc    | `string`                       |     ✓     | Text encoder model source    |
| ttsLatentDenoiserSrc | `string`                       |     ✓     | Latent denoiser model source |
| ttsVoiceDecoderSrc   | `string`                       |     ✓     | Voice decoder model source   |
| ttsVoiceSrc          | `string`                       |     ✓     | Voice `.bin` file source     |
| ttsSpeed             | `number`                       |     ✗     | Speech speed multiplier      |
| ttsNumInferenceSteps | `number`                       |     ✗     | Number of inference steps    |

#### `"ocr"`

| Field            | Type     | Required? | Description                   |
| ---------------- | -------- | :-------: | ----------------------------- |
| detectorModelSrc | `string` |     ✗     | Detector model source for OCR |

### Custom plugin

Any `modelType` string that is not a built-in type. `modelConfig` accepts `Record<string, unknown>`.

### `modelConfig` reference

#### LLM `modelConfig`

| Field              | Type               | Default                          | Description                                                                 |
| ------------------ | ------------------ | -------------------------------- | --------------------------------------------------------------------------- |
| ctx\_size          | `number`           | `1024`                           | Context window size                                                         |
| device             | `string`           | `"gpu"`                          | Device to use                                                               |
| gpu\_layers        | `number`           | `99`                             | Number of layers offloaded to GPU                                           |
| system\_prompt     | `string`           | `"You are a helpful assistant."` | System prompt                                                               |
| temp               | `number`           | —                                | Temperature (0–2)                                                           |
| top\_p             | `number`           | —                                | Top-p sampling (0–1)                                                        |
| top\_k             | `number`           | —                                | Top-k sampling (0–128)                                                      |
| seed               | `number`           | —                                | Random seed                                                                 |
| predict            | `number`           | —                                | Max tokens to predict. `-1` = until stop token, `-2` = until context filled |
| lora               | `string`           | —                                | LoRA adapter path                                                           |
| no\_mmap           | `boolean`          | —                                | Disable memory-mapped I/O                                                   |
| verbosity          | `0 \| 1 \| 2 \| 3` | —                                | Engine verbosity — use exported `VERBOSITY` constant                        |
| presence\_penalty  | `number`           | —                                | Presence penalty                                                            |
| frequency\_penalty | `number`           | —                                | Frequency penalty                                                           |
| repeat\_penalty    | `number`           | —                                | Repeat penalty                                                              |
| stop\_sequences    | `string[]`         | —                                | Custom stop sequences                                                       |
| n\_discarded       | `number`           | —                                | Number of discarded tokens                                                  |
| tools              | `boolean`          | —                                | Enable tool calling support                                                 |

#### Whisper `modelConfig`

Common fields:

| Field            | Type                        | Description                                                                                                                                    |
| ---------------- | --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| language         | `string`                    | Language code (e.g., `"en"`)                                                                                                                   |
| translate        | `boolean`                   | Whether to translate to English                                                                                                                |
| strategy         | `"greedy" \| "beam_search"` | Sampling strategy                                                                                                                              |
| temperature      | `number`                    | Temperature                                                                                                                                    |
| initial\_prompt  | `string`                    | Initial prompt for the decoder                                                                                                                 |
| detect\_language | `boolean`                   | Auto-detect language                                                                                                                           |
| vad\_params      | `object`                    | VAD parameters — `{ threshold?, min_speech_duration_ms?, min_silence_duration_ms?, max_speech_duration_s?, speech_pad_ms?, samples_overlap? }` |
| audio\_format    | `"f32le" \| "s16le"`        | Audio format                                                                                                                                   |
| contextParams    | `object`                    | Context parameters — `{ use_gpu?, flash_attn?, gpu_device? }`                                                                                  |

Additional fields: `n_threads`, `n_max_text_ctx`, `offset_ms`, `duration_ms`, `audio_ctx`, `no_context`, `no_timestamps`, `single_segment`, `print_special`, `print_progress`, `print_realtime`, `print_timestamps`, `token_timestamps`, `thold_pt`, `thold_ptsum`, `max_len`, `split_on_word`, `max_tokens`, `debug_mode`, `tdrz_enable`, `suppress_regex`, `suppress_blank`, `suppress_nst`, `length_penalty`, `temperature_inc`, `entropy_thold`, `logprob_thold`, `greedy_best_of`, `beam_search_beam_size`. All optional. See `whisperConfigSchema` in the source for details.

#### NMT `modelConfig`

Discriminated union on `engine`. Common generation parameters (all optional):

| Field             | Type     | Default  | Description           |
| ----------------- | -------- | -------- | --------------------- |
| mode              | `"full"` | `"full"` | Translation mode      |
| beamsize          | `number` | `4`      | Beam size             |
| lengthpenalty     | `number` | `1.0`    | Length penalty        |
| maxlength         | `number` | `512`    | Max output length     |
| repetitionpenalty | `number` | `1.0`    | Repetition penalty    |
| norepeatngramsize | `number` | `0`      | No-repeat n-gram size |
| temperature       | `number` | `0.3`    | Temperature           |
| topk              | `number` | `0`      | Top-k sampling        |
| topp              | `number` | `1.0`    | Top-p sampling        |

Engine-specific:

* **Opus**: *Deprecated in v1.0.0. Use Bergamot for European language pairs.*
* **Bergamot**: `from`/`to` accept 24 languages (en, ar, bg, ca, cs, de, es, et, fi, fr, hu, is, it, ja, ko, lt, lv, nl, pl, pt, ru, sk, sl, uk, zh). Additional fields: `srcVocabPath`, `dstVocabPath`, `normalize`
* **IndicTrans**: `from`/`to` accept 26 Indic language codes (e.g., `"eng_Latn"`, `"hin_Deva"`)

#### OCR `modelConfig`

| Field                  | Type       | Description                            |
| ---------------------- | ---------- | -------------------------------------- |
| langList               | `string[]` | Languages to detect                    |
| useGPU                 | `boolean`  | Use GPU acceleration                   |
| timeout                | `number`   | Timeout in milliseconds                |
| magRatio               | `number`   | Magnification ratio for detection      |
| defaultRotationAngles  | `number[]` | Rotation angles to try                 |
| contrastRetry          | `boolean`  | Retry with contrast adjustment         |
| lowConfidenceThreshold | `number`   | Threshold for low-confidence filtering |
| recognizerBatchSize    | `number`   | Batch size for recognizer              |

### `ModelProgressUpdate`

| Field                       | Type              | Description                                            |
| --------------------------- | ----------------- | ------------------------------------------------------ |
| type                        | `"modelProgress"` | Event type                                             |
| downloaded                  | `number`          | Bytes downloaded so far                                |
| total                       | `number`          | Total bytes expected                                   |
| percentage                  | `number`          | Download percentage                                    |
| downloadKey                 | `string`          | Unique download key (use with [`cancel()`](../cancel)) |
| shardInfo                   | `object`          | Shard progress (optional, for sharded models)          |
| shardInfo.currentShard      | `number`          | Current shard index                                    |
| shardInfo.totalShards       | `number`          | Total number of shards                                 |
| shardInfo.shardName         | `string`          | Current shard file name                                |
| shardInfo.overallDownloaded | `number`          | Total bytes downloaded across all shards               |
| shardInfo.overallTotal      | `number`          | Total bytes across all shards                          |
| shardInfo.overallPercentage | `number`          | Overall percentage across all shards                   |
| onnxInfo                    | `object`          | ONNX multi-file progress (optional, for ONNX models)   |
| onnxInfo.currentFile        | `string`          | Current file being downloaded                          |
| onnxInfo.fileIndex          | `number`          | Current file index                                     |
| onnxInfo.totalFiles         | `number`          | Total number of files                                  |
| onnxInfo.overallDownloaded  | `number`          | Total bytes downloaded across all files                |
| onnxInfo.overallTotal       | `number`          | Total bytes across all files                           |
| onnxInfo.overallPercentage  | `number`          | Overall percentage across all files                    |

## Returns

`Promise<string>` — Resolves to the model ID (used to reference the model in subsequent API calls).

## Throws

| Error                           | When                                                              |
| ------------------------------- | ----------------------------------------------------------------- |
| `MODEL_LOAD_FAILED`             | Model loading fails                                               |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends without a final response (when using `onProgress`) |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"loadModel"`               |

## Example

```typescript
// Local file path
const modelId = await loadModel({
  modelSrc: "/home/user/models/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { contextSize: 2048 }
});

// Remote URL with progress tracking
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../model.gguf",
  modelType: "llm",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});

// Hyperdrive URL
const modelId = await loadModel({
  modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { contextSize: 2048 }
});

// Multimodal model with projection
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../main-model.gguf",
  modelType: "llm",
  projectionModelSrc: "https://huggingface.co/.../projection-model.gguf",
  modelConfig: { ctx_size: 512 }
});

// Whisper with VAD model
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../whisper-model.gguf",
  modelType: "whisper",
  vadModelSrc: "https://huggingface.co/.../vad-model.bin",
  modelConfig: {
    mode: "caption",
    output_format: "plaintext",
    min_seconds: 2,
    max_seconds: 6
  }
});

// With logger forwarding
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
  logger
});
```


# loggingStream( ) (/v0.7.0/sdk/api/loggingStream)


```ts
function loggingStream(params): AsyncGenerator<LoggingStreamResponse>;
```

## Parameters

| Name      | Type     | Required? | Description                                                                                                                   |
| --------- | -------- | :-------: | ----------------------------------------------------------------------------------------------------------------------------- |
| params    | `object` |     ✓     | The logging stream parameters                                                                                                 |
| params.id | `string` |     ✓     | The identifier to stream logs for. Pass a model ID for model logs, or the exported constant `SDK_LOG_ID` for SDK server logs. |

## Returns

`AsyncGenerator<`[`LoggingStreamResponse`](#loggingstreamresponse)`>` — Yields log messages in real time.

### `LoggingStreamResponse`

| Field     | Type                                     | Description               |
| --------- | ---------------------------------------- | ------------------------- |
| type      | `"loggingStream"`                        | Response type             |
| id        | `string`                                 | Identifier being streamed |
| level     | `"error" \| "warn" \| "info" \| "debug"` | Log level                 |
| namespace | `string`                                 | Logger namespace          |
| message   | `string`                                 | Log message               |
| timestamp | `number`                                 | Unix timestamp            |

## Throws

| Error                   | When                                                     |
| ----------------------- | -------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | A chunk's type does not match expected `"loggingStream"` |

## Example

```typescript
import { loggingStream, SDK_LOG_ID } from "@qvac/sdk";

// Stream logs from a loaded model
const logStream = loggingStream({ id: "my-model-id" });

for await (const logMessage of logStream) {
  console.log(`[${logMessage.level}] ${logMessage.namespace}: ${logMessage.message}`);
}

// Or stream SDK server logs
const sdkLogs = loggingStream({ id: SDK_LOG_ID });
```


# modelRegistryGetModel( ) (/v0.7.0/sdk/api/modelRegistryGetModel)


```ts
function modelRegistryGetModel(registryPath: string, registrySource: string): Promise<ModelRegistryEntry>;
```

## Parameters

| Name           | Type     | Required? | Description                    |
| -------------- | -------- | :-------: | ------------------------------ |
| registryPath   | `string` |     ✓     | The registry path of the model |
| registrySource | `string` |     ✓     | The registry source identifier |

## Returns

`Promise<`[`ModelRegistryEntry`](#modelregistryentry)`>` — The matching model entry.

### `ModelRegistryEntry`

| Field           | Type                                                                                | Description                 |
| --------------- | ----------------------------------------------------------------------------------- | --------------------------- |
| name            | `string`                                                                            | Human-readable model name   |
| registryPath    | `string`                                                                            | Registry path               |
| registrySource  | `string`                                                                            | Registry source             |
| blobCoreKey     | `string`                                                                            | Hyperdrive blob core key    |
| blobBlockOffset | `number`                                                                            | Blob block offset           |
| blobBlockLength | `number`                                                                            | Blob block length           |
| blobByteOffset  | `number`                                                                            | Blob byte offset            |
| modelId         | `string`                                                                            | Unique model identifier     |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type            |
| expectedSize    | `number`                                                                            | Expected file size in bytes |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum            |
| engine          | `string`                                                                            | Inference engine            |
| quantization    | `string`                                                                            | Quantization level          |
| params          | `string`                                                                            | Parameter count             |

## Throws

| Error                              | When                                           |
| ---------------------------------- | ---------------------------------------------- |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails or model is not found |

## Example

```typescript
const model = await modelRegistryGetModel("llama-3.2-3b-q4", "qvac");
console.log(model.name, model.expectedSize);
```


# modelRegistryList( ) (/v0.7.0/sdk/api/modelRegistryList)


```ts
function modelRegistryList(): Promise<ModelRegistryEntry[]>;
```

## Parameters

None.

## Returns

`Promise<`[`ModelRegistryEntry[]`](../modelRegistryGetModel#modelregistryentry)`>` — Array of all available model entries. See [`modelRegistryGetModel()`](../modelRegistryGetModel#modelregistryentry) for the `ModelRegistryEntry` shape.

## Throws

| Error                              | When                     |
| ---------------------------------- | ------------------------ |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails |

## Example

```typescript
const models = await modelRegistryList();
console.log(`Available models: ${models.length}`);
for (const model of models) {
  console.log(`${model.name} (${model.addon}, ${model.quantization})`);
}
```


# modelRegistrySearch( ) (/v0.7.0/sdk/api/modelRegistrySearch)


```ts
function modelRegistrySearch(params?): Promise<ModelRegistryEntry[]>;
```

## Parameters

| Name   | Type                                                      | Required? | Description                                              |
| ------ | --------------------------------------------------------- | :-------: | -------------------------------------------------------- |
| params | [`ModelRegistrySearchParams`](#modelregistrysearchparams) |     ✗     | Optional search filters. If omitted, returns all models. |

### `ModelRegistrySearchParams`

| Field        | Type                                                                                | Required? | Description                              |
| ------------ | ----------------------------------------------------------------------------------- | :-------: | ---------------------------------------- |
| filter       | `string`                                                                            |     ✗     | Free-text filter on model name           |
| engine       | `string`                                                                            |     ✗     | Filter by inference engine               |
| quantization | `string`                                                                            |     ✗     | Filter by quantization level             |
| modelType    | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` |     ✗     | Filter by model type (alias for `addon`) |
| addon        | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` |     ✗     | Filter by addon type                     |

## Returns

`Promise<`[`ModelRegistryEntry[]`](../modelRegistryGetModel#modelregistryentry)`>` — Matching model entries. See [`modelRegistryGetModel()`](../modelRegistryGetModel#modelregistryentry) for the `ModelRegistryEntry` shape.

## Throws

| Error                              | When                     |
| ---------------------------------- | ------------------------ |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails |

## Example

```typescript
// Search LLM models
const llmModels = await modelRegistrySearch({ modelType: "llm" });

// Search by name
const llamaModels = await modelRegistrySearch({ filter: "llama" });

// Combined filters
const models = await modelRegistrySearch({
  modelType: "llm",
  quantization: "Q4_K_M",
});
```


# ocr( ) (/v0.7.0/sdk/api/ocr)


```ts
function ocr(params): {
  blockStream: AsyncGenerator<OCRTextBlock[]>;
  blocks: Promise<OCRTextBlock[]>;
  stats: Promise<OCRStats | undefined>;
};
```

## Parameters

| Name   | Type                                  | Required? | Description        |
| ------ | ------------------------------------- | :-------: | ------------------ |
| params | [`OCRClientParams`](#ocrclientparams) |     ✓     | The OCR parameters |

### `OCRClientParams`

| Field   | Type                        | Required? | Default | Description                                       |
| ------- | --------------------------- | :-------: | ------- | ------------------------------------------------- |
| modelId | `string`                    |     ✓     | —       | The identifier of the loaded OCR model            |
| image   | `string \| Buffer`          |     ✓     | —       | Image input as file path (string) or image buffer |
| options | [`OCROptions`](#ocroptions) |     ✗     | —       | Optional OCR options                              |
| stream  | `boolean`                   |     ✗     | `false` | Whether to stream blocks as they're detected      |

#### `OCROptions`

| Field     | Type      | Required? | Description           |
| --------- | --------- | :-------: | --------------------- |
| paragraph | `boolean` |     ✗     | Enable paragraph mode |

## Returns

`object` — Object with the following fields:

| Field       | Type                                                  | Description                                                 |
| ----------- | ----------------------------------------------------- | ----------------------------------------------------------- |
| blockStream | `AsyncGenerator<`[`OCRTextBlock[]`](#ocrtextblock)`>` | Stream of detected text blocks (active when `stream: true`) |
| blocks      | `Promise<`[`OCRTextBlock[]`](#ocrtextblock)`>`        | All detected text blocks (populated when `stream: false`)   |
| stats       | `Promise<`[`OCRStats`](#ocrstats) `\| undefined>`     | Performance statistics                                      |

### `OCRTextBlock`

| Field      | Type                               | Description                                |
| ---------- | ---------------------------------- | ------------------------------------------ |
| text       | `string`                           | Detected text                              |
| bbox       | `[number, number, number, number]` | Bounding box `[x1, y1, x2, y2]` (optional) |
| confidence | `number`                           | Detection confidence score (optional)      |

### `OCRStats`

| Field           | Type     | Description                             |
| --------------- | -------- | --------------------------------------- |
| detectionTime   | `number` | Detection phase time in ms (optional)   |
| recognitionTime | `number` | Recognition phase time in ms (optional) |
| totalTime       | `number` | Total OCR time in ms (optional)         |

## Example

```typescript
// Non-streaming mode (default)
const { blocks } = ocr({ modelId, image: "/path/to/image.png" });
for (const block of await blocks) {
  console.log(block.text, block.bbox, block.confidence);
}

// Streaming mode
const { blockStream } = ocr({ modelId, image: imageBuffer, stream: true });
for await (const blocks of blockStream) {
  console.log("Detected:", blocks);
}
```


# ping( ) (/v0.7.0/sdk/api/ping)


```ts
function ping(): Promise<{ type: "pong"; number: number }>;
```

## Parameters

None.

## Returns

`Promise<{ type: "pong", number: number }>` — The pong response containing a number.

## Throws

| Error                   | When                                           |
| ----------------------- | ---------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"pong"` |

## Example

```typescript
const pong = await ping();
console.log(pong.number);
```


# ragChunk( ) (/v0.7.0/sdk/api/ragChunk)


```ts
function ragChunk(params): Promise<RagDoc[]>;
```

Part of the segregated flow: `ragChunk()` → [`embed()`](../embed) → [`ragSaveEmbeddings()`](../ragSaveEmbeddings)

## Parameters

| Name             | Type                            | Required? | Description             |
| ---------------- | ------------------------------- | :-------: | ----------------------- |
| params           | `object`                        |     ✓     | The chunking parameters |
| params.documents | `string \| string[]`            |     ✓     | Documents to chunk      |
| params.chunkOpts | [`ChunkOptions`](#chunkoptions) |     ✗     | Chunking options        |

### `ChunkOptions`

| Field         | Type                                                       | Required? | Description             |
| ------------- | ---------------------------------------------------------- | :-------: | ----------------------- |
| chunkSize     | `number`                                                   |     ✗     | Maximum chunk size      |
| chunkOverlap  | `number`                                                   |     ✗     | Overlap between chunks  |
| chunkStrategy | `"character" \| "paragraph"`                               |     ✗     | Chunking strategy       |
| splitStrategy | `"character" \| "word" \| "token" \| "sentence" \| "line"` |     ✗     | Text splitting strategy |

## Returns

`Promise<RagDoc[]>` — Array of chunk results.

| Field   | Type     | Description        |
| ------- | -------- | ------------------ |
| id      | `string` | Chunk identifier   |
| content | `string` | Chunk text content |

## Throws

| Error              | When                         |
| ------------------ | ---------------------------- |
| `RAG_CHUNK_FAILED` | The chunking operation fails |

## Example

```typescript
const chunks = await ragChunk({
  documents: ["Long document text here..."],
  chunkOpts: {
    chunkSize: 256,
    chunkOverlap: 50,
    chunkStrategy: "paragraph",
  },
});
```


# ragCloseWorkspace( ) (/v0.7.0/sdk/api/ragCloseWorkspace)


```ts
function ragCloseWorkspace(params?): Promise<void>;
```

Releases Corestore, HyperDB adapter, and RAG instance. Workspace data remains on disk unless `deleteOnClose` is set.

## Parameters

| Name                 | Type      | Required? | Default     | Description                                             |
| -------------------- | --------- | :-------: | ----------- | ------------------------------------------------------- |
| params.workspace     | `string`  |     ✗     | `"default"` | Workspace to close                                      |
| params.deleteOnClose | `boolean` |     ✗     | `false`     | If true, deletes workspace data from disk after closing |

## Returns

`Promise<void>` — Resolves when the workspace is closed.

## Throws

| Error                        | When                      |
| ---------------------------- | ------------------------- |
| `RAG_WORKSPACE_CLOSE_FAILED` | The close operation fails |

## Example

```typescript
// Close a workspace
await ragCloseWorkspace({ workspace: "my-docs" });

// Close and delete in one call
await ragCloseWorkspace({ workspace: "my-docs", deleteOnClose: true });
```


# ragDeleteEmbeddings( ) (/v0.7.0/sdk/api/ragDeleteEmbeddings)


```ts
function ragDeleteEmbeddings(params): Promise<void>;
```

## Parameters

| Name             | Type       | Required? | Default     | Description                                             |
| ---------------- | ---------- | :-------: | ----------- | ------------------------------------------------------- |
| params.ids       | `string[]` |     ✓     | —           | Array of document IDs to delete                         |
| params.modelId   | `string`   |     ✗     | —           | Embedding model ID (required if no cached RAG instance) |
| params.workspace | `string`   |     ✗     | `"default"` | Workspace to delete from                                |

## Returns

`Promise<void>` — Resolves when the embeddings are deleted.

## Throws

| Error               | When                                                  |
| ------------------- | ----------------------------------------------------- |
| `RAG_DELETE_FAILED` | The delete operation fails or workspace doesn't exist |

## Example

```typescript
await ragDeleteEmbeddings({
  ids: ["doc-1", "doc-2"],
  workspace: "my-docs",
});
```


# ragDeleteWorkspace( ) (/v0.7.0/sdk/api/ragDeleteWorkspace)


```ts
function ragDeleteWorkspace(params): Promise<void>;
```

The workspace must not be currently loaded/in-use.

## Parameters

| Name             | Type     | Required? | Description                     |
| ---------------- | -------- | :-------: | ------------------------------- |
| params.workspace | `string` |     ✓     | Name of the workspace to delete |

## Returns

`Promise<void>` — Resolves when the workspace is deleted.

## Throws

| Error               | When                                               |
| ------------------- | -------------------------------------------------- |
| `RAG_DELETE_FAILED` | The workspace doesn't exist or is currently loaded |

## Example

```typescript
await ragDeleteWorkspace({ workspace: "my-docs" });
```


# ragIngest( ) (/v0.7.0/sdk/api/ragIngest)


```ts
function ragIngest(params): Promise<{ processed: RagSaveEmbeddingsResult[]; droppedIndices: number[] }>;
```

Full pipeline: chunk → embed → save. Implicitly opens (or creates) the workspace.

## Parameters

| Name                    | Type                                       | Required? | Default     | Description                                                  |
| ----------------------- | ------------------------------------------ | :-------: | ----------- | ------------------------------------------------------------ |
| params.modelId          | `string`                                   |     ✓     | —           | The embedding model identifier                               |
| params.documents        | `string \| string[]`                       |     ✓     | —           | Documents to ingest                                          |
| params.chunk            | `boolean`                                  |     ✗     | `true`      | Whether to chunk documents before embedding                  |
| params.chunkOpts        | [`ChunkOptions`](../ragChunk#chunkoptions) |     ✗     | —           | Chunking options                                             |
| params.workspace        | `string`                                   |     ✗     | `"default"` | Workspace for isolated storage. Created if it doesn't exist. |
| params.onProgress       | `(stage, current, total) => void`          |     ✗     | —           | Progress callback                                            |
| params.progressInterval | `number`                                   |     ✗     | —           | Minimum interval between progress updates in ms              |

## Returns

| Field          | Type                        | Description                                                   |
| -------------- | --------------------------- | ------------------------------------------------------------- |
| processed      | `RagSaveEmbeddingsResult[]` | Array of `{ status: "fulfilled" \| "rejected", id?, error? }` |
| droppedIndices | `number[]`                  | Indices of documents that were dropped                        |

## Throws

| Error                           | When                                                  |
| ------------------------------- | ----------------------------------------------------- |
| `RAG_SAVE_FAILED`               | The ingestion operation fails                         |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`) |

## Example

```typescript
const result = await ragIngest({
  modelId,
  documents: ["Document 1", "Document 2"],
  workspace: "my-docs",
  onProgress: (stage, current, total) => {
    console.log(`[${stage}] ${current}/${total}`);
  },
});
```


# ragListWorkspaces( ) (/v0.7.0/sdk/api/ragListWorkspaces)


```ts
function ragListWorkspaces(): Promise<RagWorkspaceInfo[]>;
```

## Parameters

None.

## Returns

`Promise<RagWorkspaceInfo[]>` — Array of workspace info.

| Field | Type      | Description                                         |
| ----- | --------- | --------------------------------------------------- |
| name  | `string`  | Workspace name                                      |
| open  | `boolean` | Whether the workspace is currently loaded in memory |

## Throws

| Error                        | When                |
| ---------------------------- | ------------------- |
| `RAG_LIST_WORKSPACES_FAILED` | The operation fails |

## Example

```typescript
const workspaces = await ragListWorkspaces();
// [{ name: "default", open: true }, { name: "my-docs", open: false }]
```


# ragReindex( ) (/v0.7.0/sdk/api/ragReindex)


```ts
function ragReindex(params): Promise<RagReindexResult>;
```

For HyperDB, rebalances centroids using k-means clustering. Requires a minimum number of documents (16 by default).

## Parameters

| Name              | Type                              | Required? | Default     | Description                                             |
| ----------------- | --------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.modelId    | `string`                          |     ✗     | —           | Embedding model ID (required if no cached RAG instance) |
| params.workspace  | `string`                          |     ✗     | `"default"` | Workspace to reindex. Must already exist.               |
| params.onProgress | `(stage, current, total) => void` |     ✗     | —           | Progress callback                                       |

## Returns

| Field     | Type                      | Description                                             |
| --------- | ------------------------- | ------------------------------------------------------- |
| reindexed | `boolean`                 | Whether reindexing was performed                        |
| details   | `Record<string, unknown>` | Additional details (e.g., reason if skipped) — optional |

## Throws

| Error                           | When                                                   |
| ------------------------------- | ------------------------------------------------------ |
| `RAG_SAVE_FAILED`               | The reindex operation fails or workspace doesn't exist |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`)  |

## Example

```typescript
const result = await ragReindex({ workspace: "my-docs" });

if (!result.reindexed) {
  console.log("Reindex skipped:", result.details?.reason);
}
```


# ragSaveEmbeddings( ) (/v0.7.0/sdk/api/ragSaveEmbeddings)


```ts
function ragSaveEmbeddings(params): Promise<RagSaveEmbeddingsResult[]>;
```

Part of the segregated flow: [`ragChunk()`](../ragChunk) → [`embed()`](../embed) → `ragSaveEmbeddings()`. Implicitly opens (or creates) the workspace.

## Parameters

| Name                    | Type                                  | Required? | Default     | Description                                                    |
| ----------------------- | ------------------------------------- | :-------: | ----------- | -------------------------------------------------------------- |
| params.documents        | [`RagEmbeddedDoc[]`](#ragembeddeddoc) |     ✓     | —           | Pre-embedded documents                                         |
| params.modelId          | `string`                              |     ✗     | —           | Embedding model ID (required if no cached RAG instance exists) |
| params.workspace        | `string`                              |     ✗     | `"default"` | Workspace for isolated storage                                 |
| params.onProgress       | `(stage, current, total) => void`     |     ✗     | —           | Progress callback                                              |
| params.progressInterval | `number`                              |     ✗     | —           | Minimum interval between progress updates in ms                |

### `RagEmbeddedDoc`

| Field            | Type                      | Required? | Description                          |
| ---------------- | ------------------------- | :-------: | ------------------------------------ |
| id               | `string`                  |     ✓     | Document identifier                  |
| content          | `string`                  |     ✓     | Document text content                |
| embedding        | `number[]`                |     ✓     | Pre-computed embedding vector        |
| embeddingModelId | `string`                  |     ✓     | Model used to generate the embedding |
| metadata         | `Record<string, unknown>` |     ✗     | Optional metadata                    |

## Returns

`Promise<RagSaveEmbeddingsResult[]>` — Array of `{ status: "fulfilled" | "rejected", id?, error? }`.

## Throws

| Error                           | When                                                  |
| ------------------------------- | ----------------------------------------------------- |
| `RAG_SAVE_FAILED`               | The save operation fails                              |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`) |

## Example

```typescript
const chunks = await ragChunk({ documents: ["text1", "text2"] });
const embeddings = await embed({ modelId, text: chunks.map(c => c.content) });
const embeddedDocs = chunks.map((chunk, i) => ({
  ...chunk,
  embedding: embeddings[i],
  embeddingModelId: modelId,
}));
const result = await ragSaveEmbeddings({
  documents: embeddedDocs,
  workspace: "my-workspace",
});
```


# ragSearch( ) (/v0.7.0/sdk/api/ragSearch)


```ts
function ragSearch(params): Promise<RagSearchResult[]>;
```

## Parameters

| Name             | Type     | Required? | Default     | Description                              |
| ---------------- | -------- | :-------: | ----------- | ---------------------------------------- |
| params.modelId   | `string` |     ✓     | —           | The embedding model identifier           |
| params.query     | `string` |     ✓     | —           | The search query text                    |
| params.topK      | `number` |     ✗     | `5`         | Number of top results to retrieve        |
| params.n         | `number` |     ✗     | `3`         | Number of centroids for IVF index search |
| params.workspace | `string` |     ✗     | `"default"` | Workspace to search in                   |

## Returns

`Promise<RagSearchResult[]>` — Array of search results. Empty array if workspace doesn't exist.

| Field   | Type     | Description           |
| ------- | -------- | --------------------- |
| id      | `string` | Document identifier   |
| content | `string` | Document text content |
| score   | `number` | Similarity score      |

## Throws

| Error               | When                       |
| ------------------- | -------------------------- |
| `RAG_SEARCH_FAILED` | The search operation fails |

## Example

```typescript
const results = await ragSearch({
  modelId,
  query: "AI and machine learning",
  topK: 5,
  workspace: "my-docs",
});
```


# startQVACProvider( ) (/v0.7.0/sdk/api/startQVACProvider)


```ts
function startQVACProvider(params): Promise<object>;
```

The provider's keypair can be controlled via the seed option or the `QVAC_HYPERSWARM_SEED` environment variable.

## Parameters

| Name            | Type                                | Required? | Description                         |
| --------------- | ----------------------------------- | :-------: | ----------------------------------- |
| params.topic    | `string`                            |     ✓     | Topic hex string for peer discovery |
| params.firewall | [`FirewallConfig`](#firewallconfig) |     ✗     | Optional firewall configuration     |

### `FirewallConfig`

| Field      | Type                | Required? | Default   | Description                  |
| ---------- | ------------------- | :-------: | --------- | ---------------------------- |
| mode       | `"allow" \| "deny"` |     ✗     | `"allow"` | Firewall mode                |
| publicKeys | `string[]`          |     ✗     | `[]`      | Public keys to allow or deny |

## Returns

`Promise<object>` — The provide response containing `success`, `publicKey`, and optional `error`.

## Throws

| Error                   | When                                              |
| ----------------------- | ------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"provide"` |
| `PROVIDER_START_FAILED` | The server reports provider start failure         |

## Example

```typescript
const response = await startQVACProvider({
  topic: "a1b2c3d4...",
  firewall: {
    mode: "allow",
    publicKeys: ["peer-public-key-hex"],
  },
});
console.log("Provider public key:", response.publicKey);
```


# stopQVACProvider( ) (/v0.7.0/sdk/api/stopQVACProvider)


```ts
function stopQVACProvider(params): Promise<object>;
```

## Parameters

| Name         | Type     | Required? | Description               |
| ------------ | -------- | :-------: | ------------------------- |
| params.topic | `string` |     ✓     | Topic hex string to leave |

## Returns

`Promise<object>` — The stop provide response containing `success` and optional `error`.

## Throws

| Error                   | When                                                  |
| ----------------------- | ----------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"stopProvide"` |
| `PROVIDER_STOP_FAILED`  | The server reports provider stop failure              |

## Example

```typescript
await stopQVACProvider({ topic: "a1b2c3d4..." });
```


# textToSpeech( ) (/v0.7.0/sdk/api/textToSpeech)


```ts
function textToSpeech(params): {
  bufferStream: AsyncGenerator<number>;
  buffer: Promise<number[]>;
  done: Promise<boolean>;
};
```

## Parameters

| Name             | Type      | Required? | Default  | Description                                           |
| ---------------- | --------- | :-------: | -------- | ----------------------------------------------------- |
| params.modelId   | `string`  |     ✓     | —        | The identifier of the loaded TTS model                |
| params.text      | `string`  |     ✓     | —        | The text to convert to speech (non-empty)             |
| params.inputType | `string`  |     ✗     | `"text"` | Input type                                            |
| params.stream    | `boolean` |     ✗     | `true`   | Whether to stream audio samples or return all at once |

## Returns

`object` — Object with the following fields:

| Field        | Type                     | Description                                            |
| ------------ | ------------------------ | ------------------------------------------------------ |
| bufferStream | `AsyncGenerator<number>` | Stream of audio samples (active when `stream: true`)   |
| buffer       | `Promise<number[]>`      | Complete audio buffer (populated when `stream: false`) |
| done         | `Promise<boolean>`       | Resolves to `true` when generation completes           |

## Example

```typescript
// Streaming mode
const { bufferStream } = textToSpeech({ modelId, text: "Hello world" });
for await (const sample of bufferStream) {
  // process audio sample
}

// Non-streaming mode
const { buffer } = textToSpeech({ modelId, text: "Hello world", stream: false });
const audioData = await buffer;
```


# transcribe( ) (/v0.7.0/sdk/api/transcribe)


```ts
function transcribe(params): Promise<string>;
```

Collects all streaming results from [`transcribeStream()`](../transcribeStream) into a single string.

## Parameters

| Name              | Type               | Required? | Description                                        |
| ----------------- | ------------------ | :-------: | -------------------------------------------------- |
| params.modelId    | `string`           |     ✓     | The identifier of the transcription model          |
| params.audioChunk | `string \| Buffer` |     ✓     | Audio input as file path (string) or audio buffer  |
| params.prompt     | `string`           |     ✗     | Optional initial prompt to guide the transcription |

## Returns

`Promise<string>` — The complete transcribed text.

## Throws

| Error                  | When                |
| ---------------------- | ------------------- |
| `TRANSCRIPTION_FAILED` | Transcription fails |

## Example

```typescript
const text = await transcribe({
  modelId: "whisper-model",
  audioChunk: "/path/to/audio.wav",
});
console.log(text);
```

## `SUPPORTED_AUDIO_FORMATS`

Use this exported constant to check which audio formats are accepted before passing a file to `transcribe()`. Avoids runtime errors from unsupported formats.

```typescript
import { SUPPORTED_AUDIO_FORMATS } from "@qvac/sdk";

const ext = filePath.split(".").pop();
if (!SUPPORTED_AUDIO_FORMATS.includes(ext)) {
  console.error(`Unsupported format: ${ext}`);
}
```


# transcribeStream( ) (/v0.7.0/sdk/api/transcribeStream)


```ts
function transcribeStream(params): AsyncGenerator<string>;
```

Yields text chunks as they become available from the model.

## Parameters

| Name              | Type               | Required? | Description                                        |
| ----------------- | ------------------ | :-------: | -------------------------------------------------- |
| params.modelId    | `string`           |     ✓     | The identifier of the transcription model          |
| params.audioChunk | `string \| Buffer` |     ✓     | Audio input as file path (string) or audio buffer  |
| params.prompt     | `string`           |     ✗     | Optional initial prompt to guide the transcription |

## Returns

`AsyncGenerator<string>` — Yields text chunks as they are transcribed.

## Throws

| Error                  | When                |
| ---------------------- | ------------------- |
| `TRANSCRIPTION_FAILED` | Transcription fails |

## Example

```typescript
const stream = transcribeStream({
  modelId: "whisper-model",
  audioChunk: "/path/to/audio.wav",
});

for await (const textChunk of stream) {
  process.stdout.write(textChunk);
}
```

## `SUPPORTED_AUDIO_FORMATS`

Use this exported constant to check which audio formats are accepted before passing a file to `transcribeStream()`. Avoids runtime errors from unsupported formats.

```typescript
import { SUPPORTED_AUDIO_FORMATS } from "@qvac/sdk";

const ext = filePath.split(".").pop();
if (!SUPPORTED_AUDIO_FORMATS.includes(ext)) {
  console.error(`Unsupported format: ${ext}`);
}
```


# translate( ) (/v0.7.0/sdk/api/translate)


```ts
function translate(params): {
  tokenStream: AsyncGenerator<string>;
  text: Promise<string>;
  stats: Promise<TranslationStats | undefined>;
};
```

Supports both NMT (Neural Machine Translation) and LLM models.

## Parameters

| Name             | Type                 | Required? | Default     | Description                                                          |
| ---------------- | -------------------- | :-------: | ----------- | -------------------------------------------------------------------- |
| params.modelId   | `string`             |     ✓     | —           | The identifier of the translation model                              |
| params.text      | `string \| string[]` |     ✓     | —           | The input text(s) to translate. NMT supports array; LLM only string. |
| params.modelType | `"nmt" \| "llm"`     |     ✓     | —           | The type of translation model                                        |
| params.stream    | `boolean`            |     ✗     | `true`      | Whether to stream tokens or return complete response                 |
| params.from      | `string`             |     ✗     | auto-detect | Source language code (LLM only)                                      |
| params.to        | `string`             |  ✓ (LLM)  | —           | Target language code (LLM only)                                      |
| params.context   | `string`             |     ✗     | —           | Additional context for translation (LLM only)                        |

## Returns

`object` — Object with the following fields:

| Field       | Type                                                              | Description                 |
| ----------- | ----------------------------------------------------------------- | --------------------------- |
| tokenStream | `AsyncGenerator<string>`                                          | Stream of translated tokens |
| text        | `Promise<string>`                                                 | Complete translated text    |
| stats       | `Promise<`[`TranslationStats`](#translationstats) `\| undefined>` | Translation statistics      |

### `TranslationStats`

| Field           | Type     | Description                |
| --------------- | -------- | -------------------------- |
| processedTokens | `number` | Number of tokens processed |

## Throws

| Error                | When              |
| -------------------- | ----------------- |
| `TRANSLATION_FAILED` | Translation fails |

## Example

```typescript
// Streaming mode (default)
const result = translate({
  modelId: "nmt-model",
  text: "Hello world",
  from: "en",
  to: "es",
  modelType: "llm",
});

for await (const token of result.tokenStream) {
  process.stdout.write(token);
}

// Non-streaming mode
const response = translate({
  modelId: "nmt-model",
  text: "Hello world",
  from: "en",
  to: "es",
  modelType: "llm",
  stream: false,
});

console.log(await response.text);
```


# unloadModel( ) (/v0.7.0/sdk/api/unloadModel)


```ts
function unloadModel(params): Promise<void>;
```

When the last model is unloaded and no providers are active, the RPC connection is automatically closed, allowing the process to exit naturally.

## Parameters

| Name                | Type      | Required? | Default | Description                                  |
| ------------------- | --------- | :-------: | ------- | -------------------------------------------- |
| params.modelId      | `string`  |     ✓     | —       | The unique identifier of the model to unload |
| params.clearStorage | `boolean` |     ✗     | `false` | Whether to clear the storage for the model   |

## Returns

`Promise<void>` — Resolves when the model is unloaded.

## Throws

| Error                   | When                                                  |
| ----------------------- | ----------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"unloadModel"` |
| `MODEL_UNLOAD_FAILED`   | The server reports unload failure                     |

## Example

```typescript
await unloadModel({ modelId: "model-123" });

// Unload and clear cached files
await unloadModel({ modelId: "model-123", clearStorage: true });
```


# cancel( ) (/v0.8.0/sdk/api/cancel)


```ts
function cancel(params): Promise<void>;
```

## Parameters

| Name   | Type                            | Required? | Description                         |
| ------ | ------------------------------- | :-------: | ----------------------------------- |
| params | [`CancelParams`](#cancelparams) |     ✓     | The parameters for the cancellation |

### `CancelParams`

Discriminated union on `operation`. One of:

#### Cancel inference

| Field     | Type          | Required? | Description                          |
| --------- | ------------- | :-------: | ------------------------------------ |
| operation | `"inference"` |     ✓     | Operation type                       |
| modelId   | `string`      |     ✓     | The model ID to cancel inference for |

#### Cancel download

| Field       | Type              | Required? | Default | Description                                |
| ----------- | ----------------- | :-------: | ------- | ------------------------------------------ |
| operation   | `"downloadAsset"` |     ✓     | —       | Operation type                             |
| downloadKey | `string`          |     ✓     | —       | The download key to cancel                 |
| clearCache  | `boolean`         |     ✗     | `false` | If true, deletes the partial download file |

#### Cancel RAG

| Field     | Type     | Required? | Default     | Description                 |
| --------- | -------- | :-------: | ----------- | --------------------------- |
| operation | `"rag"`  |     ✓     | —           | Operation type              |
| workspace | `string` |     ✗     | `"default"` | The RAG workspace to cancel |

## Returns

`Promise<void>` — Resolves when the operation is cancelled.

## Throws

| Error                   | When                                             |
| ----------------------- | ------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"cancel"` |
| `CANCEL_FAILED`         | The server reports cancellation failure          |

## Examples

```ts
// Cancel inference
await cancel({ operation: "inference", modelId: "model-123" });
```

```ts
// Pause download (preserves partial file for automatic resume)
await cancel({ operation: "downloadAsset", downloadKey: "download-key" });
```

```ts
// Cancel download completely (deletes partial file)
await cancel({ operation: "downloadAsset", downloadKey: "download-key", clearCache: true });
```

```ts
// Cancel RAG operation on default workspace
await cancel({ operation: "rag" });
```

```ts
// Cancel RAG operation on specific workspace
await cancel({ operation: "rag", workspace: "my-workspace" });
```


# close( ) (/v0.8.0/sdk/api/close)


```ts
function close(): Promise<void>;
```

Safe to call multiple times — subsequent calls are a no-op if already closed.

## Returns

`Promise<void>` — Resolves when the connection is closed.

## Example

```typescript
import { loadModel, completion, close } from "@qvac/sdk";

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
});

const result = completion({
  modelId,
  history: [{ role: "user", content: "Hello" }],
});

console.log(await result.text);

await close();
```


# completion( ) (/v0.8.0/sdk/api/completion)


```ts
function completion(params): {
  tokenStream: AsyncGenerator<string>;
  toolCallStream: AsyncGenerator<ToolCallEvent>;
  text: Promise<string>;
  toolCalls: Promise<ToolCallWithCall[]>;
  stats: Promise<CompletionStats | undefined>;
};
```

## Parameters

| Name   | Type                                    | Required? | Description               |
| ------ | --------------------------------------- | :-------: | ------------------------- |
| params | [`CompletionParams`](#completionparams) |     ✓     | The completion parameters |

### `CompletionParams`

| Field            | Type                                               | Required? | Default | Description                                                         |
| ---------------- | -------------------------------------------------- | :-------: | ------- | ------------------------------------------------------------------- |
| modelId          | `string`                                           |     ✓     | —       | The identifier of the model to use for completion                   |
| history          | [`HistoryMessage[]`](#historymessage)              |     ✓     | —       | Array of conversation messages                                      |
| stream           | `boolean`                                          |     ✗     | `true`  | Whether to stream tokens or return complete response                |
| tools            | [`Tool[]`](#tool) `\|` [`ToolInput[]`](#toolinput) |     ✗     | —       | Optional array of tools (Zod-schema ToolInput or full Tool objects) |
| mcp              | [`McpClientInput[]`](#mcpclientinput)              |     ✗     | —       | Optional array of MCP client inputs for tool integration            |
| kvCache          | `boolean \| string`                                |     ✗     | —       | KV cache configuration — see [kvCache](#kvcache)                    |
| generationParams | [`GenerationParams`](#generationparams)            |     ✗     | —       | Optional sampling / generation parameters                           |
| rpcOptions       | [`RPCOptions`](../index#rpcoptions)                |     ✗     | —       | Optional RPC transport options                                      |

#### `HistoryMessage`

| Field       | Type                          | Required? | Description                                              |
| ----------- | ----------------------------- | :-------: | -------------------------------------------------------- |
| role        | `string`                      |     ✓     | Message role (e.g., `"user"`, `"assistant"`, `"system"`) |
| content     | `string`                      |     ✓     | Message content                                          |
| attachments | [`Attachment[]`](#attachment) |     ✗     | Optional file attachments for multimodal models          |

##### `Attachment`

| Field | Type     | Required? | Description                 |
| ----- | -------- | :-------: | --------------------------- |
| path  | `string` |     ✓     | File path to the attachment |

#### `Tool`

Full tool definition in JSON Schema format (sent to the model):

| Field                 | Type                                            | Required? | Description                                    |
| --------------------- | ----------------------------------------------- | :-------: | ---------------------------------------------- |
| type                  | `"function"`                                    |     ✓     | Always `"function"`                            |
| name                  | `string`                                        |     ✓     | Tool name                                      |
| description           | `string`                                        |     ✓     | What the tool does                             |
| parameters            | `object`                                        |     ✓     | JSON Schema object describing the tool's input |
| parameters.type       | `"object"`                                      |     ✓     | Always `"object"`                              |
| parameters.properties | `Record<string, { type, description?, enum? }>` |     ✓     | Parameter definitions                          |
| parameters.required   | `string[]`                                      |     ✗     | Required parameter names                       |

#### `ToolInput`

Simplified tool definition using Zod schemas (auto-converted to `Tool` internally):

| Field       | Type                                                  | Required? | Description                                                                                        |
| ----------- | ----------------------------------------------------- | :-------: | -------------------------------------------------------------------------------------------------- |
| name        | `string`                                              |     ✓     | Tool name                                                                                          |
| description | `string`                                              |     ✓     | What the tool does                                                                                 |
| parameters  | `ZodObject`                                           |     ✓     | Zod schema describing the tool's input                                                             |
| handler     | `(args: Record<string, unknown>) => Promise<unknown>` |     ✗     | Handler function — when provided, returned `ToolCallWithCall` objects include an `invoke()` method |

#### `McpClientInput`

MCP (Model Context Protocol) client input for tool integration:

| Field            | Type                      | Required? | Description                      |
| ---------------- | ------------------------- | :-------: | -------------------------------- |
| client           | [`McpClient`](#mcpclient) |     ✓     | An MCP client instance           |
| includeResources | `boolean`                 |     ✗     | Whether to include MCP resources |

##### `McpClient`

Duck-typed MCP client interface:

| Method        | Signature                                                     | Required? | Description               |
| ------------- | ------------------------------------------------------------- | :-------: | ------------------------- |
| listTools     | `() => Promise<{ tools: McpTool[] }>`                         |     ✓     | Lists available tools     |
| callTool      | `(params: { name, arguments }) => Promise<McpToolCallResult>` |     ✓     | Calls a tool by name      |
| listResources | `() => Promise<{ resources: McpResource[] }>`                 |     ✗     | Lists available resources |
| readResource  | `(params: { uri }) => Promise<{ contents: unknown[] }>`       |     ✗     | Reads a resource by URI   |

#### `kvCache`

Cache files are organized hierarchically: `{kvCacheKey}/{modelId}/{configHash}.bin`

The `configHash` includes model config + system prompt to ensure cache isolation.

| Value                 | Behavior                                                       |
| --------------------- | -------------------------------------------------------------- |
| `true`                | Auto-generate cache key based on conversation history          |
| `"custom-key"`        | Use provided string as cache key for manual session management |
| `false` / `undefined` | No caching                                                     |

When cache exists, only the last message is sent to the model (includes multimodal attachments). Use [`deleteCache()`](../deleteCache) to remove cached sessions.

#### `GenerationParams`

Optional sampling and generation parameters (strict — no extra keys allowed):

| Field              | Type     | Required? | Description                                                                 |
| ------------------ | -------- | :-------: | --------------------------------------------------------------------------- |
| temp               | `number` |     ✗     | Temperature (0–2)                                                           |
| top\_p             | `number` |     ✗     | Top-p (nucleus) sampling (0–1)                                              |
| top\_k             | `number` |     ✗     | Top-k sampling                                                              |
| predict            | `number` |     ✗     | Max tokens to predict. `-1` = until stop token, `-2` = until context filled |
| seed               | `number` |     ✗     | Random seed for reproducibility                                             |
| frequency\_penalty | `number` |     ✗     | Frequency penalty                                                           |
| presence\_penalty  | `number` |     ✗     | Presence penalty                                                            |
| repeat\_penalty    | `number` |     ✗     | Repeat penalty                                                              |

## Returns

`object` — Object with the following fields:

| Field          | Type                                                            | Description                                                                          |
| -------------- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| tokenStream    | `AsyncGenerator<string>`                                        | Stream of generated tokens                                                           |
| toolCallStream | `AsyncGenerator<`[`ToolCallEvent`](#toolcallevent)`>`           | Stream of tool call events                                                           |
| text           | `Promise<string>`                                               | Complete generated text (resolves after stream ends)                                 |
| toolCalls      | `Promise<`[`ToolCallWithCall[]`](#toolcallwithcall)`>`          | Tool calls made during completion (may include `invoke()` when handler is available) |
| stats          | `Promise<`[`CompletionStats`](#completionstats) `\| undefined>` | Performance statistics                                                               |

### `CompletionStats`

| Field            | Type     | Description                         |
| ---------------- | -------- | ----------------------------------- |
| timeToFirstToken | `number` | Time to first token in milliseconds |
| tokensPerSecond  | `number` | Tokens generated per second         |
| cacheTokens      | `number` | Number of cached tokens             |

### `ToolCallEvent`

Discriminated union on `type`. One of:

**Tool call:**

| Field          | Type                      | Description              |
| -------------- | ------------------------- | ------------------------ |
| type           | `"toolCall"`              | Event type               |
| call.id        | `string`                  | Call identifier          |
| call.name      | `string`                  | Tool name                |
| call.arguments | `Record<string, unknown>` | Tool arguments           |
| call.raw       | `string`                  | Raw call text (optional) |

**Tool call error:**

| Field         | Type                                                    | Description         |
| ------------- | ------------------------------------------------------- | ------------------- |
| type          | `"toolCallError"`                                       | Event type          |
| error.code    | `"PARSE_ERROR" \| "VALIDATION_ERROR" \| "UNKNOWN_TOOL"` | Error code          |
| error.message | `string`                                                | Error message       |
| error.raw     | `string`                                                | Raw text (optional) |

### `ToolCallWithCall`

Extends `ToolCall` with an optional `invoke()` method:

| Field     | Type                      | Description                                                              |
| --------- | ------------------------- | ------------------------------------------------------------------------ |
| id        | `string`                  | Call identifier                                                          |
| name      | `string`                  | Tool name                                                                |
| arguments | `Record<string, unknown>` | Tool arguments                                                           |
| raw       | `string`                  | Raw call text (optional)                                                 |
| invoke    | `() => Promise<unknown>`  | Executes the tool handler (present when a matching handler was provided) |

## Throws

| Error                 | When                         |
| --------------------- | ---------------------------- |
| `INVALID_TOOLS_ARRAY` | Invalid tools array provided |
| `INVALID_TOOL_SCHEMA` | A tool has an invalid schema |

## Example

```typescript
import { z } from "zod";

const result = completion({
  modelId: "llama-2",
  history: [
    { role: "user", content: "What's the weather in Tokyo?" }
  ],
  stream: true,
  tools: [{
    name: "get_weather",
    description: "Get current weather",
    parameters: z.object({
      city: z.string().describe("City name"),
    }),
    handler: async (args) => {
      return { temperature: 22, condition: "sunny" };
    }
  }],
  generationParams: {
    temp: 0.7,
    top_p: 0.9,
  }
});

for await (const token of result.tokenStream) {
  process.stdout.write(token);
}

for (const toolCall of await result.toolCalls) {
  if (toolCall.invoke) {
    const toolResult = await toolCall.invoke();
    console.log(toolResult);
  }
}
```


# defineHandler( ) (/v0.8.0/sdk/api/defineHandler)


```ts
function defineHandler<TRequest extends ZodType, TResponse extends ZodType>(
  definition: PluginHandlerDefinition<TRequest, TResponse>
): PluginHandlerDefinition<TRequest, TResponse>;
```

## Parameters

| Name       | Type                                                  | Required? | Description                                              |
| ---------- | ----------------------------------------------------- | :-------: | -------------------------------------------------------- |
| definition | [`PluginHandlerDefinition`](#pluginhandlerdefinition) |     ✓     | The handler definition with schemas and handler function |

### `PluginHandlerDefinition`

| Field          | Type                                                         | Required? | Description                                                                   |
| -------------- | ------------------------------------------------------------ | :-------: | ----------------------------------------------------------------------------- |
| requestSchema  | `ZodType`                                                    |     ✓     | Zod schema for validating incoming requests                                   |
| responseSchema | `ZodType`                                                    |     ✓     | Zod schema for validating outgoing responses                                  |
| streaming      | `boolean`                                                    |     ✓     | Whether this handler uses streaming responses                                 |
| handler        | `(request) => Promise<response> \| AsyncGenerator<response>` |     ✓     | The handler function — receives validated request, returns validated response |

## Returns

`PluginHandlerDefinition<TRequest, TResponse>` — The same definition object, with full type inference applied. This is an identity function used for type checking.


# definePlugin( ) (/v0.8.0/sdk/api/definePlugin)


```ts
function definePlugin<T extends QvacPlugin>(plugin: T): T;
```

## Parameters

| Name   | Type                        | Required? | Description           |
| ------ | --------------------------- | :-------: | --------------------- |
| plugin | [`QvacPlugin`](#qvacplugin) |     ✓     | The plugin definition |

### `QvacPlugin`

| Field                          | Type                                                                                                 | Required? | Description                                                                                                                                                                                                  |
| ------------------------------ | ---------------------------------------------------------------------------------------------------- | :-------: | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| modelType                      | `string`                                                                                             |     ✓     | Unique identifier for the model type this plugin handles                                                                                                                                                     |
| displayName                    | `string`                                                                                             |     ✓     | Human-readable name for the plugin                                                                                                                                                                           |
| addonPackage                   | `string`                                                                                             |     ✓     | The npm package name of the addon this plugin wraps                                                                                                                                                          |
| createModel                    | `(params:` [`CreateModelParams`](#createmodelparams)`) =>` [`PluginModelResult`](#pluginmodelresult) |     ✓     | Factory function that creates a model instance                                                                                                                                                               |
| handlers                       | `Record<string,` [`PluginHandlerDefinition`](../defineHandler#pluginhandlerdefinition)`>`            |     ✓     | Map of handler names to handler definitions                                                                                                                                                                  |
| logging                        | [`PluginLogging`](#pluginlogging)                                                                    |     ✗     | Optional logging configuration                                                                                                                                                                               |
| loadConfigSchema               | `ZodType`                                                                                            |     ✗     | Zod schema used to validate and parse the `modelConfig` passed to `loadModel()`                                                                                                                              |
| resolveConfig                  | `(modelConfig, ctx) => Promise<ResolveResult>`                                                       |     ✗     | Optional hook to resolve model sources in modelConfig to local paths. Called before `createModel` if the plugin needs to download/resolve artifacts. Returns transformed config and optional artifact paths. |
| skipPrimaryModelPathValidation | `boolean`                                                                                            |     ✗     | When true, skips file-existence validation for modelPath. Use for plugins that derive paths from config.                                                                                                     |

### `CreateModelParams`

| Field       | Type                      | Required? | Description                                                                               |
| ----------- | ------------------------- | :-------: | ----------------------------------------------------------------------------------------- |
| modelId     | `string`                  |     ✓     | The model identifier                                                                      |
| modelPath   | `string`                  |     ✓     | Path to the model file                                                                    |
| modelConfig | `Record<string, unknown>` |     ✗     | Model-specific configuration                                                              |
| modelName   | `string`                  |     ✗     | Human-readable model name                                                                 |
| artifacts   | `Record<string, string>`  |     ✗     | Additional file paths (e.g., `projectionModelPath`, `vadModelPath`, `ttsConfigModelPath`) |

### `PluginModelResult`

| Field  | Type                          | Description                      |
| ------ | ----------------------------- | -------------------------------- |
| model  | [`PluginModel`](#pluginmodel) | The model instance               |
| loader | `unknown`                     | Engine-specific loader reference |

### `PluginModel`

| Method | Signature                            | Required? | Description                 |
| ------ | ------------------------------------ | :-------: | --------------------------- |
| load   | `(force?: boolean) => Promise<void>` |     ✓     | Loads the model into memory |
| unload | `() => void \| Promise<void>`        |     ✗     | Releases model resources    |

### `PluginLogging`

| Field     | Type      | Required? | Description              |
| --------- | --------- | :-------: | ------------------------ |
| module    | `unknown` |     ✓     | The addon logging module |
| namespace | `string`  |     ✓     | Logger namespace         |

## Returns

`T` — The same plugin object, with full type inference applied. This is an identity function used for type checking.

## Note

See the [Write a custom plugin](/sdk/examples/utilities/write-custom-plugin) guide for a full walkthrough.


# deleteCache( ) (/v0.8.0/sdk/api/deleteCache)


```ts
function deleteCache(params): Promise<{ success: boolean }>;
```

## Parameters

| Name   | Type                                      | Required? | Description                 |
| ------ | ----------------------------------------- | :-------: | --------------------------- |
| params | [`DeleteCacheParams`](#deletecacheparams) |     ✓     | The delete cache parameters |

### `DeleteCacheParams`

Union type. One of:

#### Delete all caches

| Field | Type   | Required? | Description             |
| ----- | ------ | :-------: | ----------------------- |
| all   | `true` |     ✓     | Deletes all cache files |

#### Delete by cache key

| Field      | Type     | Required? | Description                                                                                      |
| ---------- | -------- | :-------: | ------------------------------------------------------------------------------------------------ |
| kvCacheKey | `string` |     ✓     | The cache key to delete                                                                          |
| modelId    | `string` |     ✗     | Specific model ID to delete within the cache key. If not provided, deletes the entire cache key. |

## Returns

`Promise<{ success: boolean }>` — Resolves with the success status.

## Throws

| Error                         | When                                        |
| ----------------------------- | ------------------------------------------- |
| `INVALID_DELETE_CACHE_PARAMS` | Neither `all` nor `kvCacheKey` was provided |
| `DELETE_CACHE_FAILED`         | The server reports cache deletion failure   |

## Examples

```typescript
// Delete all caches
await deleteCache({ all: true });

// Delete entire cache key (all models)
await deleteCache({ kvCacheKey: "my-session" });

// Delete only specific model within cache key
await deleteCache({ kvCacheKey: "my-session", modelId: "model-abc123" });
```


# downloadAsset( ) (/v0.8.0/sdk/api/downloadAsset)


```ts
function downloadAsset(options, rpcOptions?): Promise<string>;
```

## Parameters

| Name       | Type                                            | Required? | Description                    |
| ---------- | ----------------------------------------------- | :-------: | ------------------------------ |
| options    | [`DownloadAssetOptions`](#downloadassetoptions) |     ✓     | Download configuration         |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)             |     ✗     | Optional RPC transport options |

### `DownloadAssetOptions`

| Field      | Type                                                                              | Required? | Description                                                                                           |
| ---------- | --------------------------------------------------------------------------------- | :-------: | ----------------------------------------------------------------------------------------------------- |
| assetSrc   | `string`                                                                          |     ✓     | The location from which the asset is downloaded (local path, remote URL, or Hyperdrive `pear://` URL) |
| seed       | `boolean`                                                                         |     ✗     | Whether to seed the asset on Hyperdrive after download                                                |
| onProgress | `(progress:` [`ModelProgressUpdate`](../loadModel#modelprogressupdate)`) => void` |     ✗     | Callback for real-time download progress updates. When provided, streaming is used.                   |

## Returns

`Promise<string>` — Resolves to the asset ID (either the provided `assetSrc` or a generated ID).

## Throws

| Error                           | When                                                              |
| ------------------------------- | ----------------------------------------------------------------- |
| `DOWNLOAD_ASSET_FAILED`         | The download operation fails                                      |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends without a final response (when using `onProgress`) |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"downloadAsset"`           |

## Example

```typescript
// Download model without loading
const assetId = await downloadAsset({
  assetSrc: "/path/to/model.gguf",
  seed: true
});

// Download with progress tracking
const assetId = await downloadAsset({
  assetSrc: "pear://key123/model.gguf",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});
```


# embed( ) (/v0.8.0/sdk/api/embed)


```ts
function embed(params: { modelId: string; text: string }): Promise<number[]>;
function embed(params: { modelId: string; text: string[] }): Promise<number[][]>;
```

## Parameters

| Name    | Type                                | Required? | Description                    |
| ------- | ----------------------------------- | :-------: | ------------------------------ |
| params  | [`EmbedParams`](#embedparams)       |     ✓     | The embedding parameters       |
| options | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options |

### `EmbedParams`

| Field   | Type                 | Required? | Description                                                                                    |
| ------- | -------------------- | :-------: | ---------------------------------------------------------------------------------------------- |
| modelId | `string`             |     ✓     | The identifier of the embedding model to use                                                   |
| text    | `string \| string[]` |     ✓     | The input text(s) to embed. A single string returns `number[]`; an array returns `number[][]`. |

## Returns

* `Promise<number[]>` — When `text` is a single string, returns the embedding vector.
* `Promise<number[][]>` — When `text` is an array, returns an array of embedding vectors.

## Throws

| Error                   | When                                            |
| ----------------------- | ----------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"embed"` |

## Examples

```typescript
// Single text
const vector = await embed({ modelId: "embedding-model", text: "Hello world" });
console.log(vector.length); // e.g. 384

// Multiple texts (batch)
const vectors = await embed({
  modelId: "embedding-model",
  text: ["Hello world", "How are you?"]
});
console.log(vectors.length); // 2
```


# Errors (/v0.8.0/sdk/api/errors)


## Example

```typescript
import { SDK_CLIENT_ERROR_CODES, SDK_SERVER_ERROR_CODES } from "@qvac/sdk";

try {
  await loadModel({ modelSrc: "/path/to/model.gguf", modelType: "llm" });
} catch (error) {
  if (error.code === SDK_SERVER_ERROR_CODES.MODEL_LOAD_FAILED) {
    // handle model load failure
  }
}
```

## Client errors

Thrown on the client side (response validation, RPC, provider). Access via `SDK_CLIENT_ERROR_CODES.{ERROR_NAME}`.

| Error                             | Code  | Summary                                                     | Thrown by                                                                                                                                                                                                                           |
| --------------------------------- | ----- | ----------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE`           | 50001 | Invalid response type received.                             | `cancel()`, `downloadAsset()`, `embed()`, `getModelInfo()`, `loadModel()`, `loggingStream()`, `ping()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`, `startQVACProvider()`, `stopQVACProvider()`, `unloadModel()` |
| `INVALID_OPERATION_IN_RESPONSE`   | 50002 | Response operation didn't match the expected RAG operation. | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                       |
| `STREAM_ENDED_WITHOUT_RESPONSE`   | 50003 | Streaming RPC ended without a final response.               | `downloadAsset()`, `loadModel()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                     |
| `INVALID_AUDIO_CHUNK_TYPE`        | 50004 | Invalid audio chunk input type provided.                    | `transcribe()`, `transcribeStream()`                                                                                                                                                                                                |
| `INVALID_TOOLS_ARRAY`             | 50005 | Invalid tools array provided.                               | `completion()`                                                                                                                                                                                                                      |
| `INVALID_TOOL_SCHEMA`             | 50006 | Invalid tool schema provided.                               | `completion()`                                                                                                                                                                                                                      |
| `OCR_FAILED`                      | 50007 | OCR operation failed.                                       | `ocr()`                                                                                                                                                                                                                             |
| `RPC_NO_HANDLER`                  | 50200 | No handler registered for request type.                     | Internal RPC layer                                                                                                                                                                                                                  |
| `RPC_REQUEST_NOT_SENT`            | 50201 | Request has not been sent yet.                              | Internal RPC layer                                                                                                                                                                                                                  |
| `RPC_RESPONSE_STREAM_NOT_CREATED` | 50202 | Response stream not created.                                | Internal RPC layer                                                                                                                                                                                                                  |
| `RPC_CONNECTION_FAILED`           | 50203 | RPC connection failed.                                      | Any API call (startup/transport)                                                                                                                                                                                                    |
| `PROVIDER_START_FAILED`           | 50400 | Failed to start provider.                                   | `startQVACProvider()`                                                                                                                                                                                                               |
| `PROVIDER_STOP_FAILED`            | 50401 | Failed to stop provider.                                    | `stopQVACProvider()`                                                                                                                                                                                                                |
| `DELEGATE_NO_FINAL_RESPONSE`      | 50402 | No final response received from delegated provider.         | `loadModel()`, `completion()`                                                                                                                                                                                                       |
| `DELEGATE_PROVIDER_ERROR`         | 50403 | Delegated provider returned an error.                       | `loadModel()`, `completion()`                                                                                                                                                                                                       |
| `DELEGATE_CONNECTION_FAILED`      | 50404 | Failed to connect to delegated provider.                    | `loadModel()`, `completion()`                                                                                                                                                                                                       |
| `SDK_NOT_FOUND_IN_NODE_MODULES`   | 50600 | QVAC SDK not found in node\_modules.                        | Any API call (during SDK initialization)                                                                                                                                                                                            |
| `WORKER_FILE_NOT_FOUND`           | 50601 | Worker file not found.                                      | Any API call (during SDK initialization)                                                                                                                                                                                            |
| `CONFIG_FILE_NOT_FOUND`           | 50602 | Config file not found.                                      | Any API call (during SDK initialization)                                                                                                                                                                                            |
| `CONFIG_FILE_INVALID`             | 50603 | Config file is invalid.                                     | Any API call (during SDK initialization)                                                                                                                                                                                            |
| `CONFIG_FILE_PARSE_FAILED`        | 50604 | Failed to import/parse a `qvac.config.*` file.              | Any API call (during SDK initialization)                                                                                                                                                                                            |
| `CONFIG_VALIDATION_FAILED`        | 50605 | Config validation failed (schema mismatch / invalid JSON).  | Any API call (during SDK initialization)                                                                                                                                                                                            |
| `PEAR_WORKER_ENTRY_REQUIRED`      | 50606 | No plugins registered; Pear worker entry required.          | Any API call (during SDK initialization)                                                                                                                                                                                            |
| `MULTIPLE_SDK_INSTALLATIONS`      | 50607 | Multiple QVAC SDK installations found.                      | Any API call (during SDK initialization)                                                                                                                                                                                            |
| `PROFILER_INVALID_CAPACITY`       | 50800 | Ring buffer capacity is below minimum.                      | `profiler.enable()`                                                                                                                                                                                                                 |

## Server errors

Thrown by the server (model operations, downloads, cache, RAG). Access via `SDK_SERVER_ERROR_CODES.{ERROR_NAME}`.

| Error                                  | Code  | Summary                                                         | Thrown by                                                                                                                                                                                                                      |
| -------------------------------------- | ----- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `MODEL_ALREADY_REGISTERED`             | 52001 | Model with the same ID is already registered.                   | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_NOT_FOUND`                      | 52002 | Model ID not found in the registry.                             | `completion()`, `embed()`, `getModelInfo()`, `loggingStream()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`, `unloadModel()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()` |
| `MODEL_NOT_LOADED`                     | 52003 | Model exists but is not loaded.                                 | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`                                                                                                                      |
| `MODEL_IS_DELEGATED`                   | 52004 | Model is delegated and cannot be accessed as a local model.     | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`                                                                                                                      |
| `UNKNOWN_MODEL_TYPE`                   | 52005 | Unknown `modelType` in `loadModel()`.                           | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_LOAD_FAILED`                    | 52200 | Failed to load model (generic).                                 | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_NOT_FOUND`                 | 52201 | Model file not found at given path.                             | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_NOT_FOUND_IN_DIR`          | 52202 | Expected model file not found in directory (e.g., by type).     | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_LOCATE_FAILED`             | 52203 | Failed to locate model file for the given `modelType`.          | `loadModel()`                                                                                                                                                                                                                  |
| `PROJECTION_MODEL_REQUIRED`            | 52204 | Projection model source is required for multimodal LLM models.  | `loadModel()`                                                                                                                                                                                                                  |
| `VAD_MODEL_REQUIRED`                   | 52205 | VAD model source is required for this configuration.            | `loadModel()`                                                                                                                                                                                                                  |
| `TTS_ARTIFACTS_REQUIRED`               | 52208 | TTS (Chatterbox) model artifacts are missing.                   | `loadModel()`                                                                                                                                                                                                                  |
| `TTS_REFERENCE_AUDIO_REQUIRED`         | 52209 | TTS (Chatterbox) reference audio is required for voice cloning. | `loadModel()`                                                                                                                                                                                                                  |
| `PARAKEET_ARTIFACTS_REQUIRED`          | 52210 | Parakeet model sources are missing.                             | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_UNLOAD_FAILED`                  | 52400 | Failed to unload model.                                         | `unloadModel()`                                                                                                                                                                                                                |
| `EMBED_FAILED`                         | 52401 | Failed to generate embeddings.                                  | `embed()`                                                                                                                                                                                                                      |
| `EMBED_NO_EMBEDDINGS`                  | 52402 | No embeddings returned from model.                              | `embed()`                                                                                                                                                                                                                      |
| `TRANSCRIPTION_FAILED`                 | 52403 | Transcription failed.                                           | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `AUDIO_FILE_NOT_FOUND`                 | 52404 | Audio file not found or not accessible.                         | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `TRANSLATION_FAILED`                   | 52405 | Translation failed.                                             | `translate()`                                                                                                                                                                                                                  |
| `COMPLETION_FAILED`                    | 52406 | Completion failed.                                              | `completion()`                                                                                                                                                                                                                 |
| `ATTACHMENT_NOT_FOUND`                 | 52407 | Attachment file not found at path.                              | `completion()`                                                                                                                                                                                                                 |
| `CANCEL_FAILED`                        | 52408 | Failed to cancel operation.                                     | `cancel()`                                                                                                                                                                                                                     |
| `TEXT_TO_SPEECH_FAILED`                | 52409 | Text-to-speech operation failed.                                | `textToSpeech()`                                                                                                                                                                                                               |
| `CONFIG_RELOAD_NOT_SUPPORTED`          | 52410 | Model does not support hot config reload.                       | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `MODEL_TYPE_MISMATCH`                  | 52411 | Model type mismatch (expected vs provided).                     | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                       |
| `OCR_FAILED`                           | 52412 | OCR operation failed.                                           | `ocr()`                                                                                                                                                                                                                        |
| `IMAGE_FILE_NOT_FOUND`                 | 52413 | Image file not found or not accessible.                         | `ocr()`                                                                                                                                                                                                                        |
| `INVALID_IMAGE_INPUT`                  | 52414 | Invalid image input type provided.                              | `ocr()`                                                                                                                                                                                                                        |
| `RAG_SAVE_FAILED`                      | 52800 | Failed to save embeddings.                                      | `ragSaveEmbeddings()`                                                                                                                                                                                                          |
| `RAG_SEARCH_FAILED`                    | 52801 | Failed to search embeddings.                                    | `ragSearch()`                                                                                                                                                                                                                  |
| `RAG_DELETE_FAILED`                    | 52802 | Failed to delete embeddings.                                    | `ragDeleteEmbeddings()`                                                                                                                                                                                                        |
| `RAG_UNKNOWN_OPERATION`                | 52803 | Unknown RAG operation.                                          | `ragIngest()`, `ragReindex()`                                                                                                                                                                                                  |
| `RAG_HYPERDB_FAILED`                   | 52804 | HyperDB RAG operation failed.                                   | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_MODEL_MISMATCH`         | 52805 | Workspace is configured for a different embeddings model.       | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_NOT_FOUND`              | 52806 | RAG workspace not found.                                        | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_IN_USE`                 | 52807 | RAG workspace is in use and can't be closed/deleted.            | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_CLOSE_FAILED`           | 52808 | Failed to close RAG workspace.                                  | `ragCloseWorkspace()`                                                                                                                                                                                                          |
| `RAG_LIST_WORKSPACES_FAILED`           | 52809 | Failed to list RAG workspaces.                                  | `ragListWorkspaces()`                                                                                                                                                                                                          |
| `RAG_CHUNK_FAILED`                     | 52810 | Failed to chunk input into embeddings.                          | `ragChunk()`, `ragSaveEmbeddings()`                                                                                                                                                                                            |
| `RAG_WORKSPACE_NOT_OPEN`               | 52811 | RAG workspace is not open.                                      | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `FILE_NOT_FOUND`                       | 53000 | File not found.                                                 | `loadModel()`, `downloadAsset()`                                                                                                                                                                                               |
| `DOWNLOAD_CANCELLED`                   | 53001 | Download cancelled.                                             | `cancel()`, `downloadAsset()`, `loadModel()`                                                                                                                                                                                   |
| `CHECKSUM_VALIDATION_FAILED`           | 53002 | Downloaded file checksum validation failed.                     | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `HTTP_ERROR`                           | 53003 | HTTP request failed with status code.                           | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `NO_RESPONSE_BODY`                     | 53004 | No response body received from HTTP request.                    | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `RESPONSE_BODY_NOT_READABLE`           | 53005 | Response body is not readable.                                  | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `NO_BLOB_FOUND`                        | 53006 | No blob found for the requested resource.                       | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `DOWNLOAD_ASSET_FAILED`                | 53007 | Download failed.                                                | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `SEEDING_NOT_SUPPORTED`                | 53008 | Seeding is only supported for Hyperdrive models.                | `loadModel()`                                                                                                                                                                                                                  |
| `HYPERDRIVE_DOWNLOAD_FAILED`           | 53009 | Hyperdrive download failed.                                     | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `INVALID_SHARD_URL_PATTERN`            | 53010 | Invalid shard URL pattern for sharded downloads.                | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_EXTRACTION_FAILED`            | 53011 | Failed to extract an archive.                                   | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_UNSUPPORTED_TYPE`             | 53012 | Unsupported archive type.                                       | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_MISSING_SHARDS`               | 53013 | Archive is missing required shards.                             | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `PARTIAL_DOWNLOAD_OFFLINE`             | 53014 | Partial download exists but offline prevents resuming.          | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `REGISTRY_DOWNLOAD_FAILED`             | 53015 | Registry download failed.                                       | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `DELETE_CACHE_FAILED`                  | 53200 | Failed to delete cache.                                         | `deleteCache()`                                                                                                                                                                                                                |
| `INVALID_DELETE_CACHE_PARAMS`          | 53201 | Invalid parameters for `deleteCache()`.                         | `deleteCache()`                                                                                                                                                                                                                |
| `CACHE_DIR_NOT_ABSOLUTE`               | 53202 | Cache directory must be an absolute path.                       | `deleteCache()`                                                                                                                                                                                                                |
| `CACHE_DIR_NOT_WRITABLE`               | 53203 | Cache directory is not writable.                                | `deleteCache()`                                                                                                                                                                                                                |
| `SET_CONFIG_FAILED`                    | 53350 | Failed to set config.                                           | Any API call (during SDK initialization)                                                                                                                                                                                       |
| `CONFIG_ALREADY_SET`                   | 53351 | Config is immutable and has already been set.                   | Any API call (during SDK initialization)                                                                                                                                                                                       |
| `FFMPEG_NOT_AVAILABLE`                 | 53500 | Audio decoding required but FFmpeg is not available.            | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `AUDIO_PLAYER_FAILED`                  | 53501 | Audio player failed.                                            | `textToSpeech()`                                                                                                                                                                                                               |
| `INVALID_AUDIO_CHUNK_TYPE`             | 53502 | Invalid audio chunk type.                                       | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `DELEGATE_NO_FINAL_RESPONSE`           | 53700 | No final response received from delegated provider.             | `completion()`, `loadModel()`                                                                                                                                                                                                  |
| `DELEGATE_CONNECTION_FAILED`           | 53701 | Failed to connect to delegated provider.                        | `completion()`, `loadModel()`                                                                                                                                                                                                  |
| `DELEGATE_PROVIDER_ERROR`              | 53702 | Delegated provider returned an error.                           | `completion()`, `loadModel()`                                                                                                                                                                                                  |
| `RPC_NO_DATA_RECEIVED`                 | 53703 | No data received from request.                                  | Internal server RPC                                                                                                                                                                                                            |
| `RPC_UNKNOWN_REQUEST_TYPE`             | 53704 | Unknown request type received.                                  | Internal server RPC                                                                                                                                                                                                            |
| `PLUGIN_NOT_FOUND`                     | 53850 | Plugin not found for the specified model type.                  | `invokePlugin()`, `invokePluginStream()`, `loadModel()`                                                                                                                                                                        |
| `PLUGIN_HANDLER_NOT_FOUND`             | 53851 | Handler not found in plugin.                                    | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_REQUEST_VALIDATION_FAILED`     | 53852 | Plugin request validation failed.                               | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_RESPONSE_VALIDATION_FAILED`    | 53853 | Plugin response validation failed.                              | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_ALREADY_REGISTERED`            | 53854 | Plugin already registered for model type.                       | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_HANDLER_TYPE_MISMATCH`         | 53855 | Handler type mismatch (streaming vs non-streaming).             | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_LOGGING_INVALID`               | 53856 | Plugin has invalid logging configuration.                       | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_DEFINITION_INVALID`            | 53857 | Plugin definition is invalid.                                   | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_MODEL_TYPE_RESERVED`           | 53858 | Model type is reserved for built-in plugins.                    | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_LOAD_CONFIG_VALIDATION_FAILED` | 53859 | Model config validation failed for plugin.                      | `loadModel()`                                                                                                                                                                                                                  |
| `PATH_TRAVERSAL`                       | 53900 | Path traversal detected.                                        | `loadModel()`, `downloadAsset()`                                                                                                                                                                                               |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED`     | 53950 | Model registry query failed.                                    | `modelRegistryGetModel()`, `modelRegistryList()`, `modelRegistrySearch()`                                                                                                                                                      |


# getLogger( ) (/v0.8.0/sdk/api/getLogger)


```ts
function getLogger(namespace: string, options?: LoggerOptions): Logger;
```

## Parameters

| Name      | Type                              | Required? | Description                                                                                                                    |
| --------- | --------------------------------- | :-------: | ------------------------------------------------------------------------------------------------------------------------------ |
| namespace | `string`                          |     ✓     | Logger namespace (used for identification and filtering)                                                                       |
| options   | [`LoggerOptions`](#loggeroptions) |     ✗     | Optional logger configuration. When omitted, the logger is cached by namespace. When provided, a new logger is always created. |

### `LoggerOptions`

| Field         | Type                                                | Required? | Description                       |
| ------------- | --------------------------------------------------- | :-------: | --------------------------------- |
| level         | `"error" \| "warn" \| "info" \| "debug" \| "trace"` |     ✗     | Log level                         |
| namespace     | `string`                                            |     ✗     | Override namespace                |
| transports    | [`LogTransport[]`](#logtransport)                   |     ✗     | Custom log transports             |
| enableConsole | `boolean`                                           |     ✗     | Whether to output logs to console |

### `LogTransport`

```ts
type LogTransport = (
  level: LogLevel,
  namespace: string,
  message: string,
) => void | Promise<void>;
```

A callback function invoked for each log entry. `LogLevel` is `"error" | "warn" | "info" | "debug" | "trace"`.

## Returns

[`Logger`](#logger) — A logger instance.

### `Logger`

| Method           | Signature                                                | Description                      |
| ---------------- | -------------------------------------------------------- | -------------------------------- |
| error            | `(...args: unknown[]) => void`                           | Log at error level               |
| warn             | `(...args: unknown[]) => void`                           | Log at warn level                |
| info             | `(...args: unknown[]) => void`                           | Log at info level                |
| debug            | `(...args: unknown[]) => void`                           | Log at debug level               |
| trace            | `(...args: unknown[]) => void`                           | Log at trace level               |
| setLevel         | `(level) => void`                                        | Change the log level             |
| getLevel         | `() => LogLevel`                                         | Get the current log level        |
| addTransport     | `(transport:` [`LogTransport`](#logtransport)`) => void` | Add a custom transport           |
| setConsoleOutput | `(enabled: boolean) => void`                             | Enable or disable console output |

## Example

```typescript
import { getLogger } from "@qvac/sdk";

const logger = getLogger("my-app");

logger.info("Application started");
logger.debug("Debug details:", { key: "value" });

logger.setLevel("error");
logger.info("This will not be logged");

const verboseLogger = getLogger("my-app:verbose", {
  level: "debug",
  enableConsole: true,
  transports: [(level, namespace, message) => {
    // Custom transport: write to file, send to server, etc.
  }],
});
```


# getModelByName( ) (/v0.8.0/sdk/api/getModelByName)


```ts
function getModelByName(name: string): RegistryItem | undefined;
```

## Parameters

| Name | Type     | Required? | Description                                               |
| ---- | -------- | :-------: | --------------------------------------------------------- |
| name | `string` |     ✓     | The human-readable model name (e.g., `"Llama 3.2 3B Q4"`) |

## Returns

[`RegistryItem`](#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

### `RegistryItem`

| Field           | Type                                                                                | Description                 |
| --------------- | ----------------------------------------------------------------------------------- | --------------------------- |
| name            | `string`                                                                            | Human-readable model name   |
| registryPath    | `string`                                                                            | Registry path               |
| registrySource  | `string`                                                                            | Registry source             |
| blobCoreKey     | `string`                                                                            | Hyperdrive blob core key    |
| blobBlockOffset | `number`                                                                            | Blob block offset           |
| blobBlockLength | `number`                                                                            | Blob block length           |
| blobByteOffset  | `number`                                                                            | Blob byte offset            |
| modelId         | `string`                                                                            | Unique model identifier     |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type            |
| expectedSize    | `number`                                                                            | Expected file size in bytes |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum            |
| engine          | `string`                                                                            | Inference engine            |
| quantization    | `string`                                                                            | Quantization level          |
| params          | `string`                                                                            | Model parameter count       |

## Example

```typescript
import { getModelByName } from "@qvac/sdk";

const model = getModelByName("Llama 3.2 3B Q4");
if (model) {
  console.log(model.modelId, model.expectedSize);
}
```


# getModelByPath( ) (/v0.8.0/sdk/api/getModelByPath)


```ts
function getModelByPath(registryPath: string): RegistryItem | undefined;
```

## Parameters

| Name         | Type     | Required? | Description                                                                     |
| ------------ | -------- | :-------: | ------------------------------------------------------------------------------- |
| registryPath | `string` |     ✓     | The full registry path of the model (e.g., `"llama-3.2-1b-instruct-q4_0-gguf"`) |

## Returns

[`RegistryItem`](../getModelByName#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

## Example

```typescript
import { getModelByPath } from "@qvac/sdk";

const model = getModelByPath("llama-3.2-1b-instruct-q4_0-gguf");
if (model) {
  console.log(model.name, model.expectedSize);
}
```


# getModelBySrc( ) (/v0.8.0/sdk/api/getModelBySrc)


```ts
function getModelBySrc(modelId: string, blobCoreKey: string): RegistryItem | undefined;
```

## Parameters

| Name        | Type     | Required? | Description                  |
| ----------- | -------- | :-------: | ---------------------------- |
| modelId     | `string` |     ✓     | The unique model identifier  |
| blobCoreKey | `string` |     ✓     | The Hyperdrive blob core key |

## Returns

[`RegistryItem`](../getModelByName#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

## Example

```typescript
import { getModelBySrc } from "@qvac/sdk";

const model = getModelBySrc("model-abc123", "blob-core-key-hex");
if (model) {
  console.log(model.name, model.expectedSize);
}
```


# getModelInfo( ) (/v0.8.0/sdk/api/getModelInfo)


```ts
function getModelInfo(params): Promise<ModelInfo>;
```

## Parameters

| Name        | Type     | Required? | Description               |
| ----------- | -------- | :-------: | ------------------------- |
| params      | `object` |     ✓     | The query parameters      |
| params.name | `string` |     ✓     | The model name to look up |

## Returns

`Promise<`[`ModelInfo`](#modelinfo)`>` — Detailed information about the model.

### `ModelInfo`

| Field           | Type                                                                                | Description                                     |
| --------------- | ----------------------------------------------------------------------------------- | ----------------------------------------------- |
| name            | `string`                                                                            | Model name                                      |
| modelId         | `string`                                                                            | Unique model identifier                         |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type                                |
| expectedSize    | `number`                                                                            | Expected total file size in bytes               |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum                                |
| isCached        | `boolean`                                                                           | Whether the model is fully cached locally       |
| isLoaded        | `boolean`                                                                           | Whether the model is currently loaded in memory |
| cacheFiles      | [`CacheFileInfo[]`](#cachefileinfo)                                                 | Individual cache file details                   |
| registryPath    | `string`                                                                            | Registry path (optional)                        |
| registrySource  | `string`                                                                            | Registry source (optional)                      |
| engine          | `string`                                                                            | Inference engine (optional)                     |
| quantization    | `string`                                                                            | Quantization level (optional)                   |
| params          | `string`                                                                            | Model parameter count (optional)                |
| actualSize      | `number`                                                                            | Actual cached size in bytes (optional)          |
| cachedAt        | `Date`                                                                              | When the model was cached (optional)            |
| loadedInstances | [`LoadedInstance[]`](#loadedinstance)                                               | Currently loaded instances (optional)           |

### `CacheFileInfo`

| Field          | Type      | Description                          |
| -------------- | --------- | ------------------------------------ |
| filename       | `string`  | File name                            |
| path           | `string`  | Full file path                       |
| expectedSize   | `number`  | Expected size in bytes               |
| sha256Checksum | `string`  | SHA-256 checksum                     |
| isCached       | `boolean` | Whether this file is cached          |
| actualSize     | `number`  | Actual file size (optional)          |
| cachedAt       | `Date`    | When this file was cached (optional) |

### `LoadedInstance`

| Field      | Type      | Description                                      |
| ---------- | --------- | ------------------------------------------------ |
| registryId | `string`  | Registry identifier for this loaded instance     |
| loadedAt   | `Date`    | When the model was loaded                        |
| config     | `unknown` | Model configuration used at load time (optional) |

## Throws

| Error                   | When                                                   |
| ----------------------- | ------------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"getModelInfo"` |

## Example

```typescript
import { getModelInfo } from "@qvac/sdk";

const info = await getModelInfo({ name: "Llama 3.2 3B Q4" });
console.log(`Cached: ${info.isCached}, Loaded: ${info.isLoaded}`);
console.log(`Size: ${info.expectedSize} bytes`);
```


# @qvac/sdk (/v0.8.0/sdk/api)


## Overview

`@qvac/sdk` npm package exposes a function-centric, typed JS API.

## Functions

| Function                                             | Summary                                                                                           |
| ---------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| [`cancel()`](./cancel)                               | Cancels an ongoing operation.                                                                     |
| [`close()`](./close)                                 | Closes the SDK client connection and releases all associated resources.                           |
| [`completion()`](./completion)                       | Generates completion from a language model based on conversation history.                         |
| [`defineHandler()`](./defineHandler)                 | Helper function to define a handler with full type inference.                                     |
| [`definePlugin()`](./definePlugin)                   | Helper function to define a plugin with full type inference.                                      |
| [`deleteCache()`](./deleteCache)                     | Deletes KV cache files.                                                                           |
| [`downloadAsset()`](./downloadAsset)                 | Downloads an asset (model file) without loading it into memory.                                   |
| [`embed()`](./embed)                                 | Generates embeddings for a single text using a specified model.                                   |
| [`getLogger()`](./getLogger)                         | Creates or retrieves a namespaced logger instance.                                                |
| [`getModelByName()`](./getModelByName)               | Looks up a model in the built-in catalog by its constant name.                                    |
| [`getModelByPath()`](./getModelByPath)               | Looks up a model in the built-in catalog by its registry path.                                    |
| [`getModelBySrc()`](./getModelBySrc)                 | Looks up a model in the built-in catalog by model file ID and blob core key.                      |
| [`getModelInfo()`](./getModelInfo)                   | Returns status information for a catalog model, including cache state and loaded instances.       |
| [`invokePlugin()`](./invokePlugin)                   | Invoke a non-streaming plugin handler.                                                            |
| [`invokePluginStream()`](./invokePluginStream)       | Invoke a streaming plugin handler.                                                                |
| [`loadModel()`](./loadModel)                         | Loads a machine learning model from a local path, remote URL, or Hyperdrive key.                  |
| [`loggingStream()`](./loggingStream)                 | Opens a logging stream to receive real-time logs.                                                 |
| [`modelRegistryGetModel()`](./modelRegistryGetModel) | Fetches a single model entry from the registry by its path and source.                            |
| [`modelRegistryList()`](./modelRegistryList)         | Returns all available models from the QVAC distributed model registry.                            |
| [`modelRegistrySearch()`](./modelRegistrySearch)     | Searches the model registry with optional filters for model type, engine, and quantization.       |
| [`ocr()`](./ocr)                                     | Performs Optical Character Recognition (OCR) on an image to extract text.                         |
| [`ping()`](./ping)                                   | Sends a ping request to the server and returns the pong response.                                 |
| [`ragChunk()`](./ragChunk)                           | Chunks documents into smaller pieces for embedding.                                               |
| [`ragCloseWorkspace()`](./ragCloseWorkspace)         | Closes a RAG workspace, releasing in-memory resources (Corestore, HyperDB adapter, RAG instance). |
| [`ragDeleteEmbeddings()`](./ragDeleteEmbeddings)     | Deletes document embeddings from the RAG vector database.                                         |
| [`ragDeleteWorkspace()`](./ragDeleteWorkspace)       | Deletes a RAG workspace and all its data.                                                         |
| [`ragIngest()`](./ragIngest)                         | Ingests documents into the RAG vector database.                                                   |
| [`ragListWorkspaces()`](./ragListWorkspaces)         | Lists all RAG workspaces with their open status.                                                  |
| [`ragReindex()`](./ragReindex)                       | Reindexes the RAG database to optimize search performance.                                        |
| [`ragSaveEmbeddings()`](./ragSaveEmbeddings)         | Saves pre-embedded documents to the RAG vector database.                                          |
| [`ragSearch()`](./ragSearch)                         | Searches for similar documents in the RAG vector database.                                        |
| [`startQVACProvider()`](./startQVACProvider)         | Starts a provider service that offers QVAC capabilities to remote peers.                          |
| [`stopQVACProvider()`](./stopQVACProvider)           | Stops a running provider service and leaves the specified topic.                                  |
| [`textToSpeech()`](./textToSpeech)                   | Converts text to speech audio using a loaded TTS model.                                           |
| [`transcribe()`](./transcribe)                       | Collects all streaming results into a single string response.                                     |
| [`transcribeStream()`](./transcribeStream)           | Streams audio transcription results in real-time, yielding text chunks as they become available.  |
| [`translate()`](./translate)                         | Translates text from one language to another using a specified translation model.                 |
| [`unloadModel()`](./unloadModel)                     | Unloads a previously loaded model from the server.                                                |

## Object

| Object                   | Summary                                                                       |
| ------------------------ | ----------------------------------------------------------------------------- |
| [`profiler`](./profiler) | Singleton object that collects and exports profiling data for SDK operations. |

## Shared types

<Card href="./shared-types" title="Shared types">
  Common type definitions used across SDK functions.
</Card>

## Errors

<Card href="./errors" title="Error codes">
  All errors thrown by the SDK and how to handle them via `SDK_CLIENT_ERROR_CODES` and `SDK_SERVER_ERROR_CODES`.
</Card>


# invokePlugin( ) (/v0.8.0/sdk/api/invokePlugin)


```ts
function invokePlugin<TResponse = unknown, TParams = unknown>(
  options: InvokePluginOptions<TParams>,
  rpcOptions?: RPCOptions
): Promise<TResponse>;
```

## Parameters

| Name       | Type                                          | Required? | Description                    |
| ---------- | --------------------------------------------- | :-------: | ------------------------------ |
| options    | [`InvokePluginOptions`](#invokepluginoptions) |     ✓     | The invocation options         |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)           |     ✗     | Optional RPC transport options |

### `InvokePluginOptions`

| Field   | Type      | Required? | Description                                                                         |
| ------- | --------- | :-------: | ----------------------------------------------------------------------------------- |
| modelId | `string`  |     ✓     | The model ID of the loaded plugin model                                             |
| handler | `string`  |     ✓     | The handler name to invoke (as defined in the plugin)                               |
| params  | `TParams` |     ✓     | Parameters to pass to the handler (validated against the handler's `requestSchema`) |

## Returns

`Promise<TResponse>` — The handler's response (validated against the handler's `responseSchema`).

## Throws

| Error                   | When                                                   |
| ----------------------- | ------------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"pluginInvoke"` |

## Example

```typescript
const result = await invokePlugin<{ answer: string }>({
  modelId: "my-custom-model",
  handler: "summarize",
  params: { text: "Long document..." },
});
console.log(result.answer);
```


# invokePluginStream( ) (/v0.8.0/sdk/api/invokePluginStream)


```ts
function invokePluginStream<TResponse = unknown, TParams = unknown>(
  options: InvokePluginOptions<TParams>,
  rpcOptions?: RPCOptions
): AsyncGenerator<TResponse>;
```

## Parameters

| Name       | Type                                                         | Required? | Description                                                          |
| ---------- | ------------------------------------------------------------ | :-------: | -------------------------------------------------------------------- |
| options    | [`InvokePluginOptions`](../invokePlugin#invokepluginoptions) |     ✓     | The invocation options (same as [`invokePlugin()`](../invokePlugin)) |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)                          |     ✗     | Optional RPC transport options                                       |

## Returns

`AsyncGenerator<TResponse>` — Yields streamed chunks from the handler until `done` is received.

## Throws

| Error                   | When                                                          |
| ----------------------- | ------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | A chunk's type does not match expected `"pluginInvokeStream"` |

## Example

```typescript
const stream = invokePluginStream<{ token: string }>({
  modelId: "my-custom-model",
  handler: "generateStream",
  params: { prompt: "Tell me a story" },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.token);
}
```


# loadModel( ) (/v0.8.0/sdk/api/loadModel)


```ts
// Load new model
function loadModel(options: LoadModelOptions, rpcOptions?: RPCOptions): Promise<string>;

// Hot-reload config on an already-loaded model
function loadModel(options: ReloadConfigOptions, rpcOptions?: RPCOptions): Promise<string>;
```

Supports multiple model types: LLM, Whisper (speech recognition), Parakeet (NVIDIA NeMo transcription), embeddings, NMT (translation), TTS, and OCR. Handles local file paths, HTTP/HTTPS URLs, Hyperdrive URLs (`pear://`), and registry URLs.

When `onProgress` is provided, streaming is used for real-time download progress. Otherwise, a simple request-response pattern is used.

## Parameters

| Name       | Type                                                                                       | Required? | Description                                        |
| ---------- | ------------------------------------------------------------------------------------------ | :-------: | -------------------------------------------------- |
| options    | [`LoadModelOptions`](#loadmodeloptions) `\|` [`ReloadConfigOptions`](#reloadconfigoptions) |     ✓     | Configuration for loading or hot-reloading a model |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)                                                        |     ✗     | Optional RPC transport options                     |

### `LoadModelOptions`

Common fields present in all variants:

| Field       | Type                                      | Required? | Default | Description                                                                                                |
| ----------- | ----------------------------------------- | :-------: | ------- | ---------------------------------------------------------------------------------------------------------- |
| modelSrc    | `string \| ModelDescriptor`               |     ✓     | —       | Model source — local path, HTTP(S) URL, Hyperdrive `pear://` URL, registry URL, or a model constant object |
| modelType   | `string`                                  |     ✓     | —       | The type of model — see [model type variants](#model-type-variants)                                        |
| modelConfig | `object`                                  |     ✗     | `{}`    | Model-specific configuration (varies by `modelType`)                                                       |
| seed        | `boolean`                                 |     ✗     | `false` | Whether to seed the model on Hyperdrive after download                                                     |
| delegate    | [`Delegate`](#delegate)                   |     ✗     | —       | Delegation configuration for remote inference                                                              |
| onProgress  | `(progress: ModelProgressUpdate) => void` |     ✗     | —       | Callback for real-time download progress                                                                   |
| logger      | `Logger`                                  |     ✗     | —       | Logger instance — model operation logs are forwarded to this logger                                        |

### `Delegate`

Optional delegation configuration for remote (P2P) inference:

| Field              | Type      | Required? | Default | Description                                                |
| ------------------ | --------- | :-------: | ------- | ---------------------------------------------------------- |
| topic              | `string`  |     ✓     | —       | P2P topic for delegation                                   |
| providerPublicKey  | `string`  |     ✓     | —       | Provider's public key                                      |
| timeout            | `number`  |     ✗     | —       | Timeout in milliseconds (min 100)                          |
| fallbackToLocal    | `boolean` |     ✗     | `false` | Whether to fallback to local inference if delegation fails |
| forceNewConnection | `boolean` |     ✗     | `false` | Force a new connection to the provider                     |

### `ReloadConfigOptions`

Hot-reload configuration on an already-loaded model without reloading the model weights. Currently supported for Whisper models only.

| Field       | Type     | Required? | Description                                      |
| ----------- | -------- | :-------: | ------------------------------------------------ |
| modelId     | `string` |     ✓     | The ID of an existing loaded model (16-char hex) |
| modelType   | `string` |     ✓     | The type of model (must match the loaded model)  |
| modelConfig | `object` |     ✓     | New configuration to apply                       |

### Model type variants

The `modelType` field determines which variant of `modelConfig` is accepted.

#### `"llm"`

All LLM-specific fields live inside `modelConfig`. See [LLM `modelConfig`](#llm-modelconfig) for the full reference.

#### `"whisper"`

All Whisper-specific fields live inside `modelConfig`. See [Whisper `modelConfig`](#whisper-modelconfig) for the full reference.

#### `"parakeet"`

NVIDIA NeMo Parakeet models for speech recognition. `modelConfig` is **required**.

See [Parakeet `modelConfig`](#parakeet-modelconfig) for the full reference.

#### `"embeddings"`

All embeddings fields live inside `modelConfig`. See [Embeddings `modelConfig`](#embeddings-modelconfig) for the full reference.

#### `"nmt"`

`modelConfig` is **required** and is a discriminated union on `engine`. See [NMT `modelConfig`](#nmt-modelconfig) for the full reference.

#### `"tts"`

`modelConfig` is **required** and is a discriminated union on `ttsEngine`:

**Chatterbox engine** (`ttsEngine: "chatterbox"`):

| Field                    | Type                           | Required? | Description                          |
| ------------------------ | ------------------------------ | :-------: | ------------------------------------ |
| ttsEngine                | `"chatterbox"`                 |     ✓     | Engine discriminator                 |
| language                 | `"en" \| "es" \| "de" \| "it"` |     ✓     | Output language                      |
| ttsTokenizerSrc          | `string \| ModelDescriptor`    |     ✓     | Tokenizer model source               |
| ttsSpeechEncoderSrc      | `string \| ModelDescriptor`    |     ✓     | Speech encoder model source          |
| ttsEmbedTokensSrc        | `string \| ModelDescriptor`    |     ✓     | Embed tokens model source            |
| ttsConditionalDecoderSrc | `string \| ModelDescriptor`    |     ✓     | Conditional decoder model source     |
| ttsLanguageModelSrc      | `string \| ModelDescriptor`    |     ✓     | Language model source                |
| referenceAudioSrc        | `string \| ModelDescriptor`    |     ✓     | Reference WAV file for voice cloning |

**Supertonic engine** (`ttsEngine: "supertonic"`):

| Field                | Type                           | Required? | Description                  |
| -------------------- | ------------------------------ | :-------: | ---------------------------- |
| ttsEngine            | `"supertonic"`                 |     ✓     | Engine discriminator         |
| language             | `"en" \| "es" \| "de" \| "it"` |     ✓     | Output language              |
| ttsTokenizerSrc      | `string \| ModelDescriptor`    |     ✓     | Tokenizer model source       |
| ttsTextEncoderSrc    | `string \| ModelDescriptor`    |     ✓     | Text encoder model source    |
| ttsLatentDenoiserSrc | `string \| ModelDescriptor`    |     ✓     | Latent denoiser model source |
| ttsVoiceDecoderSrc   | `string \| ModelDescriptor`    |     ✓     | Voice decoder model source   |
| ttsVoiceSrc          | `string \| ModelDescriptor`    |     ✓     | Voice `.bin` file source     |
| ttsSpeed             | `number`                       |     ✗     | Speech speed multiplier      |
| ttsNumInferenceSteps | `number`                       |     ✗     | Number of inference steps    |

#### `"ocr"`

All OCR-specific fields live inside `modelConfig`. See [OCR `modelConfig`](#ocr-modelconfig) for the full reference.

### Custom plugin

Any `modelType` string that is not a built-in type. `modelConfig` accepts `Record<string, unknown>`.

### `modelConfig` reference

#### LLM `modelConfig`

| Field              | Type                        | Default                          | Description                                                                 |
| ------------------ | --------------------------- | -------------------------------- | --------------------------------------------------------------------------- |
| ctx\_size          | `number`                    | `1024`                           | Context window size                                                         |
| device             | `string`                    | `"gpu"`                          | Device to use                                                               |
| gpu\_layers        | `number`                    | `99`                             | Number of layers offloaded to GPU                                           |
| system\_prompt     | `string`                    | `"You are a helpful assistant."` | System prompt                                                               |
| temp               | `number`                    | —                                | Temperature (0–2)                                                           |
| top\_p             | `number`                    | —                                | Top-p sampling (0–1)                                                        |
| top\_k             | `number`                    | —                                | Top-k sampling (0–128)                                                      |
| seed               | `number`                    | —                                | Random seed                                                                 |
| predict            | `number`                    | —                                | Max tokens to predict. `-1` = until stop token, `-2` = until context filled |
| lora               | `string`                    | —                                | LoRA adapter path                                                           |
| no\_mmap           | `boolean`                   | —                                | Disable memory-mapped I/O                                                   |
| verbosity          | `0 \| 1 \| 2 \| 3`          | —                                | Engine verbosity — use exported `VERBOSITY` constant                        |
| presence\_penalty  | `number`                    | —                                | Presence penalty                                                            |
| frequency\_penalty | `number`                    | —                                | Frequency penalty                                                           |
| repeat\_penalty    | `number`                    | —                                | Repeat penalty                                                              |
| stop\_sequences    | `string[]`                  | —                                | Custom stop sequences                                                       |
| n\_discarded       | `number`                    | —                                | Number of discarded tokens                                                  |
| tools              | `boolean`                   | —                                | Enable tool calling support                                                 |
| projectionModelSrc | `string \| ModelDescriptor` | —                                | Projection model source for multimodal models                               |

#### Whisper `modelConfig`

Common fields:

| Field            | Type                        | Description                                                                                                                                    |
| ---------------- | --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| language         | `string`                    | Language code (e.g., `"en"`)                                                                                                                   |
| translate        | `boolean`                   | Whether to translate to English                                                                                                                |
| strategy         | `"greedy" \| "beam_search"` | Sampling strategy                                                                                                                              |
| temperature      | `number`                    | Temperature                                                                                                                                    |
| initial\_prompt  | `string`                    | Initial prompt for the decoder                                                                                                                 |
| detect\_language | `boolean`                   | Auto-detect language                                                                                                                           |
| vad\_params      | `object`                    | VAD parameters — `{ threshold?, min_speech_duration_ms?, min_silence_duration_ms?, max_speech_duration_s?, speech_pad_ms?, samples_overlap? }` |
| audio\_format    | `"f32le" \| "s16le"`        | Audio format                                                                                                                                   |
| contextParams    | `object`                    | Context parameters — `{ model?, use_gpu?, flash_attn?, gpu_device? }`                                                                          |
| miscConfig       | `object`                    | Miscellaneous config — `{ caption_enabled? }`                                                                                                  |
| vadModelSrc      | `string \| ModelDescriptor` | VAD model source for voice activity detection                                                                                                  |

Additional fields: `n_threads`, `n_max_text_ctx`, `offset_ms`, `duration_ms`, `audio_ctx`, `no_context`, `no_timestamps`, `single_segment`, `print_special`, `print_progress`, `print_realtime`, `print_timestamps`, `token_timestamps`, `thold_pt`, `thold_ptsum`, `max_len`, `split_on_word`, `max_tokens`, `debug_mode`, `tdrz_enable`, `suppress_regex`, `suppress_blank`, `suppress_nst`, `length_penalty`, `temperature_inc`, `entropy_thold`, `logprob_thold`, `greedy_best_of`, `beam_search_beam_size`. All optional. See `whisperConfigSchema` in the source for details.

#### Parakeet `modelConfig`

`modelConfig` is **required**. Parakeet models support three variants via `modelType`: `"tdt"` (default), `"ctc"`, and `"sortformer"`.

**Runtime config:**

| Field             | Type                             | Default | Description                 |
| ----------------- | -------------------------------- | ------- | --------------------------- |
| modelType         | `"tdt" \| "ctc" \| "sortformer"` | `"tdt"` | Parakeet model variant      |
| maxThreads        | `number`                         | —       | Maximum inference threads   |
| useGPU            | `boolean`                        | —       | Use GPU acceleration        |
| sampleRate        | `number`                         | —       | Audio sample rate           |
| channels          | `number`                         | —       | Audio channels              |
| captionEnabled    | `boolean`                        | —       | Enable caption mode         |
| timestampsEnabled | `boolean`                        | —       | Enable timestamps in output |

**Model sources (all `string \| ModelDescriptor`, all optional):**

| Field                   | Description              |
| ----------------------- | ------------------------ |
| parakeetEncoderSrc      | TDT encoder model source |
| parakeetEncoderDataSrc  | TDT encoder data source  |
| parakeetDecoderSrc      | TDT decoder model source |
| parakeetVocabSrc        | TDT vocabulary source    |
| parakeetPreprocessorSrc | TDT preprocessor source  |
| parakeetCtcModelSrc     | CTC model source         |
| parakeetCtcModelDataSrc | CTC model data source    |
| parakeetTokenizerSrc    | CTC tokenizer source     |
| parakeetSortformerSrc   | Sortformer model source  |

#### Embeddings `modelConfig`

| Field          | Type                                            | Default | Description                                          |
| -------------- | ----------------------------------------------- | ------- | ---------------------------------------------------- |
| gpuLayers      | `number`                                        | `99`    | Number of layers offloaded to GPU                    |
| device         | `"gpu" \| "cpu"`                                | `"gpu"` | Device to use                                        |
| batchSize      | `number`                                        | `1024`  | Embedding batch size                                 |
| pooling        | `"none" \| "mean" \| "cls" \| "last" \| "rank"` | —       | Pooling strategy                                     |
| attention      | `"causal" \| "non-causal"`                      | —       | Attention type                                       |
| embdNormalize  | `number`                                        | —       | Embedding normalization (integer)                    |
| flashAttention | `"on" \| "off" \| "auto"`                       | —       | Flash attention toggle                               |
| mainGpu        | `number \| "integrated" \| "dedicated"`         | —       | GPU device selection                                 |
| verbosity      | `0 \| 1 \| 2 \| 3`                              | —       | Engine verbosity — use exported `VERBOSITY` constant |

#### NMT `modelConfig`

Discriminated union on `engine`. Common generation parameters (all optional):

| Field             | Type     | Default  | Description           |
| ----------------- | -------- | -------- | --------------------- |
| mode              | `"full"` | `"full"` | Translation mode      |
| beamsize          | `number` | `4`      | Beam size             |
| lengthpenalty     | `number` | `1.0`    | Length penalty        |
| maxlength         | `number` | `512`    | Max output length     |
| repetitionpenalty | `number` | `1.0`    | Repetition penalty    |
| norepeatngramsize | `number` | `0`      | No-repeat n-gram size |
| temperature       | `number` | `0.3`    | Temperature           |
| topk              | `number` | `0`      | Top-k sampling        |
| topp              | `number` | `1.0`    | Top-p sampling        |

Engine-specific:

* **Opus**: *Deprecated in v1.0.0. Use Bergamot for European language pairs.*
* **Bergamot**: `from`/`to` accept 24 languages (en, ar, bg, ca, cs, de, es, et, fi, fr, hu, is, it, ja, ko, lt, lv, nl, pl, pt, ru, sk, sl, uk, zh). Additional fields: `srcVocabSrc`, `dstVocabSrc`, `normalize`, `pivotModel`
* **IndicTrans**: `from`/`to` accept 26 Indic language codes (e.g., `"eng_Latn"`, `"hin_Deva"`)

**Bergamot `pivotModel`** (optional) — for translation via an intermediate language:

| Field       | Type                        | Required? | Description                 |
| ----------- | --------------------------- | :-------: | --------------------------- |
| modelSrc    | `string \| ModelDescriptor` |     ✓     | Pivot model source          |
| srcVocabSrc | `string \| ModelDescriptor` |     ✗     | Source vocabulary file      |
| dstVocabSrc | `string \| ModelDescriptor` |     ✗     | Destination vocabulary file |
| normalize   | `number`                    |     ✗     | Normalization factor        |

Plus all common generation parameters above.

#### OCR `modelConfig`

| Field                  | Type                        | Description                            |
| ---------------------- | --------------------------- | -------------------------------------- |
| langList               | `string[]`                  | Languages to detect                    |
| useGPU                 | `boolean`                   | Use GPU acceleration                   |
| timeout                | `number`                    | Timeout in milliseconds                |
| pipelineMode           | `"easyocr" \| "doctr"`      | OCR pipeline mode                      |
| magRatio               | `number`                    | Magnification ratio for detection      |
| defaultRotationAngles  | `number[]`                  | Rotation angles to try                 |
| contrastRetry          | `boolean`                   | Retry with contrast adjustment         |
| lowConfidenceThreshold | `number`                    | Threshold for low-confidence filtering |
| recognizerBatchSize    | `number`                    | Batch size for recognizer              |
| decodingMethod         | `"ctc" \| "attention"`      | Decoding method                        |
| straightenPages        | `boolean`                   | Straighten pages before recognition    |
| detectorModelSrc       | `string \| ModelDescriptor` | Detector model source                  |

### `ModelProgressUpdate`

| Field                       | Type              | Description                                            |
| --------------------------- | ----------------- | ------------------------------------------------------ |
| type                        | `"modelProgress"` | Event type                                             |
| downloaded                  | `number`          | Bytes downloaded so far                                |
| total                       | `number`          | Total bytes expected                                   |
| percentage                  | `number`          | Download percentage                                    |
| downloadKey                 | `string`          | Unique download key (use with [`cancel()`](../cancel)) |
| shardInfo                   | `object`          | Shard progress (optional, for sharded models)          |
| shardInfo.currentShard      | `number`          | Current shard index                                    |
| shardInfo.totalShards       | `number`          | Total number of shards                                 |
| shardInfo.shardName         | `string`          | Current shard file name                                |
| shardInfo.overallDownloaded | `number`          | Total bytes downloaded across all shards               |
| shardInfo.overallTotal      | `number`          | Total bytes across all shards                          |
| shardInfo.overallPercentage | `number`          | Overall percentage across all shards                   |
| onnxInfo                    | `object`          | ONNX multi-file progress (optional, for ONNX models)   |
| onnxInfo.currentFile        | `string`          | Current file being downloaded                          |
| onnxInfo.fileIndex          | `number`          | Current file index                                     |
| onnxInfo.totalFiles         | `number`          | Total number of files                                  |
| onnxInfo.overallDownloaded  | `number`          | Total bytes downloaded across all files                |
| onnxInfo.overallTotal       | `number`          | Total bytes across all files                           |
| onnxInfo.overallPercentage  | `number`          | Overall percentage across all files                    |

## Returns

`Promise<string>` — Resolves to the model ID (used to reference the model in subsequent API calls).

## Throws

| Error                           | When                                                              |
| ------------------------------- | ----------------------------------------------------------------- |
| `MODEL_LOAD_FAILED`             | Model loading fails                                               |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends without a final response (when using `onProgress`) |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"loadModel"`               |

## Example

```typescript
// Local file path
const modelId = await loadModel({
  modelSrc: "/home/user/models/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Remote URL with progress tracking
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../model.gguf",
  modelType: "llm",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});

// Hyperdrive URL
const modelId = await loadModel({
  modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Multimodal model with projection
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../main-model.gguf",
  modelType: "llm",
  modelConfig: {
    ctx_size: 512,
    projectionModelSrc: "https://huggingface.co/.../projection-model.gguf"
  }
});

// Whisper with VAD model
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../whisper-model.gguf",
  modelType: "whisper",
  modelConfig: {
    language: "en",
    strategy: "greedy",
    vadModelSrc: "https://huggingface.co/.../vad-model.bin"
  }
});

// With logger forwarding
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
  logger
});
```


# loggingStream( ) (/v0.8.0/sdk/api/loggingStream)


```ts
function loggingStream(params): AsyncGenerator<LoggingStreamResponse>;
```

## Parameters

| Name      | Type     | Required? | Description                                                                                                                   |
| --------- | -------- | :-------: | ----------------------------------------------------------------------------------------------------------------------------- |
| params    | `object` |     ✓     | The logging stream parameters                                                                                                 |
| params.id | `string` |     ✓     | The identifier to stream logs for. Pass a model ID for model logs, or the exported constant `SDK_LOG_ID` for SDK server logs. |

## Returns

`AsyncGenerator<`[`LoggingStreamResponse`](#loggingstreamresponse)`>` — Yields log messages in real time.

### `LoggingStreamResponse`

| Field     | Type                                     | Description               |
| --------- | ---------------------------------------- | ------------------------- |
| type      | `"loggingStream"`                        | Response type             |
| id        | `string`                                 | Identifier being streamed |
| level     | `"error" \| "warn" \| "info" \| "debug"` | Log level                 |
| namespace | `string`                                 | Logger namespace          |
| message   | `string`                                 | Log message               |
| timestamp | `number`                                 | Unix timestamp            |

## Throws

| Error                   | When                                                     |
| ----------------------- | -------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | A chunk's type does not match expected `"loggingStream"` |

## Example

```typescript
import { loggingStream, SDK_LOG_ID } from "@qvac/sdk";

// Stream logs from a loaded model
const logStream = loggingStream({ id: "my-model-id" });

for await (const logMessage of logStream) {
  console.log(`[${logMessage.level}] ${logMessage.namespace}: ${logMessage.message}`);
}

// Or stream SDK server logs
const sdkLogs = loggingStream({ id: SDK_LOG_ID });
```


# modelRegistryGetModel( ) (/v0.8.0/sdk/api/modelRegistryGetModel)


```ts
function modelRegistryGetModel(registryPath: string, registrySource: string): Promise<ModelRegistryEntry>;
```

## Parameters

| Name           | Type     | Required? | Description                    |
| -------------- | -------- | :-------: | ------------------------------ |
| registryPath   | `string` |     ✓     | The registry path of the model |
| registrySource | `string` |     ✓     | The registry source identifier |

## Returns

`Promise<`[`ModelRegistryEntry`](#modelregistryentry)`>` — The matching model entry.

### `ModelRegistryEntry`

| Field           | Type                                                                                | Description                 |
| --------------- | ----------------------------------------------------------------------------------- | --------------------------- |
| name            | `string`                                                                            | Human-readable model name   |
| registryPath    | `string`                                                                            | Registry path               |
| registrySource  | `string`                                                                            | Registry source             |
| blobCoreKey     | `string`                                                                            | Hyperdrive blob core key    |
| blobBlockOffset | `number`                                                                            | Blob block offset           |
| blobBlockLength | `number`                                                                            | Blob block length           |
| blobByteOffset  | `number`                                                                            | Blob byte offset            |
| modelId         | `string`                                                                            | Unique model identifier     |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type            |
| expectedSize    | `number`                                                                            | Expected file size in bytes |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum            |
| engine          | `string`                                                                            | Inference engine            |
| quantization    | `string`                                                                            | Quantization level          |
| params          | `string`                                                                            | Parameter count             |

## Throws

| Error                              | When                                           |
| ---------------------------------- | ---------------------------------------------- |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails or model is not found |

## Example

```typescript
const model = await modelRegistryGetModel("llama-3.2-3b-q4", "qvac");
console.log(model.name, model.expectedSize);
```


# modelRegistryList( ) (/v0.8.0/sdk/api/modelRegistryList)


```ts
function modelRegistryList(): Promise<ModelRegistryEntry[]>;
```

## Returns

`Promise<`[`ModelRegistryEntry[]`](../modelRegistryGetModel#modelregistryentry)`>` — Array of all models in the registry.

## Throws

| Error                              | When                     |
| ---------------------------------- | ------------------------ |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails |

## Example

```typescript
import { modelRegistryList } from "@qvac/sdk";

const models = await modelRegistryList();
for (const model of models) {
  console.log(model.registryPath, model.addon);
}
```


# modelRegistrySearch( ) (/v0.8.0/sdk/api/modelRegistrySearch)


```ts
function modelRegistrySearch(params?): Promise<ModelRegistryEntry[]>;
```

## Parameters

| Name   | Type                                                      | Required? | Description                                              |
| ------ | --------------------------------------------------------- | :-------: | -------------------------------------------------------- |
| params | [`ModelRegistrySearchParams`](#modelregistrysearchparams) |     ✗     | Optional search filters. If omitted, returns all models. |

### `ModelRegistrySearchParams`

| Field        | Type                                                                                | Required? | Description                              |
| ------------ | ----------------------------------------------------------------------------------- | :-------: | ---------------------------------------- |
| filter       | `string`                                                                            |     ✗     | Free-text filter on model name           |
| engine       | `string`                                                                            |     ✗     | Filter by inference engine               |
| quantization | `string`                                                                            |     ✗     | Filter by quantization level             |
| modelType    | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` |     ✗     | Filter by model type (alias for `addon`) |
| addon        | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` |     ✗     | Filter by addon type                     |

## Returns

`Promise<`[`ModelRegistryEntry[]`](../modelRegistryGetModel#modelregistryentry)`>` — Matching model entries.

## Throws

| Error                              | When                     |
| ---------------------------------- | ------------------------ |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails |

## Example

```typescript
// Search LLM models
const llmModels = await modelRegistrySearch({ modelType: "llm" });

// Search by name
const llamaModels = await modelRegistrySearch({ filter: "llama" });

// Combined filters
const models = await modelRegistrySearch({
  modelType: "llm",
  quantization: "Q4_K_M",
});
```


# ocr( ) (/v0.8.0/sdk/api/ocr)


```ts
function ocr(params): {
  blockStream: AsyncGenerator<OCRTextBlock[]>;
  blocks: Promise<OCRTextBlock[]>;
  stats: Promise<OCRStats | undefined>;
};
```

## Parameters

| Name   | Type                                  | Required? | Description        |
| ------ | ------------------------------------- | :-------: | ------------------ |
| params | [`OCRClientParams`](#ocrclientparams) |     ✓     | The OCR parameters |

### `OCRClientParams`

| Field   | Type                        | Required? | Default | Description                                       |
| ------- | --------------------------- | :-------: | ------- | ------------------------------------------------- |
| modelId | `string`                    |     ✓     | —       | The identifier of the loaded OCR model            |
| image   | `string \| Buffer`          |     ✓     | —       | Image input as file path (string) or image buffer |
| options | [`OCROptions`](#ocroptions) |     ✗     | —       | Optional OCR options                              |
| stream  | `boolean`                   |     ✗     | `false` | Whether to stream blocks as they're detected      |

#### `OCROptions`

| Field     | Type      | Required? | Description           |
| --------- | --------- | :-------: | --------------------- |
| paragraph | `boolean` |     ✗     | Enable paragraph mode |

## Returns

`object` — Object with the following fields:

| Field       | Type                                                  | Description                                                 |
| ----------- | ----------------------------------------------------- | ----------------------------------------------------------- |
| blockStream | `AsyncGenerator<`[`OCRTextBlock[]`](#ocrtextblock)`>` | Stream of detected text blocks (active when `stream: true`) |
| blocks      | `Promise<`[`OCRTextBlock[]`](#ocrtextblock)`>`        | All detected text blocks (populated when `stream: false`)   |
| stats       | `Promise<`[`OCRStats`](#ocrstats) `\| undefined>`     | Performance statistics                                      |

### `OCRTextBlock`

| Field      | Type                               | Description                                |
| ---------- | ---------------------------------- | ------------------------------------------ |
| text       | `string`                           | Detected text                              |
| bbox       | `[number, number, number, number]` | Bounding box `[x1, y1, x2, y2]` (optional) |
| confidence | `number`                           | Detection confidence score (optional)      |

### `OCRStats`

| Field           | Type     | Description                             |
| --------------- | -------- | --------------------------------------- |
| detectionTime   | `number` | Detection phase time in ms (optional)   |
| recognitionTime | `number` | Recognition phase time in ms (optional) |
| totalTime       | `number` | Total OCR time in ms (optional)         |

## Example

```typescript
// Non-streaming mode (default)
const { blocks } = ocr({ modelId, image: "/path/to/image.png" });
for (const block of await blocks) {
  console.log(block.text, block.bbox, block.confidence);
}

// Streaming mode
const { blockStream } = ocr({ modelId, image: imageBuffer, stream: true });
for await (const blocks of blockStream) {
  console.log("Detected:", blocks);
}
```


# ping( ) (/v0.8.0/sdk/api/ping)


```ts
function ping(): Promise<{ type: "pong"; number: number }>;
```

## Returns

`Promise<{ type: "pong", number: number }>` — The server's pong response.

## Throws

| Error                   | When                                           |
| ----------------------- | ---------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"pong"` |


# profiler (/v0.8.0/sdk/api/profiler)


```ts
const profiler: {
  enable(options?: ProfilerRuntimeOptions): void;
  disable(): void;
  isEnabled(): boolean;
  exportJSON(options?: { includeRecentEvents?: boolean }): ProfilerExport;
  exportTable(): string;
  exportSummary(): string;
  onRecord(callback: (event: ProfilingEvent) => void): () => void;
  getConfig(): ResolvedProfilerConfig;
  getAggregates(): Record<string, AggregatedStats>;
  clear(): void;
};
```

## Methods

| Method                              | Description                                                                |
| ----------------------------------- | -------------------------------------------------------------------------- |
| [`enable()`](#enable)               | Enables profiling and resets aggregated data                               |
| [`disable()`](#disable)             | Disables profiling                                                         |
| [`isEnabled()`](#isenabled)         | Returns whether profiling is currently enabled                             |
| [`exportJSON()`](#exportjson)       | Exports profiling data as a structured JSON object                         |
| [`exportTable()`](#exporttable)     | Exports aggregated stats as a formatted ASCII table                        |
| [`exportSummary()`](#exportsummary) | Exports a human-readable summary string                                    |
| [`onRecord()`](#onrecord)           | Registers a listener for profiling events; returns an unsubscribe function |
| [`getConfig()`](#getconfig)         | Returns the current effective profiler configuration                       |
| [`getAggregates()`](#getaggregates) | Returns all aggregated stats keyed by operation name                       |
| [`clear()`](#clear)                 | Clears all aggregated data and recent events                               |

### `enable()`

```ts
function enable(options?: ProfilerRuntimeOptions): void
```

Enables profiling and resets all previously aggregated data.

**Parameters**

| Name    | Type                                                | Required? | Description              |
| ------- | --------------------------------------------------- | :-------: | ------------------------ |
| options | [`ProfilerRuntimeOptions`](#profilerruntimeoptions) |     ✗     | Runtime profiler options |

#### `ProfilerRuntimeOptions`

| Field                  | Type                     | Required? | Default     | Description                                                           |
| ---------------------- | ------------------------ | :-------: | ----------- | --------------------------------------------------------------------- |
| mode                   | `"summary" \| "verbose"` |     ✗     | `"summary"` | Profiling detail level — `"verbose"` retains recent events            |
| includeServerBreakdown | `boolean`                |     ✗     | `false`     | Include server-side timing breakdown in profiling data                |
| operationFilters       | `string[]`               |     ✗     | `[]`        | Only profile operations whose names match these filters (empty = all) |

**Returns**

`void`

### `disable()`

```ts
function disable(): void
```

Disables profiling. New SDK operations will no longer be recorded.

**Parameters**

No parameters.

**Returns**

`void`

### `isEnabled()`

```ts
function isEnabled(): boolean
```

Returns whether profiling is currently enabled.

**Parameters**

No parameters.

**Returns**

`boolean` — `true` if profiling is enabled, `false` otherwise.

### `exportJSON()`

```ts
function exportJSON(options?: { includeRecentEvents?: boolean }): ProfilerExport
```

Exports profiling data as a structured JSON object.

**Parameters**

| Name                        | Type      | Required? | Description                                                              |
| --------------------------- | --------- | :-------: | ------------------------------------------------------------------------ |
| options                     | `object`  |     ✗     | Export options                                                           |
| options.includeRecentEvents | `boolean` |     ✗     | Include recent events in the export (only available in `"verbose"` mode) |

**Returns**

[`ProfilerExport`](#profilerexport) — structured JSON object containing configuration snapshot, aggregated statistics, and optionally recent events.

#### `ProfilerExport`

| Field                         | Type                                                      | Description                                                     |
| ----------------------------- | --------------------------------------------------------- | --------------------------------------------------------------- |
| config                        | `object`                                                  | Snapshot of the profiler configuration at export time           |
| config.enabled                | `boolean`                                                 | Whether profiling was enabled                                   |
| config.mode                   | `"summary" \| "verbose"`                                  | Profiling mode                                                  |
| config.includeServerBreakdown | `boolean`                                                 | Whether server breakdown was enabled                            |
| config.operationFilters       | `string[]`                                                | Active operation filters                                        |
| config.maxRecentEvents        | `number`                                                  | Max recent events setting                                       |
| aggregates                    | `Record<string,` [`AggregatedStats`](#aggregatedstats)`>` | Aggregated statistics keyed by operation name                   |
| recentEvents                  | [`ProfilingEvent[]`](#profilingevent)                     | Recent profiling events (only when `includeRecentEvents: true`) |
| exportedAt                    | `number`                                                  | Monotonic timestamp of the export                               |

#### `AggregatedStats`

Per-operation aggregated statistics:

| Field | Type     | Description                                |
| ----- | -------- | ------------------------------------------ |
| count | `number` | Number of recorded events                  |
| min   | `number` | Minimum duration in milliseconds           |
| max   | `number` | Maximum duration in milliseconds           |
| avg   | `number` | Average duration in milliseconds           |
| sum   | `number` | Total accumulated duration in milliseconds |
| last  | `number` | Most recent duration in milliseconds       |

#### `ProfilingEvent`

Individual profiling event recorded during an SDK operation:

| Field     | Type                                        | Required? | Description                                                          |
| --------- | ------------------------------------------- | :-------: | -------------------------------------------------------------------- |
| ts        | `number`                                    |     ✓     | Monotonic timestamp in milliseconds                                  |
| op        | `string`                                    |     ✓     | Operation name (e.g., `"completion"`, `"loadModel"`)                 |
| kind      | [`ProfilingEventKind`](#profilingeventkind) |     ✓     | Event category                                                       |
| profileId | `string`                                    |     ✗     | Unique identifier for the profiling session                          |
| phase     | `string`                                    |     ✗     | Sub-phase within the operation (e.g., `"rpc.send"`, `"handler.run"`) |
| ms        | `number`                                    |     ✗     | Duration in milliseconds                                             |
| count     | `number`                                    |     ✗     | Count metric (e.g., tokens, chunks)                                  |
| bytes     | `number`                                    |     ✗     | Byte count metric                                                    |
| gauges    | `Record<string, number>`                    |     ✗     | Numeric gauges (e.g., throughput, token counters)                    |
| tags      | `Record<string, string>`                    |     ✗     | String tags (e.g., `handlerType`, `sourceType`, `modelId`)           |

#### `ProfilingEventKind`

`"rpc" | "handler" | "download" | "load" | "delegation"`

### `exportTable()`

```ts
function exportTable(): string
```

Exports aggregated stats as a formatted ASCII table suitable for console output.

**Parameters**

No parameters.

**Returns**

`string` — formatted ASCII table of all aggregated profiling data.

### `exportSummary()`

```ts
function exportSummary(): string
```

Exports a human-readable summary of all aggregated profiling data.

**Parameters**

No parameters.

**Returns**

`string` — human-readable summary string.

### `onRecord()`

```ts
function onRecord(callback: (event: ProfilingEvent) => void): () => void
```

Registers a listener that is called for every profiling event.

**Parameters**

| Name     | Type                                                     | Required? | Description                               |
| -------- | -------------------------------------------------------- | :-------: | ----------------------------------------- |
| callback | `(event:` [`ProfilingEvent`](#profilingevent)`) => void` |     ✓     | Function called with each profiling event |

**Returns**

`() => void` — an unsubscribe function. Call it to stop receiving events.

### `getConfig()`

```ts
function getConfig(): ResolvedProfilerConfig
```

Returns the current effective profiler configuration.

**Parameters**

No parameters.

**Returns**

[`ResolvedProfilerConfig`](#resolvedprofilerconfig):

#### `ResolvedProfilerConfig`

| Field                  | Type                     | Description                                             |
| ---------------------- | ------------------------ | ------------------------------------------------------- |
| enabled                | `boolean`                | Whether profiling is currently enabled                  |
| mode                   | `"summary" \| "verbose"` | Active profiling mode                                   |
| includeServerBreakdown | `boolean`                | Whether server breakdown is included                    |
| operationFilters       | `string[]`               | Active operation filters                                |
| maxRecentEvents        | `number`                 | Maximum number of recent events retained (default 1000) |

### `getAggregates()`

```ts
function getAggregates(): Record<string, AggregatedStats>
```

Returns all aggregated stats keyed by operation name.

**Parameters**

No parameters.

**Returns**

[`Record<string, AggregatedStats>`](#aggregatedstats) — all aggregated stats keyed by operation name.

### `clear()`

```ts
function clear(): void
```

Clears all aggregated data and recent events. Does not disable profiling.

**Parameters**

No parameters.

**Returns**

`void`

## Example

```typescript
import { profiler, completion, loadModel } from "@qvac/sdk";

profiler.enable({ mode: "verbose", includeServerBreakdown: true });

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
});

const result = completion({
  modelId,
  history: [{ role: "user", content: "Hello!" }],
});

await result.text;

console.log(profiler.exportTable());

const json = profiler.exportJSON({ includeRecentEvents: true });
console.log("Aggregates:", json.aggregates);

profiler.clear();
profiler.disable();
```


# ragChunk( ) (/v0.8.0/sdk/api/ragChunk)


```ts
function ragChunk(params, options?): Promise<RagDoc[]>;
```

Part of the segregated flow: `ragChunk()` → [`embed()`](../embed) → [`ragSaveEmbeddings()`](../ragSaveEmbeddings)

## Parameters

| Name             | Type                                | Required? | Description                    |
| ---------------- | ----------------------------------- | :-------: | ------------------------------ |
| params           | `object`                            |     ✓     | The chunking parameters        |
| params.documents | `string \| string[]`                |     ✓     | Documents to chunk             |
| params.chunkOpts | [`ChunkOptions`](#chunkoptions)     |     ✗     | Chunking options               |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options |

### `ChunkOptions`

| Field         | Type                                                       | Required? | Description             |
| ------------- | ---------------------------------------------------------- | :-------: | ----------------------- |
| chunkSize     | `number`                                                   |     ✗     | Maximum chunk size      |
| chunkOverlap  | `number`                                                   |     ✗     | Overlap between chunks  |
| chunkStrategy | `"character" \| "paragraph"`                               |     ✗     | Chunking strategy       |
| splitStrategy | `"character" \| "word" \| "token" \| "sentence" \| "line"` |     ✗     | Text splitting strategy |

## Returns

`Promise<RagDoc[]>` — Array of chunk results.

| Field   | Type     | Description        |
| ------- | -------- | ------------------ |
| id      | `string` | Chunk identifier   |
| content | `string` | Chunk text content |

## Throws

| Error              | When                         |
| ------------------ | ---------------------------- |
| `RAG_CHUNK_FAILED` | The chunking operation fails |

## Example

```typescript
const chunks = await ragChunk({
  documents: ["Long document text here..."],
  chunkOpts: {
    chunkSize: 256,
    chunkOverlap: 50,
    chunkStrategy: "paragraph",
  },
});
```


# ragCloseWorkspace( ) (/v0.8.0/sdk/api/ragCloseWorkspace)


```ts
function ragCloseWorkspace(params?, options?): Promise<void>;
```

Releases Corestore, HyperDB adapter, and RAG instance. Workspace data remains on disk unless `deleteOnClose` is set.

## Parameters

| Name                 | Type                                | Required? | Default     | Description                                             |
| -------------------- | ----------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.workspace     | `string`                            |     ✗     | `"default"` | Workspace to close                                      |
| params.deleteOnClose | `boolean`                           |     ✗     | `false`     | If true, deletes workspace data from disk after closing |
| options              | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                          |

## Returns

`Promise<void>` — Resolves when the workspace is closed.

## Throws

| Error                        | When                      |
| ---------------------------- | ------------------------- |
| `RAG_WORKSPACE_CLOSE_FAILED` | The close operation fails |

## Example

```typescript
// Close a workspace
await ragCloseWorkspace({ workspace: "my-docs" });

// Close and delete in one call
await ragCloseWorkspace({ workspace: "my-docs", deleteOnClose: true });
```


# ragDeleteEmbeddings( ) (/v0.8.0/sdk/api/ragDeleteEmbeddings)


```ts
function ragDeleteEmbeddings(params, options?): Promise<void>;
```

## Parameters

| Name             | Type                                | Required? | Default     | Description                                             |
| ---------------- | ----------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.ids       | `string[]`                          |     ✓     | —           | Array of document IDs to delete                         |
| params.modelId   | `string`                            |     ✗     | —           | Embedding model ID (required if no cached RAG instance) |
| params.workspace | `string`                            |     ✗     | `"default"` | Workspace to delete from                                |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                          |

## Returns

`Promise<void>` — Resolves when the embeddings are deleted.

## Throws

| Error               | When                                                  |
| ------------------- | ----------------------------------------------------- |
| `RAG_DELETE_FAILED` | The delete operation fails or workspace doesn't exist |

## Example

```typescript
await ragDeleteEmbeddings({
  ids: ["doc-1", "doc-2"],
  workspace: "my-docs",
});
```


# ragDeleteWorkspace( ) (/v0.8.0/sdk/api/ragDeleteWorkspace)


```ts
function ragDeleteWorkspace(params, options?): Promise<void>;
```

The workspace must not be currently loaded/in-use.

## Parameters

| Name             | Type                                | Required? | Description                     |
| ---------------- | ----------------------------------- | :-------: | ------------------------------- |
| params.workspace | `string`                            |     ✓     | Name of the workspace to delete |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options  |

## Returns

`Promise<void>` — Resolves when the workspace is deleted.

## Throws

| Error               | When                                               |
| ------------------- | -------------------------------------------------- |
| `RAG_DELETE_FAILED` | The workspace doesn't exist or is currently loaded |

## Example

```typescript
await ragDeleteWorkspace({ workspace: "my-docs" });
```


# ragIngest( ) (/v0.8.0/sdk/api/ragIngest)


```ts
function ragIngest(params, options?): Promise<{ processed: RagSaveEmbeddingsResult[]; droppedIndices: number[] }>;
```

Full pipeline: chunk → embed → save. Implicitly opens (or creates) the workspace.

## Parameters

| Name                    | Type                                       | Required? | Default     | Description                                                  |
| ----------------------- | ------------------------------------------ | :-------: | ----------- | ------------------------------------------------------------ |
| params.modelId          | `string`                                   |     ✓     | —           | The embedding model identifier                               |
| params.documents        | `string \| string[]`                       |     ✓     | —           | Documents to ingest                                          |
| params.chunk            | `boolean`                                  |     ✗     | `true`      | Whether to chunk documents before embedding                  |
| params.chunkOpts        | [`ChunkOptions`](../ragChunk#chunkoptions) |     ✗     | —           | Chunking options                                             |
| params.workspace        | `string`                                   |     ✗     | `"default"` | Workspace for isolated storage. Created if it doesn't exist. |
| params.onProgress       | `(stage, current, total) => void`          |     ✗     | —           | Progress callback                                            |
| params.progressInterval | `number`                                   |     ✗     | —           | Minimum interval between progress updates in ms              |
| options                 | [`RPCOptions`](../index#rpcoptions)        |     ✗     | —           | Optional RPC transport options                               |

## Returns

| Field          | Type                        | Description                                                   |
| -------------- | --------------------------- | ------------------------------------------------------------- |
| processed      | `RagSaveEmbeddingsResult[]` | Array of `{ status: "fulfilled" \| "rejected", id?, error? }` |
| droppedIndices | `number[]`                  | Indices of documents that were dropped                        |

## Throws

| Error                           | When                                                  |
| ------------------------------- | ----------------------------------------------------- |
| `RAG_SAVE_FAILED`               | The ingestion operation fails                         |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`) |

## Example

```typescript
const result = await ragIngest({
  modelId,
  documents: ["Document 1", "Document 2"],
  workspace: "my-docs",
  onProgress: (stage, current, total) => {
    console.log(`[${stage}] ${current}/${total}`);
  },
});
```


# ragListWorkspaces( ) (/v0.8.0/sdk/api/ragListWorkspaces)


```ts
function ragListWorkspaces(options?): Promise<RagWorkspaceInfo[]>;
```

## Parameters

| Name    | Type                                | Required? | Description                    |
| ------- | ----------------------------------- | :-------: | ------------------------------ |
| options | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options |

## Returns

`Promise<RagWorkspaceInfo[]>` — Array of workspace info.

| Field | Type      | Description                                         |
| ----- | --------- | --------------------------------------------------- |
| name  | `string`  | Workspace name                                      |
| open  | `boolean` | Whether the workspace is currently loaded in memory |

## Throws

| Error                        | When                |
| ---------------------------- | ------------------- |
| `RAG_LIST_WORKSPACES_FAILED` | The operation fails |

## Example

```typescript
const workspaces = await ragListWorkspaces();
// [{ name: "default", open: true }, { name: "my-docs", open: false }]
```


# ragReindex( ) (/v0.8.0/sdk/api/ragReindex)


```ts
function ragReindex(params, options?): Promise<RagReindexResult>;
```

For HyperDB, rebalances centroids using k-means clustering. Requires a minimum number of documents (16 by default).

## Parameters

| Name              | Type                                | Required? | Default     | Description                                             |
| ----------------- | ----------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.modelId    | `string`                            |     ✗     | —           | Embedding model ID (required if no cached RAG instance) |
| params.workspace  | `string`                            |     ✗     | `"default"` | Workspace to reindex. Must already exist.               |
| params.onProgress | `(stage, current, total) => void`   |     ✗     | —           | Progress callback                                       |
| options           | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                          |

## Returns

| Field     | Type                      | Description                                             |
| --------- | ------------------------- | ------------------------------------------------------- |
| reindexed | `boolean`                 | Whether reindexing was performed                        |
| details   | `Record<string, unknown>` | Additional details (e.g., reason if skipped) — optional |

## Throws

| Error                           | When                                                   |
| ------------------------------- | ------------------------------------------------------ |
| `RAG_SAVE_FAILED`               | The reindex operation fails or workspace doesn't exist |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`)  |

## Example

```typescript
const result = await ragReindex({ workspace: "my-docs" });

if (!result.reindexed) {
  console.log("Reindex skipped:", result.details?.reason);
}
```


# ragSaveEmbeddings( ) (/v0.8.0/sdk/api/ragSaveEmbeddings)


```ts
function ragSaveEmbeddings(params, options?): Promise<RagSaveEmbeddingsResult[]>;
```

Part of the segregated flow: [`ragChunk()`](../ragChunk) → [`embed()`](../embed) → `ragSaveEmbeddings()`. Implicitly opens (or creates) the workspace.

## Parameters

| Name                    | Type                                  | Required? | Default     | Description                                                    |
| ----------------------- | ------------------------------------- | :-------: | ----------- | -------------------------------------------------------------- |
| params.documents        | [`RagEmbeddedDoc[]`](#ragembeddeddoc) |     ✓     | —           | Pre-embedded documents                                         |
| params.modelId          | `string`                              |     ✗     | —           | Embedding model ID (required if no cached RAG instance exists) |
| params.workspace        | `string`                              |     ✗     | `"default"` | Workspace for isolated storage                                 |
| params.onProgress       | `(stage, current, total) => void`     |     ✗     | —           | Progress callback                                              |
| params.progressInterval | `number`                              |     ✗     | —           | Minimum interval between progress updates in ms                |
| options                 | [`RPCOptions`](../index#rpcoptions)   |     ✗     | —           | Optional RPC transport options                                 |

### `RagEmbeddedDoc`

| Field            | Type                      | Required? | Description                          |
| ---------------- | ------------------------- | :-------: | ------------------------------------ |
| id               | `string`                  |     ✓     | Document identifier                  |
| content          | `string`                  |     ✓     | Document text content                |
| embedding        | `number[]`                |     ✓     | Pre-computed embedding vector        |
| embeddingModelId | `string`                  |     ✓     | Model used to generate the embedding |
| metadata         | `Record<string, unknown>` |     ✗     | Optional metadata                    |

## Returns

`Promise<RagSaveEmbeddingsResult[]>` — Array of `{ status: "fulfilled" | "rejected", id?, error? }`.

## Throws

| Error                           | When                                                  |
| ------------------------------- | ----------------------------------------------------- |
| `RAG_SAVE_FAILED`               | The save operation fails                              |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`) |

## Example

```typescript
const chunks = await ragChunk({ documents: ["text1", "text2"] });
const embeddings = await embed({ modelId, text: chunks.map(c => c.content) });
const embeddedDocs = chunks.map((chunk, i) => ({
  ...chunk,
  embedding: embeddings[i],
  embeddingModelId: modelId,
}));
const result = await ragSaveEmbeddings({
  documents: embeddedDocs,
  workspace: "my-workspace",
});
```


# ragSearch( ) (/v0.8.0/sdk/api/ragSearch)


```ts
function ragSearch(params, options?): Promise<RagSearchResult[]>;
```

## Parameters

| Name             | Type                                | Required? | Default     | Description                              |
| ---------------- | ----------------------------------- | :-------: | ----------- | ---------------------------------------- |
| params.modelId   | `string`                            |     ✓     | —           | The embedding model identifier           |
| params.query     | `string`                            |     ✓     | —           | The search query text                    |
| params.topK      | `number`                            |     ✗     | `5`         | Number of top results to retrieve        |
| params.n         | `number`                            |     ✗     | `3`         | Number of centroids for IVF index search |
| params.workspace | `string`                            |     ✗     | `"default"` | Workspace to search in                   |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options           |

## Returns

`Promise<RagSearchResult[]>` — Array of search results. Empty array if workspace doesn't exist.

| Field   | Type     | Description           |
| ------- | -------- | --------------------- |
| id      | `string` | Document identifier   |
| content | `string` | Document text content |
| score   | `number` | Similarity score      |

## Throws

| Error               | When                       |
| ------------------- | -------------------------- |
| `RAG_SEARCH_FAILED` | The search operation fails |

## Example

```typescript
const results = await ragSearch({
  modelId,
  query: "AI and machine learning",
  topK: 5,
  workspace: "my-docs",
});
```


# Shared types (/v0.8.0/sdk/api/shared-types)


## `RPCOptions`

Many functions accept an optional `rpcOptions` parameter for transport-level configuration:

| Field              | Type                                    | Required? | Default | Description                                                   |
| ------------------ | --------------------------------------- | :-------: | ------- | ------------------------------------------------------------- |
| timeout            | `number`                                |     ✗     | —       | Request timeout in milliseconds (min 100)                     |
| forceNewConnection | `boolean`                               |     ✗     | `false` | Force a new RPC connection instead of reusing an existing one |
| profiling          | [`PerCallProfiling`](#percallprofiling) |     ✗     | —       | Per-call profiling configuration                              |

### `PerCallProfiling`

| Field                  | Type                     | Required? | Description                                            |
| ---------------------- | ------------------------ | :-------: | ------------------------------------------------------ |
| enabled                | `boolean`                |     ✗     | Enable profiling for this call                         |
| includeServerBreakdown | `boolean`                |     ✗     | Include server-side timing breakdown in profiling data |
| mode                   | `"summary" \| "verbose"` |     ✗     | Profiling detail level                                 |


# startQVACProvider( ) (/v0.8.0/sdk/api/startQVACProvider)


```ts
function startQVACProvider(params): Promise<object>;
```

The provider's keypair can be controlled via the seed option or the `QVAC_HYPERSWARM_SEED` environment variable.

## Parameters

| Name            | Type                                | Required? | Description                         |
| --------------- | ----------------------------------- | :-------: | ----------------------------------- |
| params.topic    | `string`                            |     ✓     | Topic hex string for peer discovery |
| params.firewall | [`FirewallConfig`](#firewallconfig) |     ✗     | Optional firewall configuration     |

### `FirewallConfig`

| Field      | Type                | Required? | Default   | Description                  |
| ---------- | ------------------- | :-------: | --------- | ---------------------------- |
| mode       | `"allow" \| "deny"` |     ✗     | `"allow"` | Firewall mode                |
| publicKeys | `string[]`          |     ✗     | `[]`      | Public keys to allow or deny |

## Returns

`Promise<object>` — The provide response containing `success`, `publicKey`, and optional `error`.

## Throws

| Error                   | When                                              |
| ----------------------- | ------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"provide"` |
| `PROVIDER_START_FAILED` | The server reports provider start failure         |

## Example

```typescript
const response = await startQVACProvider({
  topic: "a1b2c3d4...",
  firewall: {
    mode: "allow",
    publicKeys: ["peer-public-key-hex"],
  },
});
console.log("Provider public key:", response.publicKey);
```


# stopQVACProvider( ) (/v0.8.0/sdk/api/stopQVACProvider)


```ts
function stopQVACProvider(params): Promise<object>;
```

## Parameters

| Name         | Type     | Required? | Description               |
| ------------ | -------- | :-------: | ------------------------- |
| params.topic | `string` |     ✓     | Topic hex string to leave |

## Returns

`Promise<object>` — The stop provide response containing `success` and optional `error`.

## Throws

| Error                   | When                                                  |
| ----------------------- | ----------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"stopProvide"` |
| `PROVIDER_STOP_FAILED`  | The server reports provider stop failure              |

## Example

```typescript
await stopQVACProvider({ topic: "a1b2c3d4..." });
```


# textToSpeech( ) (/v0.8.0/sdk/api/textToSpeech)


```ts
function textToSpeech(params, options?): {
  bufferStream: AsyncGenerator<number>;
  buffer: Promise<number[]>;
  done: Promise<boolean>;
};
```

## Parameters

| Name             | Type                                | Required? | Default  | Description                                           |
| ---------------- | ----------------------------------- | :-------: | -------- | ----------------------------------------------------- |
| params.modelId   | `string`                            |     ✓     | —        | The identifier of the loaded TTS model                |
| params.text      | `string`                            |     ✓     | —        | The text to convert to speech (non-empty)             |
| params.inputType | `string`                            |     ✗     | `"text"` | Input type                                            |
| params.stream    | `boolean`                           |     ✗     | `true`   | Whether to stream audio samples or return all at once |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —        | Optional RPC transport options                        |

## Returns

`object` — Object with the following fields:

| Field        | Type                     | Description                                            |
| ------------ | ------------------------ | ------------------------------------------------------ |
| bufferStream | `AsyncGenerator<number>` | Stream of audio samples (active when `stream: true`)   |
| buffer       | `Promise<number[]>`      | Complete audio buffer (populated when `stream: false`) |
| done         | `Promise<boolean>`       | Resolves to `true` when generation completes           |

## Example

```typescript
// Streaming mode
const { bufferStream } = textToSpeech({ modelId, text: "Hello world" });
for await (const sample of bufferStream) {
  // process audio sample
}

// Non-streaming mode
const { buffer } = textToSpeech({ modelId, text: "Hello world", stream: false });
const audioData = await buffer;
```


# transcribe( ) (/v0.8.0/sdk/api/transcribe)


```ts
function transcribe(params, options?): Promise<string>;
```

Collects all streaming results from [`transcribeStream()`](../transcribeStream) into a single string.

## Parameters

| Name              | Type                                | Required? | Description                                        |
| ----------------- | ----------------------------------- | :-------: | -------------------------------------------------- |
| params.modelId    | `string`                            |     ✓     | The identifier of the transcription model          |
| params.audioChunk | `string \| Buffer`                  |     ✓     | Audio input as file path (string) or audio buffer  |
| params.prompt     | `string`                            |     ✗     | Optional initial prompt to guide the transcription |
| options           | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options                     |

## Returns

`Promise<string>` — The complete transcribed text.

## Throws

| Error                  | When                |
| ---------------------- | ------------------- |
| `TRANSCRIPTION_FAILED` | Transcription fails |

## Example

```typescript
const text = await transcribe({
  modelId: "whisper-model",
  audioChunk: "/path/to/audio.wav",
});
console.log(text);
```

## `SUPPORTED_AUDIO_FORMATS`

Use this exported constant to check which audio formats are accepted before passing a file to `transcribe()`. Avoids runtime errors from unsupported formats.

```typescript
import { SUPPORTED_AUDIO_FORMATS } from "@qvac/sdk";

const ext = filePath.split(".").pop();
if (!SUPPORTED_AUDIO_FORMATS.includes(ext)) {
  console.error(`Unsupported format: ${ext}`);
}
```


# transcribeStream( ) (/v0.8.0/sdk/api/transcribeStream)


```ts
function transcribeStream(params, options?): AsyncGenerator<string>;
```

Yields text chunks as they become available from the model.

## Parameters

| Name              | Type                                | Required? | Description                                        |
| ----------------- | ----------------------------------- | :-------: | -------------------------------------------------- |
| params.modelId    | `string`                            |     ✓     | The identifier of the transcription model          |
| params.audioChunk | `string \| Buffer`                  |     ✓     | Audio input as file path (string) or audio buffer  |
| params.prompt     | `string`                            |     ✗     | Optional initial prompt to guide the transcription |
| options           | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options                     |

## Returns

`AsyncGenerator<string>` — Yields text chunks as they are transcribed.

## Throws

| Error                  | When                |
| ---------------------- | ------------------- |
| `TRANSCRIPTION_FAILED` | Transcription fails |

## Example

```typescript
const stream = transcribeStream({
  modelId: "whisper-model",
  audioChunk: "/path/to/audio.wav",
});

for await (const textChunk of stream) {
  process.stdout.write(textChunk);
}
```

## `SUPPORTED_AUDIO_FORMATS`

Use this exported constant to check which audio formats are accepted before passing a file to `transcribeStream()`. Avoids runtime errors from unsupported formats.

```typescript
import { SUPPORTED_AUDIO_FORMATS } from "@qvac/sdk";

const ext = filePath.split(".").pop();
if (!SUPPORTED_AUDIO_FORMATS.includes(ext)) {
  console.error(`Unsupported format: ${ext}`);
}
```


# translate( ) (/v0.8.0/sdk/api/translate)


```ts
function translate(params): {
  tokenStream: AsyncGenerator<string>;
  text: Promise<string>;
  stats: Promise<TranslationStats | undefined>;
};
```

Supports both NMT (Neural Machine Translation) and LLM models.

## Parameters

| Name             | Type                                | Required? | Default     | Description                                                          |
| ---------------- | ----------------------------------- | :-------: | ----------- | -------------------------------------------------------------------- |
| params.modelId   | `string`                            |     ✓     | —           | The identifier of the translation model                              |
| params.text      | `string \| string[]`                |     ✓     | —           | The input text(s) to translate. NMT supports array; LLM only string. |
| params.modelType | `"nmt" \| "llm"`                    |     ✓     | —           | The type of translation model                                        |
| params.stream    | `boolean`                           |     ✗     | `true`      | Whether to stream tokens or return complete response                 |
| params.from      | `string`                            |     ✗     | auto-detect | Source language code (LLM only)                                      |
| params.to        | `string`                            |  ✓ (LLM)  | —           | Target language code (LLM only)                                      |
| params.context   | `string`                            |     ✗     | —           | Additional context for translation (LLM only)                        |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                                       |

## Returns

`object` — Object with the following fields:

| Field       | Type                                                              | Description                 |
| ----------- | ----------------------------------------------------------------- | --------------------------- |
| tokenStream | `AsyncGenerator<string>`                                          | Stream of translated tokens |
| text        | `Promise<string>`                                                 | Complete translated text    |
| stats       | `Promise<`[`TranslationStats`](#translationstats) `\| undefined>` | Translation statistics      |

### `TranslationStats`

| Field           | Type     | Description                     |
| --------------- | -------- | ------------------------------- |
| processedTokens | `number` | Number of tokens processed      |
| processingTime  | `number` | Processing time in milliseconds |

## Throws

| Error                | When              |
| -------------------- | ----------------- |
| `TRANSLATION_FAILED` | Translation fails |

## Example

```typescript
// Streaming mode (default)
const result = translate({
  modelId: "nmt-model",
  text: "Hello world",
  from: "en",
  to: "es",
  modelType: "llm",
});

for await (const token of result.tokenStream) {
  process.stdout.write(token);
}

// Non-streaming mode
const response = translate({
  modelId: "nmt-model",
  text: "Hello world",
  from: "en",
  to: "es",
  modelType: "llm",
  stream: false,
});

console.log(await response.text);
```


# unloadModel( ) (/v0.8.0/sdk/api/unloadModel)


```ts
function unloadModel(params): Promise<void>;
```

When the last model is unloaded and no providers are active, the RPC connection is automatically closed, allowing the process to exit naturally.

## Parameters

| Name                | Type      | Required? | Default | Description                                  |
| ------------------- | --------- | :-------: | ------- | -------------------------------------------- |
| params.modelId      | `string`  |     ✓     | —       | The unique identifier of the model to unload |
| params.clearStorage | `boolean` |     ✗     | `false` | Whether to clear the storage for the model   |

## Returns

`Promise<void>` — Resolves when the model is unloaded.

## Throws

| Error                   | When                                                  |
| ----------------------- | ----------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"unloadModel"` |
| `MODEL_UNLOAD_FAILED`   | The server reports unload failure                     |

## Example

```typescript
await unloadModel({ modelId: "model-123" });

// Unload and clear cached files
await unloadModel({ modelId: "model-123", clearStorage: true });
```


# cancel( ) (/v0.9.0/sdk/api/cancel)


```ts
function cancel(params: CancelParams): Promise<void>;
```

## Parameters

| Name   | Type                            | Required? | Description                         |
| ------ | ------------------------------- | :-------: | ----------------------------------- |
| params | [`CancelParams`](#cancelparams) |     ✓     | The parameters for the cancellation |

### `CancelParams`

Discriminated union on `operation`. One of:

#### Cancel inference

| Field     | Type          | Required? | Description                          |
| --------- | ------------- | :-------: | ------------------------------------ |
| operation | `"inference"` |     ✓     | Operation type                       |
| modelId   | `string`      |     ✓     | The model ID to cancel inference for |

#### Cancel download

| Field       | Type              | Required? | Default | Description                                |
| ----------- | ----------------- | :-------: | ------- | ------------------------------------------ |
| operation   | `"downloadAsset"` |     ✓     | —       | Operation type                             |
| downloadKey | `string`          |     ✓     | —       | The download key to cancel                 |
| clearCache  | `boolean`         |     ✗     | `false` | If true, deletes the partial download file |

#### Cancel RAG

| Field     | Type     | Required? | Default     | Description                 |
| --------- | -------- | :-------: | ----------- | --------------------------- |
| operation | `"rag"`  |     ✓     | —           | Operation type              |
| workspace | `string` |     ✗     | `"default"` | The RAG workspace to cancel |

## Returns

`Promise<void>` — Resolves when the operation is cancelled.

## Throws

| Error                   | When                                             |
| ----------------------- | ------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"cancel"` |
| `CANCEL_FAILED`         | The server reports cancellation failure          |

## Examples

```ts
// Cancel inference
await cancel({ operation: "inference", modelId: "model-123" });
```

```ts
// Pause download (preserves partial file for automatic resume)
await cancel({ operation: "downloadAsset", downloadKey: "download-key" });
```

```ts
// Cancel download completely (deletes partial file)
await cancel({ operation: "downloadAsset", downloadKey: "download-key", clearCache: true });
```

```ts
// Cancel RAG operation on default workspace
await cancel({ operation: "rag" });
```

```ts
// Cancel RAG operation on specific workspace
await cancel({ operation: "rag", workspace: "my-workspace" });
```


# close( ) (/v0.9.0/sdk/api/close)


```ts
function close(): Promise<void>;
```

Safe to call multiple times — subsequent calls are a no-op if already closed.

## Returns

`Promise<void>` — Resolves when the connection is closed.

## Example

```typescript
import { loadModel, completion, close } from "@qvac/sdk";

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
});

const result = completion({
  modelId,
  history: [{ role: "user", content: "Hello" }],
});

console.log(await result.text);

await close();
```


# completion( ) (/v0.9.0/sdk/api/completion)


```ts
function completion(params: CompletionParams): {
  tokenStream: AsyncGenerator<string>;
  toolCallStream: AsyncGenerator<ToolCallEvent>;
  text: Promise<string>;
  toolCalls: Promise<ToolCallWithCall[]>;
  stats: Promise<CompletionStats | undefined>;
};
```

## Parameters

| Name   | Type                                    | Required? | Description               |
| ------ | --------------------------------------- | :-------: | ------------------------- |
| params | [`CompletionParams`](#completionparams) |     ✓     | The completion parameters |

### `CompletionParams`

| Field            | Type                                               | Required? | Default | Description                                                         |
| ---------------- | -------------------------------------------------- | :-------: | ------- | ------------------------------------------------------------------- |
| modelId          | `string`                                           |     ✓     | —       | The identifier of the model to use for completion                   |
| history          | [`HistoryMessage[]`](#historymessage)              |     ✓     | —       | Array of conversation messages                                      |
| stream           | `boolean`                                          |     ✗     | `true`  | Whether to stream tokens or return complete response                |
| tools            | [`Tool[]`](#tool) `\|` [`ToolInput[]`](#toolinput) |     ✗     | —       | Optional array of tools (Zod-schema ToolInput or full Tool objects) |
| mcp              | [`McpClientInput[]`](#mcpclientinput)              |     ✗     | —       | Optional array of MCP client inputs for tool integration            |
| kvCache          | `boolean \| string`                                |     ✗     | —       | KV cache configuration — see [kvCache](#kvcache)                    |
| generationParams | [`GenerationParams`](#generationparams)            |     ✗     | —       | Optional sampling / generation parameters                           |
| rpcOptions       | [`RPCOptions`](../index#rpcoptions)                |     ✗     | —       | Optional RPC transport options                                      |

#### `HistoryMessage`

| Field       | Type                          | Required? | Description                                              |
| ----------- | ----------------------------- | :-------: | -------------------------------------------------------- |
| role        | `string`                      |     ✓     | Message role (e.g., `"user"`, `"assistant"`, `"system"`) |
| content     | `string`                      |     ✓     | Message content                                          |
| attachments | [`Attachment[]`](#attachment) |     ✗     | Optional file attachments for multimodal models          |

##### `Attachment`

| Field | Type     | Required? | Description                 |
| ----- | -------- | :-------: | --------------------------- |
| path  | `string` |     ✓     | File path to the attachment |

#### `Tool`

Full tool definition in JSON Schema format (sent to the model):

| Field                 | Type                                            | Required? | Description                                    |
| --------------------- | ----------------------------------------------- | :-------: | ---------------------------------------------- |
| type                  | `"function"`                                    |     ✓     | Always `"function"`                            |
| name                  | `string`                                        |     ✓     | Tool name                                      |
| description           | `string`                                        |     ✓     | What the tool does                             |
| parameters            | `object`                                        |     ✓     | JSON Schema object describing the tool's input |
| parameters.type       | `"object"`                                      |     ✓     | Always `"object"`                              |
| parameters.properties | `Record<string, { type, description?, enum? }>` |     ✓     | Parameter definitions                          |
| parameters.required   | `string[]`                                      |     ✗     | Required parameter names                       |

#### `ToolInput`

Simplified tool definition using Zod schemas (auto-converted to `Tool` internally):

| Field       | Type                                                  | Required? | Description                                                                                        |
| ----------- | ----------------------------------------------------- | :-------: | -------------------------------------------------------------------------------------------------- |
| name        | `string`                                              |     ✓     | Tool name                                                                                          |
| description | `string`                                              |     ✓     | What the tool does                                                                                 |
| parameters  | `ZodObject`                                           |     ✓     | Zod schema describing the tool's input                                                             |
| handler     | `(args: Record<string, unknown>) => Promise<unknown>` |     ✗     | Handler function — when provided, returned `ToolCallWithCall` objects include an `invoke()` method |

#### `McpClientInput`

MCP (Model Context Protocol) client input for tool integration:

| Field            | Type                      | Required? | Description                      |
| ---------------- | ------------------------- | :-------: | -------------------------------- |
| client           | [`McpClient`](#mcpclient) |     ✓     | An MCP client instance           |
| includeResources | `boolean`                 |     ✗     | Whether to include MCP resources |

##### `McpClient`

Duck-typed MCP client interface:

| Method        | Signature                                                     | Required? | Description               |
| ------------- | ------------------------------------------------------------- | :-------: | ------------------------- |
| listTools     | `() => Promise<{ tools: McpTool[] }>`                         |     ✓     | Lists available tools     |
| callTool      | `(params: { name, arguments }) => Promise<McpToolCallResult>` |     ✓     | Calls a tool by name      |
| listResources | `() => Promise<{ resources: McpResource[] }>`                 |     ✗     | Lists available resources |
| readResource  | `(params: { uri }) => Promise<{ contents: unknown[] }>`       |     ✗     | Reads a resource by URI   |

#### `kvCache`

Cache files are organized hierarchically: `{kvCacheKey}/{modelId}/{configHash}.bin`

The `configHash` includes model config + system prompt to ensure cache isolation.

| Value                 | Behavior                                                       |
| --------------------- | -------------------------------------------------------------- |
| `true`                | Auto-generate cache key based on conversation history          |
| `"custom-key"`        | Use provided string as cache key for manual session management |
| `false` / `undefined` | No caching                                                     |

When cache exists, only the last message is sent to the model (includes multimodal attachments). Use [`deleteCache()`](../deleteCache) to remove cached sessions.

#### `GenerationParams`

Optional sampling and generation parameters (strict — no extra keys allowed):

| Field              | Type     | Required? | Description                                                                 |
| ------------------ | -------- | :-------: | --------------------------------------------------------------------------- |
| temp               | `number` |     ✗     | Temperature (0–2)                                                           |
| top\_p             | `number` |     ✗     | Top-p (nucleus) sampling (0–1)                                              |
| top\_k             | `number` |     ✗     | Top-k sampling                                                              |
| predict            | `number` |     ✗     | Max tokens to predict. `-1` = until stop token, `-2` = until context filled |
| seed               | `number` |     ✗     | Random seed for reproducibility                                             |
| frequency\_penalty | `number` |     ✗     | Frequency penalty                                                           |
| presence\_penalty  | `number` |     ✗     | Presence penalty                                                            |
| repeat\_penalty    | `number` |     ✗     | Repeat penalty                                                              |

## Returns

`object` — Object with the following fields:

| Field          | Type                                                            | Description                                                                          |
| -------------- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| tokenStream    | `AsyncGenerator<string>`                                        | Stream of generated tokens                                                           |
| toolCallStream | `AsyncGenerator<`[`ToolCallEvent`](#toolcallevent)`>`           | Stream of tool call events                                                           |
| text           | `Promise<string>`                                               | Complete generated text (resolves after stream ends)                                 |
| toolCalls      | `Promise<`[`ToolCallWithCall[]`](#toolcallwithcall)`>`          | Tool calls made during completion (may include `invoke()` when handler is available) |
| stats          | `Promise<`[`CompletionStats`](#completionstats) `\| undefined>` | Performance statistics                                                               |

### `CompletionStats`

| Field            | Type                          | Description                         |
| ---------------- | ----------------------------- | ----------------------------------- |
| timeToFirstToken | `number`                      | Time to first token in milliseconds |
| tokensPerSecond  | `number`                      | Tokens generated per second         |
| cacheTokens      | `number`                      | Number of cached tokens             |
| backendDevice    | `"cpu" \| "gpu" \| undefined` | Compute backend used for inference  |

### `ToolCallEvent`

Discriminated union on `type`. One of:

**Tool call:**

| Field          | Type                      | Description              |
| -------------- | ------------------------- | ------------------------ |
| type           | `"toolCall"`              | Event type               |
| call.id        | `string`                  | Call identifier          |
| call.name      | `string`                  | Tool name                |
| call.arguments | `Record<string, unknown>` | Tool arguments           |
| call.raw       | `string`                  | Raw call text (optional) |

**Tool call error:**

| Field         | Type                                                    | Description         |
| ------------- | ------------------------------------------------------- | ------------------- |
| type          | `"toolCallError"`                                       | Event type          |
| error.code    | `"PARSE_ERROR" \| "VALIDATION_ERROR" \| "UNKNOWN_TOOL"` | Error code          |
| error.message | `string`                                                | Error message       |
| error.raw     | `string`                                                | Raw text (optional) |

### `ToolCallWithCall`

Extends `ToolCall` with an optional `invoke()` method:

| Field     | Type                      | Description                                                              |
| --------- | ------------------------- | ------------------------------------------------------------------------ |
| id        | `string`                  | Call identifier                                                          |
| name      | `string`                  | Tool name                                                                |
| arguments | `Record<string, unknown>` | Tool arguments                                                           |
| raw       | `string`                  | Raw call text (optional)                                                 |
| invoke    | `() => Promise<unknown>`  | Executes the tool handler (present when a matching handler was provided) |

## Throws

| Error                 | When                         |
| --------------------- | ---------------------------- |
| `INVALID_TOOLS_ARRAY` | Invalid tools array provided |
| `INVALID_TOOL_SCHEMA` | A tool has an invalid schema |

## Example

```typescript
import { z } from "zod";

const result = completion({
  modelId: "llama-2",
  history: [
    { role: "user", content: "What's the weather in Tokyo?" }
  ],
  stream: true,
  tools: [{
    name: "get_weather",
    description: "Get current weather",
    parameters: z.object({
      city: z.string().describe("City name"),
    }),
    handler: async (args) => {
      return { temperature: 22, condition: "sunny" };
    }
  }],
  generationParams: {
    temp: 0.7,
    top_p: 0.9,
  }
});

for await (const token of result.tokenStream) {
  process.stdout.write(token);
}

for (const toolCall of await result.toolCalls) {
  if (toolCall.invoke) {
    const toolResult = await toolCall.invoke();
    console.log(toolResult);
  }
}
```


# defineDuplexHandler( ) (/v0.9.0/sdk/api/defineDuplexHandler)


```ts
function defineDuplexHandler<TRequest extends ZodType, TResponse extends ZodType>(
  definition: DuplexPluginHandlerDefinition<TRequest, TResponse>
): PluginHandlerDefinition<TRequest, TResponse>;
```

## Parameters

| Name       | Type                                                              | Required? | Description                                                     |
| ---------- | ----------------------------------------------------------------- | :-------: | --------------------------------------------------------------- |
| definition | [`DuplexPluginHandlerDefinition`](#duplexpluginhandlerdefinition) |     ✓     | The duplex handler definition with schemas and handler function |

### `DuplexPluginHandlerDefinition`

| Field          | Type                                                                        | Required? | Description                                                                                               |
| -------------- | --------------------------------------------------------------------------- | :-------: | --------------------------------------------------------------------------------------------------------- |
| requestSchema  | `ZodType`                                                                   |     ✓     | Zod schema for validating incoming requests                                                               |
| responseSchema | `ZodType`                                                                   |     ✓     | Zod schema for validating outgoing responses                                                              |
| streaming      | `true`                                                                      |     ✓     | Must be `true` — duplex handlers are always streaming                                                     |
| duplex         | `true`                                                                      |     ✓     | Must be `true` — marks this handler as bidirectional                                                      |
| handler        | `(request, inputStream: AsyncIterable<Buffer>) => AsyncGenerator<response>` |     ✓     | The handler function — receives a validated request and an input stream, yields validated response chunks |

## Returns

`PluginHandlerDefinition<TRequest, TResponse>` — The same definition object, with full type inference applied. This is an identity function used for type checking.


# defineHandler( ) (/v0.9.0/sdk/api/defineHandler)


```ts
function defineHandler<TRequest extends ZodType, TResponse extends ZodType>(
  definition: PluginHandlerDefinition<TRequest, TResponse>
): PluginHandlerDefinition<TRequest, TResponse>;
```

## Parameters

| Name       | Type                                                  | Required? | Description                                              |
| ---------- | ----------------------------------------------------- | :-------: | -------------------------------------------------------- |
| definition | [`PluginHandlerDefinition`](#pluginhandlerdefinition) |     ✓     | The handler definition with schemas and handler function |

### `PluginHandlerDefinition`

| Field          | Type                                                         | Required? | Description                                                                   |
| -------------- | ------------------------------------------------------------ | :-------: | ----------------------------------------------------------------------------- |
| requestSchema  | `ZodType`                                                    |     ✓     | Zod schema for validating incoming requests                                   |
| responseSchema | `ZodType`                                                    |     ✓     | Zod schema for validating outgoing responses                                  |
| streaming      | `boolean`                                                    |     ✓     | Whether this handler uses streaming responses                                 |
| handler        | `(request) => Promise<response> \| AsyncGenerator<response>` |     ✓     | The handler function — receives validated request, returns validated response |

## Returns

`PluginHandlerDefinition<TRequest, TResponse>` — The same definition object, with full type inference applied. This is an identity function used for type checking.


# definePlugin( ) (/v0.9.0/sdk/api/definePlugin)


```ts
function definePlugin<T extends QvacPlugin>(plugin: T): T;
```

## Parameters

| Name   | Type                        | Required? | Description           |
| ------ | --------------------------- | :-------: | --------------------- |
| plugin | [`QvacPlugin`](#qvacplugin) |     ✓     | The plugin definition |

### `QvacPlugin`

| Field                          | Type                                                                                                 | Required? | Description                                                                                                                                                                                                  |
| ------------------------------ | ---------------------------------------------------------------------------------------------------- | :-------: | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| modelType                      | `string`                                                                                             |     ✓     | Unique identifier for the model type this plugin handles                                                                                                                                                     |
| displayName                    | `string`                                                                                             |     ✓     | Human-readable name for the plugin                                                                                                                                                                           |
| addonPackage                   | `string`                                                                                             |     ✓     | The npm package name of the addon this plugin wraps                                                                                                                                                          |
| createModel                    | `(params:` [`CreateModelParams`](#createmodelparams)`) =>` [`PluginModelResult`](#pluginmodelresult) |     ✓     | Factory function that creates a model instance                                                                                                                                                               |
| handlers                       | `Record<string,` [`PluginHandlerDefinition`](../defineHandler#pluginhandlerdefinition)`>`            |     ✓     | Map of handler names to handler definitions                                                                                                                                                                  |
| logging                        | [`PluginLogging`](#pluginlogging)                                                                    |     ✗     | Optional logging configuration                                                                                                                                                                               |
| loadConfigSchema               | `ZodType`                                                                                            |     ✗     | Zod schema used to validate and parse the `modelConfig` passed to `loadModel()`                                                                                                                              |
| resolveConfig                  | `(modelConfig, ctx) => Promise<ResolveResult>`                                                       |     ✗     | Optional hook to resolve model sources in modelConfig to local paths. Called before `createModel` if the plugin needs to download/resolve artifacts. Returns transformed config and optional artifact paths. |
| skipPrimaryModelPathValidation | `boolean`                                                                                            |     ✗     | When true, skips file-existence validation for modelPath. Use for plugins that derive paths from config.                                                                                                     |

### `CreateModelParams`

| Field       | Type                      | Required? | Description                                                                               |
| ----------- | ------------------------- | :-------: | ----------------------------------------------------------------------------------------- |
| modelId     | `string`                  |     ✓     | The model identifier                                                                      |
| modelPath   | `string`                  |     ✓     | Path to the model file                                                                    |
| modelConfig | `Record<string, unknown>` |     ✗     | Model-specific configuration                                                              |
| modelName   | `string`                  |     ✗     | Human-readable model name                                                                 |
| artifacts   | `Record<string, string>`  |     ✗     | Additional file paths (e.g., `projectionModelPath`, `vadModelPath`, `ttsConfigModelPath`) |

### `PluginModelResult`

| Field  | Type                          | Description                      |
| ------ | ----------------------------- | -------------------------------- |
| model  | [`PluginModel`](#pluginmodel) | The model instance               |
| loader | `unknown`                     | Engine-specific loader reference |

### `PluginModel`

| Method | Signature                            | Required? | Description                 |
| ------ | ------------------------------------ | :-------: | --------------------------- |
| load   | `(force?: boolean) => Promise<void>` |     ✓     | Loads the model into memory |
| unload | `() => void \| Promise<void>`        |     ✗     | Releases model resources    |

### `PluginLogging`

| Field     | Type      | Required? | Description              |
| --------- | --------- | :-------: | ------------------------ |
| module    | `unknown` |     ✓     | The addon logging module |
| namespace | `string`  |     ✓     | Logger namespace         |

## Returns

`T` — The same plugin object, with full type inference applied. This is an identity function used for type checking.

## Note

See the [Write a custom plugin](/sdk/examples/utilities/write-custom-plugin) guide for a full walkthrough.


# deleteCache( ) (/v0.9.0/sdk/api/deleteCache)


```ts
function deleteCache(params: DeleteCacheParams): Promise<{ success: boolean }>;
```

## Parameters

| Name   | Type                                      | Required? | Description                 |
| ------ | ----------------------------------------- | :-------: | --------------------------- |
| params | [`DeleteCacheParams`](#deletecacheparams) |     ✓     | The delete cache parameters |

### `DeleteCacheParams`

Union type. One of:

#### Delete all caches

| Field | Type   | Required? | Description             |
| ----- | ------ | :-------: | ----------------------- |
| all   | `true` |     ✓     | Deletes all cache files |

#### Delete by cache key

| Field      | Type     | Required? | Description                                                                                      |
| ---------- | -------- | :-------: | ------------------------------------------------------------------------------------------------ |
| kvCacheKey | `string` |     ✓     | The cache key to delete                                                                          |
| modelId    | `string` |     ✗     | Specific model ID to delete within the cache key. If not provided, deletes the entire cache key. |

## Returns

`Promise<{ success: boolean }>` — Resolves with the success status.

## Throws

| Error                         | When                                        |
| ----------------------------- | ------------------------------------------- |
| `INVALID_DELETE_CACHE_PARAMS` | Neither `all` nor `kvCacheKey` was provided |
| `DELETE_CACHE_FAILED`         | The server reports cache deletion failure   |

## Examples

```typescript
// Delete all caches
await deleteCache({ all: true });

// Delete entire cache key (all models)
await deleteCache({ kvCacheKey: "my-session" });

// Delete only specific model within cache key
await deleteCache({ kvCacheKey: "my-session", modelId: "model-abc123" });
```


# diffusion( ) (/v0.9.0/sdk/api/diffusion)


```ts
function diffusion(params: DiffusionClientParams): {
  progressStream: AsyncGenerator<DiffusionProgressTick>;
  outputs: Promise<Uint8Array[]>;
  stats: Promise<DiffusionStats | undefined>;
};
```

## Parameters

| Name   | Type                                              | Required? | Description              |
| ------ | ------------------------------------------------- | :-------: | ------------------------ |
| params | [`DiffusionClientParams`](#diffusionclientparams) |     ✓     | The diffusion parameters |

### `DiffusionClientParams`

| Field            | Type                                | Required? | Description                                                                                       |
| ---------------- | ----------------------------------- | :-------: | ------------------------------------------------------------------------------------------------- |
| modelId          | `string`                            |     ✓     | The identifier of the loaded diffusion model                                                      |
| prompt           | `string`                            |     ✓     | Text prompt describing the image to generate                                                      |
| negative\_prompt | `string`                            |     ✗     | Text describing what to avoid in the generated image                                              |
| width            | `number`                            |     ✗     | Image width in pixels (must be a multiple of 8)                                                   |
| height           | `number`                            |     ✗     | Image height in pixels (must be a multiple of 8)                                                  |
| steps            | `number`                            |     ✗     | Number of diffusion steps                                                                         |
| cfg\_scale       | `number`                            |     ✗     | Classifier-free guidance scale for SD 1.x / 2.x / XL / SD3 models (typical range 1–20, default 7) |
| guidance         | `number`                            |     ✗     | Distilled guidance for FLUX models (typical range 1–10, default 3.5)                              |
| sampling\_method | [`SamplingMethod`](#samplingmethod) |     ✗     | Sampling algorithm                                                                                |
| scheduler        | [`Scheduler`](#scheduler)           |     ✗     | Noise scheduler                                                                                   |
| seed             | `number`                            |     ✗     | Random seed for reproducibility                                                                   |
| batch\_count     | `number`                            |     ✗     | Number of images to generate                                                                      |
| vae\_tiling      | `boolean`                           |     ✗     | Enable VAE tiling for large images on limited VRAM                                                |
| cache\_preset    | `string`                            |     ✗     | Cache preset identifier                                                                           |

#### `SamplingMethod`

`"euler" | "euler_a" | "heun" | "dpm2" | "dpm++2m" | "dpm++2mv2" | "dpm++2s_a" | "lcm" | "ipndm" | "ipndm_v" | "ddim_trailing" | "tcd" | "res_multistep" | "res_2s"`

#### `Scheduler`

`"discrete" | "karras" | "exponential" | "ays" | "gits" | "sgm_uniform" | "simple" | "lcm" | "smoothstep" | "kl_optimal" | "bong_tangent"`

## Returns

`object` — Object with the following fields:

| Field          | Type                                                                  | Description                                                  |
| -------------- | --------------------------------------------------------------------- | ------------------------------------------------------------ |
| progressStream | `AsyncGenerator<`[`DiffusionProgressTick`](#diffusionprogresstick)`>` | Stream of generation progress ticks                          |
| outputs        | `Promise<Uint8Array[]>`                                               | Generated image buffers (resolves when generation completes) |
| stats          | `Promise<`[`DiffusionStats`](#diffusionstats) `\| undefined>`         | Performance statistics                                       |

### `DiffusionProgressTick`

| Field      | Type     | Description                  |
| ---------- | -------- | ---------------------------- |
| step       | `number` | Current diffusion step       |
| totalSteps | `number` | Total number of steps        |
| elapsedMs  | `number` | Elapsed time in milliseconds |

### `DiffusionStats`

| Field             | Type                  | Description                            |
| ----------------- | --------------------- | -------------------------------------- |
| modelLoadMs       | `number \| undefined` | Model loading time in milliseconds     |
| generationMs      | `number \| undefined` | Single generation time in milliseconds |
| totalGenerationMs | `number \| undefined` | Total generation time in milliseconds  |
| totalWallMs       | `number \| undefined` | Total wall-clock time in milliseconds  |
| totalSteps        | `number \| undefined` | Total diffusion steps performed        |
| totalGenerations  | `number \| undefined` | Number of generations completed        |
| totalImages       | `number \| undefined` | Number of images produced              |
| totalPixels       | `number \| undefined` | Total pixels generated                 |
| width             | `number \| undefined` | Output image width                     |
| height            | `number \| undefined` | Output image height                    |
| seed              | `number \| undefined` | Seed used for generation               |

## Example

```typescript
import fs from "fs";

// Basic usage
const { outputs, stats } = diffusion({ modelId, prompt: "a cat" });
const buffers = await outputs;
fs.writeFileSync("output.png", buffers[0]);

// With progress tracking
const { progressStream, outputs: images } = diffusion({
  modelId,
  prompt: "a cat sitting on a windowsill",
  width: 512,
  height: 512,
  steps: 20,
  cfg_scale: 7,
});

for await (const { step, totalSteps } of progressStream) {
  console.log(`${step}/${totalSteps}`);
}

const imageBuffers = await images;
```


# downloadAsset( ) (/v0.9.0/sdk/api/downloadAsset)


```ts
function downloadAsset(options: DownloadAssetOptions, rpcOptions?: RPCOptions): Promise<string>;
```

## Parameters

| Name       | Type                                            | Required? | Description                    |
| ---------- | ----------------------------------------------- | :-------: | ------------------------------ |
| options    | [`DownloadAssetOptions`](#downloadassetoptions) |     ✓     | Download configuration         |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)             |     ✗     | Optional RPC transport options |

### `DownloadAssetOptions`

| Field      | Type                                                                              | Required? | Description                                                                                           |
| ---------- | --------------------------------------------------------------------------------- | :-------: | ----------------------------------------------------------------------------------------------------- |
| assetSrc   | `string`                                                                          |     ✓     | The location from which the asset is downloaded (local path, remote URL, or Hyperdrive `pear://` URL) |
| seed       | `boolean`                                                                         |     ✗     | Whether to seed the asset on Hyperdrive after download                                                |
| onProgress | `(progress:` [`ModelProgressUpdate`](../loadModel#modelprogressupdate)`) => void` |     ✗     | Callback for real-time download progress updates. When provided, streaming is used.                   |

## Returns

`Promise<string>` — Resolves to the asset ID (either the provided `assetSrc` or a generated ID).

## Throws

| Error                           | When                                                              |
| ------------------------------- | ----------------------------------------------------------------- |
| `DOWNLOAD_ASSET_FAILED`         | The download operation fails                                      |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends without a final response (when using `onProgress`) |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"downloadAsset"`           |

## Example

```typescript
// Download model without loading
const assetId = await downloadAsset({
  assetSrc: "/path/to/model.gguf",
  seed: true
});

// Download with progress tracking
const assetId = await downloadAsset({
  assetSrc: "pear://key123/model.gguf",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});
```


# embed( ) (/v0.9.0/sdk/api/embed)


```ts
function embed(params: { modelId: string; text: string }, options?: RPCOptions): Promise<{ embedding: number[]; stats?: EmbedStats }>;
function embed(params: { modelId: string; text: string[] }, options?: RPCOptions): Promise<{ embedding: number[][]; stats?: EmbedStats }>;
```

## Parameters

| Name    | Type                                      | Required? | Description                    |
| ------- | ----------------------------------------- | :-------: | ------------------------------ |
| params  | [`EmbedParams`](#embedparams)             |     ✓     | The embedding parameters       |
| options | [`RPCOptions`](./shared-types#rpcoptions) |     ✗     | Optional RPC transport options |

### `EmbedParams`

| Field   | Type                 | Required? | Description                                                                                    |
| ------- | -------------------- | :-------: | ---------------------------------------------------------------------------------------------- |
| modelId | `string`             |     ✓     | The identifier of the embedding model to use                                                   |
| text    | `string \| string[]` |     ✓     | The input text(s) to embed. A single string returns `number[]`; an array returns `number[][]`. |

## Returns

`Promise<object>` — Resolves to an object with the following fields:

| Field     | Type                                        | Description                                                                                               |
| --------- | ------------------------------------------- | --------------------------------------------------------------------------------------------------------- |
| embedding | `number[] \| number[][]`                    | The embedding vector(s). Single `number[]` when `text` is a string; `number[][]` when `text` is an array. |
| stats     | [`EmbedStats`](#embedstats) ` \| undefined` | Performance statistics                                                                                    |

### `EmbedStats`

| Field           | Type                          | Description                          |
| --------------- | ----------------------------- | ------------------------------------ |
| totalTime       | `number \| undefined`         | Total embedding time in milliseconds |
| tokensPerSecond | `number \| undefined`         | Tokens processed per second          |
| totalTokens     | `number \| undefined`         | Total tokens processed               |
| backendDevice   | `"cpu" \| "gpu" \| undefined` | Compute backend used for inference   |

## Throws

| Error                   | When                                            |
| ----------------------- | ----------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"embed"` |

## Examples

```typescript
// Single text
const { embedding, stats } = await embed({
  modelId: "embedding-model",
  text: "Hello world",
});
console.log(embedding.length); // e.g. 384
console.log(stats?.tokensPerSecond);

// Multiple texts (batch)
const { embedding: vectors } = await embed({
  modelId: "embedding-model",
  text: ["Hello world", "How are you?"],
});
console.log(vectors.length); // 2
```


# Errors (/v0.9.0/sdk/api/errors)


## Example

```typescript
import { SDK_CLIENT_ERROR_CODES, SDK_SERVER_ERROR_CODES } from "@qvac/sdk";

try {
  await loadModel({ modelSrc: "/path/to/model.gguf", modelType: "llm" });
} catch (error) {
  if (error.code === SDK_SERVER_ERROR_CODES.MODEL_LOAD_FAILED) {
    // handle model load failure
  }
}
```

## Client errors

Thrown on the client side (response validation, RPC, provider). Access via `SDK_CLIENT_ERROR_CODES.{ERROR_NAME}`.

| Error                             | Code  | Summary                                                     | Thrown by                                                                                                                                                                                                                                                                       |
| --------------------------------- | ----- | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE`           | 50001 | Invalid response type received.                             | `cancel()`, `downloadAsset()`, `embed()`, `finetune()`, `getModelInfo()`, `heartbeat()`, `loadModel()`, `loggingStream()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`, `resume()`, `startQVACProvider()`, `stopQVACProvider()`, `suspend()`, `unloadModel()` |
| `INVALID_OPERATION_IN_RESPONSE`   | 50002 | Response operation didn't match the expected RAG operation. | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                                                                   |
| `STREAM_ENDED_WITHOUT_RESPONSE`   | 50003 | Streaming RPC ended without a final response.               | `downloadAsset()`, `finetune()`, `loadModel()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                   |
| `INVALID_AUDIO_CHUNK_TYPE`        | 50004 | Invalid audio chunk input type provided.                    | `transcribe()`, `transcribeStream()`                                                                                                                                                                                                                                            |
| `INVALID_TOOLS_ARRAY`             | 50005 | Invalid tools array provided.                               | `completion()`                                                                                                                                                                                                                                                                  |
| `INVALID_TOOL_SCHEMA`             | 50006 | Invalid tool schema provided.                               | `completion()`                                                                                                                                                                                                                                                                  |
| `OCR_FAILED`                      | 50007 | OCR operation failed.                                       | `ocr()`                                                                                                                                                                                                                                                                         |
| `RPC_NO_HANDLER`                  | 50200 | No handler registered for request type.                     | Internal RPC layer                                                                                                                                                                                                                                                              |
| `RPC_REQUEST_NOT_SENT`            | 50201 | Request has not been sent yet.                              | Internal RPC layer                                                                                                                                                                                                                                                              |
| `RPC_RESPONSE_STREAM_NOT_CREATED` | 50202 | Response stream not created.                                | Internal RPC layer                                                                                                                                                                                                                                                              |
| `RPC_CONNECTION_FAILED`           | 50203 | RPC connection failed.                                      | Any API call (startup/transport)                                                                                                                                                                                                                                                |
| `PROVIDER_START_FAILED`           | 50400 | Failed to start provider.                                   | `startQVACProvider()`                                                                                                                                                                                                                                                           |
| `PROVIDER_STOP_FAILED`            | 50401 | Failed to stop provider.                                    | `stopQVACProvider()`                                                                                                                                                                                                                                                            |
| `DELEGATE_NO_FINAL_RESPONSE`      | 50402 | No final response received from delegated provider.         | `loadModel()`, `completion()`                                                                                                                                                                                                                                                   |
| `DELEGATE_PROVIDER_ERROR`         | 50403 | Delegated provider returned an error.                       | `loadModel()`, `completion()`                                                                                                                                                                                                                                                   |
| `DELEGATE_CONNECTION_FAILED`      | 50404 | Failed to connect to delegated provider.                    | `loadModel()`, `completion()`                                                                                                                                                                                                                                                   |
| `SDK_NOT_FOUND_IN_NODE_MODULES`   | 50600 | QVAC SDK not found in node\_modules.                        | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `WORKER_FILE_NOT_FOUND`           | 50601 | Worker file not found.                                      | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `CONFIG_FILE_NOT_FOUND`           | 50602 | Config file not found.                                      | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `CONFIG_FILE_INVALID`             | 50603 | Config file is invalid.                                     | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `CONFIG_FILE_PARSE_FAILED`        | 50604 | Failed to import/parse a `qvac.config.*` file.              | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `CONFIG_VALIDATION_FAILED`        | 50605 | Config validation failed (schema mismatch / invalid JSON).  | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `PEAR_WORKER_ENTRY_REQUIRED`      | 50606 | No plugins registered; Pear worker entry required.          | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `MULTIPLE_SDK_INSTALLATIONS`      | 50607 | Multiple QVAC SDK installations found.                      | Any API call (during SDK initialization)                                                                                                                                                                                                                                        |
| `PROFILER_INVALID_CAPACITY`       | 50800 | Ring buffer capacity is below minimum.                      | `profiler.enable()`                                                                                                                                                                                                                                                             |

## Server errors

Thrown by the server (model operations, downloads, cache, RAG). Access via `SDK_SERVER_ERROR_CODES.{ERROR_NAME}`.

| Error                                  | Code  | Summary                                                         | Thrown by                                                                                                                                                                                                                      |
| -------------------------------------- | ----- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `MODEL_ALREADY_REGISTERED`             | 52001 | Model with the same ID is already registered.                   | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_NOT_FOUND`                      | 52002 | Model ID not found in the registry.                             | `completion()`, `embed()`, `getModelInfo()`, `loggingStream()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`, `unloadModel()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()` |
| `MODEL_NOT_LOADED`                     | 52003 | Model exists but is not loaded.                                 | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`                                                                                                                      |
| `MODEL_IS_DELEGATED`                   | 52004 | Model is delegated and cannot be accessed as a local model.     | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`                                                                                                                      |
| `UNKNOWN_MODEL_TYPE`                   | 52005 | Unknown `modelType` in `loadModel()`.                           | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_LOAD_FAILED`                    | 52200 | Failed to load model (generic).                                 | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_NOT_FOUND`                 | 52201 | Model file not found at given path.                             | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_NOT_FOUND_IN_DIR`          | 52202 | Expected model file not found in directory (e.g., by type).     | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_FILE_LOCATE_FAILED`             | 52203 | Failed to locate model file for the given `modelType`.          | `loadModel()`                                                                                                                                                                                                                  |
| `PROJECTION_MODEL_REQUIRED`            | 52204 | Projection model source is required for multimodal LLM models.  | `loadModel()`                                                                                                                                                                                                                  |
| `VAD_MODEL_REQUIRED`                   | 52205 | VAD model source is required for this configuration.            | `loadModel()`                                                                                                                                                                                                                  |
| `TTS_ARTIFACTS_REQUIRED`               | 52208 | TTS (Chatterbox) model artifacts are missing.                   | `loadModel()`                                                                                                                                                                                                                  |
| `TTS_REFERENCE_AUDIO_REQUIRED`         | 52209 | TTS (Chatterbox) reference audio is required for voice cloning. | `loadModel()`                                                                                                                                                                                                                  |
| `PARAKEET_ARTIFACTS_REQUIRED`          | 52210 | Parakeet model sources are missing.                             | `loadModel()`                                                                                                                                                                                                                  |
| `MODEL_UNLOAD_FAILED`                  | 52400 | Failed to unload model.                                         | `unloadModel()`                                                                                                                                                                                                                |
| `EMBED_FAILED`                         | 52401 | Failed to generate embeddings.                                  | `embed()`                                                                                                                                                                                                                      |
| `EMBED_NO_EMBEDDINGS`                  | 52402 | No embeddings returned from model.                              | `embed()`                                                                                                                                                                                                                      |
| `TRANSCRIPTION_FAILED`                 | 52403 | Transcription failed.                                           | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `AUDIO_FILE_NOT_FOUND`                 | 52404 | Audio file not found or not accessible.                         | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `TRANSLATION_FAILED`                   | 52405 | Translation failed.                                             | `translate()`                                                                                                                                                                                                                  |
| `COMPLETION_FAILED`                    | 52406 | Completion failed.                                              | `completion()`                                                                                                                                                                                                                 |
| `ATTACHMENT_NOT_FOUND`                 | 52407 | Attachment file not found at path.                              | `completion()`                                                                                                                                                                                                                 |
| `CANCEL_FAILED`                        | 52408 | Failed to cancel operation.                                     | `cancel()`                                                                                                                                                                                                                     |
| `TEXT_TO_SPEECH_FAILED`                | 52409 | Text-to-speech operation failed.                                | `textToSpeech()`                                                                                                                                                                                                               |
| `CONFIG_RELOAD_NOT_SUPPORTED`          | 52410 | Model does not support hot config reload.                       | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `MODEL_TYPE_MISMATCH`                  | 52411 | Model type mismatch (expected vs provided).                     | `completion()`, `embed()`, `ocr()`, `textToSpeech()`, `transcribe()`, `transcribeStream()`, `translate()`, `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                       |
| `OCR_FAILED`                           | 52412 | OCR operation failed.                                           | `ocr()`                                                                                                                                                                                                                        |
| `IMAGE_FILE_NOT_FOUND`                 | 52413 | Image file not found or not accessible.                         | `ocr()`                                                                                                                                                                                                                        |
| `INVALID_IMAGE_INPUT`                  | 52414 | Invalid image input type provided.                              | `ocr()`                                                                                                                                                                                                                        |
| `RAG_SAVE_FAILED`                      | 52800 | Failed to save embeddings.                                      | `ragSaveEmbeddings()`                                                                                                                                                                                                          |
| `RAG_SEARCH_FAILED`                    | 52801 | Failed to search embeddings.                                    | `ragSearch()`                                                                                                                                                                                                                  |
| `RAG_DELETE_FAILED`                    | 52802 | Failed to delete embeddings.                                    | `ragDeleteEmbeddings()`                                                                                                                                                                                                        |
| `RAG_UNKNOWN_OPERATION`                | 52803 | Unknown RAG operation.                                          | `ragIngest()`, `ragReindex()`                                                                                                                                                                                                  |
| `RAG_HYPERDB_FAILED`                   | 52804 | HyperDB RAG operation failed.                                   | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_MODEL_MISMATCH`         | 52805 | Workspace is configured for a different embeddings model.       | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_NOT_FOUND`              | 52806 | RAG workspace not found.                                        | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_IN_USE`                 | 52807 | RAG workspace is in use and can't be closed/deleted.            | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `RAG_WORKSPACE_CLOSE_FAILED`           | 52808 | Failed to close RAG workspace.                                  | `ragCloseWorkspace()`                                                                                                                                                                                                          |
| `RAG_LIST_WORKSPACES_FAILED`           | 52809 | Failed to list RAG workspaces.                                  | `ragListWorkspaces()`                                                                                                                                                                                                          |
| `RAG_CHUNK_FAILED`                     | 52810 | Failed to chunk input into embeddings.                          | `ragChunk()`, `ragSaveEmbeddings()`                                                                                                                                                                                            |
| `RAG_WORKSPACE_NOT_OPEN`               | 52811 | RAG workspace is not open.                                      | `ragDeleteEmbeddings()`, `ragSaveEmbeddings()`, `ragSearch()`                                                                                                                                                                  |
| `FILE_NOT_FOUND`                       | 53000 | File not found.                                                 | `loadModel()`, `downloadAsset()`                                                                                                                                                                                               |
| `DOWNLOAD_CANCELLED`                   | 53001 | Download cancelled.                                             | `cancel()`, `downloadAsset()`, `loadModel()`                                                                                                                                                                                   |
| `CHECKSUM_VALIDATION_FAILED`           | 53002 | Downloaded file checksum validation failed.                     | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `HTTP_ERROR`                           | 53003 | HTTP request failed with status code.                           | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `NO_RESPONSE_BODY`                     | 53004 | No response body received from HTTP request.                    | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `RESPONSE_BODY_NOT_READABLE`           | 53005 | Response body is not readable.                                  | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `NO_BLOB_FOUND`                        | 53006 | No blob found for the requested resource.                       | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `DOWNLOAD_ASSET_FAILED`                | 53007 | Download failed.                                                | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `SEEDING_NOT_SUPPORTED`                | 53008 | Seeding is only supported for Hyperdrive models.                | `loadModel()`                                                                                                                                                                                                                  |
| `HYPERDRIVE_DOWNLOAD_FAILED`           | 53009 | Hyperdrive download failed.                                     | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `INVALID_SHARD_URL_PATTERN`            | 53010 | Invalid shard URL pattern for sharded downloads.                | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_EXTRACTION_FAILED`            | 53011 | Failed to extract an archive.                                   | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_UNSUPPORTED_TYPE`             | 53012 | Unsupported archive type.                                       | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `ARCHIVE_MISSING_SHARDS`               | 53013 | Archive is missing required shards.                             | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `PARTIAL_DOWNLOAD_OFFLINE`             | 53014 | Partial download exists but offline prevents resuming.          | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `REGISTRY_DOWNLOAD_FAILED`             | 53015 | Registry download failed.                                       | `downloadAsset()`, `loadModel()`                                                                                                                                                                                               |
| `DELETE_CACHE_FAILED`                  | 53200 | Failed to delete cache.                                         | `deleteCache()`                                                                                                                                                                                                                |
| `INVALID_DELETE_CACHE_PARAMS`          | 53201 | Invalid parameters for `deleteCache()`.                         | `deleteCache()`                                                                                                                                                                                                                |
| `CACHE_DIR_NOT_ABSOLUTE`               | 53202 | Cache directory must be an absolute path.                       | `deleteCache()`                                                                                                                                                                                                                |
| `CACHE_DIR_NOT_WRITABLE`               | 53203 | Cache directory is not writable.                                | `deleteCache()`                                                                                                                                                                                                                |
| `SET_CONFIG_FAILED`                    | 53350 | Failed to set config.                                           | Any API call (during SDK initialization)                                                                                                                                                                                       |
| `CONFIG_ALREADY_SET`                   | 53351 | Config is immutable and has already been set.                   | Any API call (during SDK initialization)                                                                                                                                                                                       |
| `FFMPEG_NOT_AVAILABLE`                 | 53500 | Audio decoding required but FFmpeg is not available.            | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `AUDIO_PLAYER_FAILED`                  | 53501 | Audio player failed.                                            | `textToSpeech()`                                                                                                                                                                                                               |
| `INVALID_AUDIO_CHUNK_TYPE`             | 53502 | Invalid audio chunk type.                                       | `transcribe()`, `transcribeStream()`                                                                                                                                                                                           |
| `DELEGATE_NO_FINAL_RESPONSE`           | 53700 | No final response received from delegated provider.             | `completion()`, `loadModel()`                                                                                                                                                                                                  |
| `DELEGATE_CONNECTION_FAILED`           | 53701 | Failed to connect to delegated provider.                        | `completion()`, `loadModel()`                                                                                                                                                                                                  |
| `DELEGATE_PROVIDER_ERROR`              | 53702 | Delegated provider returned an error.                           | `completion()`, `loadModel()`                                                                                                                                                                                                  |
| `RPC_NO_DATA_RECEIVED`                 | 53703 | No data received from request.                                  | Internal server RPC                                                                                                                                                                                                            |
| `RPC_UNKNOWN_REQUEST_TYPE`             | 53704 | Unknown request type received.                                  | Internal server RPC                                                                                                                                                                                                            |
| `LIFECYCLE_SUSPEND_FAILED`             | 53600 | Failed to suspend one or more resources.                        | `suspend()`                                                                                                                                                                                                                    |
| `LIFECYCLE_RESUME_FAILED`              | 53601 | Failed to resume one or more resources.                         | `resume()`                                                                                                                                                                                                                     |
| `PLUGIN_NOT_FOUND`                     | 53850 | Plugin not found for the specified model type.                  | `invokePlugin()`, `invokePluginStream()`, `loadModel()`                                                                                                                                                                        |
| `PLUGIN_HANDLER_NOT_FOUND`             | 53851 | Handler not found in plugin.                                    | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_REQUEST_VALIDATION_FAILED`     | 53852 | Plugin request validation failed.                               | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_RESPONSE_VALIDATION_FAILED`    | 53853 | Plugin response validation failed.                              | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_ALREADY_REGISTERED`            | 53854 | Plugin already registered for model type.                       | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_HANDLER_TYPE_MISMATCH`         | 53855 | Handler type mismatch (streaming vs non-streaming).             | `invokePlugin()`, `invokePluginStream()`                                                                                                                                                                                       |
| `PLUGIN_LOGGING_INVALID`               | 53856 | Plugin has invalid logging configuration.                       | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_DEFINITION_INVALID`            | 53857 | Plugin definition is invalid.                                   | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_MODEL_TYPE_RESERVED`           | 53858 | Model type is reserved for built-in plugins.                    | `definePlugin()`                                                                                                                                                                                                               |
| `PLUGIN_LOAD_CONFIG_VALIDATION_FAILED` | 53859 | Model config validation failed for plugin.                      | `loadModel()`                                                                                                                                                                                                                  |
| `PATH_TRAVERSAL`                       | 53900 | Path traversal detected.                                        | `loadModel()`, `downloadAsset()`                                                                                                                                                                                               |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED`     | 53950 | Model registry query failed.                                    | `modelRegistryGetModel()`, `modelRegistryList()`, `modelRegistrySearch()`                                                                                                                                                      |


# finetune( ) (/v0.9.0/sdk/api/finetune)


```ts
function finetune(params: FinetuneRunParams, rpcOptions?: RPCOptions): FinetuneHandle;
function finetune(params: FinetuneStopParams | FinetuneGetStateParams, rpcOptions?: RPCOptions): Promise<FinetuneResult>;
```

## Parameters

| Name       | Type                                                                                                                                            | Required? | Description                                               |
| ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | :-------: | --------------------------------------------------------- |
| params     | [`FinetuneRunParams`](#finetunerunparams) \| [`FinetuneStopParams`](#finetunestopparams) \| [`FinetuneGetStateParams`](#finetunegetstateparams) |     ✓     | The finetuning parameters — shape determines the overload |
| rpcOptions | [`RPCOptions`](./shared-types#rpcoptions)                                                                                                       |     ✗     | Optional RPC transport options                            |

### `FinetuneRunParams`

Used to start or resume a finetuning job.

| Field     | Type                                  | Required? | Description                                                                  |
| --------- | ------------------------------------- | :-------: | ---------------------------------------------------------------------------- |
| modelId   | `string`                              |     ✓     | The identifier of the loaded model to finetune                               |
| operation | `"start" \| "resume"`                 |     ✗     | Omit to let the add-on choose whether to start fresh or resume automatically |
| options   | [`FinetuneOptions`](#finetuneoptions) |     ✓     | Finetuning configuration                                                     |

### `FinetuneStopParams`

Used to pause or cancel a running job.

| Field     | Type                  | Required? | Description                 |
| --------- | --------------------- | :-------: | --------------------------- |
| modelId   | `string`              |     ✓     | The identifier of the model |
| operation | `"pause" \| "cancel"` |     ✓     | The stop operation          |

### `FinetuneGetStateParams`

Used to inspect the current state of a finetuning job.

| Field     | Type                                  | Required? | Description                 |
| --------- | ------------------------------------- | :-------: | --------------------------- |
| modelId   | `string`                              |     ✓     | The identifier of the model |
| operation | `"getState"`                          |     ✓     | Must be `"getState"`        |
| options   | [`FinetuneOptions`](#finetuneoptions) |     ✓     | Finetuning configuration    |

### `FinetuneOptions`

| Field               | Type                                        | Required? | Description                                           |
| ------------------- | ------------------------------------------- | :-------: | ----------------------------------------------------- |
| trainDatasetDir     | `string`                                    |     ✓     | Directory containing the training dataset             |
| validation          | [`FinetuneValidation`](#finetunevalidation) |     ✓     | Validation configuration                              |
| outputParametersDir | `string`                                    |     ✓     | Directory where output adapter parameters are written |
| numberOfEpochs      | `number`                                    |     ✗     | Number of epochs to run                               |
| learningRate        | `number`                                    |     ✗     | Learning rate override                                |
| contextLength       | `number`                                    |     ✗     | Context length override                               |
| batchSize           | `number`                                    |     ✗     | Batch size override                                   |
| microBatchSize      | `number`                                    |     ✗     | Micro batch size override                             |
| assistantLossOnly   | `boolean`                                   |     ✗     | Compute loss only on assistant tokens                 |
| loraRank            | `number`                                    |     ✗     | LoRA rank override                                    |
| loraAlpha           | `number`                                    |     ✗     | LoRA alpha override                                   |
| loraInitStd         | `number`                                    |     ✗     | LoRA initialization standard deviation                |
| loraSeed            | `number`                                    |     ✗     | LoRA initialization seed                              |
| loraModules         | `string`                                    |     ✗     | Comma-separated LoRA module selection                 |
| checkpointSaveDir   | `string`                                    |     ✗     | Directory for checkpoint snapshots                    |
| checkpointSaveSteps | `number`                                    |     ✗     | Checkpoint save interval (in steps)                   |
| chatTemplatePath    | `string`                                    |     ✗     | Custom chat template path                             |
| lrScheduler         | `"constant" \| "cosine" \| "linear"`        |     ✗     | Learning rate scheduler                               |
| lrMin               | `number`                                    |     ✗     | Minimum learning rate                                 |
| warmupRatio         | `number`                                    |     ✗     | Warmup ratio (0–1)                                    |
| warmupRatioSet      | `boolean`                                   |     ✗     | Enable warmup ratio                                   |
| warmupSteps         | `number`                                    |     ✗     | Warmup step count                                     |
| warmupStepsSet      | `boolean`                                   |     ✗     | Enable explicit warmup steps                          |
| weightDecay         | `number`                                    |     ✗     | Weight decay override                                 |

#### `FinetuneValidation`

Discriminated union on `type`:

| Variant                                | Fields                      | Description                        |
| -------------------------------------- | --------------------------- | ---------------------------------- |
| `{ type: "none" }`                     | —                           | No validation                      |
| `{ type: "split", fraction?: number }` | fraction defaults to `0.05` | Split training data for validation |
| `{ type: "dataset", path: string }`    | —                           | Use a separate validation dataset  |

## Returns

The return type depends on the `operation`:

**Run overload** (operation omitted, `"start"`, or `"resume"`):

`FinetuneHandle` — Object with the following fields:

| Field          | Type                                                        | Description                       |
| -------------- | ----------------------------------------------------------- | --------------------------------- |
| progressStream | `AsyncGenerator<`[`FinetuneProgress`](#finetuneprogress)`>` | Stream of training progress ticks |
| result         | `Promise<`[`FinetuneResult`](#finetuneresult)`>`            | Resolves when the job finishes    |

**Reply overload** (`"pause"`, `"cancel"`, or `"getState"`):

`Promise<FinetuneResult>`

### `FinetuneProgress`

| Field                 | Type             | Description                                                  |
| --------------------- | ---------------- | ------------------------------------------------------------ |
| is\_train             | `boolean`        | Whether this tick is from the training phase (vs validation) |
| loss                  | `number \| null` | Current loss value                                           |
| loss\_uncertainty     | `number \| null` | Loss uncertainty                                             |
| accuracy              | `number \| null` | Current accuracy                                             |
| accuracy\_uncertainty | `number \| null` | Accuracy uncertainty                                         |
| global\_steps         | `number`         | Total steps completed                                        |
| current\_epoch        | `number`         | Current epoch index                                          |
| current\_batch        | `number`         | Current batch index                                          |
| total\_batches        | `number`         | Total batches in the epoch                                   |
| elapsed\_ms           | `number`         | Elapsed time in milliseconds                                 |
| eta\_ms               | `number`         | Estimated time remaining in milliseconds                     |

### `FinetuneResult`

| Field  | Type                                              | Description                                        |
| ------ | ------------------------------------------------- | -------------------------------------------------- |
| type   | `"finetune"`                                      | Response type discriminator                        |
| status | [`FinetuneStatus`](#finetunestatus)               | Current job status                                 |
| stats  | [`FinetuneStats`](#finetunestats) ` \| undefined` | Final training statistics (present when completed) |

### `FinetuneStatus`

`"IDLE" | "RUNNING" | "PAUSED" | "CANCELLED" | "COMPLETED"`

### `FinetuneStats`

| Field                        | Type                          | Description                     |
| ---------------------------- | ----------------------------- | ------------------------------- |
| train\_loss                  | `number \| undefined`         | Final training loss             |
| train\_loss\_uncertainty     | `number \| null \| undefined` | Training loss uncertainty       |
| val\_loss                    | `number \| undefined`         | Final validation loss           |
| val\_loss\_uncertainty       | `number \| null \| undefined` | Validation loss uncertainty     |
| train\_accuracy              | `number \| undefined`         | Final training accuracy         |
| train\_accuracy\_uncertainty | `number \| null \| undefined` | Training accuracy uncertainty   |
| val\_accuracy                | `number \| undefined`         | Final validation accuracy       |
| val\_accuracy\_uncertainty   | `number \| null \| undefined` | Validation accuracy uncertainty |
| learning\_rate               | `number \| undefined`         | Final learning rate             |
| global\_steps                | `number`                      | Total steps completed           |
| epochs\_completed            | `number`                      | Total epochs completed          |

## Throws

| Error                           | When                                                          |
| ------------------------------- | ------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"finetune"`            |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Stream ended without receiving the terminal finetune response |

## Example

```typescript
const handle = finetune({
  modelId,
  options: {
    trainDatasetDir: "./dataset/train",
    validation: { type: "split", fraction: 0.05 },
    outputParametersDir: "./artifacts/lora",
    numberOfEpochs: 2,
  },
});

for await (const progress of handle.progressStream) {
  console.log(progress.global_steps, progress.loss);
}

console.log(await handle.result);

// Pause a running job
const pauseResult = await finetune({ modelId, operation: "pause" });
console.log(pauseResult.status); // "PAUSED"

// Inspect current state
const state = await finetune({
  modelId,
  operation: "getState",
  options: {
    trainDatasetDir: "./dataset/train",
    validation: { type: "none" },
    outputParametersDir: "./artifacts/lora",
  },
});
console.log(state.status);
```


# getLogger( ) (/v0.9.0/sdk/api/getLogger)


```ts
function getLogger(namespace: string, options?: LoggerOptions): Logger;
```

## Parameters

| Name      | Type                              | Required? | Description                                                                                                                    |
| --------- | --------------------------------- | :-------: | ------------------------------------------------------------------------------------------------------------------------------ |
| namespace | `string`                          |     ✓     | Logger namespace (used for identification and filtering)                                                                       |
| options   | [`LoggerOptions`](#loggeroptions) |     ✗     | Optional logger configuration. When omitted, the logger is cached by namespace. When provided, a new logger is always created. |

### `LoggerOptions`

| Field         | Type                                                | Required? | Description                       |
| ------------- | --------------------------------------------------- | :-------: | --------------------------------- |
| level         | `"error" \| "warn" \| "info" \| "debug" \| "trace"` |     ✗     | Log level                         |
| namespace     | `string`                                            |     ✗     | Override namespace                |
| transports    | [`LogTransport[]`](#logtransport)                   |     ✗     | Custom log transports             |
| enableConsole | `boolean`                                           |     ✗     | Whether to output logs to console |

### `LogTransport`

```ts
type LogTransport = (
  level: LogLevel,
  namespace: string,
  message: string,
) => void | Promise<void>;
```

A callback function invoked for each log entry. `LogLevel` is `"error" | "warn" | "info" | "debug" | "trace"`.

## Returns

[`Logger`](#logger) — A logger instance.

### `Logger`

| Method           | Signature                                                | Description                      |
| ---------------- | -------------------------------------------------------- | -------------------------------- |
| error            | `(...args: unknown[]) => void`                           | Log at error level               |
| warn             | `(...args: unknown[]) => void`                           | Log at warn level                |
| info             | `(...args: unknown[]) => void`                           | Log at info level                |
| debug            | `(...args: unknown[]) => void`                           | Log at debug level               |
| trace            | `(...args: unknown[]) => void`                           | Log at trace level               |
| setLevel         | `(level) => void`                                        | Change the log level             |
| getLevel         | `() => LogLevel`                                         | Get the current log level        |
| addTransport     | `(transport:` [`LogTransport`](#logtransport)`) => void` | Add a custom transport           |
| setConsoleOutput | `(enabled: boolean) => void`                             | Enable or disable console output |

## Example

```typescript
import { getLogger } from "@qvac/sdk";

const logger = getLogger("my-app");

logger.info("Application started");
logger.debug("Debug details:", { key: "value" });

logger.setLevel("error");
logger.info("This will not be logged");

const verboseLogger = getLogger("my-app:verbose", {
  level: "debug",
  enableConsole: true,
  transports: [(level, namespace, message) => {
    // Custom transport: write to file, send to server, etc.
  }],
});
```


# getModelByName( ) (/v0.9.0/sdk/api/getModelByName)


```ts
function getModelByName(name: string): RegistryItem | undefined;
```

## Parameters

| Name | Type     | Required? | Description                                               |
| ---- | -------- | :-------: | --------------------------------------------------------- |
| name | `string` |     ✓     | The human-readable model name (e.g., `"Llama 3.2 3B Q4"`) |

## Returns

[`RegistryItem`](#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

### `RegistryItem`

| Field           | Type                                                                                | Description                 |
| --------------- | ----------------------------------------------------------------------------------- | --------------------------- |
| name            | `string`                                                                            | Human-readable model name   |
| registryPath    | `string`                                                                            | Registry path               |
| registrySource  | `string`                                                                            | Registry source             |
| blobCoreKey     | `string`                                                                            | Hyperdrive blob core key    |
| blobBlockOffset | `number`                                                                            | Blob block offset           |
| blobBlockLength | `number`                                                                            | Blob block length           |
| blobByteOffset  | `number`                                                                            | Blob byte offset            |
| modelId         | `string`                                                                            | Unique model identifier     |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type            |
| expectedSize    | `number`                                                                            | Expected file size in bytes |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum            |
| engine          | `string`                                                                            | Inference engine            |
| quantization    | `string`                                                                            | Quantization level          |
| params          | `string`                                                                            | Model parameter count       |

## Example

```typescript
import { getModelByName } from "@qvac/sdk";

const model = getModelByName("Llama 3.2 3B Q4");
if (model) {
  console.log(model.modelId, model.expectedSize);
}
```


# getModelByPath( ) (/v0.9.0/sdk/api/getModelByPath)


```ts
function getModelByPath(registryPath: string): RegistryItem | undefined;
```

## Parameters

| Name         | Type     | Required? | Description                                                                     |
| ------------ | -------- | :-------: | ------------------------------------------------------------------------------- |
| registryPath | `string` |     ✓     | The full registry path of the model (e.g., `"llama-3.2-1b-instruct-q4_0-gguf"`) |

## Returns

[`RegistryItem`](../getModelByName#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

## Example

```typescript
import { getModelByPath } from "@qvac/sdk";

const model = getModelByPath("llama-3.2-1b-instruct-q4_0-gguf");
if (model) {
  console.log(model.name, model.expectedSize);
}
```


# getModelBySrc( ) (/v0.9.0/sdk/api/getModelBySrc)


```ts
function getModelBySrc(modelId: string, blobCoreKey: string): RegistryItem | undefined;
```

## Parameters

| Name        | Type     | Required? | Description                  |
| ----------- | -------- | :-------: | ---------------------------- |
| modelId     | `string` |     ✓     | The unique model identifier  |
| blobCoreKey | `string` |     ✓     | The Hyperdrive blob core key |

## Returns

[`RegistryItem`](../getModelByName#registryitem) ` | undefined` — The matching model constant, or `undefined` if not found.

## Example

```typescript
import { getModelBySrc } from "@qvac/sdk";

const model = getModelBySrc("model-abc123", "blob-core-key-hex");
if (model) {
  console.log(model.name, model.expectedSize);
}
```


# getModelInfo( ) (/v0.9.0/sdk/api/getModelInfo)


```ts
function getModelInfo(params: GetModelInfoParams): Promise<ModelInfo>;
```

## Parameters

| Name        | Type     | Required? | Description               |
| ----------- | -------- | :-------: | ------------------------- |
| params      | `object` |     ✓     | The query parameters      |
| params.name | `string` |     ✓     | The model name to look up |

## Returns

`Promise<`[`ModelInfo`](#modelinfo)`>` — Detailed information about the model.

### `ModelInfo`

| Field           | Type                                                                                | Description                                     |
| --------------- | ----------------------------------------------------------------------------------- | ----------------------------------------------- |
| name            | `string`                                                                            | Model name                                      |
| modelId         | `string`                                                                            | Unique model identifier                         |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type                                |
| expectedSize    | `number`                                                                            | Expected total file size in bytes               |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum                                |
| isCached        | `boolean`                                                                           | Whether the model is fully cached locally       |
| isLoaded        | `boolean`                                                                           | Whether the model is currently loaded in memory |
| cacheFiles      | [`CacheFileInfo[]`](#cachefileinfo)                                                 | Individual cache file details                   |
| registryPath    | `string`                                                                            | Registry path (optional)                        |
| registrySource  | `string`                                                                            | Registry source (optional)                      |
| engine          | `string`                                                                            | Inference engine (optional)                     |
| quantization    | `string`                                                                            | Quantization level (optional)                   |
| params          | `string`                                                                            | Model parameter count (optional)                |
| actualSize      | `number`                                                                            | Actual cached size in bytes (optional)          |
| cachedAt        | `Date`                                                                              | When the model was cached (optional)            |
| loadedInstances | [`LoadedInstance[]`](#loadedinstance)                                               | Currently loaded instances (optional)           |

### `CacheFileInfo`

| Field          | Type      | Description                          |
| -------------- | --------- | ------------------------------------ |
| filename       | `string`  | File name                            |
| path           | `string`  | Full file path                       |
| expectedSize   | `number`  | Expected size in bytes               |
| sha256Checksum | `string`  | SHA-256 checksum                     |
| isCached       | `boolean` | Whether this file is cached          |
| actualSize     | `number`  | Actual file size (optional)          |
| cachedAt       | `Date`    | When this file was cached (optional) |

### `LoadedInstance`

| Field      | Type      | Description                                      |
| ---------- | --------- | ------------------------------------------------ |
| registryId | `string`  | Registry identifier for this loaded instance     |
| loadedAt   | `Date`    | When the model was loaded                        |
| config     | `unknown` | Model configuration used at load time (optional) |

## Throws

| Error                   | When                                                   |
| ----------------------- | ------------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"getModelInfo"` |

## Example

```typescript
import { getModelInfo } from "@qvac/sdk";

const info = await getModelInfo({ name: "Llama 3.2 3B Q4" });
console.log(`Cached: ${info.isCached}, Loaded: ${info.isLoaded}`);
console.log(`Size: ${info.expectedSize} bytes`);
```


# heartbeat( ) (/v0.9.0/sdk/api/heartbeat)


```ts
function heartbeat(params?: { delegate?: DelegateBase }): Promise<HeartbeatResponse>;
```

Sends a heartbeat round-trip to verify that a delegated provider is online or that the local SDK worker is responsive. When called without arguments, checks the local worker.

## Parameters

| Name            | Type                            | Required? | Description                                            |
| --------------- | ------------------------------- | :-------: | ------------------------------------------------------ |
| params          | `object`                        |     ✗     | Optional delegation target                             |
| params.delegate | [`DelegateBase`](#delegatebase) |     ✗     | The provider to check — omit to check the local worker |

### `DelegateBase`

| Field              | Type     | Required? | Description                                    |
| ------------------ | -------- | :-------: | ---------------------------------------------- |
| topic              | `string` |     ✓     | Hyperswarm topic hex string                    |
| providerPublicKey  | `string` |     ✓     | Provider peer public key hex string            |
| timeout            | `number` |     ✗     | Connection timeout in milliseconds (min 100)   |
| healthCheckTimeout | `number` |     ✗     | Health check timeout in milliseconds (min 100) |

## Returns

`Promise<HeartbeatResponse>`

### `HeartbeatResponse`

| Field  | Type          | Description                 |
| ------ | ------------- | --------------------------- |
| type   | `"heartbeat"` | Response type discriminator |
| number | `number`      | Round-trip sequence number  |

## Throws

| Error                   | When                                                |
| ----------------------- | --------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"heartbeat"` |

## Examples

```typescript
// Check if a delegated provider is online
try {
  await heartbeat({
    delegate: { topic: "topicHex", providerPublicKey: "peerHex", timeout: 3000 },
  });
  console.log("Provider is online");
} catch {
  console.log("Provider is offline");
}

// Check if the local SDK worker is responsive
await heartbeat();
```


# @qvac/sdk (/v0.9.0/sdk/api)


## Overview

`@qvac/sdk` npm package exposes a function-centric, typed JS API.

## Functions

| Function                                             | Summary                                                                                           |
| ---------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| [`cancel()`](./cancel)                               | Cancels an ongoing operation.                                                                     |
| [`close()`](./close)                                 | Closes the SDK client connection and releases all associated resources.                           |
| [`completion()`](./completion)                       | Generates completion from a language model based on conversation history.                         |
| [`defineDuplexHandler()`](./defineDuplexHandler)     | Helper function to define a duplex (bidirectional streaming) handler with full type inference.    |
| [`defineHandler()`](./defineHandler)                 | Helper function to define a handler with full type inference.                                     |
| [`definePlugin()`](./definePlugin)                   | Helper function to define a plugin with full type inference.                                      |
| [`deleteCache()`](./deleteCache)                     | Deletes KV cache files.                                                                           |
| [`diffusion()`](./diffusion)                         | Generates images using a loaded diffusion model.                                                  |
| [`downloadAsset()`](./downloadAsset)                 | Downloads an asset (model file) without loading it into memory.                                   |
| [`embed()`](./embed)                                 | Generates embeddings for a single text using a specified model.                                   |
| [`finetune()`](./finetune)                           | Starts, resumes, inspects, pauses, or cancels a finetuning job.                                   |
| [`getLogger()`](./getLogger)                         | Creates or retrieves a namespaced logger instance.                                                |
| [`getModelByName()`](./getModelByName)               | Looks up a model in the built-in catalog by its constant name.                                    |
| [`getModelByPath()`](./getModelByPath)               | Looks up a model in the built-in catalog by its registry path.                                    |
| [`getModelBySrc()`](./getModelBySrc)                 | Looks up a model in the built-in catalog by model file ID and blob core key.                      |
| [`getModelInfo()`](./getModelInfo)                   | Returns status information for a catalog model, including cache state and loaded instances.       |
| [`heartbeat()`](./heartbeat)                         | Checks if a delegated provider or the local SDK worker is responsive.                             |
| [`invokePlugin()`](./invokePlugin)                   | Invoke a non-streaming plugin handler.                                                            |
| [`invokePluginStream()`](./invokePluginStream)       | Invoke a streaming plugin handler.                                                                |
| [`loadModel()`](./loadModel)                         | Loads a machine learning model from a local path, remote URL, or Hyperdrive key.                  |
| [`loggingStream()`](./loggingStream)                 | Opens a logging stream to receive real-time logs.                                                 |
| [`modelRegistryGetModel()`](./modelRegistryGetModel) | Fetches a single model entry from the registry by its path and source.                            |
| [`modelRegistryList()`](./modelRegistryList)         | Returns all available models from the QVAC distributed model registry.                            |
| [`modelRegistrySearch()`](./modelRegistrySearch)     | Searches the model registry with optional filters for model type, engine, and quantization.       |
| [`ocr()`](./ocr)                                     | Performs Optical Character Recognition (OCR) on an image to extract text.                         |
| [`ragChunk()`](./ragChunk)                           | Chunks documents into smaller pieces for embedding.                                               |
| [`ragCloseWorkspace()`](./ragCloseWorkspace)         | Closes a RAG workspace, releasing in-memory resources (Corestore, HyperDB adapter, RAG instance). |
| [`ragDeleteEmbeddings()`](./ragDeleteEmbeddings)     | Deletes document embeddings from the RAG vector database.                                         |
| [`ragDeleteWorkspace()`](./ragDeleteWorkspace)       | Deletes a RAG workspace and all its data.                                                         |
| [`ragIngest()`](./ragIngest)                         | Ingests documents into the RAG vector database.                                                   |
| [`ragListWorkspaces()`](./ragListWorkspaces)         | Lists all RAG workspaces with their open status.                                                  |
| [`ragReindex()`](./ragReindex)                       | Reindexes the RAG database to optimize search performance.                                        |
| [`ragSaveEmbeddings()`](./ragSaveEmbeddings)         | Saves pre-embedded documents to the RAG vector database.                                          |
| [`ragSearch()`](./ragSearch)                         | Searches for similar documents in the RAG vector database.                                        |
| [`resume()`](./resume)                               | Resumes all suspended Hyperswarm and Corestore resources.                                         |
| [`startQVACProvider()`](./startQVACProvider)         | Starts a provider service that offers QVAC capabilities to remote peers.                          |
| [`stopQVACProvider()`](./stopQVACProvider)           | Stops a running provider service and leaves the specified topic.                                  |
| [`suspend()`](./suspend)                             | Suspends all active Hyperswarm and Corestore resources.                                           |
| [`textToSpeech()`](./textToSpeech)                   | Converts text to speech audio using a loaded TTS model.                                           |
| [`transcribe()`](./transcribe)                       | Collects all streaming results into a single string response.                                     |
| [`transcribeStream()`](./transcribeStream)           | Streams audio transcription results in real-time, yielding text chunks as they become available.  |
| [`translate()`](./translate)                         | Translates text from one language to another using a specified translation model.                 |
| [`unloadModel()`](./unloadModel)                     | Unloads a previously loaded model from the server.                                                |

## Object

| Object                   | Summary                                                                       |
| ------------------------ | ----------------------------------------------------------------------------- |
| [`profiler`](./profiler) | Singleton object that collects and exports profiling data for SDK operations. |

## Shared types

<Card href="./shared-types" title="Shared types">
  Common type definitions used across SDK functions.
</Card>

## Errors

<Card href="./errors" title="Error codes">
  All errors thrown by the SDK and how to handle them via `SDK_CLIENT_ERROR_CODES` and `SDK_SERVER_ERROR_CODES`.
</Card>


# invokePlugin( ) (/v0.9.0/sdk/api/invokePlugin)


```ts
function invokePlugin<TResponse = unknown, TParams = unknown>(
  options: InvokePluginOptions<TParams>,
  rpcOptions?: RPCOptions
): Promise<TResponse>;
```

## Parameters

| Name       | Type                                          | Required? | Description                    |
| ---------- | --------------------------------------------- | :-------: | ------------------------------ |
| options    | [`InvokePluginOptions`](#invokepluginoptions) |     ✓     | The invocation options         |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)           |     ✗     | Optional RPC transport options |

### `InvokePluginOptions`

| Field   | Type      | Required? | Description                                                                         |
| ------- | --------- | :-------: | ----------------------------------------------------------------------------------- |
| modelId | `string`  |     ✓     | The model ID of the loaded plugin model                                             |
| handler | `string`  |     ✓     | The handler name to invoke (as defined in the plugin)                               |
| params  | `TParams` |     ✓     | Parameters to pass to the handler (validated against the handler's `requestSchema`) |

## Returns

`Promise<TResponse>` — The handler's response (validated against the handler's `responseSchema`).

## Throws

| Error                   | When                                                   |
| ----------------------- | ------------------------------------------------------ |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"pluginInvoke"` |

## Example

```typescript
const result = await invokePlugin<{ answer: string }>({
  modelId: "my-custom-model",
  handler: "summarize",
  params: { text: "Long document..." },
});
console.log(result.answer);
```


# invokePluginStream( ) (/v0.9.0/sdk/api/invokePluginStream)


```ts
function invokePluginStream<TResponse = unknown, TParams = unknown>(
  options: InvokePluginOptions<TParams>,
  rpcOptions?: RPCOptions
): AsyncGenerator<TResponse>;
```

## Parameters

| Name       | Type                                                         | Required? | Description                                                          |
| ---------- | ------------------------------------------------------------ | :-------: | -------------------------------------------------------------------- |
| options    | [`InvokePluginOptions`](../invokePlugin#invokepluginoptions) |     ✓     | The invocation options (same as [`invokePlugin()`](../invokePlugin)) |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)                          |     ✗     | Optional RPC transport options                                       |

## Returns

`AsyncGenerator<TResponse>` — Yields streamed chunks from the handler until `done` is received.

## Throws

| Error                   | When                                                          |
| ----------------------- | ------------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | A chunk's type does not match expected `"pluginInvokeStream"` |

## Example

```typescript
const stream = invokePluginStream<{ token: string }>({
  modelId: "my-custom-model",
  handler: "generateStream",
  params: { prompt: "Tell me a story" },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.token);
}
```


# loadModel( ) (/v0.9.0/sdk/api/loadModel)


```ts
// Load new model
function loadModel(options: LoadModelOptions, rpcOptions?: RPCOptions): Promise<string>;

// Hot-reload config on an already-loaded model
function loadModel(options: ReloadConfigOptions, rpcOptions?: RPCOptions): Promise<string>;
```

Supports multiple model types: LLM, Whisper (speech recognition), Parakeet (NVIDIA NeMo transcription), embeddings, NMT (translation), TTS, and OCR. Handles local file paths, HTTP/HTTPS URLs, Hyperdrive URLs (`pear://`), and registry URLs.

When `onProgress` is provided, streaming is used for real-time download progress. Otherwise, a simple request-response pattern is used.

## Parameters

| Name       | Type                                                                                       | Required? | Description                                        |
| ---------- | ------------------------------------------------------------------------------------------ | :-------: | -------------------------------------------------- |
| options    | [`LoadModelOptions`](#loadmodeloptions) `\|` [`ReloadConfigOptions`](#reloadconfigoptions) |     ✓     | Configuration for loading or hot-reloading a model |
| rpcOptions | [`RPCOptions`](../index#rpcoptions)                                                        |     ✗     | Optional RPC transport options                     |

### `LoadModelOptions`

Common fields present in all variants:

| Field       | Type                                      | Required? | Default | Description                                                                                                |
| ----------- | ----------------------------------------- | :-------: | ------- | ---------------------------------------------------------------------------------------------------------- |
| modelSrc    | `string \| ModelDescriptor`               |     ✓     | —       | Model source — local path, HTTP(S) URL, Hyperdrive `pear://` URL, registry URL, or a model constant object |
| modelType   | `string`                                  |     ✓     | —       | The type of model — see [model type variants](#model-type-variants)                                        |
| modelConfig | `object`                                  |     ✗     | `{}`    | Model-specific configuration (varies by `modelType`)                                                       |
| seed        | `boolean`                                 |     ✗     | `false` | Whether to seed the model on Hyperdrive after download                                                     |
| delegate    | [`Delegate`](#delegate)                   |     ✗     | —       | Delegation configuration for remote inference                                                              |
| onProgress  | `(progress: ModelProgressUpdate) => void` |     ✗     | —       | Callback for real-time download progress                                                                   |
| logger      | `Logger`                                  |     ✗     | —       | Logger instance — model operation logs are forwarded to this logger                                        |

### `Delegate`

Optional delegation configuration for remote (P2P) inference:

| Field              | Type      | Required? | Default | Description                                                |
| ------------------ | --------- | :-------: | ------- | ---------------------------------------------------------- |
| topic              | `string`  |     ✓     | —       | P2P topic for delegation                                   |
| providerPublicKey  | `string`  |     ✓     | —       | Provider's public key                                      |
| timeout            | `number`  |     ✗     | —       | Timeout in milliseconds (min 100)                          |
| fallbackToLocal    | `boolean` |     ✗     | `false` | Whether to fallback to local inference if delegation fails |
| forceNewConnection | `boolean` |     ✗     | `false` | Force a new connection to the provider                     |

### `ReloadConfigOptions`

Hot-reload configuration on an already-loaded model without reloading the model weights. Currently supported for Whisper models only.

| Field       | Type     | Required? | Description                                      |
| ----------- | -------- | :-------: | ------------------------------------------------ |
| modelId     | `string` |     ✓     | The ID of an existing loaded model (16-char hex) |
| modelType   | `string` |     ✓     | The type of model (must match the loaded model)  |
| modelConfig | `object` |     ✓     | New configuration to apply                       |

### Model type variants

The `modelType` field determines which variant of `modelConfig` is accepted.

#### `"llm"`

All LLM-specific fields live inside `modelConfig`. See [LLM `modelConfig`](#llm-modelconfig) for the full reference.

#### `"whisper"`

All Whisper-specific fields live inside `modelConfig`. See [Whisper `modelConfig`](#whisper-modelconfig) for the full reference.

#### `"parakeet"`

NVIDIA NeMo Parakeet models for speech recognition. `modelConfig` is **required**.

See [Parakeet `modelConfig`](#parakeet-modelconfig) for the full reference.

#### `"embeddings"`

All embeddings fields live inside `modelConfig`. See [Embeddings `modelConfig`](#embeddings-modelconfig) for the full reference.

#### `"nmt"`

`modelConfig` is **required** and is a discriminated union on `engine`. See [NMT `modelConfig`](#nmt-modelconfig) for the full reference.

#### `"tts"`

`modelConfig` is **required** and is a discriminated union on `ttsEngine`:

**Chatterbox engine** (`ttsEngine: "chatterbox"`):

| Field                    | Type                           | Required? | Description                          |
| ------------------------ | ------------------------------ | :-------: | ------------------------------------ |
| ttsEngine                | `"chatterbox"`                 |     ✓     | Engine discriminator                 |
| language                 | `"en" \| "es" \| "de" \| "it"` |     ✓     | Output language                      |
| ttsTokenizerSrc          | `string \| ModelDescriptor`    |     ✓     | Tokenizer model source               |
| ttsSpeechEncoderSrc      | `string \| ModelDescriptor`    |     ✓     | Speech encoder model source          |
| ttsEmbedTokensSrc        | `string \| ModelDescriptor`    |     ✓     | Embed tokens model source            |
| ttsConditionalDecoderSrc | `string \| ModelDescriptor`    |     ✓     | Conditional decoder model source     |
| ttsLanguageModelSrc      | `string \| ModelDescriptor`    |     ✓     | Language model source                |
| referenceAudioSrc        | `string \| ModelDescriptor`    |     ✓     | Reference WAV file for voice cloning |

**Supertonic engine** (`ttsEngine: "supertonic"`):

| Field                | Type                           | Required? | Description                  |
| -------------------- | ------------------------------ | :-------: | ---------------------------- |
| ttsEngine            | `"supertonic"`                 |     ✓     | Engine discriminator         |
| language             | `"en" \| "es" \| "de" \| "it"` |     ✓     | Output language              |
| ttsTokenizerSrc      | `string \| ModelDescriptor`    |     ✓     | Tokenizer model source       |
| ttsTextEncoderSrc    | `string \| ModelDescriptor`    |     ✓     | Text encoder model source    |
| ttsLatentDenoiserSrc | `string \| ModelDescriptor`    |     ✓     | Latent denoiser model source |
| ttsVoiceDecoderSrc   | `string \| ModelDescriptor`    |     ✓     | Voice decoder model source   |
| ttsVoiceSrc          | `string \| ModelDescriptor`    |     ✓     | Voice `.bin` file source     |
| ttsSpeed             | `number`                       |     ✗     | Speech speed multiplier      |
| ttsNumInferenceSteps | `number`                       |     ✗     | Number of inference steps    |

#### `"ocr"`

All OCR-specific fields live inside `modelConfig`. See [OCR `modelConfig`](#ocr-modelconfig) for the full reference.

### Custom plugin

Any `modelType` string that is not a built-in type. `modelConfig` accepts `Record<string, unknown>`.

### `modelConfig` reference

#### LLM `modelConfig`

| Field              | Type                        | Default                          | Description                                                                 |
| ------------------ | --------------------------- | -------------------------------- | --------------------------------------------------------------------------- |
| ctx\_size          | `number`                    | `1024`                           | Context window size                                                         |
| device             | `string`                    | `"gpu"`                          | Device to use                                                               |
| gpu\_layers        | `number`                    | `99`                             | Number of layers offloaded to GPU                                           |
| system\_prompt     | `string`                    | `"You are a helpful assistant."` | System prompt                                                               |
| temp               | `number`                    | —                                | Temperature (0–2)                                                           |
| top\_p             | `number`                    | —                                | Top-p sampling (0–1)                                                        |
| top\_k             | `number`                    | —                                | Top-k sampling (0–128)                                                      |
| seed               | `number`                    | —                                | Random seed                                                                 |
| predict            | `number`                    | —                                | Max tokens to predict. `-1` = until stop token, `-2` = until context filled |
| lora               | `string`                    | —                                | LoRA adapter path                                                           |
| no\_mmap           | `boolean`                   | —                                | Disable memory-mapped I/O                                                   |
| verbosity          | `0 \| 1 \| 2 \| 3`          | —                                | Engine verbosity — use exported `VERBOSITY` constant                        |
| presence\_penalty  | `number`                    | —                                | Presence penalty                                                            |
| frequency\_penalty | `number`                    | —                                | Frequency penalty                                                           |
| repeat\_penalty    | `number`                    | —                                | Repeat penalty                                                              |
| stop\_sequences    | `string[]`                  | —                                | Custom stop sequences                                                       |
| n\_discarded       | `number`                    | —                                | Number of discarded tokens                                                  |
| tools              | `boolean`                   | —                                | Enable tool calling support                                                 |
| projectionModelSrc | `string \| ModelDescriptor` | —                                | Projection model source for multimodal models                               |

#### Whisper `modelConfig`

Common fields:

| Field            | Type                        | Description                                                                                                                                    |
| ---------------- | --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| language         | `string`                    | Language code (e.g., `"en"`)                                                                                                                   |
| translate        | `boolean`                   | Whether to translate to English                                                                                                                |
| strategy         | `"greedy" \| "beam_search"` | Sampling strategy                                                                                                                              |
| temperature      | `number`                    | Temperature                                                                                                                                    |
| initial\_prompt  | `string`                    | Initial prompt for the decoder                                                                                                                 |
| detect\_language | `boolean`                   | Auto-detect language                                                                                                                           |
| vad\_params      | `object`                    | VAD parameters — `{ threshold?, min_speech_duration_ms?, min_silence_duration_ms?, max_speech_duration_s?, speech_pad_ms?, samples_overlap? }` |
| audio\_format    | `"f32le" \| "s16le"`        | Audio format                                                                                                                                   |
| contextParams    | `object`                    | Context parameters — `{ model?, use_gpu?, flash_attn?, gpu_device? }`                                                                          |
| miscConfig       | `object`                    | Miscellaneous config — `{ caption_enabled? }`                                                                                                  |
| vadModelSrc      | `string \| ModelDescriptor` | VAD model source for voice activity detection                                                                                                  |

Additional fields: `n_threads`, `n_max_text_ctx`, `offset_ms`, `duration_ms`, `audio_ctx`, `no_context`, `no_timestamps`, `single_segment`, `print_special`, `print_progress`, `print_realtime`, `print_timestamps`, `token_timestamps`, `thold_pt`, `thold_ptsum`, `max_len`, `split_on_word`, `max_tokens`, `debug_mode`, `tdrz_enable`, `suppress_regex`, `suppress_blank`, `suppress_nst`, `length_penalty`, `temperature_inc`, `entropy_thold`, `logprob_thold`, `greedy_best_of`, `beam_search_beam_size`. All optional. See `whisperConfigSchema` in the source for details.

#### Parakeet `modelConfig`

`modelConfig` is **required**. Parakeet models support three variants via `modelType`: `"tdt"` (default), `"ctc"`, and `"sortformer"`.

**Runtime config:**

| Field             | Type                             | Default | Description                 |
| ----------------- | -------------------------------- | ------- | --------------------------- |
| modelType         | `"tdt" \| "ctc" \| "sortformer"` | `"tdt"` | Parakeet model variant      |
| maxThreads        | `number`                         | —       | Maximum inference threads   |
| useGPU            | `boolean`                        | —       | Use GPU acceleration        |
| sampleRate        | `number`                         | —       | Audio sample rate           |
| channels          | `number`                         | —       | Audio channels              |
| captionEnabled    | `boolean`                        | —       | Enable caption mode         |
| timestampsEnabled | `boolean`                        | —       | Enable timestamps in output |

**Model sources (all `string \| ModelDescriptor`, all optional):**

| Field                   | Description              |
| ----------------------- | ------------------------ |
| parakeetEncoderSrc      | TDT encoder model source |
| parakeetEncoderDataSrc  | TDT encoder data source  |
| parakeetDecoderSrc      | TDT decoder model source |
| parakeetVocabSrc        | TDT vocabulary source    |
| parakeetPreprocessorSrc | TDT preprocessor source  |
| parakeetCtcModelSrc     | CTC model source         |
| parakeetCtcModelDataSrc | CTC model data source    |
| parakeetTokenizerSrc    | CTC tokenizer source     |
| parakeetSortformerSrc   | Sortformer model source  |

#### Embeddings `modelConfig`

| Field          | Type                                            | Default | Description                                          |
| -------------- | ----------------------------------------------- | ------- | ---------------------------------------------------- |
| gpuLayers      | `number`                                        | `99`    | Number of layers offloaded to GPU                    |
| device         | `"gpu" \| "cpu"`                                | `"gpu"` | Device to use                                        |
| batchSize      | `number`                                        | `1024`  | Embedding batch size                                 |
| pooling        | `"none" \| "mean" \| "cls" \| "last" \| "rank"` | —       | Pooling strategy                                     |
| attention      | `"causal" \| "non-causal"`                      | —       | Attention type                                       |
| embdNormalize  | `number`                                        | —       | Embedding normalization (integer)                    |
| flashAttention | `"on" \| "off" \| "auto"`                       | —       | Flash attention toggle                               |
| mainGpu        | `number \| "integrated" \| "dedicated"`         | —       | GPU device selection                                 |
| verbosity      | `0 \| 1 \| 2 \| 3`                              | —       | Engine verbosity — use exported `VERBOSITY` constant |

#### NMT `modelConfig`

Discriminated union on `engine`. Common generation parameters (all optional):

| Field             | Type     | Default  | Description           |
| ----------------- | -------- | -------- | --------------------- |
| mode              | `"full"` | `"full"` | Translation mode      |
| beamsize          | `number` | `4`      | Beam size             |
| lengthpenalty     | `number` | `1.0`    | Length penalty        |
| maxlength         | `number` | `512`    | Max output length     |
| repetitionpenalty | `number` | `1.0`    | Repetition penalty    |
| norepeatngramsize | `number` | `0`      | No-repeat n-gram size |
| temperature       | `number` | `0.3`    | Temperature           |
| topk              | `number` | `0`      | Top-k sampling        |
| topp              | `number` | `1.0`    | Top-p sampling        |

Engine-specific:

* **Opus**: *Deprecated in v1.0.0. Use Bergamot for European language pairs.*
* **Bergamot**: `from`/`to` accept 24 languages (en, ar, bg, ca, cs, de, es, et, fi, fr, hu, is, it, ja, ko, lt, lv, nl, pl, pt, ru, sk, sl, uk, zh). Additional fields: `srcVocabSrc`, `dstVocabSrc`, `normalize`, `pivotModel`
* **IndicTrans**: `from`/`to` accept 26 Indic language codes (e.g., `"eng_Latn"`, `"hin_Deva"`)

**Bergamot `pivotModel`** (optional) — for translation via an intermediate language:

| Field       | Type                        | Required? | Description                 |
| ----------- | --------------------------- | :-------: | --------------------------- |
| modelSrc    | `string \| ModelDescriptor` |     ✓     | Pivot model source          |
| srcVocabSrc | `string \| ModelDescriptor` |     ✗     | Source vocabulary file      |
| dstVocabSrc | `string \| ModelDescriptor` |     ✗     | Destination vocabulary file |
| normalize   | `number`                    |     ✗     | Normalization factor        |

Plus all common generation parameters above.

#### OCR `modelConfig`

| Field                  | Type                        | Description                            |
| ---------------------- | --------------------------- | -------------------------------------- |
| langList               | `string[]`                  | Languages to detect                    |
| useGPU                 | `boolean`                   | Use GPU acceleration                   |
| timeout                | `number`                    | Timeout in milliseconds                |
| pipelineMode           | `"easyocr" \| "doctr"`      | OCR pipeline mode                      |
| magRatio               | `number`                    | Magnification ratio for detection      |
| defaultRotationAngles  | `number[]`                  | Rotation angles to try                 |
| contrastRetry          | `boolean`                   | Retry with contrast adjustment         |
| lowConfidenceThreshold | `number`                    | Threshold for low-confidence filtering |
| recognizerBatchSize    | `number`                    | Batch size for recognizer              |
| decodingMethod         | `"ctc" \| "attention"`      | Decoding method                        |
| straightenPages        | `boolean`                   | Straighten pages before recognition    |
| detectorModelSrc       | `string \| ModelDescriptor` | Detector model source                  |

### `ModelProgressUpdate`

| Field                       | Type              | Description                                            |
| --------------------------- | ----------------- | ------------------------------------------------------ |
| type                        | `"modelProgress"` | Event type                                             |
| downloaded                  | `number`          | Bytes downloaded so far                                |
| total                       | `number`          | Total bytes expected                                   |
| percentage                  | `number`          | Download percentage                                    |
| downloadKey                 | `string`          | Unique download key (use with [`cancel()`](../cancel)) |
| shardInfo                   | `object`          | Shard progress (optional, for sharded models)          |
| shardInfo.currentShard      | `number`          | Current shard index                                    |
| shardInfo.totalShards       | `number`          | Total number of shards                                 |
| shardInfo.shardName         | `string`          | Current shard file name                                |
| shardInfo.overallDownloaded | `number`          | Total bytes downloaded across all shards               |
| shardInfo.overallTotal      | `number`          | Total bytes across all shards                          |
| shardInfo.overallPercentage | `number`          | Overall percentage across all shards                   |
| onnxInfo                    | `object`          | ONNX multi-file progress (optional, for ONNX models)   |
| onnxInfo.currentFile        | `string`          | Current file being downloaded                          |
| onnxInfo.fileIndex          | `number`          | Current file index                                     |
| onnxInfo.totalFiles         | `number`          | Total number of files                                  |
| onnxInfo.overallDownloaded  | `number`          | Total bytes downloaded across all files                |
| onnxInfo.overallTotal       | `number`          | Total bytes across all files                           |
| onnxInfo.overallPercentage  | `number`          | Overall percentage across all files                    |

## Returns

`Promise<string>` — Resolves to the model ID (used to reference the model in subsequent API calls).

## Throws

| Error                           | When                                                              |
| ------------------------------- | ----------------------------------------------------------------- |
| `MODEL_LOAD_FAILED`             | Model loading fails                                               |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends without a final response (when using `onProgress`) |
| `INVALID_RESPONSE_TYPE`         | Response type does not match expected `"loadModel"`               |

## Example

```typescript
// Local file path
const modelId = await loadModel({
  modelSrc: "/home/user/models/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Remote URL with progress tracking
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../model.gguf",
  modelType: "llm",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});

// Hyperdrive URL
const modelId = await loadModel({
  modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Multimodal model with projection
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../main-model.gguf",
  modelType: "llm",
  modelConfig: {
    ctx_size: 512,
    projectionModelSrc: "https://huggingface.co/.../projection-model.gguf"
  }
});

// Whisper with VAD model
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../whisper-model.gguf",
  modelType: "whisper",
  modelConfig: {
    language: "en",
    strategy: "greedy",
    vadModelSrc: "https://huggingface.co/.../vad-model.bin"
  }
});

// With logger forwarding
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
  logger
});
```


# loggingStream( ) (/v0.9.0/sdk/api/loggingStream)


```ts
function loggingStream(params: LoggingParams): AsyncGenerator<LoggingStreamResponse>;
```

## Parameters

| Name      | Type     | Required? | Description                                                                                                                   |
| --------- | -------- | :-------: | ----------------------------------------------------------------------------------------------------------------------------- |
| params    | `object` |     ✓     | The logging stream parameters                                                                                                 |
| params.id | `string` |     ✓     | The identifier to stream logs for. Pass a model ID for model logs, or the exported constant `SDK_LOG_ID` for SDK server logs. |

## Returns

`AsyncGenerator<`[`LoggingStreamResponse`](#loggingstreamresponse)`>` — Yields log messages in real time.

### `LoggingStreamResponse`

| Field     | Type                                     | Description               |
| --------- | ---------------------------------------- | ------------------------- |
| type      | `"loggingStream"`                        | Response type             |
| id        | `string`                                 | Identifier being streamed |
| level     | `"error" \| "warn" \| "info" \| "debug"` | Log level                 |
| namespace | `string`                                 | Logger namespace          |
| message   | `string`                                 | Log message               |
| timestamp | `number`                                 | Unix timestamp            |

## Throws

| Error                   | When                                                     |
| ----------------------- | -------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | A chunk's type does not match expected `"loggingStream"` |

## Example

```typescript
import { loggingStream, SDK_LOG_ID } from "@qvac/sdk";

// Stream logs from a loaded model
const logStream = loggingStream({ id: "my-model-id" });

for await (const logMessage of logStream) {
  console.log(`[${logMessage.level}] ${logMessage.namespace}: ${logMessage.message}`);
}

// Or stream SDK server logs
const sdkLogs = loggingStream({ id: SDK_LOG_ID });
```


# modelRegistryGetModel( ) (/v0.9.0/sdk/api/modelRegistryGetModel)


```ts
function modelRegistryGetModel(registryPath: string, registrySource: string): Promise<ModelRegistryEntry>;
```

## Parameters

| Name           | Type     | Required? | Description                    |
| -------------- | -------- | :-------: | ------------------------------ |
| registryPath   | `string` |     ✓     | The registry path of the model |
| registrySource | `string` |     ✓     | The registry source identifier |

## Returns

`Promise<`[`ModelRegistryEntry`](#modelregistryentry)`>` — The matching model entry.

### `ModelRegistryEntry`

| Field           | Type                                                                                | Description                 |
| --------------- | ----------------------------------------------------------------------------------- | --------------------------- |
| name            | `string`                                                                            | Human-readable model name   |
| registryPath    | `string`                                                                            | Registry path               |
| registrySource  | `string`                                                                            | Registry source             |
| blobCoreKey     | `string`                                                                            | Hyperdrive blob core key    |
| blobBlockOffset | `number`                                                                            | Blob block offset           |
| blobBlockLength | `number`                                                                            | Blob block length           |
| blobByteOffset  | `number`                                                                            | Blob byte offset            |
| modelId         | `string`                                                                            | Unique model identifier     |
| addon           | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` | Model addon type            |
| expectedSize    | `number`                                                                            | Expected file size in bytes |
| sha256Checksum  | `string`                                                                            | SHA-256 checksum            |
| engine          | `string`                                                                            | Inference engine            |
| quantization    | `string`                                                                            | Quantization level          |
| params          | `string`                                                                            | Parameter count             |

## Throws

| Error                              | When                                           |
| ---------------------------------- | ---------------------------------------------- |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails or model is not found |

## Example

```typescript
const model = await modelRegistryGetModel("llama-3.2-3b-q4", "qvac");
console.log(model.name, model.expectedSize);
```


# modelRegistryList( ) (/v0.9.0/sdk/api/modelRegistryList)


```ts
function modelRegistryList(): Promise<ModelRegistryEntry[]>;
```

## Returns

`Promise<`[`ModelRegistryEntry[]`](../modelRegistryGetModel#modelregistryentry)`>` — Array of all models in the registry.

## Throws

| Error                              | When                     |
| ---------------------------------- | ------------------------ |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails |

## Example

```typescript
import { modelRegistryList } from "@qvac/sdk";

const models = await modelRegistryList();
for (const model of models) {
  console.log(model.registryPath, model.addon);
}
```


# modelRegistrySearch( ) (/v0.9.0/sdk/api/modelRegistrySearch)


```ts
function modelRegistrySearch(params?: ModelRegistrySearchParams): Promise<ModelRegistryEntry[]>;
```

## Parameters

| Name   | Type                                                      | Required? | Description                                              |
| ------ | --------------------------------------------------------- | :-------: | -------------------------------------------------------- |
| params | [`ModelRegistrySearchParams`](#modelregistrysearchparams) |     ✗     | Optional search filters. If omitted, returns all models. |

### `ModelRegistrySearchParams`

| Field        | Type                                                                                | Required? | Description                              |
| ------------ | ----------------------------------------------------------------------------------- | :-------: | ---------------------------------------- |
| filter       | `string`                                                                            |     ✗     | Free-text filter on model name           |
| engine       | `string`                                                                            |     ✗     | Filter by inference engine               |
| quantization | `string`                                                                            |     ✗     | Filter by quantization level             |
| modelType    | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` |     ✗     | Filter by model type (alias for `addon`) |
| addon        | `"llm" \| "whisper" \| "embeddings" \| "nmt" \| "vad" \| "tts" \| "ocr" \| "other"` |     ✗     | Filter by addon type                     |

## Returns

`Promise<`[`ModelRegistryEntry[]`](../modelRegistryGetModel#modelregistryentry)`>` — Matching model entries.

## Throws

| Error                              | When                     |
| ---------------------------------- | ------------------------ |
| `QVAC_MODEL_REGISTRY_QUERY_FAILED` | The registry query fails |

## Example

```typescript
// Search LLM models
const llmModels = await modelRegistrySearch({ modelType: "llm" });

// Search by name
const llamaModels = await modelRegistrySearch({ filter: "llama" });

// Combined filters
const models = await modelRegistrySearch({
  modelType: "llm",
  quantization: "Q4_K_M",
});
```


# ocr( ) (/v0.9.0/sdk/api/ocr)


```ts
function ocr(params: OCRClientParams): {
  blockStream: AsyncGenerator<OCRTextBlock[]>;
  blocks: Promise<OCRTextBlock[]>;
  stats: Promise<OCRStats | undefined>;
};
```

## Parameters

| Name   | Type                                  | Required? | Description        |
| ------ | ------------------------------------- | :-------: | ------------------ |
| params | [`OCRClientParams`](#ocrclientparams) |     ✓     | The OCR parameters |

### `OCRClientParams`

| Field   | Type                        | Required? | Default | Description                                       |
| ------- | --------------------------- | :-------: | ------- | ------------------------------------------------- |
| modelId | `string`                    |     ✓     | —       | The identifier of the loaded OCR model            |
| image   | `string \| Buffer`          |     ✓     | —       | Image input as file path (string) or image buffer |
| options | [`OCROptions`](#ocroptions) |     ✗     | —       | Optional OCR options                              |
| stream  | `boolean`                   |     ✗     | `false` | Whether to stream blocks as they're detected      |

#### `OCROptions`

| Field     | Type      | Required? | Description           |
| --------- | --------- | :-------: | --------------------- |
| paragraph | `boolean` |     ✗     | Enable paragraph mode |

## Returns

`object` — Object with the following fields:

| Field       | Type                                                  | Description                                                 |
| ----------- | ----------------------------------------------------- | ----------------------------------------------------------- |
| blockStream | `AsyncGenerator<`[`OCRTextBlock[]`](#ocrtextblock)`>` | Stream of detected text blocks (active when `stream: true`) |
| blocks      | `Promise<`[`OCRTextBlock[]`](#ocrtextblock)`>`        | All detected text blocks (populated when `stream: false`)   |
| stats       | `Promise<`[`OCRStats`](#ocrstats) `\| undefined>`     | Performance statistics                                      |

### `OCRTextBlock`

| Field      | Type                               | Description                                |
| ---------- | ---------------------------------- | ------------------------------------------ |
| text       | `string`                           | Detected text                              |
| bbox       | `[number, number, number, number]` | Bounding box `[x1, y1, x2, y2]` (optional) |
| confidence | `number`                           | Detection confidence score (optional)      |

### `OCRStats`

| Field           | Type     | Description                             |
| --------------- | -------- | --------------------------------------- |
| detectionTime   | `number` | Detection phase time in ms (optional)   |
| recognitionTime | `number` | Recognition phase time in ms (optional) |
| totalTime       | `number` | Total OCR time in ms (optional)         |

## Example

```typescript
// Non-streaming mode (default)
const { blocks } = ocr({ modelId, image: "/path/to/image.png" });
for (const block of await blocks) {
  console.log(block.text, block.bbox, block.confidence);
}

// Streaming mode
const { blockStream } = ocr({ modelId, image: imageBuffer, stream: true });
for await (const blocks of blockStream) {
  console.log("Detected:", blocks);
}
```


# profiler (/v0.9.0/sdk/api/profiler)


```ts
const profiler: {
  enable(options?: ProfilerRuntimeOptions): void;
  disable(): void;
  isEnabled(): boolean;
  exportJSON(options?: { includeRecentEvents?: boolean }): ProfilerExport;
  exportTable(): string;
  exportSummary(): string;
  onRecord(callback: (event: ProfilingEvent) => void): () => void;
  getConfig(): ResolvedProfilerConfig;
  getAggregates(): Record<string, AggregatedStats>;
  clear(): void;
};
```

## Methods

| Method                              | Description                                                                |
| ----------------------------------- | -------------------------------------------------------------------------- |
| [`enable()`](#enable)               | Enables profiling and resets aggregated data                               |
| [`disable()`](#disable)             | Disables profiling                                                         |
| [`isEnabled()`](#isenabled)         | Returns whether profiling is currently enabled                             |
| [`exportJSON()`](#exportjson)       | Exports profiling data as a structured JSON object                         |
| [`exportTable()`](#exporttable)     | Exports aggregated stats as a formatted ASCII table                        |
| [`exportSummary()`](#exportsummary) | Exports a human-readable summary string                                    |
| [`onRecord()`](#onrecord)           | Registers a listener for profiling events; returns an unsubscribe function |
| [`getConfig()`](#getconfig)         | Returns the current effective profiler configuration                       |
| [`getAggregates()`](#getaggregates) | Returns all aggregated stats keyed by operation name                       |
| [`clear()`](#clear)                 | Clears all aggregated data and recent events                               |

### `enable()`

```ts
function enable(options?: ProfilerRuntimeOptions): void
```

Enables profiling and resets all previously aggregated data.

**Parameters**

| Name    | Type                                                | Required? | Description              |
| ------- | --------------------------------------------------- | :-------: | ------------------------ |
| options | [`ProfilerRuntimeOptions`](#profilerruntimeoptions) |     ✗     | Runtime profiler options |

#### `ProfilerRuntimeOptions`

| Field                  | Type                     | Required? | Default     | Description                                                           |
| ---------------------- | ------------------------ | :-------: | ----------- | --------------------------------------------------------------------- |
| mode                   | `"summary" \| "verbose"` |     ✗     | `"summary"` | Profiling detail level — `"verbose"` retains recent events            |
| includeServerBreakdown | `boolean`                |     ✗     | `false`     | Include server-side timing breakdown in profiling data                |
| operationFilters       | `string[]`               |     ✗     | `[]`        | Only profile operations whose names match these filters (empty = all) |

**Returns**

`void`

### `disable()`

```ts
function disable(): void
```

Disables profiling. New SDK operations will no longer be recorded.

**Parameters**

No parameters.

**Returns**

`void`

### `isEnabled()`

```ts
function isEnabled(): boolean
```

Returns whether profiling is currently enabled.

**Parameters**

No parameters.

**Returns**

`boolean` — `true` if profiling is enabled, `false` otherwise.

### `exportJSON()`

```ts
function exportJSON(options?: { includeRecentEvents?: boolean }): ProfilerExport
```

Exports profiling data as a structured JSON object.

**Parameters**

| Name                        | Type      | Required? | Description                                                              |
| --------------------------- | --------- | :-------: | ------------------------------------------------------------------------ |
| options                     | `object`  |     ✗     | Export options                                                           |
| options.includeRecentEvents | `boolean` |     ✗     | Include recent events in the export (only available in `"verbose"` mode) |

**Returns**

[`ProfilerExport`](#profilerexport) — structured JSON object containing configuration snapshot, aggregated statistics, and optionally recent events.

#### `ProfilerExport`

| Field                         | Type                                                      | Description                                                     |
| ----------------------------- | --------------------------------------------------------- | --------------------------------------------------------------- |
| config                        | `object`                                                  | Snapshot of the profiler configuration at export time           |
| config.enabled                | `boolean`                                                 | Whether profiling was enabled                                   |
| config.mode                   | `"summary" \| "verbose"`                                  | Profiling mode                                                  |
| config.includeServerBreakdown | `boolean`                                                 | Whether server breakdown was enabled                            |
| config.operationFilters       | `string[]`                                                | Active operation filters                                        |
| config.maxRecentEvents        | `number`                                                  | Max recent events setting                                       |
| aggregates                    | `Record<string,` [`AggregatedStats`](#aggregatedstats)`>` | Aggregated statistics keyed by operation name                   |
| recentEvents                  | [`ProfilingEvent[]`](#profilingevent)                     | Recent profiling events (only when `includeRecentEvents: true`) |
| exportedAt                    | `number`                                                  | Monotonic timestamp of the export                               |

#### `AggregatedStats`

Per-operation aggregated statistics:

| Field | Type     | Description                                |
| ----- | -------- | ------------------------------------------ |
| count | `number` | Number of recorded events                  |
| min   | `number` | Minimum duration in milliseconds           |
| max   | `number` | Maximum duration in milliseconds           |
| avg   | `number` | Average duration in milliseconds           |
| sum   | `number` | Total accumulated duration in milliseconds |
| last  | `number` | Most recent duration in milliseconds       |

#### `ProfilingEvent`

Individual profiling event recorded during an SDK operation:

| Field     | Type                                        | Required? | Description                                                          |
| --------- | ------------------------------------------- | :-------: | -------------------------------------------------------------------- |
| ts        | `number`                                    |     ✓     | Monotonic timestamp in milliseconds                                  |
| op        | `string`                                    |     ✓     | Operation name (e.g., `"completion"`, `"loadModel"`)                 |
| kind      | [`ProfilingEventKind`](#profilingeventkind) |     ✓     | Event category                                                       |
| profileId | `string`                                    |     ✗     | Unique identifier for the profiling session                          |
| phase     | `string`                                    |     ✗     | Sub-phase within the operation (e.g., `"rpc.send"`, `"handler.run"`) |
| ms        | `number`                                    |     ✗     | Duration in milliseconds                                             |
| count     | `number`                                    |     ✗     | Count metric (e.g., tokens, chunks)                                  |
| bytes     | `number`                                    |     ✗     | Byte count metric                                                    |
| gauges    | `Record<string, number>`                    |     ✗     | Numeric gauges (e.g., throughput, token counters)                    |
| tags      | `Record<string, string>`                    |     ✗     | String tags (e.g., `handlerType`, `sourceType`, `modelId`)           |

#### `ProfilingEventKind`

`"rpc" | "handler" | "download" | "load" | "delegation"`

### `exportTable()`

```ts
function exportTable(): string
```

Exports aggregated stats as a formatted ASCII table suitable for console output.

**Parameters**

No parameters.

**Returns**

`string` — formatted ASCII table of all aggregated profiling data.

### `exportSummary()`

```ts
function exportSummary(): string
```

Exports a human-readable summary of all aggregated profiling data.

**Parameters**

No parameters.

**Returns**

`string` — human-readable summary string.

### `onRecord()`

```ts
function onRecord(callback: (event: ProfilingEvent) => void): () => void
```

Registers a listener that is called for every profiling event.

**Parameters**

| Name     | Type                                                     | Required? | Description                               |
| -------- | -------------------------------------------------------- | :-------: | ----------------------------------------- |
| callback | `(event:` [`ProfilingEvent`](#profilingevent)`) => void` |     ✓     | Function called with each profiling event |

**Returns**

`() => void` — an unsubscribe function. Call it to stop receiving events.

### `getConfig()`

```ts
function getConfig(): ResolvedProfilerConfig
```

Returns the current effective profiler configuration.

**Parameters**

No parameters.

**Returns**

[`ResolvedProfilerConfig`](#resolvedprofilerconfig):

#### `ResolvedProfilerConfig`

| Field                  | Type                     | Description                                             |
| ---------------------- | ------------------------ | ------------------------------------------------------- |
| enabled                | `boolean`                | Whether profiling is currently enabled                  |
| mode                   | `"summary" \| "verbose"` | Active profiling mode                                   |
| includeServerBreakdown | `boolean`                | Whether server breakdown is included                    |
| operationFilters       | `string[]`               | Active operation filters                                |
| maxRecentEvents        | `number`                 | Maximum number of recent events retained (default 1000) |

### `getAggregates()`

```ts
function getAggregates(): Record<string, AggregatedStats>
```

Returns all aggregated stats keyed by operation name.

**Parameters**

No parameters.

**Returns**

[`Record<string, AggregatedStats>`](#aggregatedstats) — all aggregated stats keyed by operation name.

### `clear()`

```ts
function clear(): void
```

Clears all aggregated data and recent events. Does not disable profiling.

**Parameters**

No parameters.

**Returns**

`void`

## Example

```typescript
import { profiler, completion, loadModel } from "@qvac/sdk";

profiler.enable({ mode: "verbose", includeServerBreakdown: true });

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
});

const result = completion({
  modelId,
  history: [{ role: "user", content: "Hello!" }],
});

await result.text;

console.log(profiler.exportTable());

const json = profiler.exportJSON({ includeRecentEvents: true });
console.log("Aggregates:", json.aggregates);

profiler.clear();
profiler.disable();
```


# ragChunk( ) (/v0.9.0/sdk/api/ragChunk)


```ts
function ragChunk(params: RagChunkParams, options?: RPCOptions): Promise<RagDoc[]>;
```

Part of the segregated flow: `ragChunk()` → [`embed()`](../embed) → [`ragSaveEmbeddings()`](../ragSaveEmbeddings)

## Parameters

| Name             | Type                                | Required? | Description                    |
| ---------------- | ----------------------------------- | :-------: | ------------------------------ |
| params           | `object`                            |     ✓     | The chunking parameters        |
| params.documents | `string \| string[]`                |     ✓     | Documents to chunk             |
| params.chunkOpts | [`ChunkOptions`](#chunkoptions)     |     ✗     | Chunking options               |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options |

### `ChunkOptions`

| Field         | Type                                                       | Required? | Description             |
| ------------- | ---------------------------------------------------------- | :-------: | ----------------------- |
| chunkSize     | `number`                                                   |     ✗     | Maximum chunk size      |
| chunkOverlap  | `number`                                                   |     ✗     | Overlap between chunks  |
| chunkStrategy | `"character" \| "paragraph"`                               |     ✗     | Chunking strategy       |
| splitStrategy | `"character" \| "word" \| "token" \| "sentence" \| "line"` |     ✗     | Text splitting strategy |

## Returns

`Promise<RagDoc[]>` — Array of chunk results.

| Field   | Type     | Description        |
| ------- | -------- | ------------------ |
| id      | `string` | Chunk identifier   |
| content | `string` | Chunk text content |

## Throws

| Error              | When                         |
| ------------------ | ---------------------------- |
| `RAG_CHUNK_FAILED` | The chunking operation fails |

## Example

```typescript
const chunks = await ragChunk({
  documents: ["Long document text here..."],
  chunkOpts: {
    chunkSize: 256,
    chunkOverlap: 50,
    chunkStrategy: "paragraph",
  },
});
```


# ragCloseWorkspace( ) (/v0.9.0/sdk/api/ragCloseWorkspace)


```ts
function ragCloseWorkspace(params?: RagCloseWorkspaceParams, options?: RPCOptions): Promise<void>;
```

Releases Corestore, HyperDB adapter, and RAG instance. Workspace data remains on disk unless `deleteOnClose` is set.

## Parameters

| Name                 | Type                                | Required? | Default     | Description                                             |
| -------------------- | ----------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.workspace     | `string`                            |     ✗     | `"default"` | Workspace to close                                      |
| params.deleteOnClose | `boolean`                           |     ✗     | `false`     | If true, deletes workspace data from disk after closing |
| options              | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                          |

## Returns

`Promise<void>` — Resolves when the workspace is closed.

## Throws

| Error                        | When                      |
| ---------------------------- | ------------------------- |
| `RAG_WORKSPACE_CLOSE_FAILED` | The close operation fails |

## Example

```typescript
// Close a workspace
await ragCloseWorkspace({ workspace: "my-docs" });

// Close and delete in one call
await ragCloseWorkspace({ workspace: "my-docs", deleteOnClose: true });
```


# ragDeleteEmbeddings( ) (/v0.9.0/sdk/api/ragDeleteEmbeddings)


```ts
function ragDeleteEmbeddings(params: RagDeleteEmbeddingsParams, options?: RPCOptions): Promise<void>;
```

## Parameters

| Name             | Type                                | Required? | Default     | Description                                             |
| ---------------- | ----------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.ids       | `string[]`                          |     ✓     | —           | Array of document IDs to delete                         |
| params.modelId   | `string`                            |     ✗     | —           | Embedding model ID (required if no cached RAG instance) |
| params.workspace | `string`                            |     ✗     | `"default"` | Workspace to delete from                                |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                          |

## Returns

`Promise<void>` — Resolves when the embeddings are deleted.

## Throws

| Error               | When                                                  |
| ------------------- | ----------------------------------------------------- |
| `RAG_DELETE_FAILED` | The delete operation fails or workspace doesn't exist |

## Example

```typescript
await ragDeleteEmbeddings({
  ids: ["doc-1", "doc-2"],
  workspace: "my-docs",
});
```


# ragDeleteWorkspace( ) (/v0.9.0/sdk/api/ragDeleteWorkspace)


```ts
function ragDeleteWorkspace(params: RagDeleteWorkspaceParams, options?: RPCOptions): Promise<void>;
```

The workspace must not be currently loaded/in-use.

## Parameters

| Name             | Type                                | Required? | Description                     |
| ---------------- | ----------------------------------- | :-------: | ------------------------------- |
| params.workspace | `string`                            |     ✓     | Name of the workspace to delete |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options  |

## Returns

`Promise<void>` — Resolves when the workspace is deleted.

## Throws

| Error               | When                                               |
| ------------------- | -------------------------------------------------- |
| `RAG_DELETE_FAILED` | The workspace doesn't exist or is currently loaded |

## Example

```typescript
await ragDeleteWorkspace({ workspace: "my-docs" });
```


# ragIngest( ) (/v0.9.0/sdk/api/ragIngest)


```ts
function ragIngest(params: RagIngestParams, options?: RPCOptions): Promise<{ processed: RagSaveEmbeddingsResult[]; droppedIndices: number[] }>;
```

Full pipeline: chunk → embed → save. Implicitly opens (or creates) the workspace.

## Parameters

| Name                    | Type                                       | Required? | Default     | Description                                                  |
| ----------------------- | ------------------------------------------ | :-------: | ----------- | ------------------------------------------------------------ |
| params.modelId          | `string`                                   |     ✓     | —           | The embedding model identifier                               |
| params.documents        | `string \| string[]`                       |     ✓     | —           | Documents to ingest                                          |
| params.chunk            | `boolean`                                  |     ✗     | `true`      | Whether to chunk documents before embedding                  |
| params.chunkOpts        | [`ChunkOptions`](../ragChunk#chunkoptions) |     ✗     | —           | Chunking options                                             |
| params.workspace        | `string`                                   |     ✗     | `"default"` | Workspace for isolated storage. Created if it doesn't exist. |
| params.onProgress       | `(stage, current, total) => void`          |     ✗     | —           | Progress callback                                            |
| params.progressInterval | `number`                                   |     ✗     | —           | Minimum interval between progress updates in ms              |
| options                 | [`RPCOptions`](../index#rpcoptions)        |     ✗     | —           | Optional RPC transport options                               |

## Returns

| Field          | Type                        | Description                                                   |
| -------------- | --------------------------- | ------------------------------------------------------------- |
| processed      | `RagSaveEmbeddingsResult[]` | Array of `{ status: "fulfilled" \| "rejected", id?, error? }` |
| droppedIndices | `number[]`                  | Indices of documents that were dropped                        |

## Throws

| Error                           | When                                                  |
| ------------------------------- | ----------------------------------------------------- |
| `RAG_SAVE_FAILED`               | The ingestion operation fails                         |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`) |

## Example

```typescript
const result = await ragIngest({
  modelId,
  documents: ["Document 1", "Document 2"],
  workspace: "my-docs",
  onProgress: (stage, current, total) => {
    console.log(`[${stage}] ${current}/${total}`);
  },
});
```


# ragListWorkspaces( ) (/v0.9.0/sdk/api/ragListWorkspaces)


```ts
function ragListWorkspaces(options?: RPCOptions): Promise<RagWorkspaceInfo[]>;
```

## Parameters

| Name    | Type                                | Required? | Description                    |
| ------- | ----------------------------------- | :-------: | ------------------------------ |
| options | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options |

## Returns

`Promise<RagWorkspaceInfo[]>` — Array of workspace info.

| Field | Type      | Description                                         |
| ----- | --------- | --------------------------------------------------- |
| name  | `string`  | Workspace name                                      |
| open  | `boolean` | Whether the workspace is currently loaded in memory |

## Throws

| Error                        | When                |
| ---------------------------- | ------------------- |
| `RAG_LIST_WORKSPACES_FAILED` | The operation fails |

## Example

```typescript
const workspaces = await ragListWorkspaces();
// [{ name: "default", open: true }, { name: "my-docs", open: false }]
```


# ragReindex( ) (/v0.9.0/sdk/api/ragReindex)


```ts
function ragReindex(params: RagReindexParams, options?: RPCOptions): Promise<RagReindexResult>;
```

For HyperDB, rebalances centroids using k-means clustering. Requires a minimum number of documents (16 by default).

## Parameters

| Name              | Type                                | Required? | Default     | Description                                             |
| ----------------- | ----------------------------------- | :-------: | ----------- | ------------------------------------------------------- |
| params.modelId    | `string`                            |     ✗     | —           | Embedding model ID (required if no cached RAG instance) |
| params.workspace  | `string`                            |     ✗     | `"default"` | Workspace to reindex. Must already exist.               |
| params.onProgress | `(stage, current, total) => void`   |     ✗     | —           | Progress callback                                       |
| options           | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                          |

## Returns

| Field     | Type                      | Description                                             |
| --------- | ------------------------- | ------------------------------------------------------- |
| reindexed | `boolean`                 | Whether reindexing was performed                        |
| details   | `Record<string, unknown>` | Additional details (e.g., reason if skipped) — optional |

## Throws

| Error                           | When                                                   |
| ------------------------------- | ------------------------------------------------------ |
| `RAG_SAVE_FAILED`               | The reindex operation fails or workspace doesn't exist |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`)  |

## Example

```typescript
const result = await ragReindex({ workspace: "my-docs" });

if (!result.reindexed) {
  console.log("Reindex skipped:", result.details?.reason);
}
```


# ragSaveEmbeddings( ) (/v0.9.0/sdk/api/ragSaveEmbeddings)


```ts
function ragSaveEmbeddings(params: RagSaveEmbeddingsParams, options?: RPCOptions): Promise<RagSaveEmbeddingsResult[]>;
```

Part of the segregated flow: [`ragChunk()`](../ragChunk) → [`embed()`](../embed) → `ragSaveEmbeddings()`. Implicitly opens (or creates) the workspace.

## Parameters

| Name                    | Type                                  | Required? | Default     | Description                                                    |
| ----------------------- | ------------------------------------- | :-------: | ----------- | -------------------------------------------------------------- |
| params.documents        | [`RagEmbeddedDoc[]`](#ragembeddeddoc) |     ✓     | —           | Pre-embedded documents                                         |
| params.modelId          | `string`                              |     ✗     | —           | Embedding model ID (required if no cached RAG instance exists) |
| params.workspace        | `string`                              |     ✗     | `"default"` | Workspace for isolated storage                                 |
| params.onProgress       | `(stage, current, total) => void`     |     ✗     | —           | Progress callback                                              |
| params.progressInterval | `number`                              |     ✗     | —           | Minimum interval between progress updates in ms                |
| options                 | [`RPCOptions`](../index#rpcoptions)   |     ✗     | —           | Optional RPC transport options                                 |

### `RagEmbeddedDoc`

| Field            | Type                      | Required? | Description                          |
| ---------------- | ------------------------- | :-------: | ------------------------------------ |
| id               | `string`                  |     ✓     | Document identifier                  |
| content          | `string`                  |     ✓     | Document text content                |
| embedding        | `number[]`                |     ✓     | Pre-computed embedding vector        |
| embeddingModelId | `string`                  |     ✓     | Model used to generate the embedding |
| metadata         | `Record<string, unknown>` |     ✗     | Optional metadata                    |

## Returns

`Promise<RagSaveEmbeddingsResult[]>` — Array of `{ status: "fulfilled" | "rejected", id?, error? }`.

## Throws

| Error                           | When                                                  |
| ------------------------------- | ----------------------------------------------------- |
| `RAG_SAVE_FAILED`               | The save operation fails                              |
| `STREAM_ENDED_WITHOUT_RESPONSE` | Streaming ends unexpectedly (when using `onProgress`) |

## Example

```typescript
const chunks = await ragChunk({ documents: ["text1", "text2"] });
const embeddings = await embed({ modelId, text: chunks.map(c => c.content) });
const embeddedDocs = chunks.map((chunk, i) => ({
  ...chunk,
  embedding: embeddings[i],
  embeddingModelId: modelId,
}));
const result = await ragSaveEmbeddings({
  documents: embeddedDocs,
  workspace: "my-workspace",
});
```


# ragSearch( ) (/v0.9.0/sdk/api/ragSearch)


```ts
function ragSearch(params: RagSearchParams, options?: RPCOptions): Promise<RagSearchResult[]>;
```

## Parameters

| Name             | Type                                | Required? | Default     | Description                              |
| ---------------- | ----------------------------------- | :-------: | ----------- | ---------------------------------------- |
| params.modelId   | `string`                            |     ✓     | —           | The embedding model identifier           |
| params.query     | `string`                            |     ✓     | —           | The search query text                    |
| params.topK      | `number`                            |     ✗     | `5`         | Number of top results to retrieve        |
| params.n         | `number`                            |     ✗     | `3`         | Number of centroids for IVF index search |
| params.workspace | `string`                            |     ✗     | `"default"` | Workspace to search in                   |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options           |

## Returns

`Promise<RagSearchResult[]>` — Array of search results. Empty array if workspace doesn't exist.

| Field   | Type     | Description           |
| ------- | -------- | --------------------- |
| id      | `string` | Document identifier   |
| content | `string` | Document text content |
| score   | `number` | Similarity score      |

## Throws

| Error               | When                       |
| ------------------- | -------------------------- |
| `RAG_SEARCH_FAILED` | The search operation fails |

## Example

```typescript
const results = await ragSearch({
  modelId,
  query: "AI and machine learning",
  topK: 5,
  workspace: "my-docs",
});
```


# resume( ) (/v0.9.0/sdk/api/resume)


```ts
function resume(): Promise<void>;
```

Resumes all suspended Hyperswarm and Corestore resources. Idempotent — calling while already active is a no-op. Also serves as the recovery path after a partial suspend failure.

Typically used in mobile apps when the application returns to the foreground, paired with [`suspend()`](./suspend) when it moves to the background.

## Parameters

None.

## Returns

`Promise<void>`

## Throws

| Error                     | When                                             |
| ------------------------- | ------------------------------------------------ |
| `INVALID_RESPONSE_TYPE`   | Response type does not match expected `"resume"` |
| `LIFECYCLE_RESUME_FAILED` | One or more resources failed to resume           |

## Example

```typescript
await resume();
```


# Shared types (/v0.9.0/sdk/api/shared-types)


## `RPCOptions`

Many functions accept an optional `rpcOptions` parameter for transport-level configuration:

| Field              | Type                                    | Required? | Default | Description                                                   |
| ------------------ | --------------------------------------- | :-------: | ------- | ------------------------------------------------------------- |
| timeout            | `number`                                |     ✗     | —       | Request timeout in milliseconds (min 100)                     |
| forceNewConnection | `boolean`                               |     ✗     | `false` | Force a new RPC connection instead of reusing an existing one |
| profiling          | [`PerCallProfiling`](#percallprofiling) |     ✗     | —       | Per-call profiling configuration                              |

### `PerCallProfiling`

| Field                  | Type                     | Required? | Description                                            |
| ---------------------- | ------------------------ | :-------: | ------------------------------------------------------ |
| enabled                | `boolean`                |     ✗     | Enable profiling for this call                         |
| includeServerBreakdown | `boolean`                |     ✗     | Include server-side timing breakdown in profiling data |
| mode                   | `"summary" \| "verbose"` |     ✗     | Profiling detail level                                 |


# startQVACProvider( ) (/v0.9.0/sdk/api/startQVACProvider)


```ts
function startQVACProvider(params: ProvideParams): Promise<object>;
```

The provider's keypair can be controlled via the seed option or the `QVAC_HYPERSWARM_SEED` environment variable.

## Parameters

| Name            | Type                                | Required? | Description                         |
| --------------- | ----------------------------------- | :-------: | ----------------------------------- |
| params.topic    | `string`                            |     ✓     | Topic hex string for peer discovery |
| params.firewall | [`FirewallConfig`](#firewallconfig) |     ✗     | Optional firewall configuration     |

### `FirewallConfig`

| Field      | Type                | Required? | Default   | Description                  |
| ---------- | ------------------- | :-------: | --------- | ---------------------------- |
| mode       | `"allow" \| "deny"` |     ✗     | `"allow"` | Firewall mode                |
| publicKeys | `string[]`          |     ✗     | `[]`      | Public keys to allow or deny |

## Returns

`Promise<object>` — The provide response containing `success`, `publicKey`, and optional `error`.

## Throws

| Error                   | When                                              |
| ----------------------- | ------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"provide"` |
| `PROVIDER_START_FAILED` | The server reports provider start failure         |

## Example

```typescript
const response = await startQVACProvider({
  topic: "a1b2c3d4...",
  firewall: {
    mode: "allow",
    publicKeys: ["peer-public-key-hex"],
  },
});
console.log("Provider public key:", response.publicKey);
```


# stopQVACProvider( ) (/v0.9.0/sdk/api/stopQVACProvider)


```ts
function stopQVACProvider(params: StopProvideParams): Promise<object>;
```

## Parameters

| Name         | Type     | Required? | Description               |
| ------------ | -------- | :-------: | ------------------------- |
| params.topic | `string` |     ✓     | Topic hex string to leave |

## Returns

`Promise<object>` — The stop provide response containing `success` and optional `error`.

## Throws

| Error                   | When                                                  |
| ----------------------- | ----------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"stopProvide"` |
| `PROVIDER_STOP_FAILED`  | The server reports provider stop failure              |

## Example

```typescript
await stopQVACProvider({ topic: "a1b2c3d4..." });
```


# suspend( ) (/v0.9.0/sdk/api/suspend)


```ts
function suspend(): Promise<void>;
```

Suspends all active Hyperswarm and Corestore resources. Idempotent — calling while already suspended is a no-op.

Typically used in mobile apps when the application moves to the background, paired with [`resume()`](./resume) when it returns to the foreground.

## Parameters

None.

## Returns

`Promise<void>`

## Throws

| Error                      | When                                                      |
| -------------------------- | --------------------------------------------------------- |
| `INVALID_RESPONSE_TYPE`    | Response type does not match expected `"suspend"`         |
| `LIFECYCLE_SUSPEND_FAILED` | One or more resources failed to suspend (partial failure) |

## Example

```typescript
await suspend();
```


# textToSpeech( ) (/v0.9.0/sdk/api/textToSpeech)


```ts
function textToSpeech(params: TtsClientParams, options?: RPCOptions): {
  bufferStream: AsyncGenerator<number>;
  buffer: Promise<number[]>;
  done: Promise<boolean>;
};
```

## Parameters

| Name             | Type                                | Required? | Default  | Description                                           |
| ---------------- | ----------------------------------- | :-------: | -------- | ----------------------------------------------------- |
| params.modelId   | `string`                            |     ✓     | —        | The identifier of the loaded TTS model                |
| params.text      | `string`                            |     ✓     | —        | The text to convert to speech (non-empty)             |
| params.inputType | `string`                            |     ✗     | `"text"` | Input type                                            |
| params.stream    | `boolean`                           |     ✗     | `true`   | Whether to stream audio samples or return all at once |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —        | Optional RPC transport options                        |

## Returns

`object` — Object with the following fields:

| Field        | Type                     | Description                                            |
| ------------ | ------------------------ | ------------------------------------------------------ |
| bufferStream | `AsyncGenerator<number>` | Stream of audio samples (active when `stream: true`)   |
| buffer       | `Promise<number[]>`      | Complete audio buffer (populated when `stream: false`) |
| done         | `Promise<boolean>`       | Resolves to `true` when generation completes           |

## Example

```typescript
// Streaming mode
const { bufferStream } = textToSpeech({ modelId, text: "Hello world" });
for await (const sample of bufferStream) {
  // process audio sample
}

// Non-streaming mode
const { buffer } = textToSpeech({ modelId, text: "Hello world", stream: false });
const audioData = await buffer;
```


# transcribe( ) (/v0.9.0/sdk/api/transcribe)


```ts
function transcribe(params: TranscribeClientParams, options?: RPCOptions): Promise<string>;
```

Collects all streaming results from [`transcribeStream()`](../transcribeStream) into a single string.

## Parameters

| Name              | Type                                | Required? | Description                                        |
| ----------------- | ----------------------------------- | :-------: | -------------------------------------------------- |
| params.modelId    | `string`                            |     ✓     | The identifier of the transcription model          |
| params.audioChunk | `string \| Buffer`                  |     ✓     | Audio input as file path (string) or audio buffer  |
| params.prompt     | `string`                            |     ✗     | Optional initial prompt to guide the transcription |
| options           | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options                     |

## Returns

`Promise<string>` — The complete transcribed text.

## Throws

| Error                  | When                |
| ---------------------- | ------------------- |
| `TRANSCRIPTION_FAILED` | Transcription fails |

## Example

```typescript
const text = await transcribe({
  modelId: "whisper-model",
  audioChunk: "/path/to/audio.wav",
});
console.log(text);
```

## `SUPPORTED_AUDIO_FORMATS`

Use this exported constant to check which audio formats are accepted before passing a file to `transcribe()`. Avoids runtime errors from unsupported formats.

```typescript
import { SUPPORTED_AUDIO_FORMATS } from "@qvac/sdk";

const ext = filePath.split(".").pop();
if (!SUPPORTED_AUDIO_FORMATS.includes(ext)) {
  console.error(`Unsupported format: ${ext}`);
}
```


# transcribeStream( ) (/v0.9.0/sdk/api/transcribeStream)


```ts
function transcribeStream(params: TranscribeClientParams, options?: RPCOptions): AsyncGenerator<string>;
```

Yields text chunks as they become available from the model.

## Parameters

| Name              | Type                                | Required? | Description                                        |
| ----------------- | ----------------------------------- | :-------: | -------------------------------------------------- |
| params.modelId    | `string`                            |     ✓     | The identifier of the transcription model          |
| params.audioChunk | `string \| Buffer`                  |     ✓     | Audio input as file path (string) or audio buffer  |
| params.prompt     | `string`                            |     ✗     | Optional initial prompt to guide the transcription |
| options           | [`RPCOptions`](../index#rpcoptions) |     ✗     | Optional RPC transport options                     |

## Returns

`AsyncGenerator<string>` — Yields text chunks as they are transcribed.

## Throws

| Error                  | When                |
| ---------------------- | ------------------- |
| `TRANSCRIPTION_FAILED` | Transcription fails |

## Example

```typescript
const stream = transcribeStream({
  modelId: "whisper-model",
  audioChunk: "/path/to/audio.wav",
});

for await (const textChunk of stream) {
  process.stdout.write(textChunk);
}
```

## `SUPPORTED_AUDIO_FORMATS`

Use this exported constant to check which audio formats are accepted before passing a file to `transcribeStream()`. Avoids runtime errors from unsupported formats.

```typescript
import { SUPPORTED_AUDIO_FORMATS } from "@qvac/sdk";

const ext = filePath.split(".").pop();
if (!SUPPORTED_AUDIO_FORMATS.includes(ext)) {
  console.error(`Unsupported format: ${ext}`);
}
```


# translate( ) (/v0.9.0/sdk/api/translate)


```ts
function translate(params: TranslateClientParams): {
  tokenStream: AsyncGenerator<string>;
  text: Promise<string>;
  stats: Promise<TranslationStats | undefined>;
};
```

Supports both NMT (Neural Machine Translation) and LLM models.

## Parameters

| Name             | Type                                | Required? | Default     | Description                                                          |
| ---------------- | ----------------------------------- | :-------: | ----------- | -------------------------------------------------------------------- |
| params.modelId   | `string`                            |     ✓     | —           | The identifier of the translation model                              |
| params.text      | `string \| string[]`                |     ✓     | —           | The input text(s) to translate. NMT supports array; LLM only string. |
| params.modelType | `"nmt" \| "llm"`                    |     ✓     | —           | The type of translation model                                        |
| params.stream    | `boolean`                           |     ✗     | `true`      | Whether to stream tokens or return complete response                 |
| params.from      | `string`                            |     ✗     | auto-detect | Source language code (LLM only)                                      |
| params.to        | `string`                            |  ✓ (LLM)  | —           | Target language code (LLM only)                                      |
| params.context   | `string`                            |     ✗     | —           | Additional context for translation (LLM only)                        |
| options          | [`RPCOptions`](../index#rpcoptions) |     ✗     | —           | Optional RPC transport options                                       |

## Returns

`object` — Object with the following fields:

| Field       | Type                                                              | Description                 |
| ----------- | ----------------------------------------------------------------- | --------------------------- |
| tokenStream | `AsyncGenerator<string>`                                          | Stream of translated tokens |
| text        | `Promise<string>`                                                 | Complete translated text    |
| stats       | `Promise<`[`TranslationStats`](#translationstats) `\| undefined>` | Translation statistics      |

### `TranslationStats`

| Field           | Type     | Description                     |
| --------------- | -------- | ------------------------------- |
| processedTokens | `number` | Number of tokens processed      |
| processingTime  | `number` | Processing time in milliseconds |

## Throws

| Error                | When              |
| -------------------- | ----------------- |
| `TRANSLATION_FAILED` | Translation fails |

## Example

```typescript
// Streaming mode (default)
const result = translate({
  modelId: "nmt-model",
  text: "Hello world",
  from: "en",
  to: "es",
  modelType: "llm",
});

for await (const token of result.tokenStream) {
  process.stdout.write(token);
}

// Non-streaming mode
const response = translate({
  modelId: "nmt-model",
  text: "Hello world",
  from: "en",
  to: "es",
  modelType: "llm",
  stream: false,
});

console.log(await response.text);
```


# unloadModel( ) (/v0.9.0/sdk/api/unloadModel)


```ts
function unloadModel(params: UnloadModelParams): Promise<void>;
```

When the last model is unloaded and no providers are active, the RPC connection is automatically closed, allowing the process to exit naturally.

## Parameters

| Name                | Type      | Required? | Default | Description                                  |
| ------------------- | --------- | :-------: | ------- | -------------------------------------------- |
| params.modelId      | `string`  |     ✓     | —       | The unique identifier of the model to unload |
| params.clearStorage | `boolean` |     ✗     | `false` | Whether to clear the storage for the model   |

## Returns

`Promise<void>` — Resolves when the model is unloaded.

## Throws

| Error                   | When                                                  |
| ----------------------- | ----------------------------------------------------- |
| `INVALID_RESPONSE_TYPE` | Response type does not match expected `"unloadModel"` |
| `MODEL_UNLOAD_FAILED`   | The server reports unload failure                     |

## Example

```typescript
await unloadModel({ modelId: "model-123" });

// Unload and clear cached files
await unloadModel({ modelId: "model-123", clearStorage: true });
```


# Completion (/sdk/examples/ai-tasks/completion)


## Overview

Completion uses [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp) as inference engine. Load any supported model using `modelType: "llm"`. Then, provide array `history` as input where each element is an object with properties:

* `role: string`; can be either `"user"` or `"assistant"`
* `content: string`

`role: "user"` indicates that `content` is a previous prompt.  `role: "assistant"` indicates that `content` is a previous inference (LLM output).

Completion output is generated based on the full sequence of messages provided in `history`.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/sdk/api/loadModel)
2. [`completion()`](/sdk/api/completion)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Models

You can load any [`llama.cpp`](https://github.com/ggml-org/llama.cpp)-compatible text-generation/chat model. Model file format: `*.gguf`.

* If the model is sharded across multiple files (a multi-file bundle), see [Sharded models](/sdk/examples/utilities/sharded-models).
* For multimodal prompts (images + text), see [Multimodal](/sdk/examples/ai-tasks/multimodal).
* To adapt a model to domain-specific tasks, see [Fine-tuning](/sdk/examples/ai-tasks/fine-tuning).
* For models available as constants, see [SDK — Models](/sdk/getting-started#models).

## Features

* Tool calls: let the model emit structured tool calls and stream tool-call events alongside tokens.
* MCP: plug MCP servers into `completion()` so the model can use external tools (e.g., web search) via the same tool-call mechanism.
* KV cache: cache and reuse the model’s key/value attention state to speed up follow-up turns in long conversations.

## Examples

### Usage

The following script shows a basic example of completion:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/llamacpp-p2p.js title="completion.js" lineNumbers
      import { completion, LLAMA_3_2_1B_INST_Q4_0, loadModel, downloadAsset, unloadModel, VERBOSITY, } from "@qvac/sdk";
      try {
          // First just cache the model
          await downloadAsset({
              assetSrc: LLAMA_3_2_1B_INST_Q4_0,
              onProgress: (progress) => {
                  console.log(progress);
              },
          });
          // Then load it in memory from cache
          const modelId = await loadModel({
              modelSrc: LLAMA_3_2_1B_INST_Q4_0,
              modelType: "llm",
              modelConfig: {
                  device: "gpu",
                  ctx_size: 2048,
                  verbosity: VERBOSITY.ERROR,
              },
          });
          const history = [
              {
                  role: "user",
                  content: "Explain quantum computing in one sentence, use lots of emojis",
              },
          ];
          const result = completion({ modelId, history, stream: true });
          for await (const token of result.tokenStream) {
              process.stdout.write(token);
          }
          const stats = await result.stats;
          console.log("\n📊 Performance Stats:", stats);
          // Change `clearStorage: true` to delete cached model files
          await unloadModel({ modelId, clearStorage: false });
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/llamacpp-p2p.ts title="completion.ts" lineNumbers
      import {
        completion,
        LLAMA_3_2_1B_INST_Q4_0,
        loadModel,
        downloadAsset,
        unloadModel,
        VERBOSITY,
      } from "@qvac/sdk";

      try {
        // First just cache the model
        await downloadAsset({
          assetSrc: LLAMA_3_2_1B_INST_Q4_0,
          onProgress: (progress) => {
            console.log(progress);
          },
        });

        // Then load it in memory from cache
        const modelId = await loadModel({
          modelSrc: LLAMA_3_2_1B_INST_Q4_0,
          modelType: "llm",
          modelConfig: {
            device: "gpu",
            ctx_size: 2048,
            verbosity: VERBOSITY.ERROR,
          },
        });

        const history = [
          {
            role: "user",
            content: "Explain quantum computing in one sentence, use lots of emojis",
          },
        ];

        const result = completion({ modelId, history, stream: true });

        for await (const token of result.tokenStream) {
          process.stdout.write(token);
        }

        const stats = await result.stats;
        console.log("\n📊 Performance Stats:", stats);

        // Change `clearStorage: true` to delete cached model files
        await unloadModel({ modelId, clearStorage: false });
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Tool call

The following script shows how to provide tool definitions to `completion()`, stream `toolCallStream` events, and read the parsed tool calls:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/llamacpp-native-tools.js title="completion-tool-call.js" lineNumbers
      import { z } from "zod";
      import { completion, loadModel, unloadModel, QWEN3_1_7B_INST_Q4, } from "@/index";
      // Define Zod schemas for tool parameters
      const weatherSchema = z.object({
          city: z.string().describe("City name"),
          country: z.string().describe("Country code").optional(),
      });
      const horoscopeSchema = z.object({
          sign: z.string().describe("An astrological sign like Taurus or Aquarius"),
      });
      // Map tool names to their schemas for runtime validation
      const toolSchemas = {
          get_weather: weatherSchema,
          get_horoscope: horoscopeSchema,
      };
      // Simple tool definitions - just name, description, and Zod schema!
      const tools = [
          {
              name: "get_weather",
              description: "Get current weather for a city",
              parameters: weatherSchema,
          },
          {
              name: "get_horoscope",
              description: "Get today's horoscope for an astrological sign",
              parameters: horoscopeSchema,
          },
      ];
      try {
          // Load model from provided file path with tools support enabled
          const modelId = await loadModel({
              modelSrc: QWEN3_1_7B_INST_Q4,
              modelType: "llm",
              modelConfig: {
                  ctx_size: 4096,
                  tools: true, // Enable tools support
              },
              onProgress: (progress) => console.log(`Loading: ${progress.percentage.toFixed(1)}%`),
          });
          console.log(`✅ Model loaded successfully! Model ID: ${modelId}`);
          // Create conversation history
          const history = [
              {
                  role: "system",
                  content: "You are a helpful assistant that can use tools to get the weather and horoscope.",
              },
              {
                  role: "user",
                  content: "What's the weather in Tokyo and my horoscope for Aquarius?",
              },
          ];
          console.log("\n🤖 AI Response:");
          console.log("(Streaming with tool definitions in prompt)\n");
          const result = completion({ modelId, history, stream: true, tools });
          // Consume token stream
          const tokensTask = (async () => {
              for await (const token of result.tokenStream) {
                  process.stdout.write(token);
              }
          })();
          // Consume tool call events
          const toolsTask = (async () => {
              for await (const evt of result.toolCallStream) {
                  if (evt.type === "toolCall") {
                      console.log(`\n\n→ Tool Call Detected: ${evt.call.name}(${JSON.stringify(evt.call.arguments)})`);
                      console.log(`   ID: ${evt.call.id}`);
                  }
                  else if (evt.type === "toolCallError") {
                      console.warn(`\n⚠️  Tool Error: ${evt.error.message}`);
                      console.warn(`   Code: ${evt.error.code}`);
                  }
              }
          })();
          await Promise.all([tokensTask, toolsTask]);
          const stats = await result.stats;
          const toolCalls = await result.toolCalls;
          console.log("\n\n📋 Parsed Tool Calls:");
          if (toolCalls.length > 0) {
              for (const call of toolCalls) {
                  console.log(`  - ${call.name}(${JSON.stringify(call.arguments)})`);
                  const schema = toolSchemas[call.name];
                  if (schema) {
                      const validated = schema.safeParse(call.arguments);
                      if (validated.success) {
                          console.log(`    ✓ Arguments validated with Zod`);
                      }
                      else {
                          console.log(`    ✗ Validation failed:`, validated.error);
                      }
                  }
              }
          }
          else {
              console.log("  No tool calls detected in response");
          }
          console.log("\n📊 Performance Stats:", stats);
          // Execute tool calls and send results back to the model
          if (toolCalls.length > 0) {
              console.log("\n\n🔧 Simulating Tool Execution...");
              // Simulate tool execution (in a real app, you'd call actual APIs)
              const toolResults = toolCalls.map((call) => {
                  let result = "";
                  if (call.name === "get_weather") {
                      const args = call.arguments;
                      result = `The weather in ${args.city} is sunny, 22°C with light clouds.`;
                  }
                  else if (call.name === "get_horoscope") {
                      const args = call.arguments;
                      result = `Horoscope for ${args.sign}: Today is a great day for new beginnings and creative endeavors!`;
                  }
                  console.log(`  ✓ ${call.name}: ${result}`);
                  return { toolCallId: call.id, result };
              });
              // Add tool results to conversation history
              history.push({
                  role: "assistant",
                  content: await result.text,
              });
              // Add tool results as tool messages
              for (const toolResult of toolResults) {
                  history.push({
                      role: "tool",
                      content: toolResult.result,
                  });
              }
              // Send follow-up question with tool results
              console.log("\n\n🤖 Follow-up Response with Tool Results:");
              const followUpResult = completion({
                  modelId,
                  history,
                  stream: true,
                  tools,
              });
              for await (const token of followUpResult.tokenStream) {
                  process.stdout.write(token);
              }
              const followUpStats = await followUpResult.stats;
              console.log("\n\n📊 Follow-up Stats:", followUpStats);
          }
          console.log("\n\n🎉 Completed!");
          await unloadModel({ modelId, clearStorage: false });
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/llamacpp-native-tools.ts title="completion-tool-call.ts" lineNumbers
      import { z } from "zod";
      import {
        completion,
        loadModel,
        unloadModel,
        type ToolCall,
        type CompletionStats,
        QWEN3_1_7B_INST_Q4,
      } from "@/index";

      // Define Zod schemas for tool parameters
      const weatherSchema = z.object({
        city: z.string().describe("City name"),
        country: z.string().describe("Country code").optional(),
      });

      const horoscopeSchema = z.object({
        sign: z.string().describe("An astrological sign like Taurus or Aquarius"),
      });

      // Map tool names to their schemas for runtime validation
      const toolSchemas = {
        get_weather: weatherSchema,
        get_horoscope: horoscopeSchema,
      };

      // Simple tool definitions - just name, description, and Zod schema!
      const tools = [
        {
          name: "get_weather",
          description: "Get current weather for a city",
          parameters: weatherSchema,
        },
        {
          name: "get_horoscope",
          description: "Get today's horoscope for an astrological sign",
          parameters: horoscopeSchema,
        },
      ];

      try {
        // Load model from provided file path with tools support enabled
        const modelId = await loadModel({
          modelSrc: QWEN3_1_7B_INST_Q4,
          modelType: "llm",
          modelConfig: {
            ctx_size: 4096,
            tools: true, // Enable tools support
          },
          onProgress: (progress) =>
            console.log(`Loading: ${progress.percentage.toFixed(1)}%`),
        });
        console.log(`✅ Model loaded successfully! Model ID: ${modelId}`);

        // Create conversation history
        const history = [
          {
            role: "system",
            content:
              "You are a helpful assistant that can use tools to get the weather and horoscope.",
          },
          {
            role: "user",
            content: "What's the weather in Tokyo and my horoscope for Aquarius?",
          },
        ];

        console.log("\n🤖 AI Response:");
        console.log("(Streaming with tool definitions in prompt)\n");

        const result = completion({ modelId, history, stream: true, tools });

        // Consume token stream
        const tokensTask = (async () => {
          for await (const token of result.tokenStream) {
            process.stdout.write(token);
          }
        })();

        // Consume tool call events
        const toolsTask = (async () => {
          for await (const evt of result.toolCallStream) {
            if (evt.type === "toolCall") {
              console.log(
                `\n\n→ Tool Call Detected: ${evt.call.name}(${JSON.stringify(evt.call.arguments)})`,
              );
              console.log(`   ID: ${evt.call.id}`);
            } else if (evt.type === "toolCallError") {
              console.warn(`\n⚠️  Tool Error: ${evt.error.message}`);
              console.warn(`   Code: ${evt.error.code}`);
            }
          }
        })();

        await Promise.all([tokensTask, toolsTask]);

        const stats: CompletionStats | undefined = await result.stats;
        const toolCalls: ToolCall[] = await result.toolCalls;

        console.log("\n\n📋 Parsed Tool Calls:");
        if (toolCalls.length > 0) {
          for (const call of toolCalls) {
            console.log(`  - ${call.name}(${JSON.stringify(call.arguments)})`);

            const schema = toolSchemas[call.name as keyof typeof toolSchemas];
            if (schema) {
              const validated = schema.safeParse(call.arguments);
              if (validated.success) {
                console.log(`    ✓ Arguments validated with Zod`);
              } else {
                console.log(`    ✗ Validation failed:`, validated.error);
              }
            }
          }
        } else {
          console.log("  No tool calls detected in response");
        }

        console.log("\n📊 Performance Stats:", stats);

        // Execute tool calls and send results back to the model
        if (toolCalls.length > 0) {
          console.log("\n\n🔧 Simulating Tool Execution...");

          // Simulate tool execution (in a real app, you'd call actual APIs)
          const toolResults = toolCalls.map((call) => {
            let result = "";
            if (call.name === "get_weather") {
              const args = call.arguments as { city: string; country?: string };
              result = `The weather in ${args.city} is sunny, 22°C with light clouds.`;
            } else if (call.name === "get_horoscope") {
              const args = call.arguments as { sign: string };
              result = `Horoscope for ${args.sign}: Today is a great day for new beginnings and creative endeavors!`;
            }
            console.log(`  ✓ ${call.name}: ${result}`);
            return { toolCallId: call.id, result };
          });

          // Add tool results to conversation history
          history.push({
            role: "assistant",
            content: await result.text,
          });

          // Add tool results as tool messages
          for (const toolResult of toolResults) {
            history.push({
              role: "tool",
              content: toolResult.result,
            });
          }

          // Send follow-up question with tool results
          console.log("\n\n🤖 Follow-up Response with Tool Results:");
          const followUpResult = completion({
            modelId,
            history,
            stream: true,
            tools,
          });

          for await (const token of followUpResult.tokenStream) {
            process.stdout.write(token);
          }

          const followUpStats = await followUpResult.stats;
          console.log("\n\n📊 Follow-up Stats:", followUpStats);
        }

        console.log("\n\n🎉 Completed!");
        await unloadModel({ modelId, clearStorage: false });
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### MCP

You create and manage the MCP client, connect it to one or more MCP servers, and pass it to `completion()`. The following script shows how to attach an MCP client to `completion()` so the model can call a web search tool and then continue with the results:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/mcp-websearch.js title="completion-mcp.js" lineNumbers
      /**
       * MCP DuckDuckGo Search Example
       *
       * A web search example using DuckDuckGo - no API key required!
       * The server provides tools to search the web and get answers.
       *
       * Prerequisites:
       * - Install MCP SDK: bun add @modelcontextprotocol/sdk
       *
       * Run with: bun run examples/mcp-websearch.ts
       */
      import { completion, loadModel, unloadModel, QWEN3_1_7B_INST_Q4, } from "@/index";
      // MCP SDK is a user-installed optional dependency
      // Install with: bun add @modelcontextprotocol/sdk
      // eslint-disable-next-line import/no-unresolved
      import { Client } from "@modelcontextprotocol/sdk/client/index.js";
      // eslint-disable-next-line import/no-unresolved
      import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
      function parseSearchResults(mcpResult) {
          try {
              const result = mcpResult;
              // Extract text content from MCP response
              const textContent = result.content?.find((c) => c.type === "text");
              if (!textContent?.text) {
                  return JSON.stringify(mcpResult);
              }
              // Parse the JSON array of search results
              const rawResults = JSON.parse(textContent.text);
              // Extract just the useful fields (title, url, snippet)
              const cleanResults = rawResults.slice(0, 5).map((r) => ({
                  title: r.title ?? "Unknown",
                  url: r.url ?? "",
                  snippet: r.snippet ?? "",
              }));
              // Format as concise text for LLM
              return cleanResults
                  .map((r, i) => `[${i + 1}] ${r.title}\n    URL: ${r.url}\n    ${r.snippet}`)
                  .join("\n\n");
          }
          catch {
              // If parsing fails, return a truncated version
              const str = typeof mcpResult === "string" ? mcpResult : JSON.stringify(mcpResult);
              return str.slice(0, 2000);
          }
      }
      let mcpClient = null;
      try {
          console.log("🦆 MCP DuckDuckGo Search Example\n");
          // ============================================================
          // STEP 1: Connect to DuckDuckGo MCP server
          // ============================================================
          console.log("1️⃣  Starting DuckDuckGo MCP server...");
          mcpClient = new Client({
              name: "qvac-ddg-example",
              version: "1.0.0",
          });
          const transport = new StdioClientTransport({
              command: "npx",
              args: ["-y", "@oevortex/ddg_search"],
          });
          await mcpClient.connect(transport);
          console.log("   ✓ MCP server connected\n");
          // ============================================================
          // STEP 2: Load model
          // ============================================================
          console.log("2️⃣  Loading model...");
          const modelId = await loadModel({
              modelSrc: QWEN3_1_7B_INST_Q4,
              modelType: "llm",
              modelConfig: {
                  ctx_size: 4096,
                  tools: true,
              },
              onProgress: (progress) => process.stdout.write(`\r   Loading: ${progress.percentage.toFixed(1)}%`),
          });
          console.log(`\n   ✓ Model loaded\n`);
          // ============================================================
          // STEP 3: Ask AI to search the web (with MCP client)
          // ============================================================
          const history = [
              {
                  role: "system",
                  content: `You are a helpful assistant with access to web search.
      Use the search tool when you need current information.
      Always cite your sources with the URL.`,
              },
              {
                  role: "user",
                  content: "What is the current weather in New York City?",
              },
          ];
          console.log("3️⃣  Asking AI to search the web...\n");
          console.log("🤖 AI Response:");
          // Pass MCP client directly to completion - tools are adapted internally!
          const result = completion({
              modelId,
              history,
              stream: true,
              mcp: [{ client: mcpClient, includeResources: false }],
          });
          for await (const token of result.tokenStream) {
              process.stdout.write(token);
          }
          const toolCalls = await result.toolCalls;
          console.log("\n");
          // ============================================================
          // STEP 4: Execute tool calls using call() - automatic MCP routing!
          // ============================================================
          if (toolCalls.length > 0) {
              console.log("4️⃣  Executing search...\n");
              const toolResults = [];
              for (const toolCall of toolCalls) {
                  console.log(`🔍 ${toolCall.name}(${JSON.stringify(toolCall.arguments)})`);
                  if (!toolCall.invoke) {
                      console.log(`   ⚠️ No handler found for tool "${toolCall.name}"`);
                      continue;
                  }
                  // Use invoke() - automatically routes to the correct MCP client!
                  const mcpResult = await toolCall.invoke();
                  // Parse and clean up the search results
                  const cleanResult = parseSearchResults(mcpResult);
                  console.log(`   ✓ Got search results:`);
                  console.log(cleanResult
                      .split("\n")
                      .map((l) => `      ${l}`)
                      .join("\n"));
                  console.log();
                  toolResults.push({ id: toolCall.id, result: cleanResult });
              }
              // ============================================================
              // STEP 5: Continue with search results
              // ============================================================
              console.log("5️⃣  Getting AI response with search results...\n");
              history.push({
                  role: "assistant",
                  content: await result.text,
              });
              for (const tr of toolResults) {
                  history.push({
                      role: "tool",
                      content: tr.result,
                  });
              }
              console.log("🤖 Final Response:");
              const finalResult = completion({
                  modelId,
                  history,
                  stream: true,
                  mcp: [{ client: mcpClient, includeResources: false }],
              });
              for await (const token of finalResult.tokenStream) {
                  process.stdout.write(token);
              }
              console.log("\n");
          }
          // ============================================================
          // Cleanup
          // ============================================================
          console.log("6️⃣  Cleaning up...");
          await unloadModel({ modelId, clearStorage: false });
          console.log("   ✓ Done\n");
          console.log("🎉 Example completed!");
          process.exit(0);
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      finally {
          if (mcpClient) {
              try {
                  await mcpClient.close();
              }
              catch {
                  // Ignore close errors
              }
          }
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/mcp-websearch.ts title="completion-mcp.ts" lineNumbers
      /**
       * MCP DuckDuckGo Search Example
       *
       * A web search example using DuckDuckGo - no API key required!
       * The server provides tools to search the web and get answers.
       *
       * Prerequisites:
       * - Install MCP SDK: bun add @modelcontextprotocol/sdk
       *
       * Run with: bun run examples/mcp-websearch.ts
       */

      import {
        completion,
        loadModel,
        unloadModel,
        QWEN3_1_7B_INST_Q4,
      } from "@/index";

      // MCP SDK is a user-installed optional dependency
      // Install with: bun add @modelcontextprotocol/sdk
      // eslint-disable-next-line import/no-unresolved
      import { Client } from "@modelcontextprotocol/sdk/client/index.js";
      // eslint-disable-next-line import/no-unresolved
      import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

      // ============================================================
      // Helper: Parse MCP search results into clean format for LLM
      // ============================================================
      type SearchResult = {
        title: string;
        url: string;
        snippet: string;
      };

      type McpContent = {
        type: string;
        text?: string;
      };

      type McpToolResult = {
        content: McpContent[];
      };

      type RawSearchResult = {
        title?: string;
        url?: string;
        snippet?: string;
        Description?: string;
      };

      function parseSearchResults(mcpResult: unknown): string {
        try {
          const result = mcpResult as McpToolResult;

          // Extract text content from MCP response
          const textContent = result.content?.find((c) => c.type === "text");
          if (!textContent?.text) {
            return JSON.stringify(mcpResult);
          }

          // Parse the JSON array of search results
          const rawResults = JSON.parse(textContent.text) as RawSearchResult[];

          // Extract just the useful fields (title, url, snippet)
          const cleanResults: SearchResult[] = rawResults.slice(0, 5).map((r) => ({
            title: r.title ?? "Unknown",
            url: r.url ?? "",
            snippet: r.snippet ?? "",
          }));

          // Format as concise text for LLM
          return cleanResults
            .map(
              (r, i) => `[${i + 1}] ${r.title}\n    URL: ${r.url}\n    ${r.snippet}`,
            )
            .join("\n\n");
        } catch {
          // If parsing fails, return a truncated version
          const str =
            typeof mcpResult === "string" ? mcpResult : JSON.stringify(mcpResult);
          return str.slice(0, 2000);
        }
      }

      let mcpClient: Client | null = null;

      try {
        console.log("🦆 MCP DuckDuckGo Search Example\n");

        // ============================================================
        // STEP 1: Connect to DuckDuckGo MCP server
        // ============================================================
        console.log("1️⃣  Starting DuckDuckGo MCP server...");

        mcpClient = new Client({
          name: "qvac-ddg-example",
          version: "1.0.0",
        });

        const transport = new StdioClientTransport({
          command: "npx",
          args: ["-y", "@oevortex/ddg_search"],
        });

        await mcpClient.connect(transport);
        console.log("   ✓ MCP server connected\n");

        // ============================================================
        // STEP 2: Load model
        // ============================================================
        console.log("2️⃣  Loading model...");
        const modelId = await loadModel({
          modelSrc: QWEN3_1_7B_INST_Q4,
          modelType: "llm",
          modelConfig: {
            ctx_size: 4096,
            tools: true,
          },
          onProgress: (progress) =>
            process.stdout.write(`\r   Loading: ${progress.percentage.toFixed(1)}%`),
        });
        console.log(`\n   ✓ Model loaded\n`);

        // ============================================================
        // STEP 3: Ask AI to search the web (with MCP client)
        // ============================================================
        const history = [
          {
            role: "system",
            content: `You are a helpful assistant with access to web search.
      Use the search tool when you need current information.
      Always cite your sources with the URL.`,
          },
          {
            role: "user",
            content: "What is the current weather in New York City?",
          },
        ];

        console.log("3️⃣  Asking AI to search the web...\n");
        console.log("🤖 AI Response:");

        // Pass MCP client directly to completion - tools are adapted internally!
        const result = completion({
          modelId,
          history,
          stream: true,
          mcp: [{ client: mcpClient, includeResources: false }],
        });

        for await (const token of result.tokenStream) {
          process.stdout.write(token);
        }

        const toolCalls = await result.toolCalls;
        console.log("\n");

        // ============================================================
        // STEP 4: Execute tool calls using call() - automatic MCP routing!
        // ============================================================
        if (toolCalls.length > 0) {
          console.log("4️⃣  Executing search...\n");

          const toolResults: Array<{ id: string; result: string }> = [];

          for (const toolCall of toolCalls) {
            console.log(`🔍 ${toolCall.name}(${JSON.stringify(toolCall.arguments)})`);

            if (!toolCall.invoke) {
              console.log(`   ⚠️ No handler found for tool "${toolCall.name}"`);
              continue;
            }

            // Use invoke() - automatically routes to the correct MCP client!
            const mcpResult = await toolCall.invoke();

            // Parse and clean up the search results
            const cleanResult = parseSearchResults(mcpResult);

            console.log(`   ✓ Got search results:`);
            console.log(
              cleanResult
                .split("\n")
                .map((l) => `      ${l}`)
                .join("\n"),
            );
            console.log();

            toolResults.push({ id: toolCall.id, result: cleanResult });
          }

          // ============================================================
          // STEP 5: Continue with search results
          // ============================================================
          console.log("5️⃣  Getting AI response with search results...\n");

          history.push({
            role: "assistant",
            content: await result.text,
          });

          for (const tr of toolResults) {
            history.push({
              role: "tool",
              content: tr.result,
            });
          }

          console.log("🤖 Final Response:");
          const finalResult = completion({
            modelId,
            history,
            stream: true,
            mcp: [{ client: mcpClient, includeResources: false }],
          });

          for await (const token of finalResult.tokenStream) {
            process.stdout.write(token);
          }
          console.log("\n");
        }

        // ============================================================
        // Cleanup
        // ============================================================
        console.log("6️⃣  Cleaning up...");
        await unloadModel({ modelId, clearStorage: false });
        console.log("   ✓ Done\n");

        console.log("🎉 Example completed!");
        process.exit(0);
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      } finally {
        if (mcpClient) {
          try {
            await mcpClient.close();
          } catch {
            // Ignore close errors
          }
        }
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### KVcache

The following script enables `kvCache: true` to speed up follow-up turns, and then compares it with `kvCache: false` on the same history:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/kv-cache-example.js title="completion-kv-cache.js" lineNumbers
      import { completion, LLAMA_3_2_1B_INST_Q4_0, loadModel, unloadModel, VERBOSITY, } from "@qvac/sdk";
      try {
          // Load the model
          const modelId = await loadModel({
              modelSrc: LLAMA_3_2_1B_INST_Q4_0,
              modelType: "llm",
              modelConfig: {
                  device: "gpu",
                  ctx_size: 2048,
                  verbosity: VERBOSITY.ERROR,
              },
          });
          console.log("🧠 Testing KV Cache functionality...\n");
          // First conversation with cache enabled
          console.log("📝 First conversation (building cache):");
          const history1 = [
              { role: "user", content: "What is the capital of France?" },
          ];
          const result1 = completion({
              modelId,
              history: history1,
              stream: true,
              kvCache: true,
          }); // kvCache = true
          let response1 = "";
          for await (const token of result1.tokenStream) {
              response1 += token;
              process.stdout.write(token);
          }
          const stats1 = await result1.stats;
          console.log(`\n⏱️  First completion stats: ${JSON.stringify(stats1)}\n`);
          // Continue conversation (should reuse cache from previous conversation)
          console.log("🔄 Continuing conversation (reusing cache):");
          const history2 = [
              { role: "user", content: "What is the capital of France?" },
              { role: "assistant", content: response1.trim() },
              { role: "user", content: "What about Germany?" },
          ];
          // This should:
          // 1. Find existing cache from [user: "What is the capital of France?"] (history minus last message)
          // 2. Load that cache and process the new "What about Germany?" message
          // 3. Save the updated cache and rename it to include all messages
          const result2 = completion({
              modelId,
              history: history2,
              stream: true,
              kvCache: true,
          }); // kvCache = true
          for await (const token of result2.tokenStream) {
              process.stdout.write(token);
          }
          const stats2 = await result2.stats;
          console.log(`\n⏱️  Second completion stats: ${JSON.stringify(stats2)}\n`);
          // Compare with non-cached version
          console.log("🚀 Same conversation without cache:");
          const result3 = completion({
              modelId,
              history: history2,
              stream: true,
              kvCache: false,
          }); // kvCache = false
          for await (const token of result3.tokenStream) {
              process.stdout.write(token);
          }
          const stats3 = await result3.stats;
          console.log(`\n⏱️  Non-cached completion stats: ${JSON.stringify(stats3)}\n`);
          console.log("✅ KV Cache test completed!");
          await unloadModel({ modelId, clearStorage: false });
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/kv-cache-example.ts title="completion-kv-cache.ts" lineNumbers
      import {
        completion,
        LLAMA_3_2_1B_INST_Q4_0,
        loadModel,
        unloadModel,
        VERBOSITY,
      } from "@qvac/sdk";

      try {
        // Load the model
        const modelId = await loadModel({
          modelSrc: LLAMA_3_2_1B_INST_Q4_0,
          modelType: "llm",
          modelConfig: {
            device: "gpu",
            ctx_size: 2048,
            verbosity: VERBOSITY.ERROR,
          },
        });

        console.log("🧠 Testing KV Cache functionality...\n");

        // First conversation with cache enabled
        console.log("📝 First conversation (building cache):");
        const history1 = [
          { role: "user", content: "What is the capital of France?" },
        ];

        const result1 = completion({
          modelId,
          history: history1,
          stream: true,
          kvCache: true,
        }); // kvCache = true

        let response1 = "";
        for await (const token of result1.tokenStream) {
          response1 += token;
          process.stdout.write(token);
        }

        const stats1 = await result1.stats;
        console.log(`\n⏱️  First completion stats: ${JSON.stringify(stats1)}\n`);

        // Continue conversation (should reuse cache from previous conversation)
        console.log("🔄 Continuing conversation (reusing cache):");
        const history2 = [
          { role: "user", content: "What is the capital of France?" },
          { role: "assistant", content: response1.trim() },
          { role: "user", content: "What about Germany?" },
        ];

        // This should:
        // 1. Find existing cache from [user: "What is the capital of France?"] (history minus last message)
        // 2. Load that cache and process the new "What about Germany?" message
        // 3. Save the updated cache and rename it to include all messages
        const result2 = completion({
          modelId,
          history: history2,
          stream: true,
          kvCache: true,
        }); // kvCache = true

        for await (const token of result2.tokenStream) {
          process.stdout.write(token);
        }

        const stats2 = await result2.stats;
        console.log(`\n⏱️  Second completion stats: ${JSON.stringify(stats2)}\n`);

        // Compare with non-cached version
        console.log("🚀 Same conversation without cache:");
        const result3 = completion({
          modelId,
          history: history2,
          stream: true,
          kvCache: false,
        }); // kvCache = false

        for await (const token of result3.tokenStream) {
          process.stdout.write(token);
        }

        const stats3 = await result3.stats;
        console.log(`\n⏱️  Non-cached completion stats: ${JSON.stringify(stats3)}\n`);

        console.log("✅ KV Cache test completed!");

        await unloadModel({ modelId, clearStorage: false });
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Fine-tuning (/sdk/examples/ai-tasks/fine-tuning)


## Overview

Fine-tuning trains a [LoRA](https://arxiv.org/abs/2106.09685) (Low-Rank Adaptation) adapter on top of an LLM base model, to be used at inference time with [completion](/sdk/examples/ai-tasks/completion).

Load any supported LLM using `modelType: "llm"`. Then call `finetune()` with the dataset and training settings. Training can be done in two modes:

* [SFT](#sft): chat-based; enable with `assistantLossOnly: true`.
* [Causal](#causal): raw text; default (`assistantLossOnly: false`).

The output is a small `.gguf` adapter file that you can pass to `completion()` via `modelConfig.lora`.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/sdk/api/loadModel)
2. [`finetune()`](/sdk/api/finetune)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Models

You can fine-tune any [`llama.cpp`](https://github.com/ggml-org/llama.cpp)-compatible text-generation/chat model. Base model file format: `*.gguf`.

For models available as constants, see [SDK — Models](/sdk/getting-started#models).

## Training

### SFT

Supervised fine-tuning (SFT) teaches the model how to respond to prompts. Use it for chat tuning, instruction following, or any task where you want to shape assistant responses.

**Dataset format:** JSONL where each line is a JSON object with a `messages` array. Supported roles: `system`, `user`, `assistant`, and `tool`. Example:

```jsonl
{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"},{"role":"assistant","content":"2+2 equals 4."}]}
{"messages":[{"role":"user","content":"What is the capital of France?"},{"role":"assistant","content":"The capital of France is Paris."}]}
```

### Causal

Causal fine-tuning adapts the model to a domain by training on raw text. Use it for domain adaptation, style transfer, or tasks where you want the model to better reflect specialized vocabulary, patterns, or tone.

**Dataset format:** plain text file. Example:

```
This is sample training text.
Another paragraph of content.
```

## Example

The following script loads an LLM, runs fine-tuning on a chat dataset with a separate eval file, and optionally demonstrates pause/resume when invoked with `--pause-resume`:

<WrapCode>
  ```ts title="llamacpp-finetune.ts" lineNumbers
  import {
    finetune,
    loadModel,
    QWEN3_600M_INST_Q4,
    unloadModel,
    type FinetuneHandle,
    type FinetuneResult,
    type FinetuneRunParams,
  } from "@qvac/sdk";

  const pauseResumeEnabled = process.argv.includes("--pause-resume");

  let modelId: string | undefined;
  let exitCode = 0;

  async function readProgress(
    handle: FinetuneHandle,
    onTick: (globalSteps: number) => void,
  ) {
    for await (const tick of handle.progressStream) {
      const phase = tick.is_train ? "train" : "val";
      console.log(
        `epoch=${tick.current_epoch + 1} step=${tick.global_steps} batch=${tick.current_batch}/${tick.total_batches} ${phase} loss=${tick.loss?.toFixed(4)} acc=${tick.accuracy?.toFixed(4)} eta=${Math.round(tick.eta_ms / 1000)}s`,
      );

      onTick(tick.global_steps);
    }
  }

  try {
    modelId = await loadModel({
      modelSrc: QWEN3_600M_INST_Q4,
      modelType: "llm",
      modelConfig: {
        device: "gpu",
        ctx_size: 512,
      },
    });

    console.log(`Model loaded with ID: ${modelId}`);
    const loadedModelId = modelId;

    const finetuneParams: FinetuneRunParams = {
      modelId: loadedModelId,
      options: {
        trainDatasetDir: "./examples/finetune/input/small_train_HF.jsonl",
        validation: {
          type: "dataset",
          path: "./examples/finetune/input/small_eval_HF.jsonl",
        },
        numberOfEpochs: 2,
        learningRate: 1e-4,
        lrMin: 1e-8,
        loraModules: "attn_q,attn_k,attn_v,attn_o,ffn_gate,ffn_up,ffn_down",
        assistantLossOnly: true,
        checkpointSaveSteps: 2,
        checkpointSaveDir: "./examples/finetune/results/checkpoints",
        outputParametersDir: "./examples/finetune/results",
      },
    };

    const handle = finetune(finetuneParams);
    let pauseRequested = false;
    let pauseResultPromise: Promise<FinetuneResult> | undefined;

    const progressTask = readProgress(handle, (globalSteps) => {
      if (pauseResumeEnabled && !pauseRequested && globalSteps >= 10) {
        pauseRequested = true;
        console.log("Requesting a pause so the run can be resumed...");
        pauseResultPromise = finetune({
          modelId: loadedModelId,
          operation: "pause",
        });
      }
    });

    const initialResult = await handle.result;
    await progressTask;

    if (pauseResultPromise) {
      await pauseResultPromise;
    }

    console.log("Initial finetune result:", initialResult);

    if (pauseResumeEnabled && initialResult.status === "PAUSED") {
      console.log("Resuming from the saved checkpoint...");

      const resumedHandle = finetune({
        ...finetuneParams,
        operation: "resume",
      });
      const resumedProgressTask = readProgress(resumedHandle, function () {});

      const resumedResult = await resumedHandle.result;
      await resumedProgressTask;

      console.log("Resumed finetune result:", resumedResult);
    }
  } catch (error) {
    console.error("Error:", error);
    exitCode = 1;
  } finally {
    if (modelId) {
      await unloadModel({ modelId, clearStorage: false });
    }
  }

  process.exit(exitCode);
  ```
</WrapCode>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Image generation (/sdk/examples/ai-tasks/image-generation)


## Overview

Image generation uses [`qvac-ext-stable-diffusion.cpp`](https://github.com/tetherto/qvac-ext-stable-diffusion.cpp) as the inference engine. Load a supported model using `modelType: "diffusion"`. Then, provide a text `prompt` describing the image to generate.

`diffusion()` returns one or more PNG images as `Uint8Array` buffers. Use `progressStream` to track generation progress step-by-step.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/sdk/api/loadModel)
2. [`diffusion()`](/sdk/api/diffusion)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Models

Supported model families and their file layouts:

* **SD1.x, SD2.x**: single all-in-one `*.gguf` file. No companion files needed.
* **SDXL, SD3**: may require separate CLIP/T5 text encoder files (`clipLModelSrc`, `clipGModelSrc`, `t5XxlModelSrc`) in `modelConfig` depending on the model variant.
* **FLUX.2-klein**: split layout — diffusion model `*.gguf` + LLM text encoder `*.gguf` (via `llmModelSrc`) + VAE `*.safetensors` (via `vaeModelSrc`).

For models available as constants, see [SDK — Models](/sdk/getting-started#models).

## Examples

### Stable Diffusion

The following script shows a minimal text-to-image generation example using a single all-in-one SD 2.1 model:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/diffusion-simple.js title="diffusion-simple.js" lineNumbers
      import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";
      import fs from "fs";
      // Minimal diffusion example — single GGUF model, no companion files needed.
      // Works with SD 1.x / 2.x all-in-one models.
      const modelSrc = process.argv[2] || SD_V2_1_1B_Q8_0;
      const prompt = process.argv[3] || "a photo of a cat sitting on a windowsill";
      const modelId = await loadModel({
          modelSrc,
          modelType: "diffusion",
          modelConfig: { prediction: "v" },
      });
      const { outputs } = diffusion({ modelId, prompt });
      const buffers = await outputs;
      fs.writeFileSync("output.png", buffers[0]);
      console.log("Saved: output.png");
      await unloadModel({ modelId, clearStorage: false });
      process.exit(0);
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/diffusion-simple.ts title="diffusion-simple.ts" lineNumbers
      import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";
      import fs from "fs";

      // Minimal diffusion example — single GGUF model, no companion files needed.
      // Works with SD 1.x / 2.x all-in-one models.
      const modelSrc = process.argv[2] || SD_V2_1_1B_Q8_0;
      const prompt = process.argv[3] || "a photo of a cat sitting on a windowsill";

      const modelId = await loadModel({
        modelSrc,
        modelType: "diffusion",
        modelConfig: { prediction: "v" },
      });

      const { outputs } = diffusion({ modelId, prompt });
      const buffers = await outputs;

      fs.writeFileSync("output.png", buffers[0]!);
      console.log("Saved: output.png");

      await unloadModel({ modelId, clearStorage: false });
      process.exit(0);
      ```
    </WrapCode>
  </Tab>
</Tabs>

### FLUX.2-klein

The following script shows text-to-image generation using FLUX.2-klein with its split-layout model (separate diffusion model, LLM text encoder, and VAE):

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/diffusion-flux2-klein.js title="diffusion-flux2-klein.js" lineNumbers
      import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M } from "@qvac/sdk";
      import fs from "fs";
      import path from "path";
      // FLUX.2 [klein] uses a split-layout: separate diffusion model + LLM text encoder + VAE
      const diffusionModelSrc = process.argv[2] || FLUX_2_KLEIN_4B_Q4_0;
      const llmModelSrc = process.argv[3] || QWEN3_4B_Q4_K_M;
      const vaeModelSrc = process.argv[4] || FLUX_2_KLEIN_4B_VAE;
      const prompt = process.argv[5] || "a futuristic city at sunset, photorealistic";
      const outputDir = process.argv[6] || ".";
      console.log("Loading FLUX.2 [klein] split-layout model...");
      const modelId = await loadModel({
          modelSrc: diffusionModelSrc,
          modelType: "diffusion",
          modelConfig: {
              device: "gpu",
              threads: 4,
              llmModelSrc,
              vaeModelSrc,
          },
          onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
      });
      console.log(`Model loaded: ${modelId}`);
      console.log(`\nGenerating: "${prompt}"`);
      const { progressStream, outputs, stats } = diffusion({
          modelId,
          prompt,
          width: 512,
          height: 512,
          steps: 20,
          guidance: 3.5,
          seed: -1,
      });
      for await (const { step, totalSteps } of progressStream) {
          process.stdout.write(`\rStep ${step}/${totalSteps}`);
      }
      console.log();
      const buffers = await outputs;
      for (let i = 0; i < buffers.length; i++) {
          const outputPath = path.join(outputDir, `flux2_${i}.png`);
          fs.writeFileSync(outputPath, buffers[i]);
          console.log(`Saved: ${outputPath}`);
      }
      console.log("\nStats:", await stats);
      await unloadModel({ modelId, clearStorage: false });
      console.log("Done.");
      process.exit(0);
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/diffusion-flux2-klein.ts title="diffusion-flux2-klein.ts" lineNumbers
      import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M } from "@qvac/sdk";
      import fs from "fs";
      import path from "path";

      // FLUX.2 [klein] uses a split-layout: separate diffusion model + LLM text encoder + VAE
      const diffusionModelSrc = process.argv[2] || FLUX_2_KLEIN_4B_Q4_0;
      const llmModelSrc = process.argv[3] || QWEN3_4B_Q4_K_M;
      const vaeModelSrc = process.argv[4] || FLUX_2_KLEIN_4B_VAE;
      const prompt = process.argv[5] || "a futuristic city at sunset, photorealistic";
      const outputDir = process.argv[6] || ".";

      console.log("Loading FLUX.2 [klein] split-layout model...");

      const modelId = await loadModel({
        modelSrc: diffusionModelSrc,
        modelType: "diffusion",
        modelConfig: {
          device: "gpu",
          threads: 4,
          llmModelSrc,
          vaeModelSrc,
        },
        onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
      });
      console.log(`Model loaded: ${modelId}`);

      console.log(`\nGenerating: "${prompt}"`);

      const { progressStream, outputs, stats } = diffusion({
        modelId,
        prompt,
        width: 512,
        height: 512,
        steps: 20,
        guidance: 3.5,
        seed: -1,
      });

      for await (const { step, totalSteps } of progressStream) {
        process.stdout.write(`\rStep ${step}/${totalSteps}`);
      }
      console.log();

      const buffers = await outputs;
      for (let i = 0; i < buffers.length; i++) {
        const outputPath = path.join(outputDir, `flux2_${i}.png`);
        fs.writeFileSync(outputPath, buffers[i]!);
        console.log(`Saved: ${outputPath}`);
      }

      console.log("\nStats:", await stats);
      await unloadModel({ modelId, clearStorage: false });
      console.log("Done.");
      process.exit(0);
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Multimodal (/sdk/examples/ai-tasks/multimodal)


## Overview

[Completion](/sdk/examples/ai-tasks/completion) supports multimodal prompts. Multimodal lets you attach media files to inputs for [`completion()`](/sdk/api/completion). You can include multiple attachments in the same request (e.g., to compare two images).

Compared to text-only completion inference, the key differences are:

* You must load a multimodal-capable LLM **and** its matching `projectionModelSrc` via [`loadModel()`](/sdk/api/loadModel).
* Your `history` messages can include `attachments: [{ path: "/path/to/image.jpg" }]` (the file must exist on disk).
* Aside from attachments, you still call `completion({ modelId, history, stream })` the same way and consume the same streaming output.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/sdk/api/loadModel)
2. [`completion()`](/sdk/api/completion)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Models

You should load two models:

* a `llama.cpp`-compatible multimodal-capable LLM. Model file format: `*.gguf`; and
* a matching projection model (`mmproj-*.gguf`). Model file format: `*.gguf`.

Recommended pairs:

* SmolVLM2 + mmproj-\*
* Qwen2.5-Omni + mmproj-\* (or Qwen3-VL + mmproj-\*)

For models available as constants, see [SDK — Models](/sdk/getting-started#models).

## Example

The following script shows an example of multimodal completion with one image (and optionally two):

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/llamacpp-multimodal.js title="multimodal.js" lineNumbers
      import { completion, loadModel, SMOLVLM2_500M_MULTIMODAL_Q8_0, MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0, unloadModel, } from "@qvac/sdk";
      if (process.argv.length < 3) {
          console.error(`Specify an image file path as the first argument and a second image file path as the second (optional) argument`);
          process.exit(1);
      }
      try {
          // const modelPath = args[modelIndex + 1]!;
          const imageFilePath = process.argv[2];
          // Load the main model with projection in a single step
          const modelId = await loadModel({
              modelSrc: SMOLVLM2_500M_MULTIMODAL_Q8_0,
              modelType: "llm",
              modelConfig: {
                  ctx_size: 1024,
                  projectionModelSrc: MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0,
              },
              onProgress: (progress) => {
                  console.log(`Loading: ${progress.percentage.toFixed(1)}%`);
              },
          });
          //Using one particular media
          const history = [
              {
                  role: "user",
                  content: "What's in this image?",
                  attachments: [{ path: imageFilePath }],
              },
          ];
          const result = completion({ modelId, history, stream: true });
          for await (const token of result.tokenStream) {
              process.stdout.write(token);
          }
          const stats = await result.stats;
          console.log("\n📊 Performance Stats:", stats);
          console.log("--------------------------------");
          //Using multiple media
          if (process.argv.length < 4) {
              console.log(`Only one image provided, terminating`);
              process.exit(0);
          }
          const imageFilePath2 = process.argv[3];
          const history2 = [
              {
                  role: "user",
                  content: "Compare the two newspaper articles",
                  attachments: [{ path: imageFilePath }, { path: imageFilePath2 }],
              },
          ];
          const result2 = completion({ modelId, history: history2, stream: true });
          for await (const token of result2.tokenStream) {
              process.stdout.write(token);
          }
          const stats2 = await result2.stats;
          console.log("\n📊 Performance Stats:", stats2);
          console.log("--------------------------------");
          await unloadModel({ modelId, clearStorage: false });
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/llamacpp-multimodal.ts title="multimodal.ts" lineNumbers
      import {
        completion,
        loadModel,
        SMOLVLM2_500M_MULTIMODAL_Q8_0,
        MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0,
        unloadModel,
      } from "@qvac/sdk";

      if (process.argv.length < 3) {
        console.error(
          `Specify an image file path as the first argument and a second image file path as the second (optional) argument`,
        );
        process.exit(1);
      }

      try {
        // const modelPath = args[modelIndex + 1]!;
        const imageFilePath = process.argv[2]!;

        // Load the main model with projection in a single step
        const modelId = await loadModel({
          modelSrc: SMOLVLM2_500M_MULTIMODAL_Q8_0,
          modelType: "llm",
          modelConfig: {
            ctx_size: 1024,
            projectionModelSrc: MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0,
          },
          onProgress: (progress) => {
            console.log(`Loading: ${progress.percentage.toFixed(1)}%`);
          },
        });

        //Using one particular media
        const history = [
          {
            role: "user",
            content: "What's in this image?",
            attachments: [{ path: imageFilePath }],
          },
        ];
        const result = completion({ modelId, history, stream: true });

        for await (const token of result.tokenStream) {
          process.stdout.write(token);
        }

        const stats = await result.stats;

        console.log("\n📊 Performance Stats:", stats);

        console.log("--------------------------------");

        //Using multiple media
        if (process.argv.length < 4) {
          console.log(`Only one image provided, terminating`);
          process.exit(0);
        }

        const imageFilePath2 = process.argv[3]!;

        const history2 = [
          {
            role: "user",
            content: "Compare the two newspaper articles",
            attachments: [{ path: imageFilePath }, { path: imageFilePath2 }],
          },
        ];

        const result2 = completion({ modelId, history: history2, stream: true });

        for await (const token of result2.tokenStream) {
          process.stdout.write(token);
        }

        const stats2 = await result2.stats;

        console.log("\n📊 Performance Stats:", stats2);

        console.log("--------------------------------");

        await unloadModel({ modelId, clearStorage: false });
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# OCR (/sdk/examples/ai-tasks/ocr)


## Overview

OCR uses **ONNX runtime** as the inference engine. It runs a two-stage pipeline and requires compatible models for both stages:

* **Text detection**: locate text regions in an image
* **Text recognition**: decode characters in detected regions

Load supported models using `modelType: "ocr"`. Then, provide an image as either a file path (string) or an in-memory buffer. Each OCR block contains extracted text and may include `bbox` (bounding box coordinates) and `confidence` (recognition score).

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/sdk/api/loadModel)
2. [`ocr()`](/sdk/api/ocr)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Models

You can load any ONNX Runtime-compatible OCR pipeline. Required files: `detector_craft.onnx` + `recognizer_<lang>.onnx` (file format: `*.onnx`).

For models available as constants, see [SDK — Models](/sdk/getting-started#models).

## Example

The following script shows an example of OCR:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/ocr-fasttext.js title="ocr.js" lineNumbers
      /**
       * OCR example using the QVAC SDK.
       *
       * Usage:
       *   bun examples/ocr-fasttext.ts [path-to-image]
       *
       * This example requires a test image (default: examples/image/basic_test.bmp).
       * Sample images are available in the QVAC source repository, but not included in the published npm package.
       * Pass a custom image path, or download the default image into examples/image/:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/image/basic_test.bmp
       */
      import { close, loadModel, ocr, OCR_LATIN_RECOGNIZER_1, unloadModel, } from "@qvac/sdk";
      import path from "path";
      import { fileURLToPath } from "url";
      const __dirname = path.dirname(fileURLToPath(import.meta.url));
      const imagePath = process.argv[2] || path.join(__dirname, "image/basic_test.bmp");
      try {
          console.log("🚀 Loading OCR model...");
          const modelId = await loadModel({
              modelSrc: OCR_LATIN_RECOGNIZER_1,
              modelType: "ocr",
              modelConfig: {
                  langList: ["en"],
                  useGPU: true,
                  timeout: 30000,
                  magRatio: 1.5,
                  defaultRotationAngles: [90, 180, 270],
                  contrastRetry: false,
                  lowConfidenceThreshold: 0.5,
                  recognizerBatchSize: 1,
              },
          });
          console.log(`✅ Model loaded successfully! Model ID: ${modelId}`);
          console.log(`\n🔍 Running OCR on: ${imagePath}`);
          const { blocks } = ocr({
              modelId,
              image: imagePath,
              options: {
                  paragraph: false,
              },
          });
          const result = await blocks;
          console.log("\n📝 OCR Results:");
          console.log("================================");
          for (const block of result) {
              console.log(`\n📄 Text: ${block.text}`);
              if (block.bbox) {
                  console.log(`   📍 BBox: [${block.bbox.join(", ")}]`);
              }
              if (block.confidence !== undefined) {
                  console.log(`   ✓ Confidence: ${block.confidence}`);
              }
          }
          console.log("\n================================");
          console.log("\n🔄 Unloading model...");
          await unloadModel({ modelId, clearStorage: false });
          console.log("✅ Model unloaded successfully.");
          process.exit(0);
      }
      catch (error) {
          console.error("❌ Error during OCR processing:", error);
          await close();
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/ocr-fasttext.ts title="ocr.ts" lineNumbers
      /**
       * OCR example using the QVAC SDK.
       *
       * Usage:
       *   bun examples/ocr-fasttext.ts [path-to-image]
       *
       * This example requires a test image (default: examples/image/basic_test.bmp).
       * Sample images are available in the QVAC source repository, but not included in the published npm package.
       * Pass a custom image path, or download the default image into examples/image/:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/image/basic_test.bmp
       */
      import {
        close,
        loadModel,
        ocr,
        OCR_LATIN_RECOGNIZER_1,
        unloadModel,
      } from "@qvac/sdk";
      import path from "path";
      import { fileURLToPath } from "url";

      const __dirname = path.dirname(fileURLToPath(import.meta.url));
      const imagePath =
        process.argv[2] || path.join(__dirname, "image/basic_test.bmp");

      try {
        console.log("🚀 Loading OCR model...");
        const modelId = await loadModel({
          modelSrc: OCR_LATIN_RECOGNIZER_1,
          modelType: "ocr",
          modelConfig: {
            langList: ["en"],
            useGPU: true,
            timeout: 30000,
            magRatio: 1.5,
            defaultRotationAngles: [90, 180, 270],
            contrastRetry: false,
            lowConfidenceThreshold: 0.5,
            recognizerBatchSize: 1,
          },
        });
        console.log(`✅ Model loaded successfully! Model ID: ${modelId}`);

        console.log(`\n🔍 Running OCR on: ${imagePath}`);
        const { blocks } = ocr({
          modelId,
          image: imagePath,
          options: {
            paragraph: false,
          },
        });

        const result = await blocks;

        console.log("\n📝 OCR Results:");
        console.log("================================");
        for (const block of result) {
          console.log(`\n📄 Text: ${block.text}`);
          if (block.bbox) {
            console.log(`   📍 BBox: [${block.bbox.join(", ")}]`);
          }
          if (block.confidence !== undefined) {
            console.log(`   ✓ Confidence: ${block.confidence}`);
          }
        }
        console.log("\n================================");
        console.log("\n🔄 Unloading model...");
        await unloadModel({ modelId, clearStorage: false });
        console.log("✅ Model unloaded successfully.");
        process.exit(0);
      } catch (error) {
        console.error("❌ Error during OCR processing:", error);
        await close();
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# RAG (/sdk/examples/ai-tasks/rag)


## Overview

RAG (retrieval-augmented generation) uses [text embeddings](/sdk/examples/ai-tasks/text-embeddings): you embed (vectorize) documents, store them in a vector DB, and later retrieve the most relevant chunks for a query using similarity search.

Compared to generating text embeddings only, the key differences are:

* You must **persist embeddings** (vectors) alongside the original text (and optional metadata) in a vector DB.
* At query time, you **embed the query** and run **top‑K** vector search to fetch the most relevant documents/chunks.
* You *typically* pass the retrieved text to [`completion()`](/sdk/api/completion) as context to ground the model's answer (this retrieval step is what makes it RAG).

Note that you should bring your own vector DB (e.g., HyperDB, LanceDB, ChromaDB, SQLite-Vector, etc.).

## Functions

1. [`loadModel()`](/sdk/api/loadModel) (with `modelType: "embeddings"`)
2. [`embed()`](/sdk/api/embed) — generate vectors
3. Use any combination of the RAG functions below as needed:
   * [`ragChunk()`](/sdk/api/ragChunk) — chunk documents
   * [`ragIngest()`](/sdk/api/ragIngest) — embed and store (requires model)
   * [`ragSaveEmbeddings()`](/sdk/api/ragSaveEmbeddings) — save pre-computed vectors
   * [`ragSearch()`](/sdk/api/ragSearch) — query similar documents (requires model)
   * [`ragReindex()`](/sdk/api/ragReindex) — optimize search index
   * [`ragDeleteEmbeddings()`](/sdk/api/ragDeleteEmbeddings) — remove documents
   * [`ragListWorkspaces()`](/sdk/api/ragListWorkspaces) — list workspaces
   * [`ragCloseWorkspace()`](/sdk/api/ragCloseWorkspace) — release resources
   * [`ragDeleteWorkspace()`](/sdk/api/ragDeleteWorkspace) — delete workspace and data
4. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Pipeline

Create your RAG pipeline using [text embeddings](/sdk/examples/ai-tasks/text-embeddings) and [completion](/sdk/examples/ai-tasks/completion).

<Callout type="info">
  **Important:** make sure your vector DB schema matches the embedding dimensionality produced by your model (e.g., GTE Large embeddings used in the example are 1024‑dimensional).
</Callout>

## Example

The following script shows a RAG-style workflow using an embeddings model plus an in-memory SQLite vector index:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/rag/rag-sqlite.js title="rag.js" lineNumbers
      import { embed, loadModel, unloadModel, GTE_LARGE_FP16 } from "@qvac/sdk";
      import sqlite3InitModule from "@sqliteai/sqlite-wasm";
      try {
          // Get query from command line or use default
          const query = process.argv[2] || "machine learning algorithms";
          console.log(`🔍 Query: "${query}"`);
          // Initialize SQLite with vector extension
          const sqlite3 = await sqlite3InitModule();
          const db = new sqlite3.oo1.DB(":memory:", "c");
          const modelId = await loadModel({
              modelSrc: GTE_LARGE_FP16,
              modelType: "embeddings",
              onProgress: (progress) => {
                  console.log(`Loading model... ${progress.percentage.toFixed(1)}%`);
              },
          });
          const samples = [
              {
                  id: 1,
                  text: "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn and make predictions from data without being explicitly programmed for every task.",
              },
              {
                  id: 2,
                  text: "Deep learning uses neural networks with multiple layers to process and learn from complex data patterns, enabling breakthroughs in image recognition and natural language processing.",
              },
              {
                  id: 3,
                  text: "Natural language processing combines computational linguistics with machine learning to help computers understand, interpret, and generate human language in a meaningful way.",
              },
              {
                  id: 4,
                  text: "Computer vision enables machines to interpret and understand visual information from the world, using techniques like image classification, object detection, and facial recognition.",
              },
              {
                  id: 5,
                  text: "Quantum computing leverages quantum mechanical phenomena to process information in fundamentally different ways than classical computers, potentially solving certain problems exponentially faster.",
              },
              {
                  id: 6,
                  text: "Blockchain technology creates decentralized, immutable ledgers that enable secure peer-to-peer transactions without requiring a central authority or intermediary.",
              },
              {
                  id: 7,
                  text: "Cloud computing delivers computing services over the internet, allowing users to access resources like storage, processing power, and applications on-demand from anywhere.",
              },
              {
                  id: 8,
                  text: "Cybersecurity protects digital systems, networks, and data from malicious attacks, unauthorized access, and various forms of cyber threats through multiple layers of defense.",
              },
          ];
          // Create table for documents with vector storage
          db.exec(`
        CREATE TABLE IF NOT EXISTS documents (
          id INTEGER PRIMARY KEY,
          text TEXT NOT NULL,
          embedding BLOB NOT NULL
        )
      `);
          console.log("📚 Embedding documents...");
          for (const sample of samples) {
              const { embedding } = await embed({ modelId, text: sample.text });
              db.exec({
                  sql: "INSERT INTO documents VALUES (?, ?, vector_as_f32(?))",
                  bind: [sample.id, sample.text, JSON.stringify(embedding)],
              });
          }
          // Initialize and optimize vector index
          db.exec(`SELECT vector_init('documents', 'embedding', 'type=FLOAT32,dimension=1024')`);
          // Quantize vectors
          db.exec(`SELECT vector_quantize('documents', 'embedding')`);
          // [Optional] Preload quantized vectors in memory for optimal performance
          db.exec(`SELECT vector_quantize_preload('documents', 'embedding')`);
          // Search for similar documents
          console.log("🔎 Searching for similar documents...");
          const { embedding: queryEmbedding } = await embed({ modelId, text: query });
          const results = [];
          // Perform vector search
          db.exec({
              sql: `
          SELECT d.id, d.text, v.distance 
          FROM documents d
          JOIN vector_quantize_scan('documents', 'embedding', vector_as_f32(?), 3) v
          ON d.id = v.rowid
        `,
              bind: [JSON.stringify(queryEmbedding)],
              rowMode: "object",
              callback: (row) => {
                  const typedRow = row;
                  results.push(typedRow);
              },
          });
          console.log("\n📋 Top 3 most similar documents:");
          results.forEach((result, index) => {
              console.log("=".repeat(50) + " Top result:");
              console.log(`\n${index + 1}. [ID: ${result.id}] (Score: ${result.distance.toFixed(4)})`);
              console.log(`   ${result.text}`);
              console.log("=".repeat(100));
              console.log();
          });
          await unloadModel({ modelId });
          db.close();
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/rag/rag-sqlite.ts title="rag.ts" lineNumbers
      import { embed, loadModel, unloadModel, GTE_LARGE_FP16 } from "@qvac/sdk";
      import sqlite3InitModule from "@sqliteai/sqlite-wasm";

      try {
        // Get query from command line or use default
        const query = process.argv[2] || "machine learning algorithms";
        console.log(`🔍 Query: "${query}"`);

        // Initialize SQLite with vector extension
        const sqlite3 = await sqlite3InitModule();
        const db = new sqlite3.oo1.DB(":memory:", "c");

        const modelId = await loadModel({
          modelSrc: GTE_LARGE_FP16,
          modelType: "embeddings",
          onProgress: (progress) => {
            console.log(`Loading model... ${progress.percentage.toFixed(1)}%`);
          },
        });

        const samples = [
          {
            id: 1,
            text: "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn and make predictions from data without being explicitly programmed for every task.",
          },
          {
            id: 2,
            text: "Deep learning uses neural networks with multiple layers to process and learn from complex data patterns, enabling breakthroughs in image recognition and natural language processing.",
          },
          {
            id: 3,
            text: "Natural language processing combines computational linguistics with machine learning to help computers understand, interpret, and generate human language in a meaningful way.",
          },
          {
            id: 4,
            text: "Computer vision enables machines to interpret and understand visual information from the world, using techniques like image classification, object detection, and facial recognition.",
          },
          {
            id: 5,
            text: "Quantum computing leverages quantum mechanical phenomena to process information in fundamentally different ways than classical computers, potentially solving certain problems exponentially faster.",
          },
          {
            id: 6,
            text: "Blockchain technology creates decentralized, immutable ledgers that enable secure peer-to-peer transactions without requiring a central authority or intermediary.",
          },
          {
            id: 7,
            text: "Cloud computing delivers computing services over the internet, allowing users to access resources like storage, processing power, and applications on-demand from anywhere.",
          },
          {
            id: 8,
            text: "Cybersecurity protects digital systems, networks, and data from malicious attacks, unauthorized access, and various forms of cyber threats through multiple layers of defense.",
          },
        ];

        // Create table for documents with vector storage
        db.exec(`
        CREATE TABLE IF NOT EXISTS documents (
          id INTEGER PRIMARY KEY,
          text TEXT NOT NULL,
          embedding BLOB NOT NULL
        )
      `);

        console.log("📚 Embedding documents...");
        for (const sample of samples) {
          const { embedding } = await embed({ modelId, text: sample.text });
          db.exec({
            sql: "INSERT INTO documents VALUES (?, ?, vector_as_f32(?))",
            bind: [sample.id, sample.text, JSON.stringify(embedding)],
          });
        }

        // Initialize and optimize vector index
        db.exec(
          `SELECT vector_init('documents', 'embedding', 'type=FLOAT32,dimension=1024')`,
        );

        // Quantize vectors
        db.exec(`SELECT vector_quantize('documents', 'embedding')`);

        // [Optional] Preload quantized vectors in memory for optimal performance
        db.exec(`SELECT vector_quantize_preload('documents', 'embedding')`);

        // Search for similar documents
        console.log("🔎 Searching for similar documents...");
        const { embedding: queryEmbedding } = await embed({ modelId, text: query });

        const results: Array<{
          id: number;
          text: string;
          distance: number;
        }> = [];

        // Perform vector search
        db.exec({
          sql: `
          SELECT d.id, d.text, v.distance 
          FROM documents d
          JOIN vector_quantize_scan('documents', 'embedding', vector_as_f32(?), 3) v
          ON d.id = v.rowid
        `,
          bind: [JSON.stringify(queryEmbedding)],
          rowMode: "object",
          callback: (row: unknown) => {
            const typedRow = row as { id: number; text: string; distance: number };
            results.push(typedRow);
          },
        });

        console.log("\n📋 Top 3 most similar documents:");
        results.forEach((result, index) => {
          console.log("=".repeat(50) + " Top result:");
          console.log(
            `\n${index + 1}. [ID: ${result.id}] (Score: ${result.distance.toFixed(4)})`,
          );
          console.log(`   ${result.text}`);
          console.log("=".repeat(100));
          console.log();
        });

        await unloadModel({ modelId });
        db.close();
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Text embeddings (/sdk/examples/ai-tasks/text-embeddings)


## Overview

Text embeddings uses [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp) as inference engine. Load any supported model using `modelType: "embeddings"`. Then, provide text input as `text` where the value is either a single `string` or an array of strings.

`embed()` returns a single embedding vector (`number[]`) for single text input, or an array of embedding vectors (`number[][]`) for batch input.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/sdk/api/loadModel)
2. [`embed()`](/sdk/api/embed)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Models

You can load any [`llama.cpp`](https://github.com/ggml-org/llama.cpp)-compatible embeddings model. Model file format: `*.gguf`.

* If the model is sharded across multiple files (a multi-file bundle), see [Sharded models](/sdk/examples/utilities/sharded-models).
* For models available as constants, see [SDK — Models](/sdk/getting-started#models).

## Example

The following script shows an example of embedding:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/embed-p2p.js title="text-embeddings.js" lineNumbers
      import { embed, GTE_LARGE_FP16, loadModel, unloadModel } from "@qvac/sdk";
      function cosineSimilarity(vecA, vecB) {
          let dotProduct = 0;
          for (let i = 0; i < vecA.length; i++) {
              dotProduct += vecA[i] * vecB[i];
          }
          return dotProduct;
      }
      try {
          const modelId = await loadModel({
              modelSrc: GTE_LARGE_FP16,
              modelType: "embeddings",
              onProgress: (progress) => {
                  console.log(progress);
              },
              modelConfig: {
                  gpuLayers: 99,
                  device: "gpu",
              },
          });
          console.log("\n📝 Example 1: Single Text Embedding");
          console.log("=".repeat(50));
          const { embedding: singleEmbedding } = await embed({ modelId, text: "Hello, world!" });
          console.log("Input: 'Hello, world!'");
          console.log("Embedding dimensions:", singleEmbedding.length);
          console.log("First 10 values:", singleEmbedding.slice(0, 10));
          console.log("\n📝 Example 2: Batch Text Embeddings");
          console.log("=".repeat(50));
          const texts = [
              "The quick brown fox jumps over the lazy dog",
              "A fast auburn fox leaps over a sleepy canine",
              "Python is a programming language",
          ];
          const { embedding: batchEmbeddings } = await embed({ modelId, text: texts });
          console.log("Input: Array of", texts.length, "texts");
          console.log("Output: Array of", batchEmbeddings.length, "embeddings");
          const [emb1, emb2, emb3] = batchEmbeddings;
          if (!emb1 || !emb2 || !emb3) {
              throw new Error("Expected 3 embeddings");
          }
          console.log("Each embedding dimensions:", emb1.length);
          console.log("\n🔍 Similarity Analysis");
          console.log("=".repeat(50));
          const similarity1 = cosineSimilarity(emb1, emb2);
          const similarity2 = cosineSimilarity(emb1, emb3);
          console.log("Similarity between texts 1 and 2 (similar meaning):", similarity1.toFixed(4));
          console.log("Similarity between texts 1 and 3 (different topics):", similarity2.toFixed(4));
          console.log("\n💡 Higher values indicate more similar meanings");
          await unloadModel({ modelId, clearStorage: false });
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/embed-p2p.ts title="text-embeddings.ts" lineNumbers
      import { embed, GTE_LARGE_FP16, loadModel, unloadModel } from "@qvac/sdk";

      function cosineSimilarity(vecA: number[], vecB: number[]) {
        let dotProduct = 0;
        for (let i = 0; i < vecA.length; i++) {
          dotProduct += vecA[i]! * vecB[i]!;
        }
        return dotProduct;
      }

      try {
        const modelId = await loadModel({
          modelSrc: GTE_LARGE_FP16,
          modelType: "embeddings",
          onProgress: (progress) => {
            console.log(progress);
          },
          modelConfig: {
            gpuLayers: 99,
            device: "gpu",
          },
        });

        console.log("\n📝 Example 1: Single Text Embedding");
        console.log("=".repeat(50));

        const { embedding: singleEmbedding } = await embed({ modelId, text: "Hello, world!" });

        console.log("Input: 'Hello, world!'");
        console.log("Embedding dimensions:", singleEmbedding.length);
        console.log("First 10 values:", singleEmbedding.slice(0, 10));

        console.log("\n📝 Example 2: Batch Text Embeddings");
        console.log("=".repeat(50));

        const texts = [
          "The quick brown fox jumps over the lazy dog",
          "A fast auburn fox leaps over a sleepy canine",
          "Python is a programming language",
        ];

        const { embedding: batchEmbeddings } = await embed({ modelId, text: texts });

        console.log("Input: Array of", texts.length, "texts");
        console.log("Output: Array of", batchEmbeddings.length, "embeddings");

        const [emb1, emb2, emb3] = batchEmbeddings;

        if (!emb1 || !emb2 || !emb3) {
          throw new Error("Expected 3 embeddings");
        }

        console.log("Each embedding dimensions:", emb1.length);

        console.log("\n🔍 Similarity Analysis");
        console.log("=".repeat(50));

        const similarity1 = cosineSimilarity(emb1, emb2);
        const similarity2 = cosineSimilarity(emb1, emb3);

        console.log(
          "Similarity between texts 1 and 2 (similar meaning):",
          similarity1.toFixed(4),
        );
        console.log(
          "Similarity between texts 1 and 3 (different topics):",
          similarity2.toFixed(4),
        );
        console.log("\n💡 Higher values indicate more similar meanings");

        await unloadModel({ modelId, clearStorage: false });
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Text-to-Speech (/sdk/examples/ai-tasks/text-to-speech)


## Overview

Text-to-Speech uses [ONNX runtime](https://onnxruntime.ai) as inference engine. Load any supported model using `modelType: "tts"`. Then, provide `text` as input (with `inputType: "text"`) to generate speech audio.

`textToSpeech()` returns an object containing `buffer` and, when streaming is enabled, a `bufferStream` for incremental audio output.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/sdk/api/loadModel)
2. [`textToSpeech()`](/sdk/api/textToSpeech)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Models

You can load any [**Chatterbox**](https://www.resemble.ai/chatterbox/) or [**Supertonic**](https://supertonictts.com) model bundle compatible with ONNX Runtime. Required files: an `onnx/` directory with one or more `*.onnx` graphs (and optional `*.onnx_data`) plus the model assets required by the bundle (tokenizer/config files and voice assets such as `voices/*.bin` or a reference `*.wav`).

For models available as constants, see [SDK — Models](/sdk/getting-started#models).

## Example

### Chatterbox

The following script shows an example of Chatterbox TTS with voice cloning from a reference audio file. Use it with [`utils.js` / `utils.ts`](#utils):

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/tts/chatterbox.js title="tts-chatterbox.js" lineNumbers
      import { loadModel, textToSpeech, unloadModel, TTS_TOKENIZER_EN_CHATTERBOX, TTS_SPEECH_ENCODER_EN_CHATTERBOX_FP32, TTS_EMBED_TOKENS_EN_CHATTERBOX_FP32, TTS_CONDITIONAL_DECODER_EN_CHATTERBOX_FP32, TTS_LANGUAGE_MODEL_EN_CHATTERBOX_FP32, } from "@qvac/sdk";
      import { createWav, playAudio, int16ArrayToBuffer, createWavHeader, } from "./utils";
      // Chatterbox TTS: voice cloning with reference audio.
      // Uses registry model constants - downloads automatically from QVAC Registry.
      // Only reference audio WAV needs to be provided by the user.
      // Usage: node chatterbox-filesystem.js <referenceAudioSrc>
      const [referenceAudioSrc] = process.argv.slice(2);
      if (!referenceAudioSrc) {
          console.error("Usage: node chatterbox-filesystem.js <referenceAudioSrc>");
          process.exit(1);
      }
      const CHATTERBOX_SAMPLE_RATE = 24000;
      try {
          const modelId = await loadModel({
              modelSrc: TTS_TOKENIZER_EN_CHATTERBOX.src,
              modelType: "tts",
              modelConfig: {
                  ttsEngine: "chatterbox",
                  language: "en",
                  ttsTokenizerSrc: TTS_TOKENIZER_EN_CHATTERBOX.src,
                  ttsSpeechEncoderSrc: TTS_SPEECH_ENCODER_EN_CHATTERBOX_FP32.src,
                  ttsEmbedTokensSrc: TTS_EMBED_TOKENS_EN_CHATTERBOX_FP32.src,
                  ttsConditionalDecoderSrc: TTS_CONDITIONAL_DECODER_EN_CHATTERBOX_FP32.src,
                  ttsLanguageModelSrc: TTS_LANGUAGE_MODEL_EN_CHATTERBOX_FP32.src,
                  referenceAudioSrc,
              },
              onProgress: (progress) => {
                  console.log(progress);
              },
          });
          console.log(`Model loaded: ${modelId}`);
          console.log("🎵 Testing Text-to-Speech...");
          const result = textToSpeech({
              modelId,
              text: `QVAC SDK is the canonical entry point to QVAC. Written in TypeScript, it provides all QVAC capabilities through a unified interface while also abstracting away the complexity of running your application in a JS environment other than Bare. Supported JS environments include Bare, Node.js, Expo and Bun.`,
              inputType: "text",
              stream: false,
          });
          const audioBuffer = await result.buffer;
          console.log(`TTS complete. Total bytes: ${audioBuffer.length}`);
          console.log("💾 Saving audio to file...");
          createWav(audioBuffer, CHATTERBOX_SAMPLE_RATE, "tts-output.wav");
          console.log("✅ Audio saved to tts-output.wav");
          console.log("🔊 Playing audio...");
          const audioData = int16ArrayToBuffer(audioBuffer);
          const wavBuffer = Buffer.concat([
              createWavHeader(audioData.length, CHATTERBOX_SAMPLE_RATE),
              audioData,
          ]);
          playAudio(wavBuffer);
          console.log("✅ Audio playback complete");
          await unloadModel({ modelId });
          console.log("Model unloaded");
          process.exit(0);
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/tts/chatterbox.ts title="tts-chatterbox.ts" lineNumbers
      import {
        loadModel,
        textToSpeech,
        unloadModel,
        type ModelProgressUpdate,
        TTS_TOKENIZER_EN_CHATTERBOX,
        TTS_SPEECH_ENCODER_EN_CHATTERBOX_FP32,
        TTS_EMBED_TOKENS_EN_CHATTERBOX_FP32,
        TTS_CONDITIONAL_DECODER_EN_CHATTERBOX_FP32,
        TTS_LANGUAGE_MODEL_EN_CHATTERBOX_FP32,
      } from "@qvac/sdk";
      import {
        createWav,
        playAudio,
        int16ArrayToBuffer,
        createWavHeader,
      } from "./utils";

      // Chatterbox TTS: voice cloning with reference audio.
      // Uses registry model constants - downloads automatically from QVAC Registry.
      // Only reference audio WAV needs to be provided by the user.
      // Usage: node chatterbox-filesystem.js <referenceAudioSrc>
      const [referenceAudioSrc] = process.argv.slice(2);

      if (!referenceAudioSrc) {
        console.error("Usage: node chatterbox-filesystem.js <referenceAudioSrc>");
        process.exit(1);
      }

      const CHATTERBOX_SAMPLE_RATE = 24000;

      try {
        const modelId = await loadModel({
          modelSrc: TTS_TOKENIZER_EN_CHATTERBOX.src,
          modelType: "tts",
          modelConfig: {
            ttsEngine: "chatterbox",
            language: "en",
            ttsTokenizerSrc: TTS_TOKENIZER_EN_CHATTERBOX.src,
            ttsSpeechEncoderSrc: TTS_SPEECH_ENCODER_EN_CHATTERBOX_FP32.src,
            ttsEmbedTokensSrc: TTS_EMBED_TOKENS_EN_CHATTERBOX_FP32.src,
            ttsConditionalDecoderSrc: TTS_CONDITIONAL_DECODER_EN_CHATTERBOX_FP32.src,
            ttsLanguageModelSrc: TTS_LANGUAGE_MODEL_EN_CHATTERBOX_FP32.src,
            referenceAudioSrc,
          },
          onProgress: (progress: ModelProgressUpdate) => {
            console.log(progress);
          },
        });

        console.log(`Model loaded: ${modelId}`);

        console.log("🎵 Testing Text-to-Speech...");
        const result = textToSpeech({
          modelId,
          text: `QVAC SDK is the canonical entry point to QVAC. Written in TypeScript, it provides all QVAC capabilities through a unified interface while also abstracting away the complexity of running your application in a JS environment other than Bare. Supported JS environments include Bare, Node.js, Expo and Bun.`,
          inputType: "text",
          stream: false,
        });

        const audioBuffer = await result.buffer;
        console.log(`TTS complete. Total bytes: ${audioBuffer.length}`);

        console.log("💾 Saving audio to file...");
        createWav(audioBuffer, CHATTERBOX_SAMPLE_RATE, "tts-output.wav");
        console.log("✅ Audio saved to tts-output.wav");

        console.log("🔊 Playing audio...");
        const audioData = int16ArrayToBuffer(audioBuffer);
        const wavBuffer = Buffer.concat([
          createWavHeader(audioData.length, CHATTERBOX_SAMPLE_RATE),
          audioData,
        ]);
        playAudio(wavBuffer);
        console.log("✅ Audio playback complete");

        await unloadModel({ modelId });
        console.log("Model unloaded");
        process.exit(0);
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Supertonic

The following script shows an example of Supertonic TTS for general-purpose speech synthesis. Use it with [`utils.js` / `utils.ts`](#utils):

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/tts/supertonic.js title="tts-supertonic.js" lineNumbers
      import { loadModel, textToSpeech, unloadModel, TTS_SUPERTONIC2_OFFICIAL_TEXT_ENCODER_SUPERTONE_FP32, TTS_SUPERTONIC2_OFFICIAL_DURATION_PREDICTOR_SUPERTONE_FP32, TTS_SUPERTONIC2_OFFICIAL_VECTOR_ESTIMATOR_SUPERTONE_FP32, TTS_SUPERTONIC2_OFFICIAL_VOCODER_SUPERTONE_FP32, TTS_SUPERTONIC2_OFFICIAL_UNICODE_INDEXER_SUPERTONE_FP32, TTS_SUPERTONIC2_OFFICIAL_TTS_CONFIG_SUPERTONE, TTS_SUPERTONIC2_OFFICIAL_VOICE_STYLE_SUPERTONE, } from "@qvac/sdk";
      import { createWav, playAudio, int16ArrayToBuffer, createWavHeader, } from "./utils";
      // Supertonic TTS: general-purpose, no voice cloning.
      // Uses registry model constants - downloads automatically from QVAC Registry.
      const SUPERTONIC_SAMPLE_RATE = 44100;
      try {
          const modelId = await loadModel({
              modelSrc: TTS_SUPERTONIC2_OFFICIAL_TEXT_ENCODER_SUPERTONE_FP32.src,
              modelType: "tts",
              modelConfig: {
                  ttsEngine: "supertonic",
                  language: "en",
                  speed: 1.05,
                  numInferenceSteps: 5,
                  supertonicMultilingual: false, //false for English quality.
                  ttsTextEncoderSrc: TTS_SUPERTONIC2_OFFICIAL_TEXT_ENCODER_SUPERTONE_FP32.src,
                  ttsDurationPredictorSrc: TTS_SUPERTONIC2_OFFICIAL_DURATION_PREDICTOR_SUPERTONE_FP32.src,
                  ttsVectorEstimatorSrc: TTS_SUPERTONIC2_OFFICIAL_VECTOR_ESTIMATOR_SUPERTONE_FP32.src,
                  ttsVocoderSrc: TTS_SUPERTONIC2_OFFICIAL_VOCODER_SUPERTONE_FP32.src,
                  ttsUnicodeIndexerSrc: TTS_SUPERTONIC2_OFFICIAL_UNICODE_INDEXER_SUPERTONE_FP32.src,
                  ttsTtsConfigSrc: TTS_SUPERTONIC2_OFFICIAL_TTS_CONFIG_SUPERTONE.src,
                  ttsVoiceStyleSrc: TTS_SUPERTONIC2_OFFICIAL_VOICE_STYLE_SUPERTONE.src,
              },
              onProgress: (progress) => {
                  console.log(progress);
              },
          });
          console.log(`Model loaded: ${modelId}`);
          console.log("🎵 Testing Text-to-Speech...");
          const result = textToSpeech({
              modelId,
              text: `QVAC SDK is the canonical entry point to QVAC. Written in TypeScript, it provides all QVAC capabilities through a unified interface while also abstracting away the complexity of running your application in a JS environment other than Bare. Supported JS environments include Bare, Node.js, Expo and Bun.`,
              inputType: "text",
              stream: false,
          });
          const audioBuffer = await result.buffer;
          console.log(`TTS complete. Total samples: ${audioBuffer.length}`);
          console.log("💾 Saving audio to file...");
          createWav(audioBuffer, SUPERTONIC_SAMPLE_RATE, "supertonic-output.wav");
          console.log("✅ Audio saved to supertonic-output.wav");
          console.log("🔊 Playing audio...");
          const audioData = int16ArrayToBuffer(audioBuffer);
          const wavBuffer = Buffer.concat([
              createWavHeader(audioData.length, SUPERTONIC_SAMPLE_RATE),
              audioData,
          ]);
          playAudio(wavBuffer);
          console.log("✅ Audio playback complete");
          await unloadModel({ modelId });
          console.log("Model unloaded");
          process.exit(0);
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/tts/supertonic.ts title="tts-supertonic.ts" lineNumbers
      import {
        loadModel,
        textToSpeech,
        unloadModel,
        type ModelProgressUpdate,
        TTS_SUPERTONIC2_OFFICIAL_TEXT_ENCODER_SUPERTONE_FP32,
        TTS_SUPERTONIC2_OFFICIAL_DURATION_PREDICTOR_SUPERTONE_FP32,
        TTS_SUPERTONIC2_OFFICIAL_VECTOR_ESTIMATOR_SUPERTONE_FP32,
        TTS_SUPERTONIC2_OFFICIAL_VOCODER_SUPERTONE_FP32,
        TTS_SUPERTONIC2_OFFICIAL_UNICODE_INDEXER_SUPERTONE_FP32,
        TTS_SUPERTONIC2_OFFICIAL_TTS_CONFIG_SUPERTONE,
        TTS_SUPERTONIC2_OFFICIAL_VOICE_STYLE_SUPERTONE,
      } from "@qvac/sdk";
      import {
        createWav,
        playAudio,
        int16ArrayToBuffer,
        createWavHeader,
      } from "./utils";

      // Supertonic TTS: general-purpose, no voice cloning.
      // Uses registry model constants - downloads automatically from QVAC Registry.
      const SUPERTONIC_SAMPLE_RATE = 44100;

      try {
        const modelId = await loadModel({
          modelSrc: TTS_SUPERTONIC2_OFFICIAL_TEXT_ENCODER_SUPERTONE_FP32.src,
          modelType: "tts",
          modelConfig: {
            ttsEngine: "supertonic",
            language: "en",
            speed: 1.05,
            numInferenceSteps: 5,
            supertonicMultilingual: false, //false for English quality.
            ttsTextEncoderSrc: TTS_SUPERTONIC2_OFFICIAL_TEXT_ENCODER_SUPERTONE_FP32.src,
            ttsDurationPredictorSrc: TTS_SUPERTONIC2_OFFICIAL_DURATION_PREDICTOR_SUPERTONE_FP32.src,
            ttsVectorEstimatorSrc: TTS_SUPERTONIC2_OFFICIAL_VECTOR_ESTIMATOR_SUPERTONE_FP32.src,
            ttsVocoderSrc: TTS_SUPERTONIC2_OFFICIAL_VOCODER_SUPERTONE_FP32.src,
            ttsUnicodeIndexerSrc: TTS_SUPERTONIC2_OFFICIAL_UNICODE_INDEXER_SUPERTONE_FP32.src,
            ttsTtsConfigSrc: TTS_SUPERTONIC2_OFFICIAL_TTS_CONFIG_SUPERTONE.src,
            ttsVoiceStyleSrc: TTS_SUPERTONIC2_OFFICIAL_VOICE_STYLE_SUPERTONE.src,
          },
          onProgress: (progress: ModelProgressUpdate) => {
            console.log(progress);
          },
        });

        console.log(`Model loaded: ${modelId}`);

        console.log("🎵 Testing Text-to-Speech...");
        const result = textToSpeech({
          modelId,
          text: `QVAC SDK is the canonical entry point to QVAC. Written in TypeScript, it provides all QVAC capabilities through a unified interface while also abstracting away the complexity of running your application in a JS environment other than Bare. Supported JS environments include Bare, Node.js, Expo and Bun.`,
          inputType: "text",
          stream: false,
        });

        const audioBuffer = await result.buffer;
        console.log(`TTS complete. Total samples: ${audioBuffer.length}`);

        console.log("💾 Saving audio to file...");
        createWav(audioBuffer, SUPERTONIC_SAMPLE_RATE, "supertonic-output.wav");
        console.log("✅ Audio saved to supertonic-output.wav");

        console.log("🔊 Playing audio...");
        const audioData = int16ArrayToBuffer(audioBuffer);
        const wavBuffer = Buffer.concat([
          createWavHeader(audioData.length, SUPERTONIC_SAMPLE_RATE),
          audioData,
        ]);
        playAudio(wavBuffer);
        console.log("✅ Audio playback complete");

        await unloadModel({ modelId });
        console.log("Model unloaded");
        process.exit(0);
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Utils

The following helper script is used by both examples above to convert the raw PCM samples returned by `textToSpeech()` into a WAV file and play it back:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/tts/utils.js title="utils.js" lineNumbers
      import { writeFileSync, unlinkSync } from "fs";
      import { spawnSync } from "child_process";
      import { platform } from "os";
      /**
       * Create WAV header for 16-bit PCM audio
       */
      export function createWavHeader(dataLength, sampleRate) {
          const header = Buffer.alloc(44);
          // RIFF header
          header.write("RIFF", 0);
          header.writeUInt32LE(36 + dataLength, 4);
          header.write("WAVE", 8);
          // fmt chunk
          header.write("fmt ", 12);
          header.writeUInt32LE(16, 16); // fmt chunk size
          header.writeUInt16LE(1, 20); // PCM format
          header.writeUInt16LE(1, 22); // mono
          header.writeUInt32LE(sampleRate, 24);
          header.writeUInt32LE(sampleRate * 2, 28); // byte rate
          header.writeUInt16LE(2, 32); // block align
          header.writeUInt16LE(16, 34); // bits per sample
          // data chunk
          header.write("data", 36);
          header.writeUInt32LE(dataLength, 40);
          return header;
      }
      /**
       * Convert Int16Array to Buffer
       */
      export function int16ArrayToBuffer(samples) {
          const buffer = Buffer.alloc(samples.length * 2);
          for (let i = 0; i < samples.length; i++) {
              const value = Math.max(-32768, Math.min(32767, Math.round(samples[i] ?? 0)));
              buffer.writeInt16LE(value, i * 2);
          }
          return buffer;
      }
      /**
       * Create and save WAV file
       */
      export function createWav(audioBuffer, sampleRate, filename) {
          const audioData = int16ArrayToBuffer(audioBuffer);
          const wavHeader = createWavHeader(audioData.length, sampleRate);
          const wavFile = Buffer.concat([wavHeader, audioData]);
          writeFileSync(filename, wavFile);
          console.log(`WAV file saved as: ${filename}`);
      }
      /**
       * Play audio using system audio players
       */
      export function playAudio(audioBuffer) {
          const currentPlatform = platform();
          const tempFile = `/tmp/audio-${Date.now()}.wav`;
          // Write audio buffer to temporary file
          writeFileSync(tempFile, audioBuffer);
          let audioPlayer;
          let args;
          switch (currentPlatform) {
              case "darwin":
                  audioPlayer = "afplay";
                  args = [tempFile];
                  break;
              case "linux":
                  audioPlayer = "aplay";
                  args = [tempFile];
                  break;
              case "win32":
                  audioPlayer = "powershell";
                  args = [
                      "-Command",
                      `Add-Type -AssemblyName presentationCore; (New-Object Media.SoundPlayer).LoadStream([System.IO.File]::ReadAllBytes('${tempFile}')).PlaySync()`,
                  ];
                  break;
              default:
                  audioPlayer = "aplay";
                  args = [tempFile];
          }
          const result = spawnSync(audioPlayer, args, {
              stdio: ["inherit", "inherit", "inherit"],
          });
          try {
              unlinkSync(tempFile);
          }
          catch {
              // Ignore cleanup errors
          }
          if (result.error) {
              throw new Error(`Audio player failed: ${result.error.message}`);
          }
          if (result.status !== 0) {
              throw new Error(`Audio player exited with code ${result.status}`);
          }
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/tts/utils.ts title="utils.ts" lineNumbers
      import { writeFileSync, unlinkSync } from "fs";
      import { spawnSync } from "child_process";
      import { platform } from "os";

      /**
       * Create WAV header for 16-bit PCM audio
       */
      export function createWavHeader(
        dataLength: number,
        sampleRate: number,
      ): Buffer {
        const header = Buffer.alloc(44);

        // RIFF header
        header.write("RIFF", 0);
        header.writeUInt32LE(36 + dataLength, 4);
        header.write("WAVE", 8);

        // fmt chunk
        header.write("fmt ", 12);
        header.writeUInt32LE(16, 16); // fmt chunk size
        header.writeUInt16LE(1, 20); // PCM format
        header.writeUInt16LE(1, 22); // mono
        header.writeUInt32LE(sampleRate, 24);
        header.writeUInt32LE(sampleRate * 2, 28); // byte rate
        header.writeUInt16LE(2, 32); // block align
        header.writeUInt16LE(16, 34); // bits per sample

        // data chunk
        header.write("data", 36);
        header.writeUInt32LE(dataLength, 40);

        return header;
      }

      /**
       * Convert Int16Array to Buffer
       */
      export function int16ArrayToBuffer(samples: number[]): Buffer {
        const buffer = Buffer.alloc(samples.length * 2);
        for (let i = 0; i < samples.length; i++) {
          const value = Math.max(
            -32768,
            Math.min(32767, Math.round(samples[i] ?? 0)),
          );
          buffer.writeInt16LE(value, i * 2);
        }
        return buffer;
      }

      /**
       * Create and save WAV file
       */
      export function createWav(
        audioBuffer: number[],
        sampleRate: number,
        filename: string,
      ): void {
        const audioData = int16ArrayToBuffer(audioBuffer);
        const wavHeader = createWavHeader(audioData.length, sampleRate);
        const wavFile = Buffer.concat([wavHeader, audioData]);

        writeFileSync(filename, wavFile);
        console.log(`WAV file saved as: ${filename}`);
      }

      /**
       * Play audio using system audio players
       */
      export function playAudio(audioBuffer: Buffer): void {
        const currentPlatform = platform();
        const tempFile = `/tmp/audio-${Date.now()}.wav`;

        // Write audio buffer to temporary file
        writeFileSync(tempFile, audioBuffer);

        let audioPlayer: string;
        let args: string[];

        switch (currentPlatform) {
          case "darwin":
            audioPlayer = "afplay";
            args = [tempFile];
            break;
          case "linux":
            audioPlayer = "aplay";
            args = [tempFile];
            break;
          case "win32":
            audioPlayer = "powershell";
            args = [
              "-Command",
              `Add-Type -AssemblyName presentationCore; (New-Object Media.SoundPlayer).LoadStream([System.IO.File]::ReadAllBytes('${tempFile}')).PlaySync()`,
            ];
            break;
          default:
            audioPlayer = "aplay";
            args = [tempFile];
        }

        const result = spawnSync(audioPlayer, args, {
          stdio: ["inherit", "inherit", "inherit"],
        });

        try {
          unlinkSync(tempFile);
        } catch {
          // Ignore cleanup errors
        }

        if (result.error) {
          throw new Error(`Audio player failed: ${result.error.message}`);
        }
        if (result.status !== 0) {
          throw new Error(`Audio player exited with code ${result.status}`);
        }
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Transcription (/sdk/examples/ai-tasks/transcription)


## Overview

Transcription uses your choice of either [`qvac-ext-lib-whisper.cpp`](https://github.com/tetherto/qvac-ext-lib-whisper.cpp) or [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) (via [ONNX Runtime](https://onnxruntime.ai)) as inference engine. Load a model using `modelType: "whisper"` for `qvac-ext-lib-whisper.cpp`, or `modelType: "parakeet"` for Parakeet. Parakeet supports multilingual transcription (TDT), english-only transcription (CTC), and speaker diarization (Sortformer).

Provide audio input as `audioChunk`, either as a file path (string) or an in-memory audio buffer.

`transcribe()` returns the full transcription as a single `string`. If you need partial results as they become available, use `transcribeStream()` to receive text chunks in real-time.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/sdk/api/loadModel)
2. [`transcribe()`](/sdk/api/transcribe) or [`transcribeStream()`](/sdk/api/transcribeStream)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Models

### `qvac-ext-lib-whisper.cpp`

You should load two models:

* a [`whisper.cpp`](https://github.com/ggml-org/whisper.cpp)-compatible model for transcription. Model file format: `*.bin`; and
* a VAD model (e.g., Silero) converted to GGML. Model file format: `*.bin` *(optional, recommended)*.

### Parakeet

Parakeet requires multiple model artifacts. The required files depend on the model variant:

* **TDT** (multilingual, \~25 languages): encoder, encoder data, decoder, vocabulary, and preprocessor files.
* **CTC** (english-only): model, model data, and tokenizer files.
* **Sortformer** (speaker diarization): a single model file.

Pass the model variant via `modelConfig.modelType` (`"tdt"`, `"ctc"`, or `"sortformer"`) and provide the corresponding source fields in `modelConfig`. See [`loadModel()` — Parakeet `modelConfig`](/sdk/api/loadModel#parakeet-modelconfig).

For model artifacts available as constants, see [SDK — Models](/sdk/getting-started#models).

<Callout type="success">
  **Tip:** if you are not using SDK model constants, download the required files from a compatible model repository — e.g., [`parakeet-ctc-0.6b-ONNX` on Hugging Face](https://huggingface.co/onnx-community/parakeet-ctc-0.6b-ONNX/tree/main/onnx). For a list of compatible repositories, see [Addons — Parakeet](/addons/transcription-parakeet#models).
</Callout>

<Callout type="info">
  **On VAD:** when using `qvac-ext-lib-whisper.cpp`, you can optionally provide a separate model for voice activity detection (VAD); this is recommended. In turn, Parakeet handles VAD internally, so no additional model or configuration is required.
</Callout>

## Examples

### `qvac-ext-lib-whisper.cpp`

The following script shows an example of `qvac-ext-lib-whisper.cpp` transcription with prompt-guided decoding, VAD, and GPU acceleration:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/transcription/whispercpp-prompt.js title="whispercpp-prompt.js" lineNumbers
      /**
       * Whisper transcription with prompt example.
       *
       * Usage:
       *   bun examples/transcription/whispercpp-prompt.ts
       *
       * This example requires a test audio file (default: examples/audio/sample-16khz.wav).
       * Sample audio files are available in the QVAC source repository, but not included in the published npm package.
       * Set audioChunk to a custom WAV, or download the default audio into examples/audio/:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/audio/sample-16khz.wav
       */
      import { loadModel, unloadModel, transcribe, WHISPER_TINY } from "@qvac/sdk";
      try {
          console.log("🎤 Starting Whisper transcription with prompt example...");
          // Load the Whisper model
          console.log("📥 Loading Whisper model...");
          const modelId = await loadModel({
              modelSrc: WHISPER_TINY,
              modelType: "whisper",
              modelConfig: {
                  audio_format: "f32le",
                  // Sampling strategy
                  strategy: "greedy",
                  n_threads: 4,
                  // Transcription options
                  language: "en",
                  translate: false,
                  no_timestamps: false,
                  single_segment: false,
                  print_timestamps: true,
                  token_timestamps: true,
                  // Quality settings
                  temperature: 0.0,
                  suppress_blank: true,
                  suppress_nst: true,
                  // Advanced tuning
                  entropy_thold: 2.4,
                  logprob_thold: -1.0,
                  // VAD configuration
                  vad_params: {
                      threshold: 0.35,
                      min_speech_duration_ms: 200,
                      min_silence_duration_ms: 150,
                      max_speech_duration_s: 30.0,
                      speech_pad_ms: 600,
                      samples_overlap: 0.3,
                  },
                  // Context parameters for GPU
                  contextParams: {
                      use_gpu: true,
                      flash_attn: true,
                      gpu_device: 0,
                  },
              },
              onProgress: (progress) => {
                  console.log(progress);
              },
          });
          console.log(`✅ Whisper model loaded with ID: ${modelId}`);
          // Perform transcription
          console.log("🎧 Transcribing audio...");
          const text = await transcribe({
              modelId,
              audioChunk: "examples/audio/sample-16khz.wav",
              prompt: "This is a test recording with clear speech and proper punctuation.",
          });
          console.log("📝 Transcription result:");
          console.log(text);
          // Unload the model when done
          console.log("🧹 Unloading Whisper model...");
          await unloadModel({ modelId });
          console.log("✅ Whisper model unloaded successfully");
          process.exit(0);
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/transcription/whispercpp-prompt.ts title="whispercpp-prompt.ts" lineNumbers
      /**
       * Whisper transcription with prompt example.
       *
       * Usage:
       *   bun examples/transcription/whispercpp-prompt.ts
       *
       * This example requires a test audio file (default: examples/audio/sample-16khz.wav).
       * Sample audio files are available in the QVAC source repository, but not included in the published npm package.
       * Set audioChunk to a custom WAV, or download the default audio into examples/audio/:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/audio/sample-16khz.wav
       */
      import { loadModel, unloadModel, transcribe, WHISPER_TINY } from "@qvac/sdk";

      try {
        console.log("🎤 Starting Whisper transcription with prompt example...");

        // Load the Whisper model
        console.log("📥 Loading Whisper model...");
        const modelId = await loadModel({
          modelSrc: WHISPER_TINY,
          modelType: "whisper",
          modelConfig: {
            audio_format: "f32le",
            // Sampling strategy
            strategy: "greedy",
            n_threads: 4,
            // Transcription options
            language: "en",
            translate: false,
            no_timestamps: false,
            single_segment: false,
            print_timestamps: true,
            token_timestamps: true,
            // Quality settings
            temperature: 0.0,
            suppress_blank: true,
            suppress_nst: true,
            // Advanced tuning
            entropy_thold: 2.4,
            logprob_thold: -1.0,
            // VAD configuration
            vad_params: {
              threshold: 0.35,
              min_speech_duration_ms: 200,
              min_silence_duration_ms: 150,
              max_speech_duration_s: 30.0,
              speech_pad_ms: 600,
              samples_overlap: 0.3,
            },
            // Context parameters for GPU
            contextParams: {
              use_gpu: true,
              flash_attn: true,
              gpu_device: 0,
            },
          },
          onProgress: (progress) => {
            console.log(progress);
          },
        });

        console.log(`✅ Whisper model loaded with ID: ${modelId}`);

        // Perform transcription
        console.log("🎧 Transcribing audio...");
        const text = await transcribe({
          modelId,
          audioChunk: "examples/audio/sample-16khz.wav",
          prompt:
            "This is a test recording with clear speech and proper punctuation.",
        });

        console.log("📝 Transcription result:");
        console.log(text);

        // Unload the model when done
        console.log("🧹 Unloading Whisper model...");
        await unloadModel({ modelId });
        console.log("✅ Whisper model unloaded successfully");
        process.exit(0);
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Parakeet TDT

The following script shows an example of multilingual transcription using the Parakeet TDT model from a WAV file:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/transcription/parakeet-tdt-filesystem.js title="parakeet-tdt-filesystem.js" lineNumbers
      import { loadModel, unloadModel, transcribe, PARAKEET_TDT_ENCODER_FP32, PARAKEET_TDT_DECODER_FP32, PARAKEET_TDT_VOCAB, PARAKEET_TDT_PREPROCESSOR_FP32, } from "@qvac/sdk";
      const args = process.argv.slice(2);
      if (!args[0]) {
          console.error("Usage: bun run examples/transcription/parakeet-tdt-filesystem.ts <wav-file-path> " +
              "[encoder-onnx] [decoder-onnx] [vocab-txt] [preprocessor-onnx]");
          console.error("\nIf model paths are omitted, defaults to registry models.");
          process.exit(1);
      }
      const audioFilePath = args[0];
      const parakeetEncoderSrc = args[1] ?? PARAKEET_TDT_ENCODER_FP32;
      const parakeetDecoderSrc = args[2] ?? PARAKEET_TDT_DECODER_FP32;
      const parakeetVocabSrc = args[3] ?? PARAKEET_TDT_VOCAB;
      const parakeetPreprocessorSrc = args[4] ?? PARAKEET_TDT_PREPROCESSOR_FP32;
      try {
          console.log("Starting Parakeet transcription example...");
          console.log("Loading Parakeet model...");
          const modelId = await loadModel({
              modelSrc: parakeetEncoderSrc,
              modelType: "parakeet",
              modelConfig: {
                  parakeetEncoderSrc,
                  parakeetDecoderSrc,
                  parakeetVocabSrc,
                  parakeetPreprocessorSrc,
              },
              onProgress: (progress) => {
                  console.log(`Download progress: ${progress.percentage.toFixed(1)}%`);
              },
          });
          console.log(`Parakeet model loaded with ID: ${modelId}`);
          console.log("Transcribing audio...");
          const text = await transcribe({ modelId, audioChunk: audioFilePath });
          console.log("Transcription result:");
          console.log(text);
          console.log("Unloading Parakeet model...");
          await unloadModel({ modelId });
          console.log("Parakeet model unloaded successfully");
      }
      catch (error) {
          console.error("Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/transcription/parakeet-tdt-filesystem.ts title="parakeet-tdt-filesystem.ts" lineNumbers
      import {
        loadModel,
        unloadModel,
        transcribe,
        PARAKEET_TDT_ENCODER_FP32,
        PARAKEET_TDT_DECODER_FP32,
        PARAKEET_TDT_VOCAB,
        PARAKEET_TDT_PREPROCESSOR_FP32,
      } from "@qvac/sdk";

      const args = process.argv.slice(2);

      if (!args[0]) {
        console.error(
          "Usage: bun run examples/transcription/parakeet-tdt-filesystem.ts <wav-file-path> " +
            "[encoder-onnx] [decoder-onnx] [vocab-txt] [preprocessor-onnx]",
        );
        console.error("\nIf model paths are omitted, defaults to registry models.");
        process.exit(1);
      }

      const audioFilePath = args[0];

      const parakeetEncoderSrc = args[1] ?? PARAKEET_TDT_ENCODER_FP32;
      const parakeetDecoderSrc = args[2] ?? PARAKEET_TDT_DECODER_FP32;
      const parakeetVocabSrc = args[3] ?? PARAKEET_TDT_VOCAB;
      const parakeetPreprocessorSrc = args[4] ?? PARAKEET_TDT_PREPROCESSOR_FP32;

      try {
        console.log("Starting Parakeet transcription example...");

        console.log("Loading Parakeet model...");
        const modelId = await loadModel({
          modelSrc: parakeetEncoderSrc,
          modelType: "parakeet",
          modelConfig: {
            parakeetEncoderSrc,
            parakeetDecoderSrc,
            parakeetVocabSrc,
            parakeetPreprocessorSrc,
          },
          onProgress: (progress) => {
            console.log(`Download progress: ${progress.percentage.toFixed(1)}%`);
          },
        });

        console.log(`Parakeet model loaded with ID: ${modelId}`);

        console.log("Transcribing audio...");
        const text = await transcribe({ modelId, audioChunk: audioFilePath });

        console.log("Transcription result:");
        console.log(text);

        console.log("Unloading Parakeet model...");
        await unloadModel({ modelId });
        console.log("Parakeet model unloaded successfully");
      } catch (error) {
        console.error("Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Parakeet CTC

The following script shows an example of English-only transcription using the Parakeet CTC model from a WAV file:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/transcription/parakeet-ctc-filesystem.js title="parakeet-ctc-filesystem.js" lineNumbers
      import { loadModel, unloadModel, transcribe, PARAKEET_CTC_FP32, PARAKEET_CTC_TOKENIZER, } from "@qvac/sdk";
      const args = process.argv.slice(2);
      if (!args[0]) {
          console.error("Usage: bun run examples/transcription/parakeet-ctc-filesystem.ts <wav-file> " +
              "[model.onnx] [tokenizer.json]");
          console.error("\nIf model paths are omitted, defaults to registry models.");
          process.exit(1);
      }
      const audioFilePath = args[0];
      const parakeetCtcModelSrc = args[1] ?? PARAKEET_CTC_FP32;
      const parakeetTokenizerSrc = args[2] ?? PARAKEET_CTC_TOKENIZER;
      try {
          console.log("Loading Parakeet CTC model...");
          const modelId = await loadModel({
              modelSrc: parakeetCtcModelSrc,
              modelType: "parakeet",
              modelConfig: {
                  modelType: "ctc",
                  parakeetCtcModelSrc,
                  parakeetTokenizerSrc,
              },
              onProgress: (progress) => {
                  console.log(`Download progress: ${progress.percentage.toFixed(1)}%`);
              },
          });
          console.log(`Parakeet CTC model loaded with ID: ${modelId}`);
          console.log("Transcribing audio...");
          const text = await transcribe({ modelId, audioChunk: audioFilePath });
          console.log("Transcription result:");
          console.log(text);
          console.log("Unloading model...");
          await unloadModel({ modelId });
          console.log("Done");
      }
      catch (error) {
          console.error("Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/transcription/parakeet-ctc-filesystem.ts title="parakeet-ctc-filesystem.ts" lineNumbers
      import {
        loadModel,
        unloadModel,
        transcribe,
        PARAKEET_CTC_FP32,
        PARAKEET_CTC_TOKENIZER,
      } from "@qvac/sdk";

      const args = process.argv.slice(2);

      if (!args[0]) {
        console.error(
          "Usage: bun run examples/transcription/parakeet-ctc-filesystem.ts <wav-file> " +
            "[model.onnx] [tokenizer.json]",
        );
        console.error("\nIf model paths are omitted, defaults to registry models.");
        process.exit(1);
      }

      const audioFilePath = args[0];
      const parakeetCtcModelSrc = args[1] ?? PARAKEET_CTC_FP32;
      const parakeetTokenizerSrc = args[2] ?? PARAKEET_CTC_TOKENIZER;

      try {
        console.log("Loading Parakeet CTC model...");
        const modelId = await loadModel({
          modelSrc: parakeetCtcModelSrc,
          modelType: "parakeet",
          modelConfig: {
            modelType: "ctc",
            parakeetCtcModelSrc,
            parakeetTokenizerSrc,
          },
          onProgress: (progress) => {
            console.log(`Download progress: ${progress.percentage.toFixed(1)}%`);
          },
        });

        console.log(`Parakeet CTC model loaded with ID: ${modelId}`);

        console.log("Transcribing audio...");
        const text = await transcribe({ modelId, audioChunk: audioFilePath });

        console.log("Transcription result:");
        console.log(text);

        console.log("Unloading model...");
        await unloadModel({ modelId });
        console.log("Done");
      } catch (error) {
        console.error("Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Parakeet Sortformer

The following script shows an example of speaker diarization using the Parakeet Sortformer model, followed by per-segment transcription with the TDT model:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/transcription/parakeet-sortformer.js title="parakeet-sortformer.js" lineNumbers
      /**
       * Parakeet Sortformer diarization + TDT transcription example.
       *
       * Usage:
       *   bun examples/transcription/parakeet-sortformer.ts [sortformer-src] [path-to-audio]
       *
       * This example requires a test audio file (default: examples/audio/diarization-sample-16k.wav).
       * Sample audio files are available in the QVAC source repository, but not included in the published npm package.
       * Pass a custom audio path as the second argument, or download the default audio into examples/audio/:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/audio/diarization-sample-16k.wav
       */
      import { loadModel, unloadModel, transcribe, PARAKEET_TDT_ENCODER_FP32, PARAKEET_TDT_DECODER_FP32, PARAKEET_TDT_VOCAB, PARAKEET_TDT_PREPROCESSOR_FP32, PARAKEET_SORTFORMER_FP32, } from "@qvac/sdk";
      import { dirname, join } from "path";
      import { fileURLToPath } from "url";
      import { readFileSync, writeFileSync, mkdirSync } from "fs";
      import { tmpdir } from "os";
      const __dirname = dirname(fileURLToPath(import.meta.url));
      const args = process.argv.slice(2);
      const sortformerSrc = args[0] ?? PARAKEET_SORTFORMER_FP32;
      const defaultAudioPath = join(__dirname, "..", "audio", "diarization-sample-16k.wav");
      const audioFilePath = args[1] ?? defaultAudioPath;
      // ── Step 1: Diarize with Sortformer ──
      const sfModelId = await loadModel({
          modelSrc: sortformerSrc,
          modelType: "parakeet",
          modelConfig: {
              modelType: "sortformer",
              parakeetSortformerSrc: sortformerSrc,
          },
      });
      const diarization = await transcribe({
          modelId: sfModelId,
          audioChunk: audioFilePath,
      });
      await unloadModel({ modelId: sfModelId });
      const segments = parseDiarization(diarization);
      // ── Step 2: Transcribe each segment with TDT ──
      const tdtModelId = await loadModel({
          modelSrc: PARAKEET_TDT_ENCODER_FP32,
          modelType: "parakeet",
          modelConfig: {
              parakeetEncoderSrc: PARAKEET_TDT_ENCODER_FP32,
              parakeetDecoderSrc: PARAKEET_TDT_DECODER_FP32,
              parakeetVocabSrc: PARAKEET_TDT_VOCAB,
              parakeetPreprocessorSrc: PARAKEET_TDT_PREPROCESSOR_FP32,
          },
      });
      const pcm = readPcm(audioFilePath);
      const sliceDir = join(tmpdir(), `qvac-diarize-${Date.now()}`);
      mkdirSync(sliceDir, { recursive: true });
      const results = [];
      for (let i = 0; i < segments.length; i++) {
          const seg = segments[i];
          const slicePath = join(sliceDir, `seg-${i}.wav`);
          if (!writeWavSlice(pcm, seg.start, seg.end, slicePath)) {
              results.push({ ...seg, text: "[No speech detected]" });
              continue;
          }
          const text = await transcribe({ modelId: tdtModelId, audioChunk: slicePath });
          results.push({ ...seg, text: text.trim() || "[No speech detected]" });
      }
      await unloadModel({ modelId: tdtModelId });
      // ── Step 3: Merge consecutive same-speaker segments and print ──
      const merged = mergeSpeakers(results);
      console.log("\n=== DIARIZED TRANSCRIPTION ===");
      console.log("=".repeat(60));
      for (const entry of merged) {
          console.log(`Speaker ${entry.speaker} (${entry.start.toFixed(2)}s - ${entry.end.toFixed(2)}s):`);
          console.log(`  ${entry.text}\n`);
      }
      console.log("=".repeat(60));
      console.log("\nDone!");
      // ── Helpers ──
      function parseDiarization(text) {
          const segs = [];
          for (const line of text.split("\n")) {
              const m = line.match(/Speaker (\d+): ([\d.]+)s - ([\d.]+)s/);
              if (m)
                  segs.push({ speaker: +m[1], start: +m[2], end: +m[3] });
          }
          return segs.sort((a, b) => a.start - b.start);
      }
      function readPcm(wavPath) {
          const buf = readFileSync(wavPath);
          const dataOffset = buf.indexOf("data") + 4;
          return buf.subarray(dataOffset + 4, dataOffset + 4 + buf.readUInt32LE(dataOffset));
      }
      function writeWavSlice(pcm, startSec, endSec, outPath) {
          const SR = 16000;
          const BPS = 2;
          const startByte = Math.floor(startSec * SR) * BPS;
          const endByte = Math.min(Math.ceil(endSec * SR) * BPS, pcm.length);
          if (startByte >= endByte)
              return false;
          const slice = pcm.subarray(startByte, endByte);
          const hdr = Buffer.alloc(44);
          hdr.write("RIFF", 0);
          hdr.writeUInt32LE(36 + slice.length, 4);
          hdr.write("WAVEfmt ", 8);
          hdr.writeUInt32LE(16, 16);
          hdr.writeUInt16LE(1, 20);
          hdr.writeUInt16LE(1, 22);
          hdr.writeUInt32LE(SR, 24);
          hdr.writeUInt32LE(SR * BPS, 28);
          hdr.writeUInt16LE(BPS, 32);
          hdr.writeUInt16LE(16, 34);
          hdr.write("data", 36);
          hdr.writeUInt32LE(slice.length, 40);
          writeFileSync(outPath, Buffer.concat([hdr, slice]));
          return true;
      }
      function mergeSpeakers(entries) {
          const out = [];
          for (const e of entries) {
              const last = out[out.length - 1];
              if (last && last.speaker === e.speaker) {
                  last.text += " " + e.text;
                  last.end = e.end;
              }
              else {
                  out.push({ ...e });
              }
          }
          return out;
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/transcription/parakeet-sortformer.ts title="parakeet-sortformer.ts" lineNumbers
      /**
       * Parakeet Sortformer diarization + TDT transcription example.
       *
       * Usage:
       *   bun examples/transcription/parakeet-sortformer.ts [sortformer-src] [path-to-audio]
       *
       * This example requires a test audio file (default: examples/audio/diarization-sample-16k.wav).
       * Sample audio files are available in the QVAC source repository, but not included in the published npm package.
       * Pass a custom audio path as the second argument, or download the default audio into examples/audio/:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/audio/diarization-sample-16k.wav
       */
      import {
        loadModel,
        unloadModel,
        transcribe,
        PARAKEET_TDT_ENCODER_FP32,
        PARAKEET_TDT_DECODER_FP32,
        PARAKEET_TDT_VOCAB,
        PARAKEET_TDT_PREPROCESSOR_FP32,
        PARAKEET_SORTFORMER_FP32,
      } from "@qvac/sdk";
      import { dirname, join } from "path";
      import { fileURLToPath } from "url";
      import { readFileSync, writeFileSync, mkdirSync } from "fs";
      import { tmpdir } from "os";

      const __dirname = dirname(fileURLToPath(import.meta.url));

      const args = process.argv.slice(2);
      const sortformerSrc = args[0] ?? PARAKEET_SORTFORMER_FP32;

      const defaultAudioPath = join(
        __dirname,
        "..",
        "audio",
        "diarization-sample-16k.wav",
      );
      const audioFilePath = args[1] ?? defaultAudioPath;

      // ── Step 1: Diarize with Sortformer ──

      const sfModelId = await loadModel({
        modelSrc: sortformerSrc,
        modelType: "parakeet",
        modelConfig: {
          modelType: "sortformer",
          parakeetSortformerSrc: sortformerSrc,
        },
      });

      const diarization = await transcribe({
        modelId: sfModelId,
        audioChunk: audioFilePath,
      });
      await unloadModel({ modelId: sfModelId });

      const segments = parseDiarization(diarization);

      // ── Step 2: Transcribe each segment with TDT ──

      const tdtModelId = await loadModel({
        modelSrc: PARAKEET_TDT_ENCODER_FP32,
        modelType: "parakeet",
        modelConfig: {
          parakeetEncoderSrc: PARAKEET_TDT_ENCODER_FP32,
          parakeetDecoderSrc: PARAKEET_TDT_DECODER_FP32,
          parakeetVocabSrc: PARAKEET_TDT_VOCAB,
          parakeetPreprocessorSrc: PARAKEET_TDT_PREPROCESSOR_FP32,
        },
      });

      const pcm = readPcm(audioFilePath);
      const sliceDir = join(tmpdir(), `qvac-diarize-${Date.now()}`);
      mkdirSync(sliceDir, { recursive: true });

      const results: { speaker: number; start: number; end: number; text: string }[] =
        [];

      for (let i = 0; i < segments.length; i++) {
        const seg = segments[i]!;
        const slicePath = join(sliceDir, `seg-${i}.wav`);

        if (!writeWavSlice(pcm, seg.start, seg.end, slicePath)) {
          results.push({ ...seg, text: "[No speech detected]" });
          continue;
        }

        const text = await transcribe({ modelId: tdtModelId, audioChunk: slicePath });
        results.push({ ...seg, text: text.trim() || "[No speech detected]" });
      }

      await unloadModel({ modelId: tdtModelId });

      // ── Step 3: Merge consecutive same-speaker segments and print ──

      const merged = mergeSpeakers(results);

      console.log("\n=== DIARIZED TRANSCRIPTION ===");
      console.log("=".repeat(60));
      for (const entry of merged) {
        console.log(
          `Speaker ${entry.speaker} (${entry.start.toFixed(2)}s - ${entry.end.toFixed(2)}s):`,
        );
        console.log(`  ${entry.text}\n`);
      }
      console.log("=".repeat(60));
      console.log("\nDone!");

      // ── Helpers ──

      function parseDiarization(text: string) {
        const segs: { speaker: number; start: number; end: number }[] = [];
        for (const line of text.split("\n")) {
          const m = line.match(/Speaker (\d+): ([\d.]+)s - ([\d.]+)s/);
          if (m) segs.push({ speaker: +m[1]!, start: +m[2]!, end: +m[3]! });
        }
        return segs.sort((a, b) => a.start - b.start);
      }

      function readPcm(wavPath: string): Buffer {
        const buf = readFileSync(wavPath);
        const dataOffset = buf.indexOf("data") + 4;
        return buf.subarray(dataOffset + 4, dataOffset + 4 + buf.readUInt32LE(dataOffset));
      }

      function writeWavSlice(
        pcm: Buffer,
        startSec: number,
        endSec: number,
        outPath: string,
      ): boolean {
        const SR = 16000;
        const BPS = 2;
        const startByte = Math.floor(startSec * SR) * BPS;
        const endByte = Math.min(Math.ceil(endSec * SR) * BPS, pcm.length);
        if (startByte >= endByte) return false;

        const slice = pcm.subarray(startByte, endByte);
        const hdr = Buffer.alloc(44);
        hdr.write("RIFF", 0);
        hdr.writeUInt32LE(36 + slice.length, 4);
        hdr.write("WAVEfmt ", 8);
        hdr.writeUInt32LE(16, 16);
        hdr.writeUInt16LE(1, 20);
        hdr.writeUInt16LE(1, 22);
        hdr.writeUInt32LE(SR, 24);
        hdr.writeUInt32LE(SR * BPS, 28);
        hdr.writeUInt16LE(BPS, 32);
        hdr.writeUInt16LE(16, 34);
        hdr.write("data", 36);
        hdr.writeUInt32LE(slice.length, 40);

        writeFileSync(outPath, Buffer.concat([hdr, slice]));
        return true;
      }

      function mergeSpeakers<T extends { speaker: number; start: number; end: number; text: string }>(
        entries: T[],
      ): T[] {
        const out: T[] = [];
        for (const e of entries) {
          const last = out[out.length - 1];
          if (last && last.speaker === e.speaker) {
            last.text += " " + e.text;
            last.end = e.end;
          } else {
            out.push({ ...e });
          }
        }
        return out;
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Translation (/sdk/examples/ai-tasks/translation)


## Overview

Translation uses your choice of either [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp) or [Bergamot](https://browser.mt) as inference engine. Load any supported model using `modelType: "nmt"`, and `modelConfig.engine: "Bergamot"` for Bergamot.

Translation input is defined by:

* `from: string`: source language id (e.g., "en")
* `to: string`: target language id
* `text: string | string[]`: text to be translated

`translate()` returns an object containing `text` and when streaming is enabled, a `tokenStream` for real-time output.

For a list of supported languages and their ids (string abbreviations), see [qvac-sdk/schemas/translation-config.ts](https://github.com/tetherto/qvac/blob/main/packages/sdk/schemas/translation-config.ts).

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/sdk/api/loadModel)
2. [`translate()`](/sdk/api/translate)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Models

You should load a model compatible with your chosen inference engine:

* `qvac-fabric-llm.cpp` (default): Bergamot or IndicTrans2. Bergamot uses intgemm `*.bin` + `*.spm` vocab files; IndicTrans2 uses GGML `*.bin`.
* Bergamot: Bergamot model bundle. Required files: model `*.bin` + `vocab*.spm`.

For models available as constants, see [SDK — Models](/sdk/getting-started#models).

## Example

The following script shows an example of translation:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/translation/translation-stream.js title="translation.js" lineNumbers
      import { loadModel, translate, unloadModel, BERGAMOT_EN_ES, } from "@qvac/sdk";
      try {
          const modelId = await loadModel({
              modelSrc: BERGAMOT_EN_ES,
              modelType: "nmt",
              modelConfig: {
                  engine: "Bergamot",
                  from: "en",
                  to: "es",
              },
          });
          console.log(`✅ Model loaded: ${modelId}`);
          const text = "Hello, how are you today? I hope you are having a wonderful day!";
          console.log("\n--- Streaming Translation ---");
          const streamResult = translate({
              modelId,
              text,
              modelType: "nmt",
              stream: true,
          });
          process.stdout.write("Translated text EN -> ES: ");
          for await (const token of streamResult.tokenStream) {
              process.stdout.write(token);
          }
          console.log();
          const stats = await streamResult.stats;
          if (stats) {
              console.log(`Processing stats:`, stats);
          }
          await unloadModel({ modelId, clearStorage: false });
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/translation/translation-stream.ts title="translation.ts" lineNumbers
      import {
        loadModel,
        translate,
        unloadModel,
        BERGAMOT_EN_ES,
      } from "@qvac/sdk";

      try {
        const modelId = await loadModel({
          modelSrc: BERGAMOT_EN_ES,
          modelType: "nmt",
          modelConfig: {
            engine: "Bergamot",
            from: "en",
            to: "es",
          },
        });

        console.log(`✅ Model loaded: ${modelId}`);

        const text =
          "Hello, how are you today? I hope you are having a wonderful day!";

        console.log("\n--- Streaming Translation ---");
        const streamResult = translate({
          modelId,
          text,
          modelType: "nmt",
          stream: true,
        });

        process.stdout.write("Translated text EN -> ES: ");
        for await (const token of streamResult.tokenStream) {
          process.stdout.write(token);
        }
        console.log();

        const stats = await streamResult.stats;
        if (stats) {
          console.log(`Processing stats:`, stats);
        }

        await unloadModel({ modelId, clearStorage: false });
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Download lifecycle (/sdk/examples/utilities/download-lifecycle)


## Overview

Downloads in QVAC are *resumable by default*. When you download an asset via [`downloadAsset()`](/sdk/api/downloadAsset) or [`loadModel()`](/sdk/api/loadModel)), the SDK writes partial files to disk so the next run can continue from where it left off. The progress callback provides a `downloadKey`. Persist it if you want to programmatically pause/cancel the download.

## Functions

1. [`downloadAsset()`](/sdk/api/downloadAsset) or [`loadModel()`](/sdk/api/loadModel) — with `onProgress` for progress tracking
2. [`cancel()`](/sdk/api/cancel) — pause/cancel a download using the `downloadKey`

For how to use each function, see [SDK — API reference](/sdk/api/).

## Flow

* Pause: call [`cancel()`](/sdk/api/cancel) with `operation: "downloadAsset"` and the `downloadKey` from progress events.
* Resume: run the same `downloadAsset()` / `loadModel()` call again — the SDK will reuse the partial file and continue downloading.
* Discard partial file: call `cancel({ ..., clearCache: true })`.

## Example

The following script shows an example of pausing and resuming a download using `cancel()` + `downloadKey`:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/download-with-cancel.js title="download-lifecycle.js" lineNumbers
      import { cancel, close, downloadAsset, LLAMA_3_2_1B_INST_Q4_0, } from "@qvac/sdk";
      console.log(`🚀 Starting download with pause/resume example`);
      console.log(`\n💡 Press Ctrl+C to pause the download (it will resume on restart)\n`);
      let modelId;
      let cancelled = false;
      try {
          // Download model with progress tracking and cancellation
          await downloadAsset({
              assetSrc: LLAMA_3_2_1B_INST_Q4_0,
              onProgress: (progress) => {
                  const downloadedMB = (progress.downloaded / 1024 / 1024).toFixed(2);
                  const totalMB = (progress.total / 1024 / 1024).toFixed(2);
                  const percentage = progress.percentage.toFixed(1);
                  console.log(`📊 Progress: ${percentage}% (${downloadedMB}MB / ${totalMB}MB)`);
                  // Example: Stops at 10% (or use Ctrl+C for manual stop)
                  if (parseFloat(percentage) >= 10 && !cancelled) {
                      console.log("\n🚫 Auto-cancelling at 10% for demo purposes...,");
                      console.log(`📊 Progress: ${percentage}% (${downloadedMB}MB / ${totalMB}MB)`);
                      console.log(progress);
                      cancelled = true;
                      // Use the downloadKey to cancel
                      if (progress.downloadKey) {
                          void cancel({
                              operation: "downloadAsset",
                              downloadKey: progress.downloadKey,
                              // clearCache: true, // Uncomment to delete partial file instead of resuming
                          });
                      }
                  }
              },
          });
          console.log(`\n✅ Model downloaded successfully! Model ID: ${modelId}`);
          console.log("🎯 Download completed without interruption");
          void close();
      }
      catch (error) {
          if (error instanceof Error && error.message.includes("cancelled")) {
              console.log("✅ Download was successfully cancelled");
              void close();
          }
          else {
              console.error("❌ Error:", error);
              process.exit(1);
          }
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/download-with-cancel.ts title="download-lifecycle.ts" lineNumbers
      import {
        cancel,
        close,
        downloadAsset,
        LLAMA_3_2_1B_INST_Q4_0,
      } from "@qvac/sdk";

      console.log(`🚀 Starting download with pause/resume example`);
      console.log(
        `\n💡 Press Ctrl+C to pause the download (it will resume on restart)\n`,
      );

      let modelId: string | undefined;
      let cancelled = false;

      try {
        // Download model with progress tracking and cancellation
        await downloadAsset({
          assetSrc: LLAMA_3_2_1B_INST_Q4_0,
          onProgress: (progress) => {
            const downloadedMB = (progress.downloaded / 1024 / 1024).toFixed(2);
            const totalMB = (progress.total / 1024 / 1024).toFixed(2);
            const percentage = progress.percentage.toFixed(1);

            console.log(
              `📊 Progress: ${percentage}% (${downloadedMB}MB / ${totalMB}MB)`,
            );

            // Example: Stops at 10% (or use Ctrl+C for manual stop)
            if (parseFloat(percentage) >= 10 && !cancelled) {
              console.log("\n🚫 Auto-cancelling at 10% for demo purposes...,");
              console.log(
                `📊 Progress: ${percentage}% (${downloadedMB}MB / ${totalMB}MB)`,
              );
              console.log(progress);
              cancelled = true;

              // Use the downloadKey to cancel
              if (progress.downloadKey) {
                void cancel({
                  operation: "downloadAsset",
                  downloadKey: progress.downloadKey,
                  // clearCache: true, // Uncomment to delete partial file instead of resuming
                });
              }
            }
          },
        });

        console.log(`\n✅ Model downloaded successfully! Model ID: ${modelId}`);
        console.log("🎯 Download completed without interruption");
        void close();
      } catch (error) {
        if (error instanceof Error && error.message.includes("cancelled")) {
          console.log("✅ Download was successfully cancelled");
          void close();
        } else {
          console.error("❌ Error:", error);
          process.exit(1);
        }
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Logging (/sdk/examples/utilities/logging)


## Overview

QVAC provides two complementary logging primitives:

* `loggingStream()`: stream real-time logs emitted by the SDK server and native addons (llamacpp, whispercpp, etc.). You decide what to do with each log line (print, persist, filter).
* `getLogger()`: create a logger for your own application code (namespaced, configurable level, optional transports).

## Functions

1. [`getLogger()`](/sdk/api/getLogger) — create a logger
2. [`loadModel()`](/sdk/api/loadModel) — pass logger via `logger` option
3. [`loggingStream()`](/sdk/api/loggingStream) — stream real-time logs from models or SDK server

For how to use each function, see [SDK — API reference](/sdk/api/).

## Flow

1. Pass a logger when loading your model.
2. When logging is enabled, you'll see real-time logs from the underlying model libraries:

```
[DEBUG] llamacpp:llm: Loading model weights...
[INFO] llamacpp:llm: Model loaded successfully, vocab_size=32000
[DEBUG] llamacpp:llm: Starting inference...
[DEBUG] llamacpp:llm: Inference completed, tokens=12
```

## Features

* **Streaming API (`loggingStream`)** — Consume real-time logs programmatically. Stream either:
  * SDK server logs using `SDK_LOG_ID`, or
  * per-model addon logs using the model ID returned by `loadModel()`.

* **Logger API (`getLogger`)** — Create loggers for your application code with custom transports. Console output enabled by default; set `enableConsole: false` to use only custom transports.

It works for all model types (LLM, Whisper, NMT, Embeddings) and provides valuable insight into model performance and behavior.

## Configuration

To configure global logging (level and console output), use a config file (`qvac.config.json`, `qvac.config.js`, or `qvac.config.ts`) and set:

* `loggerLevel`: `"error" | "warn" | "info" | "debug"`
* `loggerConsoleOutput`: `boolean`

## Example

The following script shows an example of streaming logs from loaded models:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/logging-streaming.js title="logging.js" lineNumbers
      import { loadModel, completion, unloadModel, loggingStream, SDK_LOG_ID, LLAMA_3_2_1B_INST_Q4_0, GTE_LARGE_FP16, VERBOSITY, embed, } from "@qvac/sdk";
      try {
          console.log("🚀 Starting log streaming demo...\n");
          // Note: To configure logging (level and console output), use config file:
          // { "loggerLevel": "debug", "loggerConsoleOutput": false } in qvac.config.json/js/ts
          // Subscribe to SDK server logs in background
          console.log("📡 Starting SDK server log stream...\n");
          (async () => {
              for await (const log of loggingStream({ id: SDK_LOG_ID })) {
                  console.log(`[SDK] [${log.level.toUpperCase()}] [${log.namespace}] ${log.message}`);
              }
          })().catch(() => {
              // Stream terminated - normal on shutdown
          });
          // Load models
          console.log("📥 Loading models (watch SDK logs above)...\n");
          const llmModelId = await loadModel({
              modelSrc: LLAMA_3_2_1B_INST_Q4_0,
              modelType: "llm",
              modelConfig: {
                  ctx_size: 2048,
                  temp: 0.7,
                  verbosity: VERBOSITY.ERROR, // Only log errors, remaining logs are captured by loggingStream
              },
          });
          const embedModelId = await loadModel({
              modelSrc: GTE_LARGE_FP16,
              modelType: "embeddings",
          });
          console.log("📡 Starting model-specific log streams...\n");
          (async () => {
              for await (const log of loggingStream({ id: llmModelId })) {
                  const timestamp = new Date(log.timestamp).toISOString();
                  console.log(`[LLM] [${timestamp}] [${log.level.toUpperCase()}] ${log.namespace}: ${log.message}`);
              }
          })().catch(() => {
              // Stream terminated - this is normal when model unloads
          });
          (async () => {
              for await (const log of loggingStream({ id: embedModelId })) {
                  const timestamp = new Date(log.timestamp).toISOString();
                  console.log(`[EMBED] [${timestamp}] [${log.level.toUpperCase()}] ${log.namespace}: ${log.message}`);
              }
          })().catch(() => {
              // Stream terminated - this is normal when model unloads
          });
          const messages = [
              { role: "user", content: "Count from 1 to 5 and explain each number." },
          ];
          const result = completion({
              modelId: llmModelId,
              history: messages,
              stream: true,
          });
          const { embedding } = await embed({
              modelId: embedModelId,
              text: messages[0]?.content ?? "Hello, world!",
          });
          console.log("📝 Response:\n");
          for await (const token of result.tokenStream) {
              process.stdout.write(token);
          }
          console.log("Embedding (first 20 elements)", embedding.slice(0, 20));
          console.log("Embeddings length", embedding.length);
          console.log("\n💡 Notice three log streams running:\n" +
              "   - [SDK] = SDK server operations\n" +
              "   - [LLM] = LLM model inference logs\n" +
              "   - [EMBED] = Embedding model logs\n");
          await unloadModel({ modelId: llmModelId, clearStorage: false });
          await unloadModel({ modelId: embedModelId, clearStorage: false });
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/logging-streaming.ts title="logging.ts" lineNumbers
      import {
        loadModel,
        completion,
        unloadModel,
        loggingStream,
        SDK_LOG_ID,
        LLAMA_3_2_1B_INST_Q4_0,
        GTE_LARGE_FP16,
        VERBOSITY,
        embed,
      } from "@qvac/sdk";

      try {
        console.log("🚀 Starting log streaming demo...\n");

        // Note: To configure logging (level and console output), use config file:
        // { "loggerLevel": "debug", "loggerConsoleOutput": false } in qvac.config.json/js/ts

        // Subscribe to SDK server logs in background
        console.log("📡 Starting SDK server log stream...\n");
        (async () => {
          for await (const log of loggingStream({ id: SDK_LOG_ID })) {
            console.log(
              `[SDK] [${log.level.toUpperCase()}] [${log.namespace}] ${log.message}`,
            );
          }
        })().catch(() => {
          // Stream terminated - normal on shutdown
        });

        // Load models
        console.log("📥 Loading models (watch SDK logs above)...\n");
        const llmModelId = await loadModel({
          modelSrc: LLAMA_3_2_1B_INST_Q4_0,
          modelType: "llm",
          modelConfig: {
            ctx_size: 2048,
            temp: 0.7,
            verbosity: VERBOSITY.ERROR, // Only log errors, remaining logs are captured by loggingStream
          },
        });

        const embedModelId = await loadModel({
          modelSrc: GTE_LARGE_FP16,
          modelType: "embeddings",
        });

        console.log("📡 Starting model-specific log streams...\n");
        (async () => {
          for await (const log of loggingStream({ id: llmModelId })) {
            const timestamp = new Date(log.timestamp).toISOString();
            console.log(
              `[LLM] [${timestamp}] [${log.level.toUpperCase()}] ${log.namespace}: ${log.message}`,
            );
          }
        })().catch(() => {
          // Stream terminated - this is normal when model unloads
        });

        (async () => {
          for await (const log of loggingStream({ id: embedModelId })) {
            const timestamp = new Date(log.timestamp).toISOString();
            console.log(
              `[EMBED] [${timestamp}] [${log.level.toUpperCase()}] ${log.namespace}: ${log.message}`,
            );
          }
        })().catch(() => {
          // Stream terminated - this is normal when model unloads
        });
        const messages = [
          { role: "user", content: "Count from 1 to 5 and explain each number." },
        ];

        const result = completion({
          modelId: llmModelId,
          history: messages,
          stream: true,
        });
        const { embedding } = await embed({
          modelId: embedModelId,
          text: messages[0]?.content ?? "Hello, world!",
        });

        console.log("📝 Response:\n");
        for await (const token of result.tokenStream) {
          process.stdout.write(token);
        }

        console.log("Embedding (first 20 elements)", embedding.slice(0, 20));
        console.log("Embeddings length", embedding.length);

        console.log(
          "\n💡 Notice three log streams running:\n" +
            "   - [SDK] = SDK server operations\n" +
            "   - [LLM] = LLM model inference logs\n" +
            "   - [EMBED] = Embedding model logs\n",
        );

        await unloadModel({ modelId: llmModelId, clearStorage: false });
        await unloadModel({ modelId: embedModelId, clearStorage: false });
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Plugin system (/sdk/examples/utilities/plugin-system)


## Overview

Each QVAC AI capability maps to a **built-in plugin** in the SDK. This lets you enable only what you need for your project and reduce your application's final bundle size.

In addition, you can add **custom plugins** — both your own and community ones — that extend QVAC's capabilities. In both cases, you'll use the QVAC configuration file and the SDK CLI.

## Built-in plugins

### Catalog

All the built-in plugins you can select, along with the AI tasks that depend on each one:

| Plugin                              | Use in `qvac.config.*`                      | AI tasks that require it                                                                                                                                                                |
| ----------------------------------- | ------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| LLM                                 | `@qvac/sdk/llamacpp-completion/plugin`      | [Completion](/sdk/examples/ai-tasks/completion); [multimodal](/sdk/examples/ai-tasks/multimodal); [RAG](/sdk/examples/ai-tasks/rag) ; [Fine-tuning](/sdk/examples/ai-tasks/fine-tuning) |
| Embeddings                          | `@qvac/sdk/llamacpp-embedding/plugin`       | [Text embeddings](/sdk/examples/ai-tasks/text-embeddings); [RAG](/sdk/examples/ai-tasks/rag)                                                                                            |
| ASR with `qvac-ext-lib-whisper.cpp` | `@qvac/sdk/whispercpp-transcription/plugin` | [Transcription](/sdk/examples/ai-tasks/transcription)                                                                                                                                   |
| ASR with Parakeet                   | `@qvac/sdk/parakeet-transcription/plugin`   | [Transcription](/sdk/examples/ai-tasks/transcription)                                                                                                                                   |
| NMT                                 | `@qvac/sdk/nmtcpp-translation/plugin`       | [Translation](/sdk/examples/ai-tasks/translation)                                                                                                                                       |
| TTS                                 | `@qvac/sdk/onnx-tts/plugin`                 | [Text-to-Speech](/sdk/examples/ai-tasks/text-to-speech)                                                                                                                                 |
| OCR                                 | `@qvac/sdk/onnx-ocr/plugin`                 | [OCR](/sdk/examples/ai-tasks/ocr)                                                                                                                                                       |
| Diffusion                           | `@qvac/sdk/sdcpp-generation/plugin`         | [Image generation](/sdk/examples/ai-tasks/image-generation)                                                                                                                             |

### Enabling

<Steps>
  <Step>
    [In your `qvac.config.*`, add the built-in plugins you’ll need in your project](/sdk/getting-started/configuration). For example:

    ```json title="qvac.config.json"
    {
      "plugins": [
        "@qvac/sdk/llamacpp-completion/plugin",
        "@qvac/sdk/onnx-ocr/plugin"
      ]
    }
    ```
  </Step>

  <Step>
    <Tabs>
      <Tab value="desktop" label="Node.js/Bare" default>
        When developing for desktop environment, use the QVAC CLI to (re)bundle the SDK only with the selected plugins:

        ```bash
        npx qvac bundle sdk
        ```
      </Tab>

      <Tab value="mobile" label="Expo">
        [When developing for mobile environment, use the Expo CLI to prebuild your project only with the selected plugins:](/sdk/getting-started/installation/#expo)

        ```bash
        npx expo prebuild
        ```
      </Tab>
    </Tabs>
  </Step>
</Steps>

Only the selected plugins are included in your bundle, significantly reducing your application size. If `plugins` is omitted or an empty array, it bundles all built-in plugins by default.

<Callout type="info">
  See [QVAC CLI](/cli) to learn how to install and use it.
</Callout>

## Custom plugins

Custom plugins are consumed as npm packages. Install the package, enable it in your `qvac.config.*` file like any built-in plugin, then import and use its API alongside the SDK's regular API.

### Enabling

<Steps>
  <Step>
    Install the plugin package in your project. For example:

    ```bash
    npm i qvac-echo-plugin
    ```
  </Step>

  <Step>
    [In your `qvac.config.*`, add its `<package_name>/plugin` specifier alongside any built-in plugins](/sdk/getting-started/configuration). For example:

    ```json title="qvac.config.json"
    {
      "plugins": [
        "@qvac/sdk/llamacpp-completion/plugin",
        "qvac-echo-plugin/plugin" // [!code highlight]
      ]
    }
    ```
  </Step>

  <Step>
    <Tabs>
      <Tab value="desktop" label="Node.js/Bare" default>
        When developing for desktop environment, use the SDK CLI to (re)bundle the SDK with all listed plugins:

        ```bash
        npx qvac bundle sdk
        ```
      </Tab>

      <Tab value="mobile" label="Expo">
        [When developing for mobile environment, use the Expo CLI to prebuild your project with all listed plugins:](/sdk/getting-started/installation/#expo)

        ```bash
        npx expo prebuild
        ```
      </Tab>
    </Tabs>
  </Step>
</Steps>

<Callout type="success">
  **Tip:** when adding one or more custom plugins, you **must** also add **all** the built-in plugins you will need to use.
</Callout>

### Usage

Custom plugins are consumed as npm packages. Install the package, enable it in your `qvac.config.*` file like any built-in plugin, then import and use its API alongside the SDK’s regular API.

```ts
import { loadModel, unloadModel } from "@qvac/sdk";

// 1. Import the API you need from the custom plugin package:
import { echo, echoStream } from "qvac-echo-plugin";

// 2. Load the model(s) required by the custom plugin:
const modelId = await loadModel({
  modelSrc: "/path/to/echo-model.bin",
  modelType: "echo",
});

// 3. Call functions exposed by the custom plugin API:
const result = await echo({ modelId, message: "Hello, plugin system!" });
console.log(result);

for await (const char of echoStream({ modelId, message: "Streaming test!" })) {
  process.stdout.write(char);
}

await unloadModel({ modelId });
```

## Notes

* In `qvac.config.*`, if `plugins` is omitted or set to an empty array, the SDK bundles all built-in plugins. If `plugins` is set, it bundles only the listed plugins.
* [Each plugin maps to one QVAC addon](/addons).
* [Write a custom plugin](/sdk/examples/utilities/write-custom-plugin).


# Profiler (/sdk/examples/utilities/profiler)


## Overview

`@qvac/sdk` npm package exposes a `profiler` object that you can import and use to measure and analyze how long SDK operations take in your application. You can enable profiling in two ways:

* [**Global:**](#enable-global) call `profiler.enable()` to profile all subsequent operations until `profiler.disable()` is called.
* [**Per-call:**](#enable-per-call) pass `{ profiling: { enabled: true } }` in the options of an individual function call.

You can also enable profiling globally and use a per-call override to [opt out](#opt-out) of specific calls by passing `{ profiling: { enabled: false } }`. Data collected through both modes is stored in and exported from the same `profiler` singleton. See [Support](#support) for the list of SDK operations that support profiling.

<Callout type="info">
  `profiler` is a process-wide singleton — its state persists until the process exits.
</Callout>

## `profiler`

The `profiler` object provides the following methods:

1. `profiler.enable(options?)` — enable profiling globally
2. `profiler.isEnabled()` — return whether profiling is enabled
3. `profiler.onRecord(callback)` — subscribe to profiling events in real time
4. Export collected data using any of the following:
   * `profiler.exportSummary()` — return a high-level summary string
   * `profiler.exportTable()` — return a detailed table of aggregated metrics
   * `profiler.exportJSON(options?)` — return a full export as structured JSON
5. `profiler.disable()` — disable profiling
6. `profiler.clear()` — clear collected data

See [API — `profiler`](/sdk/api/profiler) for the complete reference, including the full metrics catalog for each export function and detailed usage.

## Enable profiling

By default, profiling is disabled. You can enable it [globally](#enable-global) or [per-call](#enable-per-call). If you enable profiling globally, you can [opt out](#opt-out) of individual function calls.

### Global

Enable profiling globally by calling `profiler.enable()`:

```ts
import { profiler } from "@qvac/sdk";

profiler.enable({
  mode: "verbose",                 // "summary" (default) | "verbose"
  includeServerBreakdown: true,    // include server-side timing in responses
  operationFilters: ["completion"], // only profile these operations (empty = all)
});
```

### Per-call

To profile a single call, pass the `profiling` option when invoking the function:

```ts
await embed(
  { modelId, text: "hello" },
  {
    profiling: {
      enabled: true,
      includeServerBreakdown: true,
      mode: "verbose",
    },
  },
);
```

See [Support](#support) for the list of functions that accept the `profiling` option.

### Opt-out

If you enabled profiling [globally](#enable-global), you can pass the `profiling` option to opt out of a specific call:

```ts
await embed(
  { modelId, text: "hello" },
  { profiling: { enabled: false } },
);
```

## Support

The following SDK operations support profiling:

[`completion()`](/sdk/api/completion) | [`downloadAsset()`](/sdk/api/downloadAsset) | [`embed()`](/sdk/api/embed) | [`invokePlugin()`](/sdk/api/invokePlugin) | [`invokePluginStream()`](/sdk/api/invokePluginStream) | [`loadModel()`](/sdk/api/loadModel) | [`ocr()`](/sdk/api/ocr) | [`ragChunk()`](/sdk/api/ragChunk) | [`ragCloseWorkspace()`](/sdk/api/ragCloseWorkspace) | [`ragDeleteEmbeddings()`](/sdk/api/ragDeleteEmbeddings) | [`ragDeleteWorkspace()`](/sdk/api/ragDeleteWorkspace) | [`ragIngest()`](/sdk/api/ragIngest) | [`ragListWorkspaces()`](/sdk/api/ragListWorkspaces) | [`ragReindex()`](/sdk/api/ragReindex) | [`ragSaveEmbeddings()`](/sdk/api/ragSaveEmbeddings) | [`ragSearch()`](/sdk/api/ragSearch) | [`textToSpeech()`](/sdk/api/textToSpeech) | [`transcribe()`](/sdk/api/transcribe) | [`transcribeStream()`](/sdk/api/transcribeStream) | [`translate()`](/sdk/api/translate)

## Examples

### Global

The following script enables profiling globally, loads a model, runs a completion, and exports timing data in all available formats:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/profiling/basic.js title="profiling-basic.js" lineNumbers
      import { completion, loadModel, unloadModel, LLAMA_3_2_1B_INST_Q4_0, profiler, } from "@qvac/sdk";
      try {
          // Enable profiling globally
          profiler.enable({
              mode: "verbose",
              includeServerBreakdown: true,
          });
          console.log("Profiler enabled:", profiler.isEnabled());
          const modelId = await loadModel({
              modelSrc: LLAMA_3_2_1B_INST_Q4_0,
              modelType: "llm",
              onProgress: (p) => console.log(`  ${p.percentage.toFixed(1)}%`),
          });
          console.log("Model loaded:", modelId);
          console.log("\n→ Running completion...");
          const result = completion({
              modelId,
              history: [{ role: "user", content: "Say hello in one sentence." }],
              stream: true,
          });
          for await (const token of result.tokenStream) {
              process.stdout.write(token);
          }
          console.log();
          await unloadModel({ modelId });
          // Export profiling data
          console.log("\n=== Profiler Summary ===");
          console.log(profiler.exportSummary());
          console.log("\n=== Profiler Table ===");
          console.log(profiler.exportTable());
          const json = profiler.exportJSON();
          console.log("\n=== Load Model Metrics ===");
          // Filter for operation-level event (kind: "handler"), not RPC phase events
          const loadModelEvent = json.recentEvents?.find((e) => e.op === "loadModel" && e.kind === "handler");
          if (loadModelEvent) {
              const tags = loadModelEvent.tags ?? {};
              const gauges = loadModelEvent.gauges ?? {};
              console.log("  sourceType:", tags["sourceType"] ?? "(not set)");
              console.log("  cacheHit:", tags["cacheHit"] ?? "(not set)");
              console.log("  totalLoadTime:", gauges["totalLoadTime"], "ms");
              console.log("  modelInitializationTime:", gauges["modelInitializationTime"], "ms");
              if (tags["cacheHit"] !== "true") {
                  console.log("  downloadTime:", gauges["downloadTime"] ?? "(cached)", "ms");
                  console.log("  totalBytesDownloaded:", gauges["totalBytesDownloaded"] ?? "(cached)");
                  console.log("  downloadSpeedBps:", gauges["downloadSpeedBps"] ?? "(cached)");
              }
              else {
                  console.log("  (download metrics omitted - cache hit)");
              }
              if (gauges["checksumValidationTime"] !== undefined) {
                  console.log("  checksumValidationTime:", gauges["checksumValidationTime"], "ms");
              }
          }
          else {
              console.log("  (no loadModel handler event captured)");
              // Debug: show what ops are available
              const ops = [...new Set(json.recentEvents?.map((e) => `${e.op}:${e.kind}`) ?? [])];
              console.log("  Available ops:", ops.join(", "));
          }
          console.log("\n=== Profiler JSON (structure) ===");
          console.log("  aggregates:", Object.keys(json.aggregates).length, "metrics");
          console.log("  recentEvents:", json.recentEvents?.length ?? 0, "events");
          console.log("  config:", json.config);
          // Disable profiling
          profiler.disable();
          console.log("\nProfiler disabled:", !profiler.isEnabled());
      }
      catch (error) {
          console.error("Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/profiling/basic.ts title="profiling-basic.ts" lineNumbers
      import {
        completion,
        loadModel,
        unloadModel,
        LLAMA_3_2_1B_INST_Q4_0,
        profiler,
      } from "@qvac/sdk";

      try {
        // Enable profiling globally
        profiler.enable({
          mode: "verbose",
          includeServerBreakdown: true,
        });
        console.log("Profiler enabled:", profiler.isEnabled());

        const modelId = await loadModel({
          modelSrc: LLAMA_3_2_1B_INST_Q4_0,
          modelType: "llm",
          onProgress: (p) => console.log(`  ${p.percentage.toFixed(1)}%`),
        });
        console.log("Model loaded:", modelId);

        console.log("\n→ Running completion...");
        const result = completion({
          modelId,
          history: [{ role: "user", content: "Say hello in one sentence." }],
          stream: true,
        });

        for await (const token of result.tokenStream) {
          process.stdout.write(token);
        }
        console.log();

        await unloadModel({ modelId });

        // Export profiling data
        console.log("\n=== Profiler Summary ===");
        console.log(profiler.exportSummary());

        console.log("\n=== Profiler Table ===");
        console.log(profiler.exportTable());

        const json = profiler.exportJSON();
        console.log("\n=== Load Model Metrics ===");
        // Filter for operation-level event (kind: "handler"), not RPC phase events
        const loadModelEvent = json.recentEvents?.find(
          (e) => e.op === "loadModel" && e.kind === "handler",
        );
        if (loadModelEvent) {
          const tags = loadModelEvent.tags ?? {};
          const gauges = loadModelEvent.gauges ?? {};
          console.log("  sourceType:", tags["sourceType"] ?? "(not set)");
          console.log("  cacheHit:", tags["cacheHit"] ?? "(not set)");
          console.log("  totalLoadTime:", gauges["totalLoadTime"], "ms");
          console.log(
            "  modelInitializationTime:",
            gauges["modelInitializationTime"],
            "ms",
          );
          if (tags["cacheHit"] !== "true") {
            console.log("  downloadTime:", gauges["downloadTime"] ?? "(cached)", "ms");
            console.log(
              "  totalBytesDownloaded:",
              gauges["totalBytesDownloaded"] ?? "(cached)",
            );
            console.log(
              "  downloadSpeedBps:",
              gauges["downloadSpeedBps"] ?? "(cached)",
            );
          } else {
            console.log("  (download metrics omitted - cache hit)");
          }
          if (gauges["checksumValidationTime"] !== undefined) {
            console.log("  checksumValidationTime:", gauges["checksumValidationTime"], "ms");
          }
        } else {
          console.log("  (no loadModel handler event captured)");
          // Debug: show what ops are available
          const ops = [...new Set(json.recentEvents?.map((e) => `${e.op}:${e.kind}`) ?? [])];
          console.log("  Available ops:", ops.join(", "));
        }

        console.log("\n=== Profiler JSON (structure) ===");
        console.log("  aggregates:", Object.keys(json.aggregates).length, "metrics");
        console.log("  recentEvents:", json.recentEvents?.length ?? 0, "events");
        console.log("  config:", json.config);

        // Disable profiling
        profiler.disable();
        console.log("\nProfiler disabled:", !profiler.isEnabled());
      } catch (error) {
        console.error("Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Per-call

The following script keeps the profiler disabled globally and selectively profiles individual `embed()` calls using the per-call option:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/profiling/per-call.js title="profiling-per-call.js" lineNumbers
      import { embed, loadModel, unloadModel, GTE_LARGE_FP16, profiler, } from "@qvac/sdk";
      try {
          profiler.disable();
          console.log("Profiler globally enabled:", profiler.isEnabled());
          const modelId = await loadModel({
              modelSrc: GTE_LARGE_FP16,
              modelType: "embeddings",
              onProgress: (p) => console.log(`  ${p.percentage.toFixed(1)}%`),
          });
          console.log("Model loaded:", modelId);
          console.log("\n=== Embed with per-call profiling ===");
          const { embedding: embedding1 } = await embed({ modelId, text: "Profile this specific call" }, { profiling: { enabled: true, includeServerBreakdown: true } });
          console.log("Embedding dimensions:", embedding1.length);
          console.log("\n=== Embed without profiling ===");
          const { embedding: embedding2 } = await embed({
              modelId,
              text: "This call is not profiled",
          });
          console.log("Embedding dimensions:", embedding2.length);
          console.log("\n=== Embed with profiling explicitly disabled ===");
          const { embedding: embedding3 } = await embed({ modelId, text: "Profiling explicitly disabled for this call" }, { profiling: { enabled: false } });
          console.log("Embedding dimensions:", embedding3.length);
          await unloadModel({ modelId });
          console.log("\n=== Profiler Summary (per-call data only) ===");
          console.log(profiler.exportSummary());
      }
      catch (error) {
          console.error("Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/profiling/per-call.ts title="profiling-per-call.ts" lineNumbers
      import {
        embed,
        loadModel,
        unloadModel,
        GTE_LARGE_FP16,
        profiler,
      } from "@qvac/sdk";

      try {
        profiler.disable();
        console.log("Profiler globally enabled:", profiler.isEnabled());

        const modelId = await loadModel({
          modelSrc: GTE_LARGE_FP16,
          modelType: "embeddings",
          onProgress: (p) => console.log(`  ${p.percentage.toFixed(1)}%`),
        });
        console.log("Model loaded:", modelId);

        console.log("\n=== Embed with per-call profiling ===");
        const { embedding: embedding1 } = await embed(
          { modelId, text: "Profile this specific call" },
          { profiling: { enabled: true, includeServerBreakdown: true } },
        );
        console.log("Embedding dimensions:", embedding1.length);

        console.log("\n=== Embed without profiling ===");
        const { embedding: embedding2 } = await embed({
          modelId,
          text: "This call is not profiled",
        });
        console.log("Embedding dimensions:", embedding2.length);

        console.log("\n=== Embed with profiling explicitly disabled ===");
        const { embedding: embedding3 } = await embed(
          { modelId, text: "Profiling explicitly disabled for this call" },
          { profiling: { enabled: false } },
        );
        console.log("Embedding dimensions:", embedding3.length);

        await unloadModel({ modelId });

        console.log("\n=== Profiler Summary (per-call data only) ===");
        console.log(profiler.exportSummary());
      } catch (error) {
        console.error("Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Sharded models (/sdk/examples/utilities/sharded-models)


## Overview

Some models may be distributed as **multiple files** (shards) instead of a single large one. [`loadModel()`](/sdk/api/loadModel) supports sharded models and ensures that all shards are available before loading it. For this, shard file names **must** follow the pattern: `<name>-00001-of-0000X.<ext>`.

<Callout type="info">
  **Important:** for now, sharded models are supported only for GGUF models.
</Callout>

## Supported formats

* Archives (`.tar`, `.tar.gz`, `.tgz`): HTTP or local with automatic extraction
* HTTP sharded URL: pass the download URL of any shard and the SDK will fetch the remaining shards
* Local shards: pass the path to any shard file.

## Local sharded models

All files **must** be present in the same directory:

* All numbered shard files, for example:
  * `model-00001-of-00005.gguf`
  * `model-00002-of-00005.gguf`
  * …
  * `model-00005-of-00005.gguf`
* A companion tensors manifest file:
  * `model.tensors.txt`

When using `loadModel()`, you pass the path to the first shard (e.g., `model-00001-of-00005.gguf`), and the SDK automatically detects and loads the remaining shards.

## Functions

1. [`loadModel()`](/sdk/api/loadModel) — pass the path/URL of any shard; SDK fetches remaining shards
2. Use the model as usual ([`completion()`](/sdk/api/completion), [`embed()`](/sdk/api/embed), etc.)
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Example

The following script shows an example of loading a sharded model with per-shard progress tracking:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/llamacpp-sharded.js title="sharded-models.js" lineNumbers
      import { completion, loadModel, unloadModel, VERBOSITY } from "@qvac/sdk";
      // Sharded models can be loaded from:
      // 1. HTTP archives: "https://example.com/model.tar.gz"
      // 2. HTTP pattern: "https://example.com/model-00001-of-00005.gguf"
      // 3. Hyperdrive: use any sharded model source/constant, eg: LLAMA_3_2_1B_INST_Q4_0_SHARD
      // 4. Local filesystem: pass the path to the first shard file (Note: All shards must be in the same directory)
      // 5. Local archive: pass the path to the archive file (.tar, .tar.gz, .tgz)
      try {
          const modelId = await loadModel({
              modelSrc: "https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_0-00001-of-00002.gguf",
              modelType: "llm",
              modelConfig: {
                  device: "gpu",
                  ctx_size: 2048,
                  verbosity: VERBOSITY.ERROR,
              },
              onProgress: (progress) => {
                  // For sharded models, progress.shardInfo contains detailed progress for both
                  // individual shards AND overall download progress across all shards
                  if (progress.shardInfo) {
                      // For pattern-based or Hyperdrive shards
                      const { shardInfo } = progress;
                      console.log(`📥 Downloading ${shardInfo.shardName} (${shardInfo.currentShard}/${shardInfo.totalShards})\n` +
                          `   File: ${progress.percentage.toFixed(1)}% (${(progress.downloaded / 1024 / 1024).toFixed(2)}MB / ${(progress.total / 1024 / 1024).toFixed(2)}MB)\n` +
                          `   Overall: ${shardInfo.overallPercentage.toFixed(1)}% (${(shardInfo.overallDownloaded / 1024 / 1024).toFixed(2)}MB / ${(shardInfo.overallTotal / 1024 / 1024).toFixed(2)}MB)`);
                  }
                  else {
                      // For archive-based shards
                      console.log(`📥 Progress: ${progress.percentage.toFixed(1)}% ` +
                          `(${(progress.downloaded / 1024 / 1024).toFixed(2)}MB / ${(progress.total / 1024 / 1024).toFixed(2)}MB)`);
                  }
              },
          });
          const history = [
              {
                  role: "user",
                  content: "What are the benefits of sharding large language models? Use emojis in your response.",
              },
          ];
          const result = completion({ modelId, history, stream: true });
          console.log("\n🤖 Model response:");
          for await (const token of result.tokenStream) {
              process.stdout.write(token);
          }
          const stats = await result.stats;
          console.log("\n\n📊 Performance Stats:", stats);
          await unloadModel({ modelId, clearStorage: false });
          process.exit(0);
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/llamacpp-sharded.ts title="sharded-models.ts" lineNumbers
      import { completion, loadModel, unloadModel, VERBOSITY } from "@qvac/sdk";

      // Sharded models can be loaded from:
      // 1. HTTP archives: "https://example.com/model.tar.gz"
      // 2. HTTP pattern: "https://example.com/model-00001-of-00005.gguf"
      // 3. Hyperdrive: use any sharded model source/constant, eg: LLAMA_3_2_1B_INST_Q4_0_SHARD
      // 4. Local filesystem: pass the path to the first shard file (Note: All shards must be in the same directory)
      // 5. Local archive: pass the path to the archive file (.tar, .tar.gz, .tgz)

      try {
        const modelId = await loadModel({
          modelSrc:
            "https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_0-00001-of-00002.gguf",
          modelType: "llm",
          modelConfig: {
            device: "gpu",
            ctx_size: 2048,
            verbosity: VERBOSITY.ERROR,
          },
          onProgress: (progress) => {
            // For sharded models, progress.shardInfo contains detailed progress for both
            // individual shards AND overall download progress across all shards
            if (progress.shardInfo) {
              // For pattern-based or Hyperdrive shards
              const { shardInfo } = progress;

              console.log(
                `📥 Downloading ${shardInfo.shardName} (${shardInfo.currentShard}/${shardInfo.totalShards})\n` +
                  `   File: ${progress.percentage.toFixed(1)}% (${(progress.downloaded / 1024 / 1024).toFixed(2)}MB / ${(progress.total / 1024 / 1024).toFixed(2)}MB)\n` +
                  `   Overall: ${shardInfo.overallPercentage.toFixed(1)}% (${(shardInfo.overallDownloaded / 1024 / 1024).toFixed(2)}MB / ${(shardInfo.overallTotal / 1024 / 1024).toFixed(2)}MB)`,
              );
            } else {
              // For archive-based shards
              console.log(
                `📥 Progress: ${progress.percentage.toFixed(1)}% ` +
                  `(${(progress.downloaded / 1024 / 1024).toFixed(2)}MB / ${(progress.total / 1024 / 1024).toFixed(2)}MB)`,
              );
            }
          },
        });

        const history = [
          {
            role: "user",
            content:
              "What are the benefits of sharding large language models? Use emojis in your response.",
          },
        ];

        const result = completion({ modelId, history, stream: true });

        console.log("\n🤖 Model response:");
        for await (const token of result.tokenStream) {
          process.stdout.write(token);
        }

        const stats = await result.stats;
        console.log("\n\n📊 Performance Stats:", stats);

        await unloadModel({ modelId, clearStorage: false });
        process.exit(0);
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>


# Write a custom plugin (/sdk/examples/utilities/write-custom-plugin)


import { File, Folder, Files } from "fumadocs-ui/components/files";

## Entrypoints

Your package must expose two entrypoints:

* `.`: client-side wrappers (Metro-safe, cross-platform)
* `./plugin`: worker-side plugin definition (Bare-only; addon imports allowed)

Example:

```json title="package.json"
{
  "name": "qvac-echo-plugin",
  "exports": {
    ".": {
      "types": "./dist/client.d.ts",
      "import": "./dist/client.js"
    },
    "./plugin": {
      "types": "./dist/plugin.d.ts",
      "import": "./dist/plugin.js"
    }
  }
}
```

## Project structure

Keep client and plugin code split:

<Files>
  <Folder name="qvac-echo-plugin" defaultOpen>
    <File name="package.json" />

    <Folder name="src" defaultOpen>
      <Folder name="client" defaultOpen>
        <File name="index.ts" />
      </Folder>

      <Folder name="plugin" defaultOpen>
        <File name="index.ts" />
      </Folder>
    </Folder>

    <Folder name="dist" defaultOpen>
      <Folder name="client" defaultOpen>
        <File name="index.js" />
      </Folder>

      <Folder name="plugin" defaultOpen>
        <File name="index.js" />
      </Folder>
    </Folder>
  </Folder>
</Files>

## Bare and bundling constraints

Write plugin code that can be statically bundled. Guidelines:

* Avoid Node-only standard library usage in worker/plugin code.
* Avoid dynamic imports in worker/plugin code.
* Keep worker/plugin imports static and predictable.

## Handler payload rules

Keep all handler I/O JSON-serializable. Guidelines:

* Request payloads must be plain JSON.
* Response payloads must be plain JSON.
* Avoid classes, Dates, Buffers, Maps/Sets, functions, and non-serializable types.

## Client wrappers

Expose a wrapper-first API. Do not ask consumers to call `invokePlugin` directly. Guidelines:

* Export all public functions from the package root (`.`).
* Keep wrapper code Metro-safe.
* Forward calls to the worker using `invokePlugin` / `invokePluginStream`.
* Use typed request/response payloads.
* Prefer an options object signature (for example `{ modelId, ...params }`) for consistency with SDK usage.

Example:

```ts title="src/client/index.ts"
import { invokePlugin, invokePluginStream } from "@qvac/sdk";

export async function echo(options: { modelId: string; message: string }) {
  return invokePlugin<{ echoed: string; timestamp: number }>({
    modelId: options.modelId,
    handler: "echo",
    params: options,
  });
}

export async function* echoStream(options: { modelId: string; message: string }) {
  for await (const chunk of invokePluginStream<{ char: string | null; done: boolean }>({
    modelId: options.modelId,
    handler: "echoStream",
    params: options,
  })) {
    if (!chunk.done && chunk.char) {
      yield chunk.char;
    }
  }
}
```

## Worker-side plugin definition

Implement the worker-side plugin in the `./plugin` entrypoint. Guidelines:

* Define the plugin with `definePlugin` and handlers with `defineHandler`.
* Use a unique `modelType`.
* Set a human-readable `displayName`.
* Set `addonPackage` if your plugin uses an addon (use `"none"` for pure JS plugins).
* Implement `createModel(params)` using `params.modelPath` when your plugin requires model files.

Example:

```ts title="src/plugin/index.ts"
import { z } from "zod";
import { definePlugin, defineHandler } from "@qvac/sdk/plugin-utils";
import type { CreateModelParams, PluginModelResult } from "@qvac/sdk";

export const echoPlugin = definePlugin({
  modelType: "echo",
  displayName: "Echo Plugin",
  addonPackage: "none",
  loadConfigSchema: z.object({}).catchall(z.unknown()),

  createModel: (params: CreateModelParams): PluginModelResult => {
    const model = { id: params.modelId, load: async () => {} };
    return { model, loader: null };
  },

  handlers: {
    echo: defineHandler({
      requestSchema: z.object({ message: z.string() }),
      responseSchema: z.object({ echoed: z.string(), timestamp: z.number() }),
      handler: async (request) => ({
        echoed: `Echo: ${request.message}`,
        timestamp: Date.now(),
      }),
    }),
  },
});
```

## Local development (workspace linking)

Use workspace linking during development:

```bash
# In the plugin directory
bun link

# In the SDK/app project
bun link qvac-echo-plugin
```

Then reference the plugin by package name.

## Publishing checklist

* Package exports `.` and `./plugin`.
* Root entrypoint exports only Metro-safe client wrappers.
* `./plugin` exports the worker-side plugin definition.
* If your plugin requires model files, `createModel` consumes a resolved `modelPath`.
* Handler requests and responses are JSON-serializable.
* Worker/plugin code is Bare-compatible and statically importable.


# Blind relays (/sdk/examples/p2p/blind-relays)


## Overview

Blind relays help establish peer connections through NAT/firewalls by routing traffic through relay nodes. Use it for model downloads from [our model registry](/sdk/getting-started#models) and for delegated inference scenarios where peers may be behind restrictive network configurations.

## Configuration

Configure relays via QVAC config file (`qvac.config.json`, `qvac.config.js`, or `qvac.config.ts`) in your project root:

```json
{
  "swarmRelays": ["<relay-public-key-hex>", "<relay-public-key-hex>"]
}
```

## Example

The following script shows a basic example of downloading a model via Hyperdrive with blind relays enabled:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/download-with-blind-relays.js title="blind-relays.js" lineNumbers
      // Set config path before importing SDK - the SDK loads config during initialization
      // This example uses a config file that defines blind relay public keys for improved P2P connectivity
      // Blind relays help establish connections through NAT/firewalls by acting as intermediaries
      const configDir = import.meta.dirname ?? process.cwd();
      process.env["QVAC_CONFIG_PATH"] =
          `${configDir}/config/blind-relay/blind-relay.config.js`;
      import { downloadAsset, close, LLAMA_3_2_1B_INST_Q4_0, loadModel, getModelInfo, unloadModel, } from "@qvac/sdk";
      console.log(`🚀 Download with Blind Relays Example`);
      console.log(`${"=".repeat(60)}\n`);
      try {
          // Config is loaded from examples/config/qvac.config.json (set via QVAC_CONFIG_PATH above)
          // The config contains swarmRelays - an array of Hyperswarm relay public keys
          // These relays help with NAT traversal and firewall bypassing for P2P downloads
          console.log(`📥 Starting model download from Hyperdrive...\n`);
          const startTime = Date.now();
          const modelId = await loadModel({
              modelSrc: LLAMA_3_2_1B_INST_Q4_0,
              modelType: "llm",
          });
          console.log(`Model loaded with ID: ${modelId}`);
          const firstStatus = await getModelInfo(LLAMA_3_2_1B_INST_Q4_0);
          console.log(`First status: ${JSON.stringify(firstStatus)}`);
          await unloadModel({ modelId });
          // Download model with progress tracking
          await downloadAsset({
              assetSrc: LLAMA_3_2_1B_INST_Q4_0,
              onProgress: (progress) => {
                  const downloadedMB = (progress.downloaded / 1024 / 1024).toFixed(2);
                  const totalMB = (progress.total / 1024 / 1024).toFixed(2);
                  const percentage = progress.percentage.toFixed(1);
                  const elapsedSeconds = (Date.now() - startTime) / 1000;
                  const speedMBps = (progress.downloaded /
                      1024 /
                      1024 /
                      elapsedSeconds).toFixed(2);
                  console.log(`📊 ${percentage}% - ${downloadedMB}MB / ${totalMB}MB (${speedMBps} MB/s)`);
              },
          });
          console.log(`\n✅ Model downloaded successfully using blind relays!`);
          console.log(`Blind relays helped establish peer connections through NAT/firewalls\n`);
          await close();
      }
      catch (error) {
          console.error("❌ Error:", error);
          console.log(`\nIf download failed, check the relay public keys in:`);
          console.log(`   examples/config/qvac.config.json`);
          console.log(`   (Mock keys in this example won't work in practice!)`);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/download-with-blind-relays.ts title="blind-relays.ts" lineNumbers
      // Set config path before importing SDK - the SDK loads config during initialization
      // This example uses a config file that defines blind relay public keys for improved P2P connectivity
      // Blind relays help establish connections through NAT/firewalls by acting as intermediaries

      const configDir = import.meta.dirname ?? process.cwd();
      process.env["QVAC_CONFIG_PATH"] =
        `${configDir}/config/blind-relay/blind-relay.config.js`;

      import {
        downloadAsset,
        close,
        LLAMA_3_2_1B_INST_Q4_0,
        type ModelProgressUpdate,
        loadModel,
        getModelInfo,
        unloadModel,
      } from "@qvac/sdk";

      console.log(`🚀 Download with Blind Relays Example`);
      console.log(`${"=".repeat(60)}\n`);

      try {
        // Config is loaded from examples/config/qvac.config.json (set via QVAC_CONFIG_PATH above)
        // The config contains swarmRelays - an array of Hyperswarm relay public keys
        // These relays help with NAT traversal and firewall bypassing for P2P downloads

        console.log(`📥 Starting model download from Hyperdrive...\n`);

        const startTime = Date.now();

        const modelId = await loadModel({
          modelSrc: LLAMA_3_2_1B_INST_Q4_0,
          modelType: "llm",
        });

        console.log(`Model loaded with ID: ${modelId}`);

        const firstStatus = await getModelInfo(LLAMA_3_2_1B_INST_Q4_0);

        console.log(`First status: ${JSON.stringify(firstStatus)}`);

        await unloadModel({ modelId });

        // Download model with progress tracking
        await downloadAsset({
          assetSrc: LLAMA_3_2_1B_INST_Q4_0,
          onProgress: (progress: ModelProgressUpdate) => {
            const downloadedMB = (progress.downloaded / 1024 / 1024).toFixed(2);
            const totalMB = (progress.total / 1024 / 1024).toFixed(2);
            const percentage = progress.percentage.toFixed(1);
            const elapsedSeconds = (Date.now() - startTime) / 1000;
            const speedMBps = (
              progress.downloaded /
              1024 /
              1024 /
              elapsedSeconds
            ).toFixed(2);

            console.log(
              `📊 ${percentage}% - ${downloadedMB}MB / ${totalMB}MB (${speedMBps} MB/s)`,
            );
          },
        });

        console.log(`\n✅ Model downloaded successfully using blind relays!`);
        console.log(
          `Blind relays helped establish peer connections through NAT/firewalls\n`,
        );

        await close();
      } catch (error) {
        console.error("❌ Error:", error);
        console.log(`\nIf download failed, check the relay public keys in:`);
        console.log(`   examples/config/qvac.config.json`);
        console.log(`   (Mock keys in this example won't work in practice!)`);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>

## Notes

The examples use mock relay keys. For real deployments, you **must** use your own relay servers or trusted public relays.

The same `swarmRelays` config also applies to delegated inference (provider/consumer). Configure it via [`qvac.config.*` file (or `QVAC_CONFIG_PATH`)](https://github.com/tetherto/qvac/blob/main/packages/qvac-sdk/schemas/translation-config.ts) and then start your provider/consumer as usual.


# Delegated inference (/sdk/examples/p2p/delegated-inference)


## Overview

Lets a **consumer** delegate inference requests to a remote **provider** in a P2P manner via [Hyperswarm](https://docs.pears.com/building-blocks/hyperswarm). Use it when an inference requires more resources than the local device are able to provide.

Delegation is configured at model-load time by passing a `delegate` object to [`loadModel()`](/sdk/api/loadModel). The provider is started separately using [`startQVACProvider()`](/sdk/api/startQVACProvider).

## Functions

**Provider:**

1. [`startQVACProvider()`](/sdk/api/startQVACProvider) — join a topic and serve requests
2. [`stopQVACProvider()`](/sdk/api/stopQVACProvider) — leave the topic

**Consumer:**

1. [`loadModel()`](/sdk/api/loadModel) — with `delegate` option
2. [`completion()`](/sdk/api/completion) / [`transcribe()`](/sdk/api/transcribe) / [`translate()`](/sdk/api/translate) / etc. — same as local
3. [`unloadModel()`](/sdk/api/unloadModel)

For how to use each function, see [SDK — API reference](/sdk/api/).

## Provider

Joins a topic and serves delegated requests. It publishes its public key to share with consumers.

## Consumer

Creates a delegated model via `loadModel({ delegate: ... })`.
`delegate` main options:

* `topic`: topic hex string
* `providerPublicKey`: provider public key
* `timeout`: request timeout in ms (optional)
* `fallbackToLocal`: if `true`, run locally when delegation fails (optional)
* `forceNewConnection`: if `true`, do not reuse cached connections (optional)

## Examples

### Consumer

The following script shows an example of a consumer that delegates `completion()` requests to a provider:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/delegated-inference/consumer.js title="delegated-inference-consumer.js" lineNumbers
      import { completion, LLAMA_3_2_1B_INST_Q4_0, loadModel, close, } from "@qvac/sdk";
      // Topic hex (required)
      const topicHex = process.argv[2];
      if (!topicHex) {
          console.error("❌ Topic hex is required. Usage: node consumer.ts <topic-hex> <provider-public-key> [consumer-seed]");
          process.exit(1);
      }
      // Provider public key (required)
      const providerPublicKey = process.argv[3];
      if (!providerPublicKey) {
          console.error("❌ Provider public key is required. Usage: node consumer.ts <topic-hex> <provider-public-key> [consumer-seed]");
          process.exit(1);
      }
      try {
          // Optional: Consumer seed for deterministic consumer identity (for firewall testing)
          const consumerSeed = process.argv[4];
          process.env["QVAC_HYPERSWARM_SEED"] = consumerSeed;
          console.log(`🚀 Testing delegated inference`);
          console.log(`📡 Topic: ${topicHex}`);
          console.log(`🔑 Provider: ${providerPublicKey}`);
          if (consumerSeed) {
              console.log(`🔑 Consumer seed: ${consumerSeed.substring(0, 16)}... (deterministic identity)`);
          }
          else {
              console.log(`🎲 No consumer seed provided (random identity)`);
          }
          const modelId = await loadModel({
              modelSrc: LLAMA_3_2_1B_INST_Q4_0,
              modelType: "llm",
              delegate: {
                  topic: topicHex,
                  providerPublicKey,
                  timeout: 5_000, // Optional: 5 second timeout for delegated requests
                  fallbackToLocal: true, // Optional: Fall back to local inference if delegation fails
                  // forceNewConnection: true, // Optional: Force a new connection instead of reusing cached one
              },
              onProgress: (progress) => {
                  console.log(`📊 Download progress: ${progress.percentage.toFixed(1)}% (${progress.downloaded}/${progress.total} bytes)`);
              },
          });
          console.log(`✅ Delegated model registered: ${modelId}`);
          const response = completion({
              modelId,
              history: [{ role: "user", content: "Hello!" }],
              stream: true,
          });
          for await (const token of response.tokenStream) {
              console.log(`📨 Response: ${token}`);
          }
          console.log("🔍 Stats:", await response.stats);
          console.log("\n🎯 Delegation infrastructure working! Server correctly detected and routed the delegated request.");
          void close();
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/delegated-inference/consumer.ts title="delegated-inference-consumer.ts" lineNumbers
      import {
        completion,
        LLAMA_3_2_1B_INST_Q4_0,
        loadModel,
        close,
      } from "@qvac/sdk";

      // Topic hex (required)
      const topicHex = process.argv[2];
      if (!topicHex) {
        console.error(
          "❌ Topic hex is required. Usage: node consumer.ts <topic-hex> <provider-public-key> [consumer-seed]",
        );
        process.exit(1);
      }

      // Provider public key (required)
      const providerPublicKey = process.argv[3];
      if (!providerPublicKey) {
        console.error(
          "❌ Provider public key is required. Usage: node consumer.ts <topic-hex> <provider-public-key> [consumer-seed]",
        );
        process.exit(1);
      }

      try {
        // Optional: Consumer seed for deterministic consumer identity (for firewall testing)
        const consumerSeed = process.argv[4];

        process.env["QVAC_HYPERSWARM_SEED"] = consumerSeed;

        console.log(`🚀 Testing delegated inference`);
        console.log(`📡 Topic: ${topicHex}`);
        console.log(`🔑 Provider: ${providerPublicKey}`);
        if (consumerSeed) {
          console.log(
            `🔑 Consumer seed: ${consumerSeed.substring(0, 16)}... (deterministic identity)`,
          );
        } else {
          console.log(`🎲 No consumer seed provided (random identity)`);
        }

        const modelId = await loadModel({
          modelSrc: LLAMA_3_2_1B_INST_Q4_0,
          modelType: "llm",
          delegate: {
            topic: topicHex,
            providerPublicKey,
            timeout: 5_000, // Optional: 5 second timeout for delegated requests
            fallbackToLocal: true, // Optional: Fall back to local inference if delegation fails
            // forceNewConnection: true, // Optional: Force a new connection instead of reusing cached one
          },
          onProgress: (progress) => {
            console.log(
              `📊 Download progress: ${progress.percentage.toFixed(1)}% (${progress.downloaded}/${progress.total} bytes)`,
            );
          },
        });

        console.log(`✅ Delegated model registered: ${modelId}`);

        const response = completion({
          modelId,
          history: [{ role: "user", content: "Hello!" }],
          stream: true,
        });

        for await (const token of response.tokenStream) {
          console.log(`📨 Response: ${token}`);
        }

        console.log("🔍 Stats:", await response.stats);

        console.log(
          "\n🎯 Delegation infrastructure working! Server correctly detected and routed the delegated request.",
        );

        void close();
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Provider

The following script shows an example of starting a provider and printing the `topic` + provider `publicKey` for consumers:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/delegated-inference/provider.js title="delegated-inference-provider.js" lineNumbers
      import { startQVACProvider } from "@qvac/sdk";
      // Random topic if not provided
      const topic = process.argv[2] ||
          "66646f696865726f6569686a726530776a66646f696865726f6569686a726530";
      // Optional: Seed for deterministic provider identity (64-character hex string)
      const seed = process.argv[3];
      process.env["QVAC_HYPERSWARM_SEED"] = seed;
      // Optional: Consumer public key for firewall (allow only this consumer)
      const allowedConsumerPublicKey = process.argv[4];
      console.log(`🚀 Starting provider service with topic: ${topic}...`);
      try {
          if (allowedConsumerPublicKey) {
              console.log(`🔒 Firewall enabled: only allowing consumer ${allowedConsumerPublicKey}`);
          }
          // Start the provider service with optional firewall and seed
          const response = await startQVACProvider({
              topic,
              firewall: allowedConsumerPublicKey
                  ? {
                      mode: "allow",
                      publicKeys: [allowedConsumerPublicKey],
                  }
                  : undefined,
          });
          console.log("✅ Provider service started successfully!");
          console.log("🔗 Provider is now available for delegated inference requests");
          console.log("");
          console.log("📋 Connection Details:");
          console.log(`   📡 Topic (shared): ${topic}`);
          console.log(`   🆔 Provider Public Key (unique): ${response.publicKey}`);
          console.log("");
          console.log("💡 Consumer command:");
          console.log(`   node consumer.ts ${topic} ${response.publicKey}`);
          console.log("");
          console.log("💡 To reproduce this provider identity:");
          console.log(`   node provider.ts ${topic} ${seed || "<random-seed>"}`);
          if (!seed) {
              console.log("   (Note: seed was random this time, set one for reproducible identity)");
          }
          console.log("");
          console.log("🔒 For firewall testing:");
          console.log("   1. Generate a consumer seed (64-char hex)");
          console.log("   2. Get consumer public key: getConsumerPublicKey(consumerSeed)");
          console.log("   3. Restart provider with consumer public key as 4th argument");
          console.log(`   4. Run consumer with: node consumer.ts ${topic} ${response.publicKey} <consumer-seed>`);
          // Keep the process running
          console.log("📡 Provider is running... Press Ctrl+C to stop");
          process.on("SIGINT", () => {
              console.log("\n🛑 Provider service stopped");
              process.exit(0);
          });
          process.stdin.resume();
      }
      catch (error) {
          console.error("❌ Error:", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/delegated-inference/provider.ts title="delegated-inference-provider.ts" lineNumbers
      import { startQVACProvider } from "@qvac/sdk";

      // Random topic if not provided
      const topic =
        process.argv[2] ||
        "66646f696865726f6569686a726530776a66646f696865726f6569686a726530";

      // Optional: Seed for deterministic provider identity (64-character hex string)
      const seed: string | undefined = process.argv[3];

      process.env["QVAC_HYPERSWARM_SEED"] = seed;

      // Optional: Consumer public key for firewall (allow only this consumer)
      const allowedConsumerPublicKey: string | undefined = process.argv[4];

      console.log(`🚀 Starting provider service with topic: ${topic}...`);

      try {
        if (allowedConsumerPublicKey) {
          console.log(
            `🔒 Firewall enabled: only allowing consumer ${allowedConsumerPublicKey}`,
          );
        }

        // Start the provider service with optional firewall and seed
        const response = await startQVACProvider({
          topic,
          firewall: allowedConsumerPublicKey
            ? {
                mode: "allow" as const,
                publicKeys: [allowedConsumerPublicKey],
              }
            : undefined,
        });

        console.log("✅ Provider service started successfully!");
        console.log("🔗 Provider is now available for delegated inference requests");
        console.log("");
        console.log("📋 Connection Details:");
        console.log(`   📡 Topic (shared): ${topic}`);
        console.log(`   🆔 Provider Public Key (unique): ${response.publicKey}`);
        console.log("");
        console.log("💡 Consumer command:");
        console.log(`   node consumer.ts ${topic} ${response.publicKey}`);
        console.log("");
        console.log("💡 To reproduce this provider identity:");
        console.log(`   node provider.ts ${topic} ${seed || "<random-seed>"}`);
        if (!seed) {
          console.log(
            "   (Note: seed was random this time, set one for reproducible identity)",
          );
        }
        console.log("");
        console.log("🔒 For firewall testing:");
        console.log("   1. Generate a consumer seed (64-char hex)");
        console.log(
          "   2. Get consumer public key: getConsumerPublicKey(consumerSeed)",
        );
        console.log(
          "   3. Restart provider with consumer public key as 4th argument",
        );
        console.log(
          `   4. Run consumer with: node consumer.ts ${topic} ${response.publicKey} <consumer-seed>`,
        );

        // Keep the process running
        console.log("📡 Provider is running... Press Ctrl+C to stop");
        process.on("SIGINT", () => {
          console.log("\n🛑 Provider service stopped");
          process.exit(0);
        });

        process.stdin.resume();
      } catch (error) {
        console.error("❌ Error:", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/sdk/getting-started/quickstart).
</Callout>

## Notes

* Consumers do not handle reconnection automatically yet. If the provider restarts, restart the consumer.
* To stop a running provider and leave the topic, call [`stopQVACProvider()`](/sdk/api/stopQVACProvider).
* When starting the provider, you can optionally set a firewall rule to allow/deny specific consumer public keys.