# QVAC by Tether: the Infinite Stable Intelligence Platform (/) import { MessagesSquare, Hash, Languages, Mic, Speech, Volume2, ScanText, Image as ImageIcon, Video, GalleryHorizontal, FlaskConical, ScanSearch, Shapes, Eye, Map, Rocket, Server, MonitorPlay } from 'lucide-react' ## Why QVAC? QVAC is Tether's answer to centralized AI by ensuring AI is not tied to massive data centers in the hands of a few *but is free to run on everyone's devices, without a central point of failure or arbitrary censorship*. ## Features ## AI capabilities {/* : for via */} {/* : for , via */} Text generation}> LLM inference for text generation and chat via Fabric LLM. Text embeddings}> Vector embedding generation for semantic search, clustering, and retrieval, via Fabric LLM. RAG}> Out-of-the-box retrieval-augmented generation workflow. Fine-tuning}> Adapting LLMs to domain-specific tasks via LoRA. Multimodal}> LLM inference over text, images, and other media within a single conversation context. Image generation}> Text-to-image and image-to-image generation via a customized Diffusion backend. Transcription}> Automatic speech recognition (ASR) via a customized Whisper backend or NVIDIA Parakeet. Text-to-Speech}> Speech synthesis (TTS) via a customized GGML backend. Voice assistant}> Real-time voice pipeline: transcription, text generation, and speech synthesis in one loop. Translation}> Text-to-text neural machine translation (NMT), via Fabric LLM and Bergamot. VLA}> Vision-language-action for robot control via a customized GGML backend. OCR}> Optical character recognition for extracting text from images via ONNX Runtime. Image classification}> Classify images into labels with confidence scores via a customized GGML backend. ## System overview {/* Two theme-specific copies of the same diagram, swapped by the site's `.dark` class. The diagram is a draw.io SVG that colors strokes/text via `light-dark()`, which resolves against the SVG document's own `color-scheme`. Because the SVG is embedded in an isolated `` document, the site's class-based theme does NOT reach into it — and Safari (unlike Chromium) won't propagate the embedding context's color-scheme, so dark theme rendered black-on-black. Each copy hard-codes its root `color-scheme` (light vs dark) so `light-dark()` resolves deterministically in every browser; we then show the matching copy via `dark:`/`hidden`. NOTE: temporary fix — the two SVGs differ only by that one `color-scheme` token (regenerate the dark one from the light one). A single-source inline solution is the planned follow-up. */} System overview diagram System overview diagram *The SDK is the main entry point for using QVAC*. It is type-safe and exposes all QVAC capabilities through a unified interface. It runs on Node.js, [Bare runtime](https://bare.pears.com), and [Expo](https://expo.dev). Additionally, QVAC provides a CLI with development tools, as well as an HTTP server that wraps QVAC and exposes an [**OpenAI-compatible API**](https://platform.openai.com/docs/api-reference). *By implementing the OpenAI API format, QVAC can integrate with the broader AI ecosystem.* Finally, QVAC also encompasses desktop and mobile [**flagship applications**](https://qvac.tether.io#products) to empower users and showcase QVAC capabilities, as well as [**research initiatives**](https://huggingface.co/qvac) to advance the state of the art in local AI. ## Next steps Choose how you want to start with QVAC: } title="Get started with the SDK"> Learn the essentials for using the SDK: system requirements, compatibility matrix, setup basics, and core usage flows. } title="Quickstart"> Run your first example using the JS/TS SDK. At the end, you'll find instructions to run any example in this documentation. } title="See QVAC in practice"> Try QVAC Workbench, our flagship desktop and mobile app built with QVAC. } title="Integrate an OpenAI-compatible client"> Run the QVAC HTTP server and connect any existing OpenAI-compatible system to local AI. # Installation (/installation) import { TrackCopy } from '@/components/track-copy' ## Supported environments QVAC SDK is distributed as the npm package `@qvac/sdk` for JavaScript/TypeScript projects. ### JS environments * Node.js $\geq$ v22.17 * [Bare](https://bare.pears.com) $\geq$ v1.24 * [Expo](https://expo.dev) $\geq$ v54 ### Compatibility matrix | Platform | Min Version | Architecture | GPU API/Backend | Notes | | -------- | ----------- | ------------ | ---------------------------- | --------------------------------------------------------------------------- | | macOS | 14.0+ | arm64 | Metal | Arch x64 supports CPU inference only; Intel iGPU acceleration not supported | | iOS | 17.0+ | arm64 | Metal | Requires Expo | | Linux | Ubuntu 22+ | arm64, x64 | Vulkan | Vulkan runtime required | | Android | 12+ | arm64 | Vulkan, OpenCL (Adreno 700+) | Requires Expo | | Windows | 10+ | x64 | Vulkan | Vulkan-capable GPU + vendor drivers required | ## Installation ```bash npm i @qvac/sdk ``` ### Linux Requirements: * Ubuntu 22 requires [g++](https://github.com/gcc-mirror/gcc) $\geq$ 13. * Vulkan runtime: Vulkan loader + a GPU driver with Vulkan support On desktop Linux distributions (e.g., Ubuntu Desktop), these requirements are typically satisfied out of the box. On PCs, the Vulkan runtime is usually installed along with the GPU drivers. In other words, *if you've installed the correct driver for your GPU (with Vulkan support), you typically don't need to install anything else.* To verify it, install Vulkan tools and run `vulkaninfo`: ```bash sudo apt update sudo apt install -y vulkan-tools vulkaninfo --summary ``` ```bash sudo dnf install -y vulkan-tools vulkan-devel vulkaninfo --summary ``` In minimalist/headless installations (e.g., Ubuntu Server), you may need to manually install the Vulkan loader, and ensure a Vulkan-capable GPU driver (ICD) is installed. The exact packages vary by distro and GPU vendor. For example: ```bash sudo apt update sudo apt install -y libvulkan1 mesa-vulkan-drivers vulkaninfo --summary ``` ```bash sudo dnf install -y vulkan-loader mesa-vulkan-drivers vulkaninfo --summary ``` Ensure QVAC can detect the GPU Vulkan driver by adding your user to the `render` and `video` groups: ```bash sudo usermod -aG render,video $USER ``` ### Expo Install peer dependencies: ```bash npm i 'react-native-bare-kit@^0.11.5' npm i -D 'bare-pack@^1.5.1' npx expo install expo-file-system expo-build-properties expo-device ``` **Tip:** use `npx expo install` for all `expo-*` packages to ensure compatibility with your project's Expo SDK version. Configure `expo-build-properties` and add `@qvac/sdk/expo-plugin` to the `plugins` array in your `app.json`: ```json title="app.json" { "expo": { "plugins": [ ["expo-build-properties", { // [!code ++] "android": { "minSdkVersion": 29 } // [!code ++] }], // [!code ++] "@qvac/sdk/expo-plugin" // [!code ++] ] } } ``` Prebuild your project to generate the native files: ```bash npx expo prebuild ``` Build and run it on a **physical device**: ```bash npx expo run:ios --device # or npx expo run:android --device ``` Due to limitations with `llamacpp`, QVAC currently does not run on emulators. You **must** use a physical device. ### Windows Requirement: * Vulkan runtime: Vulkan loader + a GPU driver with Vulkan support This requirement is typically satisfied out of the box after installing the correct GPU vendor drivers. To verify it, [install Vulkan SDK](https://vulkan.lunarg.com) and run: ```powershell vulkaninfo --summary ``` # Introduction (/introduction) ## Overview Install the npm package `@qvac/sdk` in your project. Then, load models and use them to perform AI inference locally, or delegate inference to peers using the built-in P2P capability. {/* ## Releases - [Latest version: v0.7.0](https://www.npmjs.com/package/@qvac/sdk) - [Release notes for this version](https://github.com/tetherto/qvac-sdk/releases/tag/v0.5.0) */} ## Description The JS SDK is cross-platform, type-safe, and pluggable, exposing all QVAC capabilities through a unified interface. ### Key features * **Cross-platform:** portable code across Linux, macOS, and Windows (Node.js / [Bare runtime](https://bare.pears.com)); Android and iOS ([Expo](https://expo.dev)). * **Pluggable**: build lean apps by including only what you need, and extend the SDK with custom plugins. * **Type-safe:** typed JS API. * **Unified interface:** multiple AI tasks, one single npm package to install in your project. ## Quickstart At the end, you’ll find instructions for running all examples in this documentation. ## Installation Supported environments and how to install the SDK for each one. ## Functionalities ### AI tasks {/* : for via */} {/* : for , via */} * [**Text generation:**](/ai-capabilities/text-generation) LLM inference for text generation and chat via [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp). * [**Text embeddings:**](/ai-capabilities/text-embeddings) vector embedding generation for semantic search, clustering, and retrieval, via `qvac-fabric-llm.cpp`. * [**RAG:**](/ai-capabilities/rag) out-of-the-box retrieval-augmented generation workflow. * [**Fine-tuning:**](/ai-capabilities/fine-tuning) adapting LLMs to domain-specific tasks via LoRA. * [**Multimodal:**](/ai-capabilities/multimodal) LLM inference over text, images, and other media within a single conversation context. * [**Image generation:**](/ai-capabilities/image-generation) text-to-image and image-to-image generation via a customized Diffusion engine. * [**Video generation:**](/ai-capabilities/video-generation) text-to-video generation via a customized Diffusion engine. * [**Transcription:**](/ai-capabilities/transcription) automatic speech recognition (ASR) for speech-to-text via a customized Whisper engine or [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2). * [**Text-to-Speech:**](/ai-capabilities/text-to-speech) speech synthesis for text-to-speech (TTS) via [a customized GGML backend](https://github.com/tetherto/qvac/tree/main/packages/tts-ggml). * [**Voice assistant:**](/ai-capabilities/voice-assistant) real-time voice conversation pipeline chaining transcription, text generation, and text-to-speech. * [**Translation:**](/ai-capabilities/translation) text-to-text neural machine translation (NMT), via `qvac-fabric-llm.cpp` and [Bergamot](https://browser.mt). * [**VLA:**](/ai-capabilities/vla) vision-language-action that turns camera frames, robot state, and natural-language instruction into action chunks for robot control, via [a customized GGML backend](https://github.com/tetherto/qvac/tree/main/packages/vla-ggml). * [**OCR:**](/ai-capabilities/ocr) optical character recognition (OCR) for extracting text from images via ONNX runtime. * [**Image classification:**](/ai-capabilities/image-classification) assigning class labels with confidence scores to images, via [a customized GGML backend](https://github.com/tetherto/qvac/tree/main/packages/classification-ggml). ### P2P capabilities * [**Delegated inference:**](/p2p-capabilities/delegated-inference) delegate inference to peers via the [Holepunch stack](https://holepunch.to), enabling resource sharing. * **Fetch models:** download AI models from peers via the distributed model registry. * [**Blind relays:**](/p2p-capabilities/blind-relays) connect peers across NATs/firewalls by routing traffic through relay nodes. ### Utilities * [**Logging:**](/runtime/logging) visibility into what's happening during loading, inference, and other operations. * [**Profiler:**](/runtime/profiler) measure and export timing metrics across model loading, inference, and P2P delegation. * [**Download Lifecycle:**](/models/download-lifecycle) pause and resume model downloads. * [**Runtime lifecycle:**](/runtime/lifecycle) suspend and resume the SDK runtime (e.g., on app background/foreground) and query lifecycle state. * [**Cancellation:**](/runtime/cancellation) cancel any in-flight inference, model load, or download by `requestId`, or broad-cancel by `modelId` for unload/shutdown. * [**Sharded models:**](/models/sharded-models) download a model that is sharded into multiple parts. ## Flow Before you can use a model, you need to load it from some location into memory. Flow for performing AI inference: 1. Call function [`loadModel()`](/reference/api#loadmodel) to initialize the SDK and load one model. You can load multiple models simultaneously calling `loadModel()` again. 2. Perform AI tasks by calling the appropriate functions from SDK API — e.g., `completion()`. 3. When you are done with a model, call [`unloadModel()`](/reference/api#unloadmodel) to release computer resources. 4. Finally, close the SDK instance by calling [`close()`](/reference/api). ## Models Each [AI task](#ai-tasks) works with different model families, and among the supported ones, you can choose which to use and how to obtain them. `loadModel()` manages the download and caching of models (one or multiple files), and their loading from disk into memory, preparing them for use. `loadModel()` supports loading models from three different locations: * Local filesystem, by providing a path. * HTTP server, by providing an HTTP URL. * Our distributed model registry. The SDK package does not ship with built-in models, **but** its API exposes constants representing preconfigured models (e.g., `LLAMA_3_2_1B_INST_Q4_0`). Each constant maps a model already published to our model registry. When calling `loadModel()`, you can provide one of these constants instead of a location, making model retrieval transparent. See the index of models available in our distributed model registry. For more on querying the model registry, see [`modelRegistryList()`](/reference/api#modelregistrylist), [`modelRegistrySearch()`](/reference/api#modelregistrysearch), and [`modelRegistryGetModel()`](/reference/api#modelregistrygetmodel). For more on loading models, see [`loadModel()` at `@qvac/sdk` API reference](/reference/api#loadmodel). ## Configuration Use `qvac.config.*` to configure QVAC's overall behavior. ### Plugin system Enable and disable built-in AI capabilities, and add new ones via custom plugins. Guidelines to ship your custom plugin as a single npm package. ## JS API `@qvac/sdk` npm package exposes a function-centric, typed JS API. ## How it works Understand what happens under the hood when you use QVAC SDK in your application ## Other resources * [SDK landing page](https://qvac.tether.io/dev/sdk/) * [Package at npm](https://www.npmjs.com/package/@qvac/sdk) # Quickstart (/quickstart) import { TrackCopy } from '@/components/track-copy' ## Requirements * Node.js $\geq$ v22.17 * npm $\geq$ v10.9 ## Step-by-step Create the examples workspace: ```bash mkdir qvac-examples cd qvac-examples npm init -y && npm pkg set type=module ``` Install the SDK: ```bash npm i @qvac/sdk ``` Create the quickstart script: ```js file=/packages/sdk/dist/examples/quickstart.js title="quickstart.js" lineNumbers import { loadModel, LLAMA_3_2_1B_INST_Q4_0, completion, unloadModel, } from "@qvac/sdk"; try { // Load a model into memory const modelId = await loadModel({ modelSrc: LLAMA_3_2_1B_INST_Q4_0, onProgress: (progress) => { console.log(progress); }, }); // You can use the loaded model multiple times const history = [ { role: "user", content: "Explain quantum computing in one sentence", }, ]; const result = completion({ modelId, history, stream: true }); for await (const token of result.tokenStream) { process.stdout.write(token); } // Unload model to free up system resources await unloadModel({ modelId }); } catch (error) { console.error("❌ Error:", error); process.exit(1); } ``` Run the quickstart script with Node.js: ```bash node quickstart.js ``` Or with the [Bare](https://bare.pears.com) runtime (same script; the SDK installs a Node-compatible `process` global on Bare so `process.stdout` and `process.exit` work without importing `bare-process` in your own file): ```bash bare quickstart.js ``` You still need Bare and the SDK’s **Bare peer dependencies** (including `bare-process` and the other `bare-*` packages listed for `@qvac/sdk`) installed in the project. Recent npm versions resolve peers when you run `npm i @qvac/sdk`; if resolution fails, install the peers your package manager reports as missing. ## Running examples Follow these instructions to run any example in this documentation: * All examples are self-contained, runnable JavaScript scripts. Use the `qvac-examples` workspace created in this quickstart to store and run them as you explore this documentation. * Run each example with the indicated compatible JavaScript environment. QVAC supports multiple environments (Node.js, Bare, and Expo). After you `import` from `@qvac/sdk`, Bare has a `process` global (via `bare-process`) for typical CLI patterns; some examples still use other Node-specific APIs (e.g. `fs` without Bare shims) and note their compatible environment. * Some examples also provide a TypeScript version. If you want to run TS directly, install the required dev dependencies: ```bash npm i -D tsx typescript ``` # System requirements (/system-requirements) ## Overview Minimum host requirements for running `@qvac/sdk` and `@qvac/cli`. You can validate your environment against this list with: ```bash qvac doctor ``` Use `--json` for machine-readable output and `--quiet` to set the exit code only (`0` when all required checks pass, `1` otherwise). ## Scope The `qvac` CLI itself runs on desktops only. The SDK additionally targets Android and iOS via Expo/BareKit; those appear here as **deploy targets** with host-toolchain checks (`adb`, `xcodebuild`) but never cause a non-zero exit. ## Required | Requirement | Notes | | ------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | | Node.js `>= 18.0.0` | Node 18 is end-of-life; prefer `>= 20`. Matches `engines.node` of `@qvac/cli`. | | Supported CLI host | `darwin-arm64`, `darwin-x64`, `linux-arm64`, `linux-x64`, `win32-x64`. The CLI cannot run on mobile; those are deploy targets only. | | Total RAM `>= 2 GB` (recommended `>= 4 GB`) | Below 4 GB, most LLMs will fail to load. | ## Recommended | Requirement | When it is needed | | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Available RAM `>= 2 GB` | Needed when loading a model. Checked via `os.availableMemory()` on Node 22+, falling back to `os.freemem()` on older Nodes. | | GPU acceleration (Metal on macOS, Vulkan on Linux/Windows) | QVAC inference backends use Metal (always present on macOS) or Vulkan on Linux/Windows. Without a Vulkan ICD, LLM and Whisper inference fall back to CPU and are significantly slower. | | Free disk `>= 5 GB` in the working directory | Model artifacts are typically multi-GB per model. Uses `fs.statfsSync` (Node 18.15+) with a POSIX `df` fallback. | ## Deploy targets These checks are informational. They never cause `qvac doctor` to exit non-zero, because cross-bundling is always supported via bare-pack prebuilt binaries. What is checked here is the host toolchain needed to install/deploy to each target class. | Target | Check | Status when missing | | ------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------- | | `darwin-{arm64,x64}`, `linux-{arm64,x64}`, `win32-x64` | Listed under "Desktop"; native host flagged with `(native)`. | Always `pass` — cross-bundling is built in. | | `android-arm64` | `adb --version` | `warn` — install [Android platform tools](https://developer.android.com/tools/releases/platform-tools) to deploy to devices. | | `ios-arm64` + simulators | `xcodebuild -version` (macOS only) | `warn` on macOS without Xcode, `info` on non-macOS hosts. | ## Optional tools Only required if you use the corresponding feature. The checker warns when they are missing but does not fail. | Tool | Required for | | -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | | `ffmpeg` | Microphone capture, transcription examples, and the built-in audio decoder. Install from [ffmpeg.org](https://ffmpeg.org/download.html). | | [Bare](https://bare.pears.com) runtime | Running the SDK under Bare directly. | | [Bun](https://bun.sh) | Building the SDK from source or running the monorepo development workflow. | ## Project | Check | Notes | | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `@qvac/sdk` resolvable from project | Resolved with `require.resolve('@qvac/sdk/package.json')` rooted at the working directory; hoisted installs (monorepos, Yarn/Bun workspaces) are correctly detected. | ## Exit codes * `0` — all required checks passed. Warnings, skips, and informational rows may still be present. * `1` — one or more required checks failed (unsupported Node version, unsupported CLI host, insufficient total RAM, ...). See the printed hints for remediation steps. ## JSON schema ```ts interface DoctorReport { ok: boolean; platform: string; // e.g. "darwin" arch: string; // e.g. "arm64" nodeVersion: string; // e.g. "20.19.5" sections: Array<{ id: 'runtime' | 'hardware' | 'targets' | 'tools' | 'project'; title: string; checks: Array<{ id: string; label: string; status: 'pass' | 'warn' | 'fail' | 'skip' | 'info'; severity: 'required' | 'recommended' | 'informational'; value?: string; detail?: string; hint?: string; }>; }>; } ``` ### Status semantics * `pass` — check ran and the requirement is satisfied. * `warn` — recommended requirement not met, or a deploy-target toolchain is missing; does not cause a non-zero exit. * `fail` — required check not met; causes exit code `1`. * `skip` — the check could not be executed on this host (missing Node API and no fallback, etc.). * `info` — informational row with no pass/fail judgment (e.g. iOS deploy target on a non-macOS host). # Troubleshooting (/troubleshooting) ## CLI error: command not found ### Situation You ran a `qvac` command (e.g., `qvac doctor`) and the shell responded with: ``` zsh: command not found: qvac ``` (On bash: `bash: qvac: command not found`.) ### Cause The `qvac` binary is shipped by the `@qvac/cli` npm package. It is only added to your shell `PATH` when `@qvac/cli` is installed globally. Without a global install, the binary is not discoverable by name from your shell. ### Solution Install `@qvac/cli` globally: ```bash npm install -g @qvac/cli ``` Then re-run your command. For example: ```bash qvac doctor ``` **Tip:** If you cannot install the CLI globally, you can run it with `npx` instead: ```bash npx --package "@qvac/cli" qvac doctor ``` See [CLI → Usage](/cli#usage) for the full setup, including installing `@qvac/sdk` in your project. ## Startup crash: requested module does not provide a default export ### Situation You ran `qvac bundle sdk`, started your app, and then the worklet crashed with: ``` SyntaxError: The requested module '@qvac/sdk/tts-ggml/plugin' does not provide an export named 'default' ``` ### Cause This usually means there is version skew between `@qvac/cli` and `@qvac/sdk`. An older CLI version may generate a bundle using outdated built-in plugin metadata, while a newer SDK version exports the renamed plugin only through its current named export shape. Starting with `@qvac/cli` 0.6.0, `qvac bundle sdk` delegates to `@qvac/sdk/commands`, which keeps the bundling logic aligned with the SDK. ### Solution Upgrade `@qvac/cli`, then rebuild the SDK bundle: ```bash npm install -g @qvac/cli@latest qvac bundle sdk ``` If you do not install the CLI globally, run the latest version with `npx` instead. # How it works (/about/how-it-works) ## Overview The SDK supports multiple JS runtimes, but its [underlying components](https://github.com/tetherto/qvac/tree/main/packages) run only on [Bare](https://bare.pears.com). When the SDK runs in a runtime other than Bare, it spawns a Bare worker where all AI operations will take place. The worker is started lazily on the first RPC call and can be explicitly shut down with `close()`. ## Phase 1: initialization The first time you call `loadModel()` (or any function other than `close()`), the SDK performs a complete initialization sequence. It initializes a runtime-specific RPC client and sends configuration to the worker via the internal `__init_config` message. The worker process is spawned once and reused for subsequent calls until you explicitly close it. In Bare runtime, no separate worker process is spawned; requests are handled in-process. ## Phase 2: model loading There is only a single RPC client and Bare worker per application, not per model — i.e., singleton pattern. The model is downloaded and loaded into memory, and registered with a unique ID. From that point on, it will be available for AI inference until you unload it — call `unloadModel()` to free its memory. ## Phase 3: inference You can call `loadModel()` multiple times to make multiple models ready for use simultaneously. Additionally, you can perform AI inference multiple times with all of them. When you no longer need a model, call `unloadModel()` to free up resources. ## Phase 4: shutdown `close()` explicitly shuts down the worker and releases the RPC connection. In Node/Expo, this terminates the worker process; in Bare, the call is a no-op since there is no separate worker process. After `close()`, the next SDK call will reinitialize the RPC client and spawn a fresh worker. `unloadModel()` will automatically close the RPC connection when there are no active models or providers, but `close()` is the explicit way to shut down the SDK instance. # Public launch (/about/public-launch) Tether publicly launched QVAC in October 2025 at the Plan B Forum in Lugano. During the forum, Tether CEO Paolo Ardoino presented QVAC's future vision in his keynote, *Fiat Lux*: