# Introduction (/introduction)



## Overview

Install the npm package `@qvac/sdk` in your project. Then, load models and use them to perform AI inference locally, or delegate inference to peers using the built-in P2P capability.

{/*
  ## Releases

  - [Latest version: v0.7.0](https://www.npmjs.com/package/@qvac/sdk)
  - [Release notes for this version](https://github.com/tetherto/qvac-sdk/releases/tag/v0.5.0)
  */}

## Description

The JS SDK is cross-platform, type-safe, and pluggable, exposing all QVAC capabilities through a unified interface.

### Key features

* **Cross-platform:** portable code across Linux, macOS, and Windows (Node.js / [Bare runtime](https://bare.pears.com)); Android and iOS ([Expo](https://expo.dev)).
* **Pluggable**: build lean apps by including only what you need, and extend the SDK with custom plugins.
* **Type-safe:** typed JS API.
* **Unified interface:** multiple AI tasks, one single npm package to install in your project.

## Quickstart

<Card href="/quickstart" title="Run your first example using the JS SDK">
  At the end, you’ll find instructions for running all examples in this documentation.
</Card>

## Installation

<Card href="/installation" title="Install and run on Node.js, Bare, or Expo">
  Supported environments and how to install the SDK for each one.
</Card>

## Functionalities

### AI tasks

{/* <Task name>: <what computation is performed> for <what the developer achieves> via <engine> */}

{/* <task>: <tech process> for <use case>, via <engine> */}

* [**Text generation:**](/ai-capabilities/text-generation) LLM inference for text generation and chat via [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp).
* [**Text embeddings:**](/ai-capabilities/text-embeddings) vector embedding generation for semantic search, clustering, and retrieval, via `qvac-fabric-llm.cpp`.
* [**RAG:**](/ai-capabilities/rag) out-of-the-box retrieval-augmented generation workflow.
* [**Fine-tuning:**](/ai-capabilities/fine-tuning) adapting LLMs to domain-specific tasks via LoRA.
* [**Multimodal:**](/ai-capabilities/multimodal) LLM inference over text, images, and other media within a single conversation context.
* [**Image generation:**](/ai-capabilities/image-generation) text-to-image and image-to-image generation via a customized Diffusion engine.
* [**Video generation:**](/ai-capabilities/video-generation) text-to-video generation via a customized Diffusion engine.
* [**Transcription:**](/ai-capabilities/transcription) automatic speech recognition (ASR) for speech-to-text via a customized Whisper engine or [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2).
* [**Text-to-Speech:**](/ai-capabilities/text-to-speech) speech synthesis for text-to-speech (TTS) via [a customized GGML backend](https://github.com/tetherto/qvac/tree/main/packages/tts-ggml).
* [**Voice assistant:**](/ai-capabilities/voice-assistant) real-time voice conversation pipeline chaining transcription, text generation, and text-to-speech.
* [**Translation:**](/ai-capabilities/translation) text-to-text neural machine translation (NMT), via `qvac-fabric-llm.cpp` and [Bergamot](https://browser.mt).
* [**VLA:**](/ai-capabilities/vla) vision-language-action that turns camera frames, robot state, and natural-language instruction into action chunks for robot control, via [a customized GGML backend](https://github.com/tetherto/qvac/tree/main/packages/vla-ggml).
* [**OCR:**](/ai-capabilities/ocr) optical character recognition (OCR) for extracting text from images via ONNX runtime.
* [**Image classification:**](/ai-capabilities/image-classification) assigning class labels with confidence scores to images, via [a customized GGML backend](https://github.com/tetherto/qvac/tree/main/packages/classification-ggml).

### P2P capabilities

* [**Delegated inference:**](/p2p-capabilities/delegated-inference) delegate inference to peers via the [Holepunch stack](https://holepunch.to), enabling resource sharing.
* **Fetch models:** download AI models from peers via the distributed model registry.
* [**Blind relays:**](/p2p-capabilities/blind-relays) connect peers across NATs/firewalls by routing traffic through relay nodes.

### Utilities

* [**Logging:**](/runtime/logging) visibility into what's happening  during loading, inference, and other operations.
* [**Profiler:**](/runtime/profiler) measure and export timing metrics across model loading, inference, and P2P delegation.
* [**Download Lifecycle:**](/models/download-lifecycle) pause and resume model downloads.
* [**Runtime lifecycle:**](/runtime/lifecycle) suspend and resume the SDK runtime (e.g., on app background/foreground) and query lifecycle state.
* [**Cancellation:**](/runtime/cancellation) cancel any in-flight inference, model load, or download by `requestId`, or broad-cancel by `modelId` for unload/shutdown.
* [**Sharded models:**](/models/sharded-models) download a model that is sharded into multiple parts.

## Flow

Before you can use a model, you need to load it from some location into memory. Flow for performing AI inference:

1. Call function [`loadModel()`](/reference/api#loadmodel) to initialize the SDK and load one model. You can load multiple models simultaneously calling `loadModel()` again.
2. Perform AI tasks by calling the appropriate functions from SDK API — e.g., `completion()`.
3. When you are done with a model, call [`unloadModel()`](/reference/api#unloadmodel) to release computer resources.
4. Finally, close the SDK instance by calling [`close()`](/reference/api).

## Models

Each [AI task](#ai-tasks) works with different model families, and among the supported ones, you can choose which to use and how to obtain them. `loadModel()` manages the download and caching of models (one or multiple files), and their loading from disk into memory, preparing them for use.

`loadModel()` supports loading models from three different locations:

* Local filesystem, by providing a path.
* HTTP server, by providing an HTTP URL.
* Our distributed model registry.

The SDK package does not ship with built-in models, **but** its API exposes constants representing preconfigured models (e.g., `LLAMA_3_2_1B_INST_Q4_0`). Each constant maps a model already published to our model registry. When calling `loadModel()`, you can provide one of these constants instead of a location, making model retrieval transparent.

<Card href="https://github.com/tetherto/qvac/blob/main/packages/sdk/models/registry/models.ts" title="Model registry index">
  See the index of models available in our distributed model registry.
</Card>

For more on querying the model registry, see [`modelRegistryList()`](/reference/api#modelregistrylist), [`modelRegistrySearch()`](/reference/api#modelregistrysearch), and [`modelRegistryGetModel()`](/reference/api#modelregistrygetmodel).

For more on loading models, see [`loadModel()` at `@qvac/sdk` API reference](/reference/api#loadmodel).

## Configuration

<Card href="/configuration" title="qvac.config.*">
  Use `qvac.config.*` to configure QVAC's overall behavior.
</Card>

### Plugin system

<Cards className="grid-cols-1">
  <Card href="/configuration/plugins" title="Built-in and custom plugins">
    Enable and disable built-in AI capabilities, and add new ones via custom plugins.
  </Card>

  <Card href="/configuration/plugins/write-custom-plugin" title="Write a custom plugin">
    Guidelines to ship your custom plugin as a single npm package.
  </Card>
</Cards>

## JS API

<Card href="/reference/api" title="API reference">
  `@qvac/sdk` npm package exposes a function-centric, typed JS API.
</Card>

## How it works

<Card href="/about/how-it-works" title="How it works">
  Understand what happens under the hood when you use QVAC SDK in your application
</Card>

## Other resources

* [SDK landing page](https://qvac.tether.io/dev/sdk/)
* [Package at npm](https://www.npmjs.com/package/@qvac/sdk)
