# Introduction (/introduction) ## Overview Install the npm package `@qvac/sdk` in your project. Then, load models and use them to perform AI inference locally, or delegate inference to peers using the built-in P2P capability. {/* ## Releases - [Latest version: v0.7.0](https://www.npmjs.com/package/@qvac/sdk) - [Release notes for this version](https://github.com/tetherto/qvac-sdk/releases/tag/v0.5.0) */} ## Description The JS SDK is cross-platform, type-safe, and pluggable, exposing all QVAC capabilities through a unified interface. ### Key features * **Cross-platform:** portable code across Linux, macOS, and Windows (Node.js / [Bare runtime](https://bare.pears.com)); Android and iOS ([Expo](https://expo.dev)). * **Pluggable**: build lean apps by including only what you need, and extend the SDK with custom plugins. * **Type-safe:** typed JS API. * **Unified interface:** multiple AI tasks, one single npm package to install in your project. ## Quickstart At the end, you’ll find instructions for running all examples in this documentation. ## Installation Supported environments and how to install the SDK for each one. ## Functionalities ### AI tasks {/* : for via */} {/* : for , via */} * [**Text generation:**](/ai-capabilities/text-generation) LLM inference for text generation and chat via [`qvac-fabric-llm.cpp`](https://github.com/tetherto/qvac-fabric-llm.cpp). * [**Text embeddings:**](/ai-capabilities/text-embeddings) vector embedding generation for semantic search, clustering, and retrieval, via `qvac-fabric-llm.cpp`. * [**RAG:**](/ai-capabilities/rag) out-of-the-box retrieval-augmented generation workflow. * [**Fine-tuning:**](/ai-capabilities/fine-tuning) adapting LLMs to domain-specific tasks via LoRA. * [**Multimodal:**](/ai-capabilities/multimodal) LLM inference over text, images, and other media within a single conversation context. * [**Image generation:**](/ai-capabilities/image-generation) text-to-image and image-to-image generation via a customized Diffusion engine. * [**Video generation:**](/ai-capabilities/video-generation) text-to-video generation via a customized Diffusion engine. * [**Transcription:**](/ai-capabilities/transcription) automatic speech recognition (ASR) for speech-to-text via a customized Whisper engine or [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2). * [**Text-to-Speech:**](/ai-capabilities/text-to-speech) speech synthesis for text-to-speech (TTS) via [a customized GGML backend](https://github.com/tetherto/qvac/tree/main/packages/tts-ggml). * [**Voice assistant:**](/ai-capabilities/voice-assistant) real-time voice conversation pipeline chaining transcription, text generation, and text-to-speech. * [**Translation:**](/ai-capabilities/translation) text-to-text neural machine translation (NMT), via `qvac-fabric-llm.cpp` and [Bergamot](https://browser.mt). * [**VLA:**](/ai-capabilities/vla) vision-language-action that turns camera frames, robot state, and natural-language instruction into action chunks for robot control, via [a customized GGML backend](https://github.com/tetherto/qvac/tree/main/packages/vla-ggml). * [**OCR:**](/ai-capabilities/ocr) optical character recognition (OCR) for extracting text from images via ONNX runtime. * [**Image classification:**](/ai-capabilities/image-classification) assigning class labels with confidence scores to images, via [a customized GGML backend](https://github.com/tetherto/qvac/tree/main/packages/classification-ggml). ### P2P capabilities * [**Delegated inference:**](/p2p-capabilities/delegated-inference) delegate inference to peers via the [Holepunch stack](https://holepunch.to), enabling resource sharing. * **Fetch models:** download AI models from peers via the distributed model registry. * [**Blind relays:**](/p2p-capabilities/blind-relays) connect peers across NATs/firewalls by routing traffic through relay nodes. ### Utilities * [**Logging:**](/runtime/logging) visibility into what's happening during loading, inference, and other operations. * [**Profiler:**](/runtime/profiler) measure and export timing metrics across model loading, inference, and P2P delegation. * [**Download Lifecycle:**](/models/download-lifecycle) pause and resume model downloads. * [**Runtime lifecycle:**](/runtime/lifecycle) suspend and resume the SDK runtime (e.g., on app background/foreground) and query lifecycle state. * [**Cancellation:**](/runtime/cancellation) cancel any in-flight inference, model load, or download by `requestId`, or broad-cancel by `modelId` for unload/shutdown. * [**Sharded models:**](/models/sharded-models) download a model that is sharded into multiple parts. ## Flow Before you can use a model, you need to load it from some location into memory. Flow for performing AI inference: 1. Call function [`loadModel()`](/reference/api#loadmodel) to initialize the SDK and load one model. You can load multiple models simultaneously calling `loadModel()` again. 2. Perform AI tasks by calling the appropriate functions from SDK API — e.g., `completion()`. 3. When you are done with a model, call [`unloadModel()`](/reference/api#unloadmodel) to release computer resources. 4. Finally, close the SDK instance by calling [`close()`](/reference/api). ## Models Each [AI task](#ai-tasks) works with different model families, and among the supported ones, you can choose which to use and how to obtain them. `loadModel()` manages the download and caching of models (one or multiple files), and their loading from disk into memory, preparing them for use. `loadModel()` supports loading models from three different locations: * Local filesystem, by providing a path. * HTTP server, by providing an HTTP URL. * Our distributed model registry. The SDK package does not ship with built-in models, **but** its API exposes constants representing preconfigured models (e.g., `LLAMA_3_2_1B_INST_Q4_0`). Each constant maps a model already published to our model registry. When calling `loadModel()`, you can provide one of these constants instead of a location, making model retrieval transparent. See the index of models available in our distributed model registry. For more on querying the model registry, see [`modelRegistryList()`](/reference/api#modelregistrylist), [`modelRegistrySearch()`](/reference/api#modelregistrysearch), and [`modelRegistryGetModel()`](/reference/api#modelregistrygetmodel). For more on loading models, see [`loadModel()` at `@qvac/sdk` API reference](/reference/api#loadmodel). ## Configuration Use `qvac.config.*` to configure QVAC's overall behavior. ### Plugin system Enable and disable built-in AI capabilities, and add new ones via custom plugins. Guidelines to ship your custom plugin as a single npm package. ## JS API `@qvac/sdk` npm package exposes a function-centric, typed JS API. ## How it works Understand what happens under the hood when you use QVAC SDK in your application ## Other resources * [SDK landing page](https://qvac.tether.io/dev/sdk/) * [Package at npm](https://www.npmjs.com/package/@qvac/sdk)