# CLI (/cli) ## Overview QVAC CLI is provided through the `@qvac/cli` npm package and requires `@qvac/sdk` to be installed alongside it in your project. At the moment, it provides the following functionality: * OpenAI-compatible HTTP server * SDK bundling ## Usage Install the SDK and CLI in your project: ```bash npm install @qvac/sdk @qvac/cli ``` See [Installation](/sdk/getting-started/installation) for environment-specific instructions of the SDK. Create a `qvac.config.*` file in the root of your project and add the configuration required for the functionality you want to use. See [Configuration](/sdk/getting-started/configuration) to learn how to do this. Run a command: ```bash qvac --help ``` ## OpenAI-compatible HTTP server QVAC CLI provides an HTTP server that is compatible with the [OpenAI REST API](https://developers.openai.com/api/reference/overview), enabling broad integration with the AI ecosystem. It internally translates HTTP requests into SDK calls. As a result, any system compatible with the OpenAI REST API can point to `http://localhost:11434/v1/` and work without changes. Learn how to run a local HTTP server that exposes an OpenAI-compatible API. ## SDK bundling The `qvac bundle sdk` command allows you to select only the plugins you need in your project and reduce your application's final bundle size. See [Plugin system](/sdk/examples/utilities/plugin-system) to learn how to use it. ## Reference Run `qvac --help` to see all available commands and `qvac --help` for command-specific options. ### `qvac bundle sdk` Generate a tree-shaken Bare worker bundle with selected plugins. | Option | Description | | --------------------- | ----------------------------------------------------------- | | `-c, --config ` | Config file path (default: auto-detect `qvac.config.*`) | | `--sdk-path ` | Path to SDK package (default: auto-detect in node\_modules) | | `--host ` | Target host (repeatable) | | `--defer ` | Defer a module (repeatable) | | `-q, --quiet` | Minimal output | | `-v, --verbose` | Detailed output | ### `qvac serve openai` Start an OpenAI-compatible REST API server. | Option | Description | | ---------------------- | ------------------------------------------------------- | | `-c, --config ` | Config file path (default: auto-detect `qvac.config.*`) | | `-p, --port ` | Port to listen on (default: `11434`) | | `-H, --host
` | Host to bind to (default: `127.0.0.1`) | | `--model ` | Model alias to preload (repeatable, must be in config) | | `--api-key ` | Require Bearer token authentication | | `--cors` | Enable CORS headers | | `-v, --verbose` | Detailed output | # HTTP server (/http-server) ## Overview To run the server, you need the `@qvac/sdk` and `@qvac/cli` npm packages installed in your project. The server is provided by `@qvac/cli` and internally translates HTTP requests into SDK calls. As a result, any system compatible with the [OpenAI REST API](https://developers.openai.com/api/reference/overview) can point to `http://localhost:11434/v1/` and work without changes. ### AI capabilities At present, the HTTP server supports the following QVAC AI capabilities: * [Completion](/sdk/examples/ai-tasks/completion) * [Text embeddings](/sdk/examples/ai-tasks/text-embeddings) * [Transcription](/sdk/examples/ai-tasks/transcription) ### Compatible tools The following tools have been verified to work as drop-in replacements by pointing their base URL to the QVAC server: | Tool | Required endpoints | | ----------------------------------------------- | ------------------------------------------------------------ | | [Continue.dev](https://continue.dev) | `/v1/chat/completions` (streaming SSE), `/v1/models` | | [LangChain](https://langchain.com) | `/v1/chat/completions`, `/v1/embeddings`, `/v1/models` | | [Open Interpreter](https://openinterpreter.com) | `/v1/chat/completions` (streaming, tool calls), `/v1/models` | ## Running the server Install the SDK and CLI in your project: ```bash npm install @qvac/sdk @qvac/cli ``` See [Installation](/sdk/getting-started/installation) for environment-specific instructions of the SDK. Create the `qvac.config.*` file at the root of your project declaring which models the server can load. For example: ```json title="qvac.config.json" { "serve": { "models": { "my-llm": { "model": "QWEN3_600M_INST_Q4", "default": true, "config": { "ctx_size": 8192 } } } } } ``` Start the server: ```bash qvac serve openai ``` Send a request: ```bash curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "my-llm", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## Configuration Models are declared in `qvac.config.*` under the `serve.models` key. The server can only load models listed under this key — requests for unlisted models return 404. Each key in `serve.models` is a **model alias** — the name that HTTP clients use in the `model` field of their requests. For the full schema of `serve.models`, see [Configuration — `ServeConfig`](/sdk/getting-started/configuration#serveconfig). ### Example ```json title="qvac.config.json" { "serve": { "models": { "my-llm": { "model": "QWEN3_600M_INST_Q4", "default": true, "preload": true, "config": { "ctx_size": 8192, "tools": true } }, "my-embed": { "model": "GTE_LARGE_FP16", "default": true }, "whisper": { "model": "WHISPER_TINY", "default": true, "preload": true, "config": { "language": "en", "strategy": "greedy" } } } } } ``` * **`model`**: SDK model constant name (e.g., `QWEN3_600M_INST_Q4`). The server resolves it to a download source and addon type automatically. * **`default`**: when `true`, marks this model as the default for its endpoint category. This does not make the server auto-select the model for requests that omit `model`. * **`preload`**: when `true`, the model is loaded into memory on server startup. When `false`, it is loaded on first request (cold start). Defaults to `true` for constant model entries. * **`config`**: model config overrides passed to the underlying addon. Same options as [`modelConfig` in `loadModel()`](/sdk/api/loadModel/#modelconfig-varies-by-model-type). `default` field **does not** act as a fallback when an API request omits `model`. Requests must still include a `model` field; otherwise, the server returns `400`. ## CLI ``` qvac serve openai [options] -c, --config Config file path (default: auto-detect qvac.config.*) -p, --port Port to listen on (default: 11434) -H, --host
Host to bind to (default: 127.0.0.1) --model Model alias to preload (repeatable, must be in config) --api-key Require Bearer token authentication --cors Enable CORS headers -v, --verbose Detailed output ``` ## API All endpoints follow the [OpenAI API](https://platform.openai.com/docs/api-reference) request and response format. Base path: `/v1`. ### Endpoints * `GET /v1/models` — list loaded models * `GET /v1/models/:id` — get model details * `DELETE /v1/models/:id` — unload a model * `POST /v1/chat/completions` — chat completions (blocking + SSE streaming) * `POST /v1/embeddings` — text embeddings (single + batch) * `POST /v1/audio/transcriptions` — audio transcription ### `GET /v1/models` List all loaded models. ```bash curl http://localhost:11434/v1/models ``` Response: ```json { "object": "list", "data": [ { "id": "my-llm", "object": "model", "created": 1718000000, "owned_by": "qvac" } ] } ``` ### `GET /v1/models/:id` Get details of a specific loaded model. ```bash curl http://localhost:11434/v1/models/my-llm ``` ### `DELETE /v1/models/:id` Unload a model, releasing its resources. ```bash curl -X DELETE http://localhost:11434/v1/models/my-llm ``` Response: ```json { "id": "my-llm", "object": "model", "deleted": true } ``` ### `POST /v1/chat/completions` Generate a chat completion. Supports both blocking and streaming (SSE) modes, tool/function calling, and per-request generation parameters. **Blocking request:** ```bash curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "my-llm", "messages": [{"role": "user", "content": "What is the capital of France?"}], "temperature": 0.7, "max_tokens": 256 }' ``` **Streaming request (server-sent events):** ```bash curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "my-llm", "messages": [{"role": "user", "content": "What is the capital of France?"}], "stream": true }' ``` **Tool calling:** ```bash curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "my-llm", "messages": [{"role": "user", "content": "What is the weather in London?"}], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string" } }, "required": ["location"] } } }] }' ``` #### Generation parameters The following OpenAI parameters are forwarded to the model on each request: | OpenAI parameter | SDK parameter | Description | | ----------------------- | ------------------- | ------------------------------------ | | `temperature` | `temp` | Sampling temperature | | `max_tokens` | `predict` | Maximum tokens to generate | | `max_completion_tokens` | `predict` | Alias for `max_tokens` | | `top_p` | `top_p` | Nucleus sampling threshold | | `seed` | `seed` | Random seed for deterministic output | | `frequency_penalty` | `frequency_penalty` | Penalize frequent tokens | | `presence_penalty` | `presence_penalty` | Penalize already-present tokens | #### Unsupported parameters The following OpenAI parameters are accepted but ignored (a warning is logged): `n`, `logprobs`, `response_format`, `stop`, `top_logprobs`, `logit_bias`, `parallel_tool_calls`, `stream_options`. ### `POST /v1/embeddings` Generate text embeddings. Accepts a single string or a batch of strings. **Single input:** ```bash curl http://localhost:11434/v1/embeddings \ -H "Content-Type: application/json" \ -d '{ "model": "my-embed", "input": "The quick brown fox" }' ``` **Batch input:** ```bash curl http://localhost:11434/v1/embeddings \ -H "Content-Type: application/json" \ -d '{ "model": "my-embed", "input": ["First sentence", "Second sentence"] }' ``` Response: ```json { "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [0.012, -0.034, ...] } ], "model": "my-embed", "usage": { "prompt_tokens": 0, "total_tokens": 0 } } ``` `encoding_format` (only `float` is supported) and `dimensions` are accepted but ignored. ### `POST /v1/audio/transcriptions` Transcribe audio using Whisper or Parakeet models. This endpoint uses `multipart/form-data` (not JSON). Maximum file size: 25 MB. **For receiving a JSON response (default):** ```bash curl http://localhost:11434/v1/audio/transcriptions \ -F "file=@audio.wav" \ -F "model=whisper" \ -F "response_format=json" ``` Response: ```json { "text": "transcribed text here" } ``` **For receiving a plain text response:** ```bash curl http://localhost:11434/v1/audio/transcriptions \ -F "file=@audio.wav" \ -F "model=whisper" \ -F "response_format=text" ``` **With prompt** (Whisper uses it as `initial_prompt`): ```bash curl http://localhost:11434/v1/audio/transcriptions \ -F "file=@audio.wav" \ -F "model=whisper" \ -F "prompt=President Kennedy speech about space exploration" ``` #### Parameters | Parameter | Description | Required | | ----------------- | --------------------------------------- | -------- | | `file` | Audio file to transcribe. | Yes | | `model` | Model alias (must be in config). | Yes | | `response_format` | `json` (default) or `text`. | No | | `prompt` | Optional prompt forwarded to the model. | No | Unsupported `response_format` values (`srt`, `vtt`, `verbose_json`) return a `400` error. `language` and `temperature` are accepted but currently only configurable at model load time (via `serve.models` config), not per-request. A warning is logged when these are sent. ## Authentication By default, the server accepts unauthenticated requests on `127.0.0.1`. To require a Bearer token, run the server with the `--api-key` flag: ```bash qvac serve openai --api-key my-secret-token ``` Clients must then include the token in the `Authorization` header: ```bash curl http://localhost:11434/v1/models \ -H "Authorization: Bearer my-secret-token" ``` Requests without a valid token receive a `401` response. # QVAC by Tether: the Infinite Stable Intelligence Platform (/) import { Sparkles, Telescope, Compass, Rocket, BookA, LayoutGrid, FolderTree, Blocks, MessageCircleQuestionMark, Wrench, DoorOpen, Package, Terminal } from 'lucide-react' import { SiDiscord, SiHuggingface, SiExpo, SiElectron } from '@icons-pack/react-simple-icons'; import { CopyInline } from '@/components/copy-inline' import { FeaturebaseIcon } from '@/components/featurebase-icon'; import { KeetIcon } from '@/components/keet-icon'; ## Discover } title="Welcome"> Learn what QVAC is, why Tether built it, and what makes it different. } title="Our vision"> Learn about Tether's long-term vision for QVAC. ## Build } title="Get started with QVAC SDK"> Use the JS/TS SDK to build local and P2P AI applications and systems. } title="Quickstart"> Run your first example using the JS/TS SDK. } title="Installation"> Supported environments and how to install the SDK for each one: Node.js, Bare, and Expo. } title="JS API reference"> `@qvac/sdk` npm package exposes a function-centric, typed JS API. } title="CLI"> `@qvac/cli` npm package provides tools and an HTTP server that exposes an OpenAI-compatible API. ## Explore } title="Flagship applications"> Desktop and mobile apps to empower users and showcase QVAC capabilities. } title="Addons"> Reference for the QVAC components that provide AI capabilities. } title="Research"> Ongoing research focused on advancing the state of the art in local AI. See the research outputs on Hugging Face. ## Learn } title="Build an Electron app"> Hands-on tutorial on using QVAC SDK with Electron. } title="Build an Expo app"> Hands-on tutorial on using QVAC SDK with Expo. ## Community } title="Discord"> Where we gather. Get real-time help from the team and community, meet peers, join in, discuss QVAC. } title="Keet" > Also another place where we gather, but using [Keet P2P app](https://keet.io). Copy this invite link [pear://keet/nfo35...](pear://keet/nfo35rbknsh35m184tmww7s4ssjh1hftiysx8yqreyi5njwsknuxs47nwhsbzt6wijoxfzmi6a8fh7b49gun3qxea3mumixnti8spizeysthyxdm968t6zof93yqpgyygb39rmoobki7fo7rh5hod71map4gayedt3qiwhkjakycqnfa5kswn58mtt8qg) into your Keet app. ## Support [*Use the appropriate channel in our **Discord** to talk directly with us.*](https://discord.com/invite/tetherdev) Get real-time help, report bugs, ask questions, share feedback, request and vote for features, take part in QVAC's evolution. ## Miscellaneous Other resources: * [Main website](https://qvac.tether.io) # Flagship applications (/about-qvac/flagship-apps) ## Overview AI is an extension of human intelligence, and you should own yours. With QVAC apps, your AI runs on your own devices. No subscriptions. No sharing of private data. You can run QVAC apps 100% offline. But if you're on mobile and need more computing power for an AI task, you can connect to your PC over the internet. No one can stop you from using your AI. ### Benefits * Total privacy * Total ownership * No subscriptions * Unstoppable ## Catalog To date, the following applications have been released. More coming soon. } title="Workbench"> Desktop app to run AI models on your device or remotely and query your documents. } title="Health"> Mobile app that empowers users to own their health and fitness data entirely on-device. # How it works (/about-qvac/how-it-works) ## Overview The SDK supports multiple JS runtimes, but its [underlying components](/addons) run only on [Bare](https://bare.pears.com). When the SDK runs in a runtime other than Bare, it spawns a Bare worker where all AI operations will take place. The worker is started lazily on the first RPC call and can be explicitly shut down with `close()`. ## Phase 1: initialization The first time you call `loadModel()` (or any function other than `close()`), the SDK performs a complete initialization sequence. It initializes a runtime-specific RPC client and sends configuration to the worker via the internal `__init_config` message. The worker process is spawned once and reused for subsequent calls until you explicitly close it. In Bare runtime, no separate worker process is spawned; requests are handled in-process. ## Phase 2: model loading There is only a single RPC client and Bare worker per application, not per model — i.e., singleton pattern. The model is downloaded and loaded into memory, and registered with a unique ID. From that point on, it will be available for AI inference until you unload it — call `unloadModel()` to free its memory. ## Phase 3: inference You can call `loadModel()` multiple times to make multiple models ready for use simultaneously. Additionally, you can perform AI inference multiple times with all of them. When you no longer need a model, call `unloadModel()` to free up resources. ## Phase 4: shutdown `close()` explicitly shuts down the worker and releases the RPC connection. In Node/Expo, this terminates the worker process; in Bare, the call is a no-op since there is no separate worker process. After `close()`, the next SDK call will reinitialize the RPC client and spawn a fresh worker. `unloadModel()` will automatically close the RPC connection when there are no active models or providers, but `close()` is the explicit way to shut down the SDK instance. # Public launch (/about-qvac/public-launch) Tether publicly launched QVAC in October 2025 at the Plan B Forum in Lugano. During the forum, Tether CEO Paolo Ardoino presented QVAC's future vision in his keynote, *Fiat Lux*: