# SDK Release Notes — v0.11.x (/reference/release-notes/v0.11.x)


## v0.11.0

### @qvac/sdk

📦 **NPM:** [https://www.npmjs.com/package/@qvac/sdk/v/0.11.0](https://www.npmjs.com/package/@qvac/sdk/v/0.11.0)

This release completes the request-lifecycle and cancellation overhaul that began in
0.10.0: every long-running SDK call — `completion`, `embed`, `transcribe`,
`transcribeStream`, `translate`, `finetune`, `loadModel`, `downloadAsset`, and the
cancellable `rag` operations — now flows through a unified `RequestRegistry`, exposes
its `requestId` synchronously on the returned promise, and can be cancelled
individually with `cancel({ requestId })`. The wire envelope for `cancel(...)` is
consolidated to two shapes, two legacy call signatures are removed, and the SDK gains
typed `instanceof` for policy/cancel errors across the RPC boundary. Alongside the
lifecycle work, this release adds Harmony / Qwen3.5 / Gemma4 tool-call dialects,
FLUX.2 multi-reference fusion and per-call LoRA on diffusion, ESRGAN upscaling (both
as a post-step and as a standalone `upscale()` API), Whisper VAD and end-of-turn
events, multi-GPU `split-mode` / `tensor-split` / `main-gpu` on the LLM and embed
plugins, a `reasoning_budget` knob for Qwen/Gemma reasoning, and a fresh Parakeet
0.4.0 GGUF backend with duplex streaming. The mobile build flow now auto-verifies the
worker bundle through `qvac verify bundle`, and the model registry was regenerated
against the upstream `base-memory` Bergamot fix (dropping the deprecated Marian Opus
constants on the way).

#### Breaking Changes

##### `unloadModel` no longer auto-closes the Bare worker

On Bare, `unloadModel` used to call `close()` whenever no models or providers were
left, which terminated the worker host on every routine unload. Long-lived Bare
workers either had to avoid `unloadModel` or work around the auto-close.

The default now flips by runtime: Node and Electron preserve the existing
auto-close behaviour (`autoClose: true` by default), while Bare leaves the
connection open (`autoClose: false` by default). Pass the field explicitly to
override.

**Before (Bare):**

```typescript
import { unloadModel } from "@qvac/sdk";

await unloadModel({ modelId });
// RPC connection closed → Bare worker host terminated.
```

**After (Bare):**

```typescript
import { unloadModel } from "@qvac/sdk";

await unloadModel({ modelId });
// Worker survives; opt in to closing explicitly:
await unloadModel({ modelId, autoClose: true });
```

##### Parakeet plugin moves to the 0.4.0 single-file GGUF API

`@qvac/transcription-parakeet` 0.4.0 replaced the legacy multi-file ONNX bundle
(encoder + decoder + vocab + preprocessor, plus the CTC / Sortformer variants)
with a single GGUF backed by `qvac-parakeet.cpp`. The SDK plugin now follows
suit: every per-variant `parakeet*Src` field on `modelConfig` is gone, the
`modelType` discriminator is gone, and the addon auto-detects TDT / CTC / EOU /
Sortformer from GGUF metadata.

**Before:**

```typescript
await loadModel({
  modelSrc: PARAKEET_TDT_ENCODER_INT8,
  modelType: "parakeet",
  modelConfig: {
    parakeetEncoderSrc: PARAKEET_TDT_ENCODER_INT8,
    parakeetDecoderSrc: PARAKEET_TDT_DECODER_INT8,
    parakeetVocabSrc: PARAKEET_TDT_VOCAB,
    parakeetPreprocessorSrc: PARAKEET_TDT_PREPROCESSOR_INT8,
  },
});

await loadModel({
  modelSrc: PARAKEET_CTC_FP32,
  modelType: "parakeet",
  modelConfig: {
    modelType: "ctc",
    parakeetCtcModelSrc: PARAKEET_CTC_FP32,
    parakeetTokenizerSrc: PARAKEET_CTC_TOKENIZER,
  },
});
```

**After:**

```typescript
await loadModel({
  modelSrc: PARAKEET_TDT_0_6B_V3_Q8_0,
  modelType: "parakeet",
});

await loadModel({
  modelSrc: PARAKEET_CTC_0_6B_Q8_0,
  modelType: "parakeet",
});
```

The new GGUF constants (`PARAKEET_TDT_0_6B_V3_Q8_0`, `PARAKEET_CTC_0_6B_Q8_0`,
`PARAKEET_SORTFORMER_4SPK_V1_Q8_0`, `PARAKEET_EOU_120M_V1_Q8_0`) are added in
this release; the legacy multi-file constants are gone.

##### Two legacy `cancel(...)` call shapes are removed

`cancel({ operation: "downloadAsset", downloadKey, clearCache })` and
`cancel({ operation: "rag", workspace })` are removed because neither carried a
`requestId` and neither can be mechanically back-mapped onto the new two-arm
cancel wire envelope. Callers must migrate to the `requestId`-targeted cancel
path (the primary one in 0.11.0) or to the broad cancel-by-`modelId` escape
hatch.

**Before — downloadAsset:**

```typescript
import { downloadAsset, cancel } from "@qvac/sdk";

const op = downloadAsset({ assetSrc, onProgress });
await cancel({ operation: "downloadAsset", downloadKey: assetSrc.key, clearCache: true });
```

**After — downloadAsset:** the decorated promise now exposes `op.requestId`
synchronously, and `clearCache` is honoured on the `requestId` path.

```typescript
import { downloadAsset, cancel } from "@qvac/sdk";

const op = downloadAsset({ assetSrc, onProgress });
await cancel({ requestId: op.requestId, clearCache: true });
```

**Before — rag:**

```typescript
import { ragIngest, cancel } from "@qvac/sdk";

ragIngest({ workspace: "my-workspace", documents });
await cancel({ operation: "rag", workspace: "my-workspace" });
```

**After — rag (primary path, by `requestId`):**

```typescript
import { ragIngest, cancel } from "@qvac/sdk";

const op = ragIngest({ workspace: "my-workspace", documents });
await cancel({ requestId: op.requestId });
```

**After — rag (broad escape hatch, no `requestId` to hand):**

```typescript
import { cancel } from "@qvac/sdk";

// Cancel every in-flight RAG operation running on the embedding model:
await cancel({ modelId: ragEmbeddingModelId, kind: "rag" });
```

Every other `cancel(...)` shape still works: `cancel({ operation: "inference",
modelId })`, `cancel({ operation: "embeddings", modelId })`, `cancel({ modelId
})`, `cancel({ modelId, kind })`, and `cancel({ requestId })` are all preserved
by the client-side normalisation layer.

#### New APIs and Capabilities

##### `requestId` exposed synchronously on every cancellable call

Every long-running SDK call now returns a decorated promise (or run handle) that
carries a `requestId` you can read on the same tick the call is dispatched.
That lets you wire a Stop button to a specific in-flight call without racing the
network round-trip. The pattern covers `completion`, `loadModel`, `embed`,
`transcribe`, `transcribeStream`, `translate`, `finetune`, `downloadAsset`, and
the three cancellable RAG ops (`ragIngest`, `ragSaveEmbeddings`, `ragReindex`).

```typescript
import {
  completion,
  loadModel,
  embed,
  downloadAsset,
  ragIngest,
  cancel,
} from "@qvac/sdk";

const run = completion({ modelId, history });
console.log(run.requestId);

const op = loadModel({ modelSrc: "..." });
console.log(op.requestId);                    // synchronously, before await
const modelId = await op;                     // legacy unwrap still works

const handle = embed({ modelId, text: "hello" });
console.log(handle.requestId);
await handle;

const download = downloadAsset({ assetSrc, onProgress });
stopButton.onclick = () => cancel({ requestId: download.requestId });
await download;                                // rejects with InferenceCancelledError if cancelled

const ingest = ragIngest({ workspace: "ws-a", modelId, documents });
console.log(ingest.requestId);
await ingest;
```

The non-cancellable RAG ops (`ragChunk`, `ragSearch`, `ragDeleteEmbeddings`,
`ragListWorkspaces`, `ragCloseWorkspace`, `ragDeleteWorkspace`) intentionally do
not decorate — they're fast-path operations that don't register with the
server-side request registry, so a `requestId` would point at nothing.

##### Typed errors that survive the RPC boundary

`InferenceCancelledError`, `RequestRejectedByPolicyError`,
`RequestIdConflictError`, and `RequestNotFoundError` are now re-exported from
`@qvac/sdk` and reconstructed on the client side with their typed fields
intact, so `err instanceof RequestRejectedByPolicyError` actually narrows and
`err.modelId` / `err.reason` / `err.requestId` are populated from the
server-side throw.

`RequestRejectedByPolicyError` (code 52420) fires when an admission policy
blocks the request — for example, the worker's default
`oneAtATimePerModel: true` rule for `completion` kind, which promotes the
llama.cpp addon's opaque "job already set" error into a typed framework-level
rejection.

```typescript
import { completion, RequestRejectedByPolicyError } from "@qvac/sdk";

try {
  const run = completion({ modelId, history });
  for await (const event of run.events) { /* ... */ }
} catch (err) {
  if (err instanceof RequestRejectedByPolicyError) {
    showBusy({ modelId: err.modelId, reason: err.reason });
    return;
  }
  throw err;
}
```

##### New broad-cancel sugar and consolidated wire envelope

The `cancel` wire envelope shrinks to two shapes — request-targeted (`{
operation: "request", requestId }`) and broad-by-model (`{ operation: "broad",
modelId, kind? }`). Two new client sugars wrap the broad shape so callers don't
have to think about the wire representation:

```typescript
import { cancel } from "@qvac/sdk";

await cancel({ modelId: "llama-3.2-1b", kind: "completion" });
await cancel({ modelId: "llama-3.2-1b" });
```

##### Plugin authors: declare cancel scope per handler

`PluginHandlerDefinition` gains an optional `cancel: { scope, hard? }` field so
plugin authors can declare upfront whether each handler accepts a per-request
cancel token, whether it cancels by model, or whether it has no addon-level
cancel surface at all (soft-cancel only — the registry aborts the signal, the
stream stops yielding, the C++ work runs to completion in the background).

`scope` is `"request" | "model" | "none"`; `hard: true` documents that the
addon-side cancel actually interrupts compute. Plugin manifests that omit the
field still load — it's optional.

```typescript
import { definePlugin, defineHandler } from "@qvac/sdk";

definePlugin({
  manifestVersion: 1,
  handlers: {
    myStream: defineHandler({
      requestSchema,
      responseSchema,
      streaming: true,
      cancel: { scope: "model", hard: true },
      handler: async function* (request, ctx) { /* ... */ },
    }),
  },
});
```

##### Multi-GPU `split-mode`, `tensor-split`, and `main-gpu` on LLM and embed

LLM and embed model configs now expose the underlying llamacpp multi-GPU knobs.
LLM uses the canonical hyphenated keys (`"split-mode"`, `"tensor-split"`,
`"main-gpu"`) to mirror the llama.cpp CLI; embed uses the existing camelCase
convention (`splitMode`, `tensorSplit`, `mainGpu`).

```typescript
// LLM
await loadModel({
  modelSrc: LLAMA_3_2_1B_INST_Q4_0,
  modelType: "llm",
  modelConfig: {
    "split-mode": "layer",        // "none" | "layer" | "row"
    "tensor-split": "1,1",        // proportional split across GPUs
    "main-gpu": 0,                // integer index or "integrated" | "dedicated"
  },
});

// Embed
await loadModel({
  modelSrc: EMBEDDING_GEMMA_300M_Q8_0,
  modelType: "embed",
  modelConfig: {
    splitMode: "layer",
    tensorSplit: "1,1",
    mainGpu: 0,
  },
});
```

##### Whisper VAD and end-of-turn events on `transcribeStream`

`transcribeStream` gains a conversational mode opted into via `emitVadEvents:
true`. The session yields a discriminated event stream that includes live
voice-activity probabilities and turn boundaries, so apps can build push-to-talk
or barge-in UX without poll-the-text hacks.

```typescript
import { transcribeStream } from "@qvac/sdk";

const session = await transcribeStream({
  modelId: "whisper-base",
  emitVadEvents: true,
  endOfTurnSilenceMs: 800,
  vadRunIntervalMs: 100,
});

for await (const event of session) {
  if (event.type === "vad") console.log("speaking:", event.speaking, event.probability);
  else if (event.type === "endOfTurn") console.log("turn ended after", event.silenceDurationMs, "ms");
  else if (event.type === "text") process.stdout.write(event.text);
}

session.write(audioChunk);
session.end();
```

`TranscribeStreamEvent`, `VadStateEvent`, `EndOfTurnEvent`, and
`TranscribeStreamConversationSession` are new exported types. The existing
text-only, segment, and audio-chunk overloads are unchanged.

##### Parakeet duplex streaming with EOU events

The new parakeet plugin (see Breaking Changes) ships a duplex
`transcribeStream` session that mirrors the whisper one. EOU model checkpoints
surface as `{ type: "endOfTurn" }` events on the same iterator as `{ type:
"text" }`.

```typescript
const session = await transcribeStream({
  modelId,
  parakeetStreamingConfig: {
    chunkMs: 1000,
    emitPartials: true,
  },
});

ffmpeg.stdout.on("data", (chunk: Buffer) => session.write(chunk));

for await (const event of session) {
  switch (event.type) {
    case "text":
      process.stdout.write(event.text);
      break;
    case "endOfTurn":
      console.log("\n[endOfTurn] turn boundary detected\n");
      break;
  }
}
```

Per-call streaming overrides — `chunkMs`, `historyMs`, `leftContextMs`,
`rightLookaheadMs`, `emitPartials`, `emitEnergyVad` — are accepted on
`parakeetStreamingConfig` and fall back to their `parakeetConfig.streaming*`
load-time counterparts.

##### Harmony, Qwen3.5, and Gemma4 tool-call dialects

`toolDialect` now covers `"hermes" | "pythonic" | "json" | "harmony"`, plus
auto-detected `qwen35` and `gemma4` parsers. Harmony adds first-class support
for GPT-OSS models — including streaming the final-channel content
incrementally instead of buffering until `<|return|>`, so long GPT-OSS responses
no longer stall — and fixes a regression where protocol markers
(`<|channel|>analysis<|message|>...`, `<|start|>assistant`, `<|return|>`) were
leaking into `contentDelta`. Qwen3.5 covers the Pythonic-XML
`<tool_call><function=NAME><parameter=KEY>...</parameter></function></tool_call>`
framing; Gemma4 covers the native
`<|tool_call>call:NAME{key:<|"|>val<|"|>,...}<tool_call|>` framing.

```typescript
import { completion, type ToolDialect } from "@qvac/sdk";

const result = completion({
  modelId,                         // gpt-oss-20b-Q4_K_M auto-routes to "harmony"
  history,
  tools,
  toolDialect: "harmony",          // optional explicit override
});

const dialect: ToolDialect = "harmony";
```

Qwen3.5 / Qwen3.6 and Gemma4 are auto-detected from the model name; the parsers
ship without any caller-side wiring. The `harmony` parser also surfaces
malformed-JSON, unknown-tool, and non-object payloads as structured
`ToolCallError`s instead of silently dropping the event.

##### `reasoning_budget` knob for thinking models

`@qvac/llm-llamacpp@0.20.0` introduced a `reasoning_budget` parameter that gates
how much "thinking" a reasoning model is allowed to produce: `-1` =
unrestricted, `0` = disabled. The SDK exposes it both as a load-time default on
`LlmConfig` and as a per-request override on `GenerationParams`.

```typescript
import { loadModel, completion } from "@qvac/sdk";

const modelId = await loadModel({
  modelSrc: "/models/Qwen3.5-7B-Instruct-Q4_K_M.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 4096, reasoning_budget: -1 },
});

const run = completion({
  modelId,
  history: [{ role: "user", content: "Think step by step." }],
  generationParams: { reasoning_budget: 0 }, // override per-request
});
```

The same bump fixes a regression where `system_prompt` (a JS-only
`completion-stream.ts` field) was being forwarded to the C++ arg parser as
`--system-prompt`, which had been removed in llamacpp 8189+ — model loads were
failing outright until this fix landed.

##### FLUX.2 multi-reference fusion and per-call LoRA for diffusion

The diffusion API gains FLUX.2 multi-reference fusion (`init_images:
Uint8Array[]`, mutually exclusive with the existing single `init_image`), FLUX.2
reference-image tunables (`increase_ref_index`, `auto_resize_ref_image`), and a
per-call `lora` field that takes an absolute filesystem path. A new load-time
`lora_apply_mode` controls whether the adapter is fused permanently into the
model or applied per-call (`"auto" | "immediately" | "at_runtime"`).

```typescript
const refA = fs.readFileSync("scientist-a.jpg");
const refB = fs.readFileSync("scientist-b.jpg");
const { outputs } = diffusion({
  modelId,
  prompt: "a portrait using most visual traits from @image1 and the eyes from @image2",
  init_images: [refA, refB],
  width: 768,
  height: 768,
});

const { outputs: loraOutputs } = diffusion({
  modelId,
  prompt: "a watercolor cat",
  lora: "/home/user/loras/watercolor.safetensors",
});

await loadModel(modelSrc, {
  modelType: "diffusion",
  modelConfig: { prediction: "flux2_flow", lora_apply_mode: "immediately" },
});
```

Relative LoRA paths are rejected: the SDK runs across processes with differing
cwds, so absolute paths (POSIX, Windows drive-letter, or UNC) are the only safe
shape.

##### ESRGAN upscaling — post-step and standalone

Two new paths land for ESRGAN upscalers. The first attaches an upscaler to a
diffusion model and runs it as a post-step on every generated image. The second
loads an ESRGAN file as a standalone `upscale()`-only model so consumers can
feed an arbitrary PNG or JPEG into the SDK and get an upscaled image back —
without standing up a full diffusion pipeline.

```typescript
// Post-step upscale during diffusion
const modelId = await loadModel({
  modelSrc: SD_V2_1_1B_Q8_0,
  modelType: "diffusion",
  modelConfig: {
    prediction: "v",
    upscaler: {
      type: "esrgan",
      model_src: "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth",
      tile_size: 128,
    },
  },
});

const { outputs } = diffusion({
  modelId,
  prompt: "an illustrated red fox portrait",
  width: 128,
  height: 128,
  upscale: { repeats: 2 },
});
```

```typescript
// Standalone upscale — no diffusion model required
import { upscale, loadModel, REALESRGAN_X4PLUS_ANIME_6B } from "@qvac/sdk";

const modelId = await loadModel(REALESRGAN_X4PLUS_ANIME_6B, {
  modelType: "diffusion",
  modelConfig: {
    mode: "upscale",
    upscaler: { tile_size: 128 },
  },
});

const { outputs, stats } = upscale({
  modelId,
  image: pngBytes,        // Uint8Array (PNG/JPEG)
  repeats: 1,             // each pass multiplies dims by the model's native scale factor
});

const [upscaledPng] = await outputs;
console.log(await stats); // { upscaleMs, totalUpscaleMs, width, height, totalPixels, repeats, ... }
```

Calling `upscale()` against a model that wasn't loaded with `mode: "upscale"`
raises `ModelOperationNotSupportedError` upfront, and loading a diffusion model
with `modelConfig.upscaler` set but `model_src` missing fails fast with a
structured `ModelLoadFailedError` instead of letting the native addon error
mid-load.

##### Mobile build flow auto-verifies the worker bundle

Expo prebuild now runs `qvac verify bundle` against the emitted
`worker.mobile.bundle.js` before copying it into the SDK's `dist/`. The flow is
`runBundler → assert bundle exists → runVerifier → copyFileSync`, so the SDK
`dist/worker.mobile.bundle.js` only updates when verification passes. Failure
preserves the last known-good artifact and fails Expo prebuild fast with a new
`BundleVerificationFailedError` (code 50609).

If a `qvac.config.{json,js,mjs,ts}` is present, ABI checks are pinned to its
`bareRuntimeVersion`; without config, the CLI auto-detects from `node_modules`
(`bare-runtime` → `bare`) and falls through to a warning only when neither is
installed. `@qvac/cli` peer dep range moves from `^0.2.4` to `^0.4.0` — the
first version that ships `qvac verify bundle` — with the existing npx fallback
still in place for consumers that don't pin the dep.

Pear consumers aren't auto-wired yet — run `qvac verify bundle --addons-source
./node_modules --host <host> --config qvac.config.json` manually before `pear
stage` / `pear run`.

##### `cancelFinetune` is now fire-and-forget

`cancelFinetune(modelId)` used to await the addon's cancel flip before
resolving. It now fires a synchronous registry cancel and returns immediately;
the actual `model.cancel()` runs out-of-band via the new context's abort
listener. The result shape is unchanged (`status: "CANCELLED"` still
populated), but workbench / CLI / external consumers that gated subsequent
calls on cancel-resolution timing should switch to `await cancel({ requestId
})`, which has been synchronous-after-abort since the lifecycle work began.

#### Bug Fixes

##### Delegated inference connect is fast again

The `loadModel.delegation.connection` regression introduced in 0.10.0 — where
`@qvac/sdk` 0.9.0 → 0.10.0 took the consumer-side connect from ≈2.5s to ≈8.3s
on first delegated call — is fixed by dropping the explicit `await
swarm.dht.fullyBootstrapped()` block before `dht.connect()` in
`ensureRPCConnection`. The SDK's normal init path already warms the routing
table via `getSwarm()` during registry initialisation, so the explicit guard
was redundant. Measured cold-start connect mean drops from 3.82s back down to
1.18s (≈3.2× faster) on the same hardware and network.

##### KV cache priming no longer wastes a token

`initSystemPromptCache` used to start generation and then race the first output
token against a cancel. That always produced one token of unnecessary work and
relied on a fragile output/cancel race. The SDK now uses the addon's new
`prefill: true` runtime option (in `@qvac/llm-llamacpp ^0.17.3`) so priming
ingests the prompt and tools into the KV cache without producing any output
tokens. `initSystemPromptCache` resolves as soon as priming finishes.

##### React Native duplex RPC no longer uses Node-only `Buffer`

The RN duplex RPC path was using Node's `Buffer` global, which isn't available
on Hermes. The path now uses `Uint8Array` end-to-end so mobile consumers can
use duplex streaming (transcribe, parakeet, tts) without hitting `Buffer is
not defined`.

##### SDK bundles its own `worker.js` for packaged consumers

Apps consuming the SDK from a bundler (Metro, esbuild, webpack) were missing
`worker.js` from the published package, so consumers had to hand-copy it. The
package now ships `worker.js` alongside `worker.mobile.bundle.js` so packaged
consumers no longer need extra bundling steps.

##### Dedup of stateful Holepunch singletons

The SDK and `@qvac/registry-client` had drifting declarations for
`corestore`, `hyperblobs`, `hyperdb`, and `hyperswarm`: the SDK declared them
as `peerDependencies`, the registry client declared them as hard
`dependencies`, and the version ranges didn't match. The mismatch caused npm
to install duplicate copies, producing separate DHT nodes and broken
connectivity. Bumping `@qvac/registry-client` to `^0.5.0` (where those libs
move to `peerDependencies`) and `@qvac/embed-llamacpp` to `^0.16.0` and
`@qvac/transcription-whispercpp` to `^0.7.0` completes the dedup chain.

#### Model Registry Changes

The Bergamot translation pairs `BERGAMOT_EN_IT` and `BERGAMOT_ES_EN` were
pinned to the buggy `tiny` variant, which caused leading `"- "` hallucinations
on short inputs and an en→it quality regression (\~3 pp `chrF++` drop direct,
\~33 pp via Spanish pivot). The registry was regenerated against the upstream
`base-memory` Bergamot fix (synced to the DHT on 2026-05-05); paths now point
at `bergamot-{enit,esen}/2026-04-28/...` and `expectedSize` flipped from
17.1 MB to 30.1 MB on both pairs, confirming the switch landed.

The regeneration also picked up the auto-deprecation of the 32 Marian Opus NMT
entries (`NMT_Q0F16` through `NMT_Q0F16_9`, `NMT_Q4_0` through `NMT_Q4_0_21`)
that were superseded earlier in the release line. A separate fix corrects the
Bergamot vocab being re-downloaded on every `loadModel` for shared-vocab pairs
— the shared vocab is now cached and reused.

##### Added

```
PARAKEET_TDT_0_6B_V3_Q8_0
PARAKEET_CTC_0_6B_Q8_0
PARAKEET_SORTFORMER_4SPK_V1_Q8_0
PARAKEET_EOU_120M_V1_Q8_0
```

##### Removed

```
NMT_Q0F16 through NMT_Q0F16_9 (10 entries)
NMT_Q4_0 through NMT_Q4_0_21 (22 entries)
```

The legacy multi-file Parakeet constants (`PARAKEET_TDT_ENCODER_INT8`,
`PARAKEET_TDT_DECODER_INT8`, `PARAKEET_TDT_VOCAB`, `PARAKEET_TDT_PREPROCESSOR_INT8`,
`PARAKEET_CTC_FP32`, `PARAKEET_CTC_TOKENIZER`, etc.) are gone alongside the
plugin migration — see Breaking Changes above for the migration path.

#### Tests and Infrastructure

* E2E bootstrap was scoped down to only the dependencies required by the
  filtered test set, shortening cold CI runs.
* Multi-GPU integration tests are now skipped on mobile (real multi-GPU hardware
  isn't represented in the mobile farm; tests are validated against the
  shared-dev `2× RTX 5090` rig in CI logs).
* `@qvac/tts-onnx` bumped to `0.9.0` and `@qvac/transcription-parakeet` to
  `0.5.0` to match the addon-side releases that land in this SDK version.