QVAC Logo

SDK Release Notes — v0.8.0

QVAC SDK release notes

📦 NPM: https://www.npmjs.com/package/@qvac/sdk/v/0.8.0

This release adds Parakeet as a new transcription engine alongside Whisper, decouples builtin plugins from the core load path for a cleaner architecture, adds CTC and Sortformer transcription models, introduces a built-in profiler for diagnosing performance, and brings Bergamot pivot translation support. Several addon logging and delegation bugs have also been resolved.

Breaking Changes

@qvac/sdk

Embedding Model Config Uses Structured Fields

The embedding addon no longer accepts the raw tab-delimited config string. Use the structured fields instead.

Before:

await loadModel({
  modelSrc: EMBEDDINGS_NOMIC_EMBED_TEXT_V1_5,
  modelType: "embeddings",
  modelConfig: {
    rawConfig: "-ngl\t99\n-dev\tgpu\n--batch_size\t1024",
  },
});

After:

await loadModel({
  modelSrc: EMBEDDINGS_NOMIC_EMBED_TEXT_V1_5,
  modelType: "embeddings",
  modelConfig: {
    gpuLayers: 99,
    device: "gpu",
    batchSize: 1024,
  },
});

Companion Model Sources Moved Into modelConfig

Top-level companion source fields (projectionModelSrc, vadModelSrc, srcVocabSrc, dstVocabSrc) have been removed from loadModel. They now live inside modelConfig. The unused toolFormat field has also been removed.

Before:

await loadModel({
  modelType: "llm",
  modelSrc: ".../model.gguf",
  projectionModelSrc: ".../mmproj.gguf",
});

await loadModel({
  modelType: "whisper",
  modelSrc: ".../model.bin",
  vadModelSrc: ".../vad.bin",
});

await loadModel({
  modelType: "nmt",
  modelSrc: ".../model.intgemm.alphas.bin",
  srcVocabSrc: ".../srcvocab.spm",
  dstVocabSrc: ".../trgvocab.spm",
});

After:

await loadModel({
  modelType: "llm",
  modelSrc: ".../model.gguf",
  modelConfig: {
    projectionModelSrc: ".../mmproj.gguf",
  },
});

await loadModel({
  modelType: "whisper",
  modelSrc: ".../model.bin",
  modelConfig: {
    vadModelSrc: ".../vad.bin",
  },
});

await loadModel({
  modelType: "nmt",
  modelSrc: ".../model.intgemm.alphas.bin",
  modelConfig: {
    srcVocabSrc: ".../srcvocab.spm",
    dstVocabSrc: ".../trgvocab.spm",
  },
});

Additionally, custom plugins must now provide a loadConfigSchema. For plugins that accept any config, use a passthrough schema:

const myPlugin: QvacPlugin = {
  modelType: "my-plugin",
  displayName: "My Plugin",
  addonPackage: "@my/addon",
  loadConfigSchema: z.object({}).catchall(z.unknown()),
  createModel: (params) => ({ model, loader }),
  handlers: {},
};

Parakeet Model Constants Renamed

All existing Parakeet model constants now include a variant prefix (TDT_) to distinguish them from the new CTC and Sortformer variants. This is a find-and-replace migration:

Old ConstantNew Constant
PARAKEET_ENCODER_*PARAKEET_TDT_ENCODER_*
PARAKEET_DECODER_*PARAKEET_TDT_DECODER_*
PARAKEET_VOCABPARAKEET_TDT_VOCAB
PARAKEET_PREPROCESSOR_*PARAKEET_TDT_PREPROCESSOR_*

Features

@qvac/sdk

Parakeet Transcription Plugin

NVIDIA Parakeet ONNX models are now supported as an alternative transcription engine alongside Whisper.cpp. Parakeet uses a multi-file model architecture (encoder, encoder-data, decoder, vocab, preprocessor) resolved through the QVAC Registry.

import {
  PARAKEET_TDT_ENCODER_FP32,
  PARAKEET_TDT_ENCODER_DATA_FP32,
  PARAKEET_TDT_DECODER_FP32,
  PARAKEET_TDT_VOCAB,
  PARAKEET_TDT_PREPROCESSOR_FP32,
} from "@qvac/sdk";

const modelId = await loadModel({
  modelType: "parakeet",
  modelSrc: PARAKEET_TDT_ENCODER_FP32,
  modelConfig: {
    parakeetEncoderSrc: PARAKEET_TDT_ENCODER_FP32,
    parakeetEncoderDataSrc: PARAKEET_TDT_ENCODER_DATA_FP32,
    parakeetDecoderSrc: PARAKEET_TDT_DECODER_FP32,
    parakeetVocabSrc: PARAKEET_TDT_VOCAB,
    parakeetPreprocessorSrc: PARAKEET_TDT_PREPROCESSOR_FP32,
  },
});

const stream = transcribeStream({ modelId, audioInput: "./audio.mp3" });
for await (const chunk of stream) {
  process.stdout.write(chunk);
}

Supports MP3, WAV, OGG, M4A, RAW, and FLAC audio formats. The plugin filters [No speech detected] chunks automatically, mirroring Whisper's [BLANK_AUDIO] filtering.

Bergamot Pivot Translation

Translate between language pairs that don't have a direct model by chaining through a pivot language (typically English). Configure the pivot model inside modelConfig:

await loadModel({
  modelSrc: BERGAMOT_ES_EN,
  modelType: "nmt",
  modelConfig: {
    engine: "Bergamot",
    from: "es",
    to: "it",
    pivotModel: {
      modelSrc: BERGAMOT_EN_IT,
      beamsize: 4,
      temperature: 0.3,
      topk: 100,
      normalize: 1,
      lengthpenalty: 1.2,
    },
  },
});

SDK Profiler

A built-in profiler lets you measure SDK operation performance at runtime. Enable it globally or on a per-call basis:

import { profiler } from "@qvac/sdk";

profiler.enable({
  mode: "verbose",
  includeServerBreakdown: true,
});

// Run operations, then export results
console.log(profiler.exportSummary());
console.log(profiler.exportTable());
console.log(profiler.exportJSON({ includeRecentEvents: true }));

profiler.disable();

Per-call profiling control is also available on embed, completion, transcribe, translate, ragSearch, and invokePlugin:

await embed(
  { modelId, text: "hello" },
  { profiling: { enabled: true } },
);

@qvac/sdk

CTC and Sortformer Transcription Models

The SDK now supports two additional Parakeet model variants beyond the existing TDT models:

CTC transcription for fast, streaming-friendly speech-to-text:

const modelId = await loadModel({
  modelSrc: PARAKEET_CTC_FP32_1,
  modelType: "parakeet",
  modelConfig: {
    modelType: "ctc",
    parakeetCtcModelSrc: PARAKEET_CTC_FP32_1,
    parakeetCtcModelDataSrc: PARAKEET_CTC_DATA_FP32_1,
    parakeetTokenizerSrc: PARAKEET_CTC_TOKENIZER,
  },
});
const text = await transcribe({ modelId, audioChunk: "/path/to/audio.wav" });

Sortformer diarization for speaker identification:

const modelId = await loadModel({
  modelSrc: PARAKEET_SORTFORMER_FP32,
  modelType: "parakeet",
  modelConfig: {
    modelType: "sortformer",
    parakeetSortformerSrc: PARAKEET_SORTFORMER_FP32,
  },
});
const diarization = await transcribe({ modelId, audioChunk: "/path/to/audio.wav" });
// Returns: "Speaker 0: 0.00s - 4.24s\nSpeaker 1: 4.65s - 14.32s\n..."

AfriqueGemma Support

AfriqueGemma models for African language translation are now available through the SDK.

Profiler Operation Transport and Metrics

The profiler now captures detailed load/download metrics and stream profiling data, giving deeper visibility into SDK operation performance.

Bug Fixes

@qvac/sdk

  • Addon logging fixed across all plugins — Logging callbacks were not being properly attached to some addon plugins, resulting in missing addon-level logs. All plugins now correctly route native addon logs through the SDK logging system.
  • Parakeet addon logger now uses the correct params.modelId for log routing.
  • RPC race condition in getRPCInstance resolved — concurrent initialization calls no longer create duplicate worker instances.
  • Delegated model unload now correctly unloads the model on the provider device instead of only removing the local registry entry.
  • Stale delegated model entries are replaced on re-registration instead of accumulating.

📦 Model Changes

Added Models

OCR:

  • OCR_DETECTOR_DB_MOBILENET_V3_LARGE
  • OCR_DETECTOR_DB_RESNET50
  • OCR_RECOGNIZER_CRNN_MOBILENET_V3_SMALL
  • OCR_RECOGNIZER_PARSEQ

Parakeet (CTC):

  • PARAKEET_CTC_FP32, PARAKEET_CTC_DATA_FP32, PARAKEET_CTC_VOCAB, PARAKEET_CTC_INT8, PARAKEET_CTC_CONFIG
  • PARAKEET_CTC_FP32_1, PARAKEET_CTC_DATA_FP32_1, PARAKEET_CTC_TOKENIZER

Parakeet (Sortformer / EOU):

  • PARAKEET_SORTFORMER_FP32
  • PARAKEET_EOU_ENCODER_FP32, PARAKEET_EOU_DECODER_FP32, PARAKEET_EOU_TOKENIZER

Parakeet (TDT — renamed from unprefixed):

  • PARAKEET_TDT_ENCODER_FP32, PARAKEET_TDT_ENCODER_DATA_FP32, PARAKEET_TDT_DECODER_FP32, PARAKEET_TDT_VOCAB, PARAKEET_TDT_PREPROCESSOR_FP32
  • PARAKEET_TDT_ENCODER_INT8, PARAKEET_TDT_DECODER_INT8, PARAKEET_TDT_PREPROCESSOR_INT8

Removed Models

The following unprefixed Parakeet constants were replaced by their TDT_ equivalents (see breaking changes):

  • PARAKEET_ENCODER_FP32, PARAKEET_ENCODER_DATA_FP32, PARAKEET_DECODER_FP32, PARAKEET_VOCAB, PARAKEET_PREPROCESSOR_FP32
  • PARAKEET_ENCODER_INT8, PARAKEET_DECODER_INT8, PARAKEET_PREPROCESSOR_INT8

🧹 Other Changes

  • Consolidated transcription schemas and shared ops into a unified structure.
  • Updated @qvac/tts-onnx to v0.6.1 and @qvac/transcription-whispercpp addon.
  • Aligned TTS plugin artifact patterns.
  • Ported Android E2E test workflow to the monorepo.
  • Improved changelog aggregation to prefer CHANGELOG_LLM.md when available.
  • Reorganized monorepo documentation structure.
  • Ported logging executor to direct assertions and added missing test handlers.
  • Enabled profiling in test-qvac desktop and mobile consumers.

On this page