# Transcription (/ai-capabilities/transcription)



## Overview

Transcription uses your choice of either [`qvac-ext-lib-whisper.cpp`](https://github.com/tetherto/qvac-ext-lib-whisper.cpp) or [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) (via the GGML-based [`parakeet-cpp`](https://github.com/tetherto/qvac-ext-lib-whisper.cpp/tree/main/parakeet-cpp) engine) as inference engine. Load a model using `modelType: "whisper"` for `qvac-ext-lib-whisper.cpp`, or `modelType: "parakeet"` for Parakeet. Parakeet supports multilingual transcription (TDT), english-only transcription (CTC), speaker diarization (Sortformer), and end-of-utterance detection (EOU) for duplex streaming.

Provide audio input as `audioChunk`, either as a file path (string) or an in-memory audio buffer.

`transcribe()` returns the full transcription as a single `string`. If you need partial results as they become available, use `transcribeStream()` to receive text chunks in real-time. Both whisper and parakeet expose duplex `transcribeStream()` sessions; see "Streaming with `transcribeStream()`" below.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/reference/api#loadmodel)
2. [`transcribe()`](/reference/api#transcribe) or [`transcribeStream()`](/reference/api#transcribestream)
3. [`unloadModel()`](/reference/api#unloadmodel)

For how to use each function, see [SDK — API reference](/reference/api/).

## Models

### `qvac-ext-lib-whisper.cpp`

You should load two models:

* a [`whisper.cpp`](https://github.com/ggml-org/whisper.cpp)-compatible model for transcription. Model file format: `*.bin`; and
* a VAD model (e.g., Silero) converted to GGML. Model file format: `*.bin` *(optional, recommended)*.

### Parakeet

As of `@qvac/transcription-parakeet` 0.6.0, Parakeet ships as a **single GGUF** per variant — the addon auto-detects TDT / CTC / Sortformer / EOU from `parakeet.model.type` GGUF metadata. There is no `modelConfig.modelType` discriminator, no per-variant `parakeet*Src` artifact fields, and no `ParakeetArtifactsRequiredError`. Just supply the GGUF via the top-level `modelSrc`:

```ts
await loadModel({
  modelSrc: PARAKEET_TDT_0_6B_V3_Q8_0,    // multilingual, ~750MB
  modelType: "parakeet",
});

await loadModel({
  modelSrc: PARAKEET_CTC_0_6B_Q8_0,       // english-only, streaming-capable
  modelType: "parakeet",
});

await loadModel({
  modelSrc: PARAKEET_SORTFORMER_4SPK_V1_Q8_0,  // 4-speaker diarization
  modelType: "parakeet",
});

await loadModel({
  modelSrc: PARAKEET_EOU_120M_V1_Q8_0,    // end-of-utterance detection
  modelType: "parakeet",
});
```

For model artifacts available as constants, see [SDK — Models](/introduction#models).

<Callout type="info">
  **Migrating from pre-0.6 Parakeet (ONNX multi-file):** the legacy multi-file ONNX `modelConfig` shape (`parakeetEncoderSrc` / `parakeetDecoderSrc` / `parakeetVocabSrc` / `parakeetPreprocessorSrc`, plus `parakeetCtcModelSrc` / `parakeetTokenizerSrc` and `parakeetSortformerSrc` for the CTC/Sortformer variants) is no longer supported. Passing any of those fields raises a structured `LegacyParakeetModelDeprecatedError` with a migration message. The legacy ONNX constants (e.g. `PARAKEET_TDT_ENCODER_INT8`, `PARAKEET_CTC_FP32`, `PARAKEET_SORTFORMER_FP32`) remain exported for one minor cycle for codemod migrations only and will be removed in a future release.
</Callout>

<Callout type="info">
  **On VAD:** when using `qvac-ext-lib-whisper.cpp`, you can optionally provide a separate model for voice activity detection (VAD); this is recommended. In turn, Parakeet handles VAD internally, so no additional model or configuration is required.
</Callout>

## Streaming with `transcribeStream()`

`transcribeStream()` opens a duplex session for both engines — write audio chunks via `session.write(...)`, iterate events with `for await (const event of session) { ... }`. Events are typed as a discriminated union `{ type }`:

* `{ type: "text", text }` — incremental transcript text.
* `{ type: "segment", segment }` — segment metadata (whisper-only when `metadata: true`).
* `{ type: "vad", speaking, probability }` — voice-activity-detection state (whisper-only).
* `{ type: "endOfTurn", source: "whisper", silenceDurationMs }` — turn boundary detected from a measured silence window (whisper).
* `{ type: "endOfTurn", source: "parakeet" }` — turn boundary detected from the EOU model's `<EOU>` token (parakeet; no silence window — the event is token-driven).

The `source` field on `endOfTurn` lets consumers narrow the union: whisper events always carry a numeric `silenceDurationMs`; parakeet events never do.

<Callout type="info">
  **Wire compatibility:** post-0.6 servers emit `source` on every `endOfTurn` frame. SDK parsers still accept the legacy whisper wire shape `{ silenceDurationMs }` (no `source`) and normalize it to `source: "whisper"`. Upgrade client and server together when using parakeet `source: "parakeet"` events — older servers never emit that branch.
</Callout>

### Parakeet duplex streaming

Pass `parakeetStreamingConfig` to `transcribeStream()` to override per-call streaming knobs (each falls back to its `parakeetConfig.streaming*` load-time counterpart):

```ts
const session = await transcribeStream({
  modelId,
  parakeetStreamingConfig: {
    chunkMs: 1000,            // encoder cadence
    historyMs: 30000,         // sortformer rolling-history window
    leftContextMs: 500,       // ASR encoder left-context window
    rightLookaheadMs: 200,    // ASR encoder right-lookahead window
    emitPartials: true,       // emit partial segments before chunk boundaries
    emitEnergyVad: false,     // CTC/TDT energy-based VAD hint (engine-internal)
  },
});

for await (const event of session) {
  switch (event.type) {
    case "text":
      process.stdout.write(event.text);
      break;
    case "endOfTurn":
      // event.source: "whisper" | "parakeet"
      console.log("\n[endOfTurn] turn boundary detected\n");
      break;
  }
}
```

The synthetic `{ type: "endOfTurn", source: "parakeet" }` event surfaces whenever the EOU model emits an `<EOU>` token, and is the parakeet equivalent of whisper's silence-window EOU. Pair it with the `PARAKEET_EOU_120M_V1_Q8_0` checkpoint when you need explicit turn boundaries from parakeet.

## Examples

### `qvac-ext-lib-whisper.cpp`

The following script shows an example of `qvac-ext-lib-whisper.cpp` transcription with prompt-guided decoding, VAD, and GPU acceleration:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/transcription/whispercpp-prompt.js title="whispercpp-prompt.js" lineNumbers
      /**
       * Whisper transcription with prompt example.
       *
       * Usage:
       *   bun examples/transcription/whispercpp-prompt.ts
       *
       * This example requires a test audio file (default: examples/audio/sample-16khz.wav).
       * Sample audio files are available in the QVAC source repository, but not included in the published npm package.
       * Set audioChunk to a custom WAV, or download the default audio into examples/audio/:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/audio/sample-16khz.wav
       */
      import { loadModel, unloadModel, transcribe, WHISPER_TINY } from "@qvac/sdk";
      try {
          console.log("▸ Starting Whisper transcription with prompt example...");
          // Load the Whisper model
          console.log("▸ Loading Whisper model...");
          const modelId = await loadModel({
              modelSrc: WHISPER_TINY,
              modelConfig: {
                  audio_format: "f32le",
                  // Sampling strategy
                  strategy: "greedy",
                  n_threads: 4,
                  // Transcription options
                  language: "en",
                  translate: false,
                  no_timestamps: false,
                  single_segment: false,
                  print_timestamps: true,
                  token_timestamps: true,
                  // Quality settings
                  temperature: 0.0,
                  suppress_blank: true,
                  suppress_nst: true,
                  // Advanced tuning
                  entropy_thold: 2.4,
                  logprob_thold: -1.0,
                  // VAD configuration
                  vad_params: {
                      threshold: 0.35,
                      min_speech_duration_ms: 200,
                      min_silence_duration_ms: 150,
                      max_speech_duration_s: 30.0,
                      speech_pad_ms: 600,
                      samples_overlap: 0.3,
                  },
                  // Context parameters for GPU
                  contextParams: {
                      use_gpu: true,
                      flash_attn: true,
                      gpu_device: 0,
                  },
              },
              onProgress: (p) => {
                  const mb = (n) => (n / 1e6).toFixed(1);
                  const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
                  process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
                  if (p.percentage >= 100)
                      process.stderr.write("\n");
              },
          });
          console.log(`▸ Whisper model loaded with ID: ${modelId}`);
          // Perform transcription
          console.log("▸ Transcribing audio...");
          const text = await transcribe({
              modelId,
              audioChunk: "examples/audio/sample-16khz.wav",
              prompt: "This is a test recording with clear speech and proper punctuation.",
          });
          console.log("▸ Transcription result:");
          console.log(text);
          // Unload the model when done
          console.log("▸ Unloading Whisper model...");
          await unloadModel({ modelId });
          console.log("▸ Whisper model unloaded successfully");
          process.exit(0);
      }
      catch (error) {
          console.error("✖", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/transcription/whispercpp-prompt.ts title="whispercpp-prompt.ts" lineNumbers
      /**
       * Whisper transcription with prompt example.
       *
       * Usage:
       *   bun examples/transcription/whispercpp-prompt.ts
       *
       * This example requires a test audio file (default: examples/audio/sample-16khz.wav).
       * Sample audio files are available in the QVAC source repository, but not included in the published npm package.
       * Set audioChunk to a custom WAV, or download the default audio into examples/audio/:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/audio/sample-16khz.wav
       */
      import { loadModel, unloadModel, transcribe, WHISPER_TINY } from "@qvac/sdk";

      try {
        console.log("▸ Starting Whisper transcription with prompt example...");

        // Load the Whisper model
        console.log("▸ Loading Whisper model...");
        const modelId = await loadModel({
          modelSrc: WHISPER_TINY,
          modelConfig: {
            audio_format: "f32le",
            // Sampling strategy
            strategy: "greedy",
            n_threads: 4,
            // Transcription options
            language: "en",
            translate: false,
            no_timestamps: false,
            single_segment: false,
            print_timestamps: true,
            token_timestamps: true,
            // Quality settings
            temperature: 0.0,
            suppress_blank: true,
            suppress_nst: true,
            // Advanced tuning
            entropy_thold: 2.4,
            logprob_thold: -1.0,
            // VAD configuration
            vad_params: {
              threshold: 0.35,
              min_speech_duration_ms: 200,
              min_silence_duration_ms: 150,
              max_speech_duration_s: 30.0,
              speech_pad_ms: 600,
              samples_overlap: 0.3,
            },
            // Context parameters for GPU
            contextParams: {
              use_gpu: true,
              flash_attn: true,
              gpu_device: 0,
            },
          },
          onProgress: (p) => {
            const mb = (n: number) => (n / 1e6).toFixed(1);
            const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
            process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
            if (p.percentage >= 100) process.stderr.write("\n");
          },
        });

        console.log(`▸ Whisper model loaded with ID: ${modelId}`);

        // Perform transcription
        console.log("▸ Transcribing audio...");
        const text = await transcribe({
          modelId,
          audioChunk: "examples/audio/sample-16khz.wav",
          prompt:
            "This is a test recording with clear speech and proper punctuation.",
        });

        console.log("▸ Transcription result:");
        console.log(text);

        // Unload the model when done
        console.log("▸ Unloading Whisper model...");
        await unloadModel({ modelId });
        console.log("▸ Whisper model unloaded successfully");
        process.exit(0);
      } catch (error) {
        console.error("✖", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Parakeet TDT

The following script shows an example of multilingual transcription using the Parakeet TDT model from a WAV file:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/transcription/parakeet-tdt-filesystem.js title="parakeet-tdt-filesystem.js" lineNumbers
      /**
       * Parakeet TDT transcription from a WAV file.
       *
       * Usage:
       *   bun run examples/transcription/parakeet-tdt-filesystem.ts <wav-file> [parakeet-tdt-gguf]
       *
       * Loads a single GGUF checkpoint (`PARAKEET_TDT_0_6B_V3_Q8_0` by default) and
       * transcribes the file with the batch `transcribe` API. Omit the model
       * argument to use the registry constant.
       *
       * Audio should be 16 kHz mono PCM in a WAV container.
       */
      import { loadModel, unloadModel, transcribe, PARAKEET_TDT_0_6B_V3_Q8_0, } from "@qvac/sdk";
      const args = process.argv.slice(2);
      if (!args[0]) {
          console.error("Usage: bun run examples/transcription/parakeet-tdt-filesystem.ts <wav-file-path> " +
              "[parakeet-tdt-gguf]");
          console.error("\nIf the model path is omitted, defaults to the registry model.");
          process.exit(1);
      }
      const audioFilePath = args[0];
      const parakeetModelSrc = args[1] ?? PARAKEET_TDT_0_6B_V3_Q8_0;
      try {
          console.log("▸ Starting Parakeet transcription example...");
          console.log("▸ Loading Parakeet model...");
          const modelId = await loadModel({
              modelSrc: parakeetModelSrc,
              modelType: "parakeet-transcription",
              onProgress: (p) => {
                  const mb = (n) => (n / 1e6).toFixed(1);
                  const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
                  process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
                  if (p.percentage >= 100)
                      process.stderr.write("\n");
              },
          });
          console.log(`▸ Parakeet model loaded with ID: ${modelId}`);
          console.log("▸ Transcribing audio...");
          const text = await transcribe({ modelId, audioChunk: audioFilePath });
          console.log(text);
          console.log("▸ Unloading Parakeet model...");
          await unloadModel({ modelId });
          console.log("▸ Parakeet model unloaded successfully");
      }
      catch (error) {
          console.error("✖", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/transcription/parakeet-tdt-filesystem.ts title="parakeet-tdt-filesystem.ts" lineNumbers
      /**
       * Parakeet TDT transcription from a WAV file.
       *
       * Usage:
       *   bun run examples/transcription/parakeet-tdt-filesystem.ts <wav-file> [parakeet-tdt-gguf]
       *
       * Loads a single GGUF checkpoint (`PARAKEET_TDT_0_6B_V3_Q8_0` by default) and
       * transcribes the file with the batch `transcribe` API. Omit the model
       * argument to use the registry constant.
       *
       * Audio should be 16 kHz mono PCM in a WAV container.
       */
      import {
        loadModel,
        unloadModel,
        transcribe,
        PARAKEET_TDT_0_6B_V3_Q8_0,
      } from "@qvac/sdk";

      const args = process.argv.slice(2);

      if (!args[0]) {
        console.error(
          "Usage: bun run examples/transcription/parakeet-tdt-filesystem.ts <wav-file-path> " +
            "[parakeet-tdt-gguf]",
        );
        console.error(
          "\nIf the model path is omitted, defaults to the registry model.",
        );
        process.exit(1);
      }

      const audioFilePath = args[0];
      const parakeetModelSrc = args[1] ?? PARAKEET_TDT_0_6B_V3_Q8_0;

      try {
        console.log("▸ Starting Parakeet transcription example...");

        console.log("▸ Loading Parakeet model...");
        const modelId = await loadModel({
          modelSrc: parakeetModelSrc,
          modelType: "parakeet-transcription",
          onProgress: (p) => {
            const mb = (n: number) => (n / 1e6).toFixed(1);
            const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
            process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
            if (p.percentage >= 100) process.stderr.write("\n");
          },
        });

        console.log(`▸ Parakeet model loaded with ID: ${modelId}`);

        console.log("▸ Transcribing audio...");
        const text = await transcribe({ modelId, audioChunk: audioFilePath });

        console.log(text);

        console.log("▸ Unloading Parakeet model...");
        await unloadModel({ modelId });
        console.log("▸ Parakeet model unloaded successfully");
      } catch (error) {
        console.error("✖", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Parakeet CTC

The following script shows an example of English-only transcription using the Parakeet CTC model from a WAV file:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/transcription/parakeet-ctc-filesystem.js title="parakeet-ctc-filesystem.js" lineNumbers
      /**
       * Parakeet CTC transcription from a WAV file.
       *
       * Usage:
       *   bun run examples/transcription/parakeet-ctc-filesystem.ts <wav-file> [parakeet-ctc-gguf]
       *
       * Loads a single GGUF checkpoint (`PARAKEET_CTC_0_6B_Q8_0` by default) and
       * transcribes the file with the batch `transcribe` API. Omit the model
       * argument to use the registry constant.
       *
       * Audio should be 16 kHz mono PCM in a WAV container.
       */
      import { loadModel, unloadModel, transcribe, PARAKEET_CTC_0_6B_Q8_0, } from "@qvac/sdk";
      const args = process.argv.slice(2);
      if (!args[0]) {
          console.error("Usage: bun run examples/transcription/parakeet-ctc-filesystem.ts <wav-file> " +
              "[parakeet-ctc-gguf]");
          console.error("\nIf the model path is omitted, defaults to the registry model.");
          process.exit(1);
      }
      const audioFilePath = args[0];
      const parakeetModelSrc = args[1] ?? PARAKEET_CTC_0_6B_Q8_0;
      try {
          console.log("▸ Loading Parakeet CTC model...");
          const modelId = await loadModel({
              modelSrc: parakeetModelSrc,
              modelType: "parakeet-transcription",
              onProgress: (p) => {
                  const mb = (n) => (n / 1e6).toFixed(1);
                  const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
                  process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
                  if (p.percentage >= 100)
                      process.stderr.write("\n");
              },
          });
          console.log(`▸ Parakeet CTC model loaded with ID: ${modelId}`);
          console.log("▸ Transcribing audio...");
          const text = await transcribe({ modelId, audioChunk: audioFilePath });
          console.log(text);
          console.log("▸ Unloading model...");
          await unloadModel({ modelId });
          console.log("▸ Done");
      }
      catch (error) {
          console.error("✖", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/transcription/parakeet-ctc-filesystem.ts title="parakeet-ctc-filesystem.ts" lineNumbers
      /**
       * Parakeet CTC transcription from a WAV file.
       *
       * Usage:
       *   bun run examples/transcription/parakeet-ctc-filesystem.ts <wav-file> [parakeet-ctc-gguf]
       *
       * Loads a single GGUF checkpoint (`PARAKEET_CTC_0_6B_Q8_0` by default) and
       * transcribes the file with the batch `transcribe` API. Omit the model
       * argument to use the registry constant.
       *
       * Audio should be 16 kHz mono PCM in a WAV container.
       */
      import {
        loadModel,
        unloadModel,
        transcribe,
        PARAKEET_CTC_0_6B_Q8_0,
      } from "@qvac/sdk";

      const args = process.argv.slice(2);

      if (!args[0]) {
        console.error(
          "Usage: bun run examples/transcription/parakeet-ctc-filesystem.ts <wav-file> " +
            "[parakeet-ctc-gguf]",
        );
        console.error(
          "\nIf the model path is omitted, defaults to the registry model.",
        );
        process.exit(1);
      }

      const audioFilePath = args[0];
      const parakeetModelSrc = args[1] ?? PARAKEET_CTC_0_6B_Q8_0;

      try {
        console.log("▸ Loading Parakeet CTC model...");
        const modelId = await loadModel({
          modelSrc: parakeetModelSrc,
          modelType: "parakeet-transcription",
          onProgress: (p) => {
            const mb = (n: number) => (n / 1e6).toFixed(1);
            const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
            process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
            if (p.percentage >= 100) process.stderr.write("\n");
          },
        });

        console.log(`▸ Parakeet CTC model loaded with ID: ${modelId}`);

        console.log("▸ Transcribing audio...");
        const text = await transcribe({ modelId, audioChunk: audioFilePath });

        console.log(text);

        console.log("▸ Unloading model...");
        await unloadModel({ modelId });
        console.log("▸ Done");
      } catch (error) {
        console.error("✖", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

### Parakeet Sortformer

The following script shows an example of speaker diarization using the Parakeet Sortformer model, followed by per-segment transcription with the TDT model:

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/transcription/parakeet-sortformer.js title="parakeet-sortformer.js" lineNumbers
      /**
       * Parakeet Sortformer diarization + TDT transcription pipeline.
       *
       * Usage:
       *   bun run examples/transcription/parakeet-sortformer.ts [sortformer-gguf] [wav-file]
       *
       * Two-step flow: Sortformer v2.1 diarizes the audio, then TDT transcribes each
       * speaker segment. Defaults to registry GGUFs and
       * `examples/audio/diarization-sample-16k.wav`. For live streaming + AOSC, see
       * `parakeet-sortformer-streaming.ts`.
       *
       * Sample audio is in the QVAC source repo but not the published npm package.
       * Download the default file into `examples/audio/`:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/audio/diarization-sample-16k.wav
       */
      import { loadModel, unloadModel, transcribe, PARAKEET_TDT_0_6B_V3_Q8_0, PARAKEET_SORTFORMER_4SPK_V2_1_Q8_0, } from "@qvac/sdk";
      import { dirname, join } from "path";
      import { fileURLToPath } from "url";
      import { readFileSync, writeFileSync, mkdirSync } from "fs";
      import { tmpdir } from "os";
      const __dirname = dirname(fileURLToPath(import.meta.url));
      const args = process.argv.slice(2);
      const sortformerSrc = args[0] ?? PARAKEET_SORTFORMER_4SPK_V2_1_Q8_0;
      const defaultAudioPath = join(__dirname, "..", "audio", "diarization-sample-16k.wav");
      const audioFilePath = args[1] ?? defaultAudioPath;
      try {
          // ── Step 1: Diarize with Sortformer ──
          const sfModelId = await loadModel({
              modelSrc: sortformerSrc,
              modelType: "parakeet-transcription",
          });
          const diarization = await transcribe({
              modelId: sfModelId,
              audioChunk: audioFilePath,
          });
          await unloadModel({ modelId: sfModelId });
          const segments = parseDiarization(diarization);
          // ── Step 2: Transcribe each segment with TDT ──
          const tdtModelId = await loadModel({
              modelSrc: PARAKEET_TDT_0_6B_V3_Q8_0,
          });
          const pcm = readPcm(audioFilePath);
          const sliceDir = join(tmpdir(), `qvac-diarize-${Date.now()}`);
          mkdirSync(sliceDir, { recursive: true });
          const results = [];
          for (let i = 0; i < segments.length; i++) {
              const seg = segments[i];
              const slicePath = join(sliceDir, `seg-${i}.wav`);
              if (!writeWavSlice(pcm, seg.start, seg.end, slicePath)) {
                  results.push({ ...seg, text: "[No speech detected]" });
                  continue;
              }
              const text = await transcribe({
                  modelId: tdtModelId,
                  audioChunk: slicePath,
              });
              results.push({ ...seg, text: text.trim() || "[No speech detected]" });
          }
          await unloadModel({ modelId: tdtModelId });
          // ── Step 3: Merge consecutive same-speaker segments and print ──
          const merged = mergeSpeakers(results);
          console.log("\n▸ Diarized transcription");
          for (const entry of merged) {
              console.log(`Speaker ${entry.speaker} (${entry.start.toFixed(2)}s - ${entry.end.toFixed(2)}s):`);
              console.log(`  ${entry.text}\n`);
          }
          console.log("▸ Done");
      }
      catch (error) {
          console.error("✖", error);
          process.exit(1);
      }
      // ── Helpers ──
      function parseDiarization(text) {
          const segs = [];
          for (const line of text.split("\n")) {
              const m = line.match(/Speaker (\d+): ([\d.]+)s - ([\d.]+)s/);
              if (m)
                  segs.push({ speaker: +m[1], start: +m[2], end: +m[3] });
          }
          return segs.sort((a, b) => a.start - b.start);
      }
      function readPcm(wavPath) {
          const buf = readFileSync(wavPath);
          const dataOffset = buf.indexOf("data") + 4;
          return buf.subarray(dataOffset + 4, dataOffset + 4 + buf.readUInt32LE(dataOffset));
      }
      function writeWavSlice(pcm, startSec, endSec, outPath) {
          const SR = 16000;
          const BPS = 2;
          const startByte = Math.floor(startSec * SR) * BPS;
          const endByte = Math.min(Math.ceil(endSec * SR) * BPS, pcm.length);
          if (startByte >= endByte)
              return false;
          const slice = pcm.subarray(startByte, endByte);
          const hdr = Buffer.alloc(44);
          hdr.write("RIFF", 0);
          hdr.writeUInt32LE(36 + slice.length, 4);
          hdr.write("WAVEfmt ", 8);
          hdr.writeUInt32LE(16, 16);
          hdr.writeUInt16LE(1, 20);
          hdr.writeUInt16LE(1, 22);
          hdr.writeUInt32LE(SR, 24);
          hdr.writeUInt32LE(SR * BPS, 28);
          hdr.writeUInt16LE(BPS, 32);
          hdr.writeUInt16LE(16, 34);
          hdr.write("data", 36);
          hdr.writeUInt32LE(slice.length, 40);
          writeFileSync(outPath, Buffer.concat([hdr, slice]));
          return true;
      }
      function mergeSpeakers(entries) {
          const out = [];
          for (const e of entries) {
              const last = out[out.length - 1];
              if (last && last.speaker === e.speaker) {
                  last.text += " " + e.text;
                  last.end = e.end;
              }
              else {
                  out.push({ ...e });
              }
          }
          return out;
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/transcription/parakeet-sortformer.ts title="parakeet-sortformer.ts" lineNumbers
      /**
       * Parakeet Sortformer diarization + TDT transcription pipeline.
       *
       * Usage:
       *   bun run examples/transcription/parakeet-sortformer.ts [sortformer-gguf] [wav-file]
       *
       * Two-step flow: Sortformer v2.1 diarizes the audio, then TDT transcribes each
       * speaker segment. Defaults to registry GGUFs and
       * `examples/audio/diarization-sample-16k.wav`. For live streaming + AOSC, see
       * `parakeet-sortformer-streaming.ts`.
       *
       * Sample audio is in the QVAC source repo but not the published npm package.
       * Download the default file into `examples/audio/`:
       *   https://github.com/tetherto/qvac/blob/main/packages/sdk/examples/audio/diarization-sample-16k.wav
       */
      import {
        loadModel,
        unloadModel,
        transcribe,
        PARAKEET_TDT_0_6B_V3_Q8_0,
        PARAKEET_SORTFORMER_4SPK_V2_1_Q8_0,
      } from "@qvac/sdk";
      import { dirname, join } from "path";
      import { fileURLToPath } from "url";
      import { readFileSync, writeFileSync, mkdirSync } from "fs";
      import { tmpdir } from "os";

      const __dirname = dirname(fileURLToPath(import.meta.url));

      const args = process.argv.slice(2);
      const sortformerSrc = args[0] ?? PARAKEET_SORTFORMER_4SPK_V2_1_Q8_0;

      const defaultAudioPath = join(
        __dirname,
        "..",
        "audio",
        "diarization-sample-16k.wav",
      );
      const audioFilePath = args[1] ?? defaultAudioPath;

      try {
        // ── Step 1: Diarize with Sortformer ──

        const sfModelId = await loadModel({
          modelSrc: sortformerSrc,
          modelType: "parakeet-transcription",
        });

        const diarization = await transcribe({
          modelId: sfModelId,
          audioChunk: audioFilePath,
        });
        await unloadModel({ modelId: sfModelId });

        const segments = parseDiarization(diarization);

        // ── Step 2: Transcribe each segment with TDT ──

        const tdtModelId = await loadModel({
          modelSrc: PARAKEET_TDT_0_6B_V3_Q8_0,
        });

        const pcm = readPcm(audioFilePath);
        const sliceDir = join(tmpdir(), `qvac-diarize-${Date.now()}`);
        mkdirSync(sliceDir, { recursive: true });

        const results: {
          speaker: number;
          start: number;
          end: number;
          text: string;
        }[] = [];

        for (let i = 0; i < segments.length; i++) {
          const seg = segments[i]!;
          const slicePath = join(sliceDir, `seg-${i}.wav`);

          if (!writeWavSlice(pcm, seg.start, seg.end, slicePath)) {
            results.push({ ...seg, text: "[No speech detected]" });
            continue;
          }

          const text = await transcribe({
            modelId: tdtModelId,
            audioChunk: slicePath,
          });
          results.push({ ...seg, text: text.trim() || "[No speech detected]" });
        }

        await unloadModel({ modelId: tdtModelId });

        // ── Step 3: Merge consecutive same-speaker segments and print ──

        const merged = mergeSpeakers(results);

        console.log("\n▸ Diarized transcription");
        for (const entry of merged) {
          console.log(
            `Speaker ${entry.speaker} (${entry.start.toFixed(2)}s - ${entry.end.toFixed(2)}s):`,
          );
          console.log(`  ${entry.text}\n`);
        }
        console.log("▸ Done");
      } catch (error) {
        console.error("✖", error);
        process.exit(1);
      }

      // ── Helpers ──

      function parseDiarization(text: string) {
        const segs: { speaker: number; start: number; end: number }[] = [];
        for (const line of text.split("\n")) {
          const m = line.match(/Speaker (\d+): ([\d.]+)s - ([\d.]+)s/);
          if (m) segs.push({ speaker: +m[1]!, start: +m[2]!, end: +m[3]! });
        }
        return segs.sort((a, b) => a.start - b.start);
      }

      function readPcm(wavPath: string): Buffer {
        const buf = readFileSync(wavPath);
        const dataOffset = buf.indexOf("data") + 4;
        return buf.subarray(
          dataOffset + 4,
          dataOffset + 4 + buf.readUInt32LE(dataOffset),
        );
      }

      function writeWavSlice(
        pcm: Buffer,
        startSec: number,
        endSec: number,
        outPath: string,
      ): boolean {
        const SR = 16000;
        const BPS = 2;
        const startByte = Math.floor(startSec * SR) * BPS;
        const endByte = Math.min(Math.ceil(endSec * SR) * BPS, pcm.length);
        if (startByte >= endByte) return false;

        const slice = pcm.subarray(startByte, endByte);
        const hdr = Buffer.alloc(44);
        hdr.write("RIFF", 0);
        hdr.writeUInt32LE(36 + slice.length, 4);
        hdr.write("WAVEfmt ", 8);
        hdr.writeUInt32LE(16, 16);
        hdr.writeUInt16LE(1, 20);
        hdr.writeUInt16LE(1, 22);
        hdr.writeUInt32LE(SR, 24);
        hdr.writeUInt32LE(SR * BPS, 28);
        hdr.writeUInt16LE(BPS, 32);
        hdr.writeUInt16LE(16, 34);
        hdr.write("data", 36);
        hdr.writeUInt32LE(slice.length, 40);

        writeFileSync(outPath, Buffer.concat([hdr, slice]));
        return true;
      }

      function mergeSpeakers<
        T extends { speaker: number; start: number; end: number; text: string },
      >(entries: T[]): T[] {
        const out: T[] = [];
        for (const e of entries) {
          const last = out[out.length - 1];
          if (last && last.speaker === e.speaker) {
            last.text += " " + e.text;
            last.end = e.end;
          } else {
            out.push({ ...e });
          }
        }
        return out;
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/quickstart).
</Callout>
