QVAC Logo

Text-to-Speech

Speech synthesis for text-to-speech (TTS) — i.e., generate audio using custom voices from written input.

Overview

Text-to-Speech uses @qvac/tts-ggml (GGML) as the inference engine. Load any supported model using modelType: "tts". Then, provide text as input (with inputType: "text") to generate speech audio.

textToSpeech() returns an object containing buffer and, when streaming is enabled, a bufferStream for incremental audio output.

Functions

Use the following sequence of function calls:

  1. loadModel()
  2. textToSpeech()
  3. unloadModel()

For how to use each function, see SDK — API reference.

Models

Chatterbox

Chatterbox uses a T3 GGUF as the top-level modelSrc and an S3Gen companion GGUF via modelConfig.s3genModelSrc. Optional referenceAudioSrc supplies a WAV for voice cloning.

await loadModel({
  modelSrc: TTS_T3_TURBO_EN_CHATTERBOX_Q8_0,
  modelType: "tts",
  modelConfig: {
    ttsEngine: "chatterbox",
    language: "en",
    s3genModelSrc: TTS_S3GEN_EN_CHATTERBOX,
  },
});

Omitting ttsEngine defaults to Chatterbox (same as the former ONNX plugin).

Supertonic

Supertonic uses a single GGUF via top-level modelSrc. Set voice, ttsSpeed, and ttsNumInferenceSteps in modelConfig as needed. Multilingual output is selected by the GGUF (e.g. TTS_MULTILINGUAL_SUPERTONIC2_Q8_0) plus language — not a separate runtime flag.

await loadModel({
  modelSrc: TTS_EN_SUPERTONIC_Q8_0,
  modelType: "tts",
  modelConfig: {
    ttsEngine: "supertonic",
    language: "en",
    voice: "F1",
  },
});

For model constants, see SDK — Models.

Migrating from @qvac/tts-onnx (multi-file ONNX): the legacy modelConfig shape (ttsSpeechEncoderSrc, ttsEmbedTokensSrc, ttsConditionalDecoderSrc, ttsLanguageModelSrc, ttsTokenizerSrc, Supertonic ttsTextEncoderSrc / ttsDurationPredictorSrc / …, and ttsSupertonicMultilingual) is no longer supported. Passing any of those fields raises a structured LegacyTtsModelDeprecatedError with a migration message. Plugin import path: @qvac/sdk/tts-ggml/plugin (a temporary @qvac/sdk/onnx-tts/plugin alias exists for older CLI bundles). Legacy ONNX model constants remain exported for one minor cycle for codemod migrations only.

Example

Chatterbox

The following script shows an example of Chatterbox TTS with voice cloning from a reference audio file. Use it with utils.js / utils.ts:

tts-chatterbox.js
import { loadModel, textToSpeech, unloadModel, TTS_T3_TURBO_EN_CHATTERBOX_Q8_0, TTS_S3GEN_EN_CHATTERBOX, } from "@qvac/sdk";
import { createWav, playAudio, int16ArrayToBuffer, createWavHeader, } from "./utils";
// Chatterbox TTS (GGML): voice cloning with optional reference audio.
// Uses registry model constants — downloads automatically from QVAC Registry.
// Usage: node chatterbox.ts [referenceAudioSrc]
const [referenceAudioSrc] = process.argv.slice(2);
const CHATTERBOX_SAMPLE_RATE = 24000;
try {
    const modelId = await loadModel({
        modelSrc: TTS_T3_TURBO_EN_CHATTERBOX_Q8_0,
        modelConfig: {
            ttsEngine: "chatterbox",
            language: "en",
            s3genModelSrc: TTS_S3GEN_EN_CHATTERBOX.src,
            ...(referenceAudioSrc ? { referenceAudioSrc } : {}),
        },
        onProgress: (progress) => {
            console.log(progress);
        },
    });
    console.log(`Model loaded: ${modelId}`);
    console.log("🎵 Testing Text-to-Speech...");
    const result = textToSpeech({
        modelId,
        text: `QVAC SDK is the canonical entry point to QVAC. Written in TypeScript, it provides all QVAC capabilities through a unified interface while also abstracting away the complexity of running your application in a JS environment other than Bare. Supported JS environments include Bare, Node.js, Expo and Bun.`,
        inputType: "text",
        stream: false,
    });
    const audioBuffer = await result.buffer;
    console.log(`TTS complete. Total bytes: ${audioBuffer.length}`);
    console.log("💾 Saving audio to file...");
    createWav(audioBuffer, CHATTERBOX_SAMPLE_RATE, "tts-output.wav");
    console.log("✅ Audio saved to tts-output.wav");
    console.log("🔊 Playing audio...");
    const audioData = int16ArrayToBuffer(audioBuffer);
    const wavBuffer = Buffer.concat([
        createWavHeader(audioData.length, CHATTERBOX_SAMPLE_RATE),
        audioData,
    ]);
    playAudio(wavBuffer);
    console.log("✅ Audio playback complete");
    await unloadModel({ modelId });
    console.log("Model unloaded");
    process.exit(0);
}
catch (error) {
    console.error("❌ Error:", error);
    process.exit(1);
}

Supertonic

The following script shows an example of Supertonic TTS for general-purpose speech synthesis. Use it with utils.js / utils.ts:

tts-supertonic.js
import { loadModel, textToSpeech, unloadModel, TTS_EN_SUPERTONIC_Q8_0, } from "@qvac/sdk";
import { createWav, playAudio, int16ArrayToBuffer, createWavHeader, } from "./utils";
// Supertonic TTS (GGML): fast English synthesis with baked-in voices.
// Uses registry model constants — downloads automatically from QVAC Registry.
const SUPERTONIC_SAMPLE_RATE = 44100;
try {
    const modelId = await loadModel({
        modelSrc: TTS_EN_SUPERTONIC_Q8_0,
        modelConfig: {
            ttsEngine: "supertonic",
            language: "en",
            voice: "F1",
            ttsSpeed: 1.05,
            ttsNumInferenceSteps: 5,
        },
        onProgress: (progress) => {
            console.log(progress);
        },
    });
    console.log(`Model loaded: ${modelId}`);
    console.log("🎵 Testing Text-to-Speech...");
    const result = textToSpeech({
        modelId,
        text: `QVAC SDK is the canonical entry point to QVAC. Written in TypeScript, it provides all QVAC capabilities through a unified interface while also abstracting away the complexity of running your application in a JS environment other than Bare. Supported JS environments include Bare, Node.js, Expo and Bun.`,
        inputType: "text",
        stream: false,
    });
    const audioBuffer = await result.buffer;
    console.log(`TTS complete. Total samples: ${audioBuffer.length}`);
    console.log("💾 Saving audio to file...");
    createWav(audioBuffer, SUPERTONIC_SAMPLE_RATE, "supertonic-output.wav");
    console.log("✅ Audio saved to supertonic-output.wav");
    console.log("🔊 Playing audio...");
    const audioData = int16ArrayToBuffer(audioBuffer);
    const wavBuffer = Buffer.concat([
        createWavHeader(audioData.length, SUPERTONIC_SAMPLE_RATE),
        audioData,
    ]);
    playAudio(wavBuffer);
    console.log("✅ Audio playback complete");
    await unloadModel({ modelId });
    console.log("Model unloaded");
    process.exit(0);
}
catch (error) {
    console.error("❌ Error:", error);
    process.exit(1);
}

Utils

The following helper script is used by both examples above to convert the raw PCM samples returned by textToSpeech() into a WAV file and play it back:

utils.js
import { writeFileSync, unlinkSync } from "fs";
import { spawn, spawnSync } from "child_process";
import { platform, tmpdir } from "os";
import { join } from "path";
/**
 * Create WAV header for 16-bit PCM audio
 */
export function createWavHeader(dataLength, sampleRate) {
    const header = Buffer.alloc(44);
    // RIFF header
    header.write("RIFF", 0);
    header.writeUInt32LE(36 + dataLength, 4);
    header.write("WAVE", 8);
    // fmt chunk
    header.write("fmt ", 12);
    header.writeUInt32LE(16, 16); // fmt chunk size
    header.writeUInt16LE(1, 20); // PCM format
    header.writeUInt16LE(1, 22); // mono
    header.writeUInt32LE(sampleRate, 24);
    header.writeUInt32LE(sampleRate * 2, 28); // byte rate
    header.writeUInt16LE(2, 32); // block align
    header.writeUInt16LE(16, 34); // bits per sample
    // data chunk
    header.write("data", 36);
    header.writeUInt32LE(dataLength, 40);
    return header;
}
/**
 * Convert Int16Array to Buffer
 */
export function int16ArrayToBuffer(samples) {
    const buffer = Buffer.alloc(samples.length * 2);
    for (let i = 0; i < samples.length; i++) {
        const value = Math.max(-32768, Math.min(32767, Math.round(samples[i] ?? 0)));
        buffer.writeInt16LE(value, i * 2);
    }
    return buffer;
}
/**
 * Create and save WAV file
 */
export function createWav(audioBuffer, sampleRate, filename) {
    const audioData = int16ArrayToBuffer(audioBuffer);
    const wavHeader = createWavHeader(audioData.length, sampleRate);
    const wavFile = Buffer.concat([wavHeader, audioData]);
    writeFileSync(filename, wavFile);
    console.log(`WAV file saved as: ${filename}`);
}
/**
 * Play a WAV buffer by streaming it into ffplay over stdin.
 *
 * ffplay ships with ffmpeg and is cross-platform (macOS/Linux/Windows), so
 * we avoid the old "write to /tmp then shell out to afplay/aplay/powershell"
 * dance — no temp files, no platform switch, no hardcoded /tmp path (which
 * doesn't exist on Windows). Requires ffplay on PATH.
 */
/**
 * Play one mono s16le PCM chunk (as a minimal WAV) and wait for the player to finish.
 * Chunks are played sequentially when awaited in order — suitable for streaming TTS output.
 */
export function playPcmInt16Chunk(samples, sampleRate) {
    if (samples.length === 0) {
        return Promise.resolve();
    }
    const audioData = int16ArrayToBuffer(samples);
    const wavHeader = createWavHeader(audioData.length, sampleRate);
    const wavFile = Buffer.concat([wavHeader, audioData]);
    // `os.tmpdir()` resolves to the OS-specific temp directory (e.g. `%TEMP%`
    // on Windows), so the Windows branch below no longer tries to read a
    // POSIX-only `/tmp/...` path.
    const tempFile = join(tmpdir(), `qvac-tts-chunk-${Date.now()}-${Math.random().toString(16).slice(2)}.wav`);
    writeFileSync(tempFile, wavFile);
    const currentPlatform = platform();
    let audioPlayer;
    let args;
    switch (currentPlatform) {
        case "darwin":
            audioPlayer = "afplay";
            args = [tempFile];
            break;
        case "linux":
            audioPlayer = "aplay";
            args = [tempFile];
            break;
        case "win32":
            audioPlayer = "powershell";
            args = [
                "-Command",
                `Add-Type -AssemblyName presentationCore; (New-Object Media.SoundPlayer).LoadStream([System.IO.File]::ReadAllBytes('${tempFile}')).PlaySync()`,
            ];
            break;
        default:
            audioPlayer = "aplay";
            args = [tempFile];
    }
    return new Promise(function (resolve, reject) {
        const proc = spawn(audioPlayer, args, { stdio: "ignore" });
        proc.on("error", function (err) {
            try {
                unlinkSync(tempFile);
            }
            catch {
                // ignore
            }
            reject(err);
        });
        proc.on("close", function (code) {
            try {
                unlinkSync(tempFile);
            }
            catch {
                // ignore
            }
            if (code === 0) {
                resolve();
            }
            else {
                reject(new Error(`Audio player exited with code ${code}`));
            }
        });
    });
}
export function playAudio(audioBuffer) {
    const result = spawnSync("ffplay", [
        "-hide_banner",
        "-loglevel",
        "error",
        "-autoexit",
        "-nodisp",
        "-i",
        "pipe:0",
    ], {
        input: audioBuffer,
        stdio: ["pipe", "inherit", "inherit"],
    });
    if (result.error) {
        const code = result.error.code;
        if (code === "ENOENT") {
            throw new Error("ffplay not found on PATH. Install ffmpeg (ffplay ships with it) and retry.");
        }
        throw new Error(`ffplay failed: ${result.error.message}`);
    }
    if (result.status !== 0) {
        throw new Error(`ffplay exited with code ${result.status}`);
    }
}

Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.

On this page

Ask AI anything about QVAC…