QVAC Logo

@qvac/translation-nmtcpp

Text-to-text neural machine translation (NMT).

Migration Note (v1.0.0+): Opus/Marian model support has been removed. Only IndicTrans2 and Bergamot backends are supported. If you were using Opus models, migrate to Bergamot for European language pairs.

Overview

Bare module that adds support for translation in QVAC using either qvac-fabric-llm.cpp or Bergamot as the inference engine.

Models

You should load a model compatible with your chosen inference engine:

  • qvac-fabric-llm.cpp (default): IndicTrans2, converted to GGML. Model file format: *.bin.
  • Bergamot: Bergamot model bundle. Required files: model *.bin + vocab*.spm.

Requirement

Bare ≥\geq v1.24

Installation

npm i @qvac/translation-nmtcpp

Quickstart

If you don't have Bare runtime, install it:

npm i -g bare

Create a new project:

mkdir qvac-translation-quickstart
cd qvac-translation-quickstart
npm init -y

Install dependencies:

npm i @qvac/translation-nmtcpp

Create example.js:

example.js

/**
 * Quickstart Example — Bergamot Backend
 *
 * This example demonstrates translation using the Bergamot backend
 * with local model files or auto-download via Firefox CDN (English to Italian).
 *
 * Usage:
 *   bare example.js
 *   BERGAMOT_MODEL_PATH=/path/to/bergamot/model bare example.js
 *
 * Enable verbose C++ logging:
 *   VERBOSE=1 bare example.js
 */

const TranslationNmtcpp = require('@qvac/translation-nmtcpp')
const {
  ensureBergamotModelFiles,
  getBergamotFileNames
} = require('@qvac/translation-nmtcpp/lib/bergamot-model-fetcher')
const path = require('bare-path')
const process = require('bare-process')

// ============================================================
// LOGGING CONFIGURATION
// Set VERBOSE=1 environment variable to enable C++ debug logs
// ============================================================
const VERBOSE = process.env.VERBOSE === '1' || process.env.VERBOSE === 'true'

const logger = VERBOSE
  ? {
      info: (msg) => console.log('[C++ INFO]', msg),
      warn: (msg) => console.warn('[C++ WARN]', msg),
      error: (msg) => console.error('[C++ ERROR]', msg),
      debug: (msg) => console.log('[C++ DEBUG]', msg)
    }
  : null // null = suppress all C++ logs

const text = 'Machine translation has revolutionized how we communicate across language barriers in the modern digital world.'

async function testBergamot () {
  console.log('\n=== Testing Bergamot Backend ===\n')

  const srcLang = 'en'
  const dstLang = 'it'

  // Use local model path if provided, otherwise auto-download
  const bergamotPath = process.env.BERGAMOT_MODEL_PATH || './model/bergamot/enit'

  // Ensure model files are present (downloads from Firefox CDN if not)
  const modelDir = await ensureBergamotModelFiles(srcLang, dstLang, bergamotPath)
  console.log('Model directory:', modelDir)

  const fileNames = getBergamotFileNames(srcLang, dstLang)

  console.log('Loading model...')

  // Create the model with resolved file paths
  const model = new TranslationNmtcpp({
    files: {
      model: path.join(modelDir, fileNames.modelName),
      srcVocab: path.join(modelDir, fileNames.srcVocabName),
      dstVocab: path.join(modelDir, fileNames.dstVocabName)
    },
    params: { mode: 'full', dstLang, srcLang },
    config: {
      modelType: TranslationNmtcpp.ModelTypes.Bergamot
    },
    logger // Pass the logger
  })

  // Load model
  await model.load()
  console.log('Model loaded successfully!')

  try {
    console.log('Running translation...')
    console.log('Input text:', text)

    // Run the Model
    const response = await model.run(text)

    await response
      .onUpdate(data => {
        console.log('Translation output:', data)
      })
      .await()

    console.log('Bergamot translation finished!')
  } finally {
    console.log('Unloading model...')
    await model.unload()
    console.log('Done!')
  }
}

async function main () {
  try {
    await testBergamot()

    console.log('\n=== All Tests Completed Successfully! ===\n')
  } catch (error) {
    console.error('Test failed:', error)
    throw error
  }
}

main()

Run example.js:

bare example.js

Usage

The library provides a straightforward and intuitive workflow for translating text. Irrespective of the chosen model, the workflow remains the same:

1. Obtain the model files

The model class is files-based: you pass it the resolved on-disk paths of the model weights (and, for Bergamot, the vocab files). The package ships fetcher helpers that download the files on demand and return the directory they were written to:

  • IndicTrans2 — ensureIndicTransModelFile() downloads the GGML model from the QVAC model registry (via @qvac/registry-client).
  • Bergamot — ensureBergamotModelFiles() downloads the model + vocab bundle from Mozilla's Firefox Remote Settings CDN.

Both helpers are idempotent: if a valid file already exists at the destination, they return immediately without re-downloading. You can also point the model class at files you have downloaded yourself — the fetchers are a convenience, not a requirement.

// IndicTrans2 — fetch from the QVAC model registry
const {
  ensureIndicTransModelFile,
  getIndicTransFileName
} = require('@qvac/translation-nmtcpp/lib/indictrans-model-fetcher')

const path = require('bare-path')

const modelPath = path.join('./model/indictrans', getIndicTransFileName())
await ensureIndicTransModelFile(modelPath) // downloads if not already present
// Bergamot — fetch from the Firefox Remote Settings CDN
const {
  ensureBergamotModelFiles,
  getBergamotFileNames
} = require('@qvac/translation-nmtcpp/lib/bergamot-model-fetcher')

const modelDir = await ensureBergamotModelFiles('en', 'it', './model/bergamot/enit')
const fileNames = getBergamotFileNames('en', 'it') // { modelName, srcVocabName, dstVocabName }

2. Create the args object

The model is constructed from a single options object: files (resolved paths from Step 1), params (languages and mode), config (model type and decoding options), and an optional logger.

The shape of files varies slightly depending on which backend you're using.

IndicTrans2

For Indic language translations (English ↔ Hindi, Bengali, Tamil, etc.) IndicTrans2 needs only the model weights file:

const args = {
  files: {
    model: modelPath // resolved path from Step 1
  },
  params: {
    mode: 'full',
    srcLang: 'eng_Latn',   // Source language (ISO 15924 code)
    dstLang: 'hin_Deva'    // Target language (ISO 15924 code)
  },
  config: {
    modelType: TranslationNmtcpp.ModelTypes.IndicTrans
  }
}

Key Parameters:

ParameterDescriptionExample
srcLangSource language (ISO 15924)'eng_Latn', 'hin_Deva', 'ben_Beng'
dstLangTarget language (ISO 15924)'eng_Latn', 'hin_Deva', 'tam_Taml'
files.modelPath to the model weights file'./model/indictrans/ggml-indictrans2-en-indic-dist-200M-q4_0.bin'
modelTypeRequired in config: TranslationNmtcpp.ModelTypes.IndicTrans-

IndicTrans2 model naming pattern:

  • ggml-indictrans2-{direction}-{size}.bin for q0f32 quantization
  • ggml-indictrans2-{direction}-{size}-q0f16.bin for q0f16 quantization
  • ggml-indictrans2-{direction}-{size}-q4_0.bin for q4_0 quantization

Where direction is en-indic, indic-en, or indic-indic, and size is dist-200M, dist-320M, or 1B.

Bergamot

Bergamot needs the model weights plus a source and target vocabulary file. Use the paths returned by ensureBergamotModelFiles() / getBergamotFileNames(), or point at files you downloaded yourself:

const path = require('bare-path')

const args = {
  files: {
    model: path.join(modelDir, fileNames.modelName),
    srcVocab: path.join(modelDir, fileNames.srcVocabName),
    dstVocab: path.join(modelDir, fileNames.dstVocabName)
  },
  params: {
    mode: 'full',
    srcLang: 'en',    // Source language (ISO 639-1 code)
    dstLang: 'it'     // Target language (ISO 639-1 code)
  },
  config: {
    modelType: TranslationNmtcpp.ModelTypes.Bergamot
  }
}

Bergamot Model Files by Language Pair:

Language PairModel FileVocab File(s)
en→itmodel.enit.intgemm.alphas.binvocab.enit.spm
it→enmodel.iten.intgemm.alphas.binvocab.iten.spm
en→esmodel.enes.intgemm.alphas.binvocab.enes.spm
es→enmodel.esen.intgemm.alphas.binvocab.esen.spm
en→frmodel.enfr.intgemm.alphas.binvocab.enfr.spm
fr→enmodel.fren.intgemm.alphas.bin(see Firefox Translations models)
en→demodel.ende.intgemm.alphas.binvocab.ende.spm
en→rumodel.enru.intgemm.alphas.binvocab.enru.spm
ru→enmodel.ruen.intgemm.alphas.binvocab.ruen.spm
en→zhmodel.enzh.intgemm.alphas.binsrcvocab.enzh.spm, trgvocab.enzh.spm
zh→enmodel.zhen.intgemm.alphas.binvocab.zhen.spm
en→jamodel.enja.intgemm.alphas.binsrcvocab.enja.spm, trgvocab.enja.spm
ja→enmodel.jaen.intgemm.alphas.binvocab.jaen.spm

getBergamotFileNames(srcLang, dstLang) returns the correct modelName, srcVocabName, and dstVocabName for each pair (including the separate source/target vocabs used by CJK languages), so you normally don't need to hard-code these.

Key Parameters:

ParameterDescriptionExample
srcLangSource language (ISO 639-1)'en', 'es', 'de'
dstLangTarget language (ISO 639-1)'it', 'fr', 'de'
files.modelPath to the model weights file'./model/bergamot/enit/model.enit.intgemm.alphas.bin'
files.srcVocabPath to the source vocabulary file'./model/bergamot/enit/vocab.enit.spm'
files.dstVocabPath to the target vocabulary file'./model/bergamot/enit/vocab.enit.spm'
modelTypeRequired in config: TranslationNmtcpp.ModelTypes.Bergamot-

Bergamot model file naming convention:

  • model.{srctgt}.intgemm.alphas.bin - Model weights (e.g., model.enit.intgemm.alphas.bin)
  • vocab.{srctgt}.spm - Shared vocabulary for most language pairs
  • srcvocab.{srctgt}.spm + trgvocab.{srctgt}.spm - Separate vocabs for CJK languages (zh, ja)

Model directory layout

Use a unique directory per model to avoid file conflicts when using multiple models:

  • ./model/indictrans for IndicTrans English→Hindi
  • ./model/bergamot/enit for Bergamot English→Italian

The list of supported languages for the srcLang and dstLang parameters differ by model type.

3. Create the config object

The config object contains two types of parameters:

  1. Model-specific parameters (required for some backends)
  2. Generation/decoding parameters (optional, controls output quality)

Model-Specific Parameters

ParameterIndicTrans2Bergamot
config.modelTypeRequiredRequired
files.srcVocabNot neededRequired (passed via files, see Step 2)
files.dstVocabNot neededRequired (passed via files, see Step 2)

Generation/Decoding Parameters (IndicTrans Only)

These parameters control how the model generates output. Note: Full parameter support is only available for IndicTrans2 models. Bergamot has limited parameter support.

// Generation parameters for IndicTrans2
const generationParams = {
  beamsize: 4,            // Beam search width (>=1). 1 disables beam search
  lengthpenalty: 0.6,     // Length normalization strength (>=0)
  maxlength: 128,         // Maximum generated tokens (>0)
  repetitionpenalty: 1.2, // Penalize previously generated tokens (0..2)
  norepeatngramsize: 2,   // Disallow repeating n-grams of this size (0..10)
  temperature: 0.8,       // Sampling temperature [0..2]
  topk: 40,               // Keep top-K logits [0..vocab_size]
  topp: 0.9               // Nucleus sampling threshold (0 < p <= 1)
}

4. Create Model Instance

Import TranslationNmtcpp and create an instance from the single options object built in Step 2. Decoding options from Step 3 are merged into args.config:

const TranslationNmtcpp = require('@qvac/translation-nmtcpp')

IndicTrans2

// IndicTrans - must specify modelType + generation parameters
const model = new TranslationNmtcpp({
  ...args, // files + params from Step 2
  config: {
    modelType: TranslationNmtcpp.ModelTypes.IndicTrans,
    ...generationParams,  // Spread generation params from Step 3
    maxlength: 256        // Override for longer outputs
  }
})

Bergamot

// Bergamot - vocab files are passed via `files` (see Step 2); limited generation params support
const model = new TranslationNmtcpp({
  ...args, // files (model + srcVocab + dstVocab) + params from Step 2
  config: {
    modelType: TranslationNmtcpp.ModelTypes.Bergamot,
    beamsize: 4 // Only beamsize supported for Bergamot
  }
})

Available Model Types:

TranslationNmtcpp.ModelTypes = {
  IndicTrans: 'IndicTrans', // Indic language models
  Bergamot: 'Bergamot'      // Firefox Translations models
}

5. Load Model

try {
  // Basic usage
  await model.load()
} catch (error) {
  console.error('Failed to load model:', error)
}

6. Run the Model

We can perform inference on the input text using the run() method. This method returns a QVACResponse object.

try {
  // Execute translation on input text
  const response = await model.run('Hello world! Welcome to the internet of peers!')

  // Process streamed output using callback
  await response
    .onUpdate(outputChunk => {
      // Handle each new piece of translated text
      console.log(outputChunk)
    })
    .await() // Wait for translation to complete

  // Access performance statistics (if enabled with opts.stats)
  if (response.stats) {
    console.log('Translation completed in:', response.stats.totalTime, 'ms')
  }
} catch (error) {
  console.error('Translation failed:', error)
}

7. Batch Translation (Bergamot Only)

For translating multiple texts efficiently, use the runBatch() method instead of calling run() multiple times.

runBatch() is only available with the Bergamot backend. IndicTrans2 models should use sequential run() calls.

// Array of texts to translate (English)
const textsToTranslate = [
  'Hello world!',
  'How are you today?',
  'Machine translation has revolutionized communication.'
]

try {
  // Batch translation - returns array of translated strings
  const translations = await model.runBatch(textsToTranslate)

  // Output each translation
  translations.forEach((translatedText, index) => {
    console.log(`Original: ${textsToTranslate[index]}`)
    console.log(`Translated: ${translatedText}\n`)
  })
} catch (error) {
  console.error('Batch translation failed:', error)
}

runBatch() vs run():

MethodInputOutputBackend Support
run(text)Single stringQVACResponse with streamingAll (IndicTrans, Bergamot)
runBatch(texts)Array of stringsArray of stringsBergamot only

runBatch() is significantly faster when translating multiple texts as it processes them in a single batch operation.

8. Unload the Model

// Always unload the model when finished to free memory
try {
  await model.unload()
} catch (error) {
  console.error('Failed to unload model:', error)
}

Supported Languages

IndicTrans2 Models (QVAC registry)

IndicTrans2 supports translation between English and 22 Indic languages. The following directions are available via the QVAC model registry:

DirectionAvailableSizes
English → IndicYes200M, 1B
Indic → EnglishYes200M, 1B
Indic → IndicYes320M, 1B

Supported Indic Languages:

Assamese (asm_Beng)Kashmiri (Arabic) (kas_Arab)Punjabi (pan_Guru)
Bengali (ben_Beng)Kashmiri (Devanagari) (kas_Deva)Sanskrit (san_Deva)
Bodo (brx_Deva)Maithili (mai_Deva)Santali (sat_Olck)
Dogri (doi_Deva)Malayalam (mal_Mlym)Sindhi (Arabic) (snd_Arab)
English (eng_Latn)Marathi (mar_Deva)Sindhi (Devanagari) (snd_Deva)
Konkani (gom_Deva)Manipuri (Bengali) (mni_Beng)Tamil (tam_Taml)
Gujarati (guj_Gujr)Manipuri (Meitei) (mni_Mtei)Telugu (tel_Telu)
Hindi (hin_Deva)Nepali (npi_Deva)Urdu (urd_Arab)
Kannada (kan_Knda)Odia (ory_Orya)

Bergamot Models (Firefox Translations)

Language pairs available via the Firefox Remote Settings CDN:

LanguageCodeen→XX→en
ArabicarYesYes
CzechcsYesYes
SpanishesYesYes
FrenchfrYesYes
ItalianitYesYes
JapanesejaYesYes
PortugueseptYesYes
RussianruYesYes
ChinesezhYesYes

The Bergamot backend supports all language pairs available in Firefox Translations. See the Firefox Translations models repository for the complete and up-to-date list of supported language pairs.

ModelClasses and Packages

ModelClass

The main class exported by this library is TranslationNmtcpp, which supports multiple translation backends:

const TranslationNmtcpp = require('@qvac/translation-nmtcpp')

// Available model types
TranslationNmtcpp.ModelTypes = {
  IndicTrans: 'IndicTrans',  // For Indic language translations
  Bergamot: 'Bergamot'       // For Bergamot/Firefox translations
}

Available Packages

Main Package

PackageDescriptionBackendsLanguages
@qvac/translation-nmtcppMain translation packageBergamot, IndicTransSee Supported Languages

The main package supports both backends and all their respective languages. See Supported Languages for the complete list.

Logging

The library supports configurable logging for both JavaScript and C++ (native) components. By default, C++ logs are suppressed for cleaner output.

Enabling C++ Logs

To enable verbose C++ logging, pass a logger object in the options:

// Enable C++ logging
const logger = {
  info: (msg) => console.log('[C++ INFO]', msg),
  warn: (msg) => console.warn('[C++ WARN]', msg),
  error: (msg) => console.error('[C++ ERROR]', msg),
  debug: (msg) => console.log('[C++ DEBUG]', msg)
}

const args = {
  files: { model: modelPath, srcVocab, dstVocab }, // resolved paths (see Step 1)
  params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
  config: { modelType: TranslationNmtcpp.ModelTypes.Bergamot },
  logger  // Pass logger to enable C++ logs
}

Disabling C++ Logs

To suppress all C++ logs, either omit the logger parameter or set it to null:

const args = {
  files: { model: modelPath, srcVocab, dstVocab }, // resolved paths (see Step 1)
  params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
  config: { modelType: TranslationNmtcpp.ModelTypes.Bergamot }
  // No logger = suppress C++ logs
}

All examples support the VERBOSE environment variable:

# Run with C++ logging disabled (default)
bare examples/quickstart.js

# Run with C++ logging enabled
VERBOSE=1 bare examples/quickstart.js

Log Levels

The C++ backend supports these log levels (mapped from native priority):

PriorityLevelDescription
0errorCritical errors
1warnWarnings
2infoInformational messages
3debugDebug/trace messages

More resources

Package at npm

On this page

Ask anything about QVAC.