@qvac/translation-nmtcpp
Text-to-text neural machine translation (NMT).
Migration Note (v1.0.0+): Opus/Marian model support has been removed. Only IndicTrans2 and Bergamot backends are supported. If you were using Opus models, migrate to Bergamot for European language pairs.
Overview
Bare module that adds support for translation in QVAC using either qvac-fabric-llm.cpp or Bergamot as the inference engine.
Models
You should load a model compatible with your chosen inference engine:
qvac-fabric-llm.cpp(default): IndicTrans2, converted to GGML. Model file format:*.bin.- Bergamot: Bergamot model bundle. Required files: model
*.bin+vocab*.spm.
Requirement
Bare v1.24
Installation
npm i @qvac/translation-nmtcppQuickstart
If you don't have Bare runtime, install it:
npm i -g bareCreate a new project:
mkdir qvac-translation-quickstart
cd qvac-translation-quickstart
npm init -yInstall dependencies:
npm i @qvac/translation-nmtcppCreate example.js:
/**
* Quickstart Example — Bergamot Backend
*
* This example demonstrates translation using the Bergamot backend
* with local model files or auto-download via Firefox CDN (English to Italian).
*
* Usage:
* bare example.js
* BERGAMOT_MODEL_PATH=/path/to/bergamot/model bare example.js
*
* Enable verbose C++ logging:
* VERBOSE=1 bare example.js
*/
const TranslationNmtcpp = require('@qvac/translation-nmtcpp')
const {
ensureBergamotModelFiles,
getBergamotFileNames
} = require('@qvac/translation-nmtcpp/lib/bergamot-model-fetcher')
const path = require('bare-path')
const process = require('bare-process')
// ============================================================
// LOGGING CONFIGURATION
// Set VERBOSE=1 environment variable to enable C++ debug logs
// ============================================================
const VERBOSE = process.env.VERBOSE === '1' || process.env.VERBOSE === 'true'
const logger = VERBOSE
? {
info: (msg) => console.log('[C++ INFO]', msg),
warn: (msg) => console.warn('[C++ WARN]', msg),
error: (msg) => console.error('[C++ ERROR]', msg),
debug: (msg) => console.log('[C++ DEBUG]', msg)
}
: null // null = suppress all C++ logs
const text = 'Machine translation has revolutionized how we communicate across language barriers in the modern digital world.'
async function testBergamot () {
console.log('\n=== Testing Bergamot Backend ===\n')
const srcLang = 'en'
const dstLang = 'it'
// Use local model path if provided, otherwise auto-download
const bergamotPath = process.env.BERGAMOT_MODEL_PATH || './model/bergamot/enit'
// Ensure model files are present (downloads from Firefox CDN if not)
const modelDir = await ensureBergamotModelFiles(srcLang, dstLang, bergamotPath)
console.log('Model directory:', modelDir)
const fileNames = getBergamotFileNames(srcLang, dstLang)
console.log('Loading model...')
// Create the model with resolved file paths
const model = new TranslationNmtcpp({
files: {
model: path.join(modelDir, fileNames.modelName),
srcVocab: path.join(modelDir, fileNames.srcVocabName),
dstVocab: path.join(modelDir, fileNames.dstVocabName)
},
params: { mode: 'full', dstLang, srcLang },
config: {
modelType: TranslationNmtcpp.ModelTypes.Bergamot
},
logger // Pass the logger
})
// Load model
await model.load()
console.log('Model loaded successfully!')
try {
console.log('Running translation...')
console.log('Input text:', text)
// Run the Model
const response = await model.run(text)
await response
.onUpdate(data => {
console.log('Translation output:', data)
})
.await()
console.log('Bergamot translation finished!')
} finally {
console.log('Unloading model...')
await model.unload()
console.log('Done!')
}
}
async function main () {
try {
await testBergamot()
console.log('\n=== All Tests Completed Successfully! ===\n')
} catch (error) {
console.error('Test failed:', error)
throw error
}
}
main()Run example.js:
bare example.jsUsage
The library provides a straightforward and intuitive workflow for translating text. Irrespective of the chosen model, the workflow remains the same:
1. Obtain the model files
The model class is files-based: you pass it the resolved on-disk paths of the model weights (and, for Bergamot, the vocab files). The package ships fetcher helpers that download the files on demand and return the directory they were written to:
- IndicTrans2 —
ensureIndicTransModelFile()downloads the GGML model from the QVAC model registry (via@qvac/registry-client). - Bergamot —
ensureBergamotModelFiles()downloads the model + vocab bundle from Mozilla's Firefox Remote Settings CDN.
Both helpers are idempotent: if a valid file already exists at the destination, they return immediately without re-downloading. You can also point the model class at files you have downloaded yourself — the fetchers are a convenience, not a requirement.
// IndicTrans2 — fetch from the QVAC model registry
const {
ensureIndicTransModelFile,
getIndicTransFileName
} = require('@qvac/translation-nmtcpp/lib/indictrans-model-fetcher')
const path = require('bare-path')
const modelPath = path.join('./model/indictrans', getIndicTransFileName())
await ensureIndicTransModelFile(modelPath) // downloads if not already present// Bergamot — fetch from the Firefox Remote Settings CDN
const {
ensureBergamotModelFiles,
getBergamotFileNames
} = require('@qvac/translation-nmtcpp/lib/bergamot-model-fetcher')
const modelDir = await ensureBergamotModelFiles('en', 'it', './model/bergamot/enit')
const fileNames = getBergamotFileNames('en', 'it') // { modelName, srcVocabName, dstVocabName }2. Create the args object
The model is constructed from a single options object: files (resolved paths from Step 1), params (languages and mode), config (model type and decoding options), and an optional logger.
The shape of files varies slightly depending on which backend you're using.
IndicTrans2
For Indic language translations (English ↔ Hindi, Bengali, Tamil, etc.) IndicTrans2 needs only the model weights file:
const args = {
files: {
model: modelPath // resolved path from Step 1
},
params: {
mode: 'full',
srcLang: 'eng_Latn', // Source language (ISO 15924 code)
dstLang: 'hin_Deva' // Target language (ISO 15924 code)
},
config: {
modelType: TranslationNmtcpp.ModelTypes.IndicTrans
}
}Key Parameters:
| Parameter | Description | Example |
|---|---|---|
srcLang | Source language (ISO 15924) | 'eng_Latn', 'hin_Deva', 'ben_Beng' |
dstLang | Target language (ISO 15924) | 'eng_Latn', 'hin_Deva', 'tam_Taml' |
files.model | Path to the model weights file | './model/indictrans/ggml-indictrans2-en-indic-dist-200M-q4_0.bin' |
modelType | Required in config: TranslationNmtcpp.ModelTypes.IndicTrans | - |
IndicTrans2 model naming pattern:
ggml-indictrans2-{direction}-{size}.binfor q0f32 quantizationggml-indictrans2-{direction}-{size}-q0f16.binfor q0f16 quantizationggml-indictrans2-{direction}-{size}-q4_0.binfor q4_0 quantization
Where direction is en-indic, indic-en, or indic-indic, and size is dist-200M, dist-320M, or 1B.
Bergamot
Bergamot needs the model weights plus a source and target vocabulary file. Use the paths returned by ensureBergamotModelFiles() / getBergamotFileNames(), or point at files you downloaded yourself:
const path = require('bare-path')
const args = {
files: {
model: path.join(modelDir, fileNames.modelName),
srcVocab: path.join(modelDir, fileNames.srcVocabName),
dstVocab: path.join(modelDir, fileNames.dstVocabName)
},
params: {
mode: 'full',
srcLang: 'en', // Source language (ISO 639-1 code)
dstLang: 'it' // Target language (ISO 639-1 code)
},
config: {
modelType: TranslationNmtcpp.ModelTypes.Bergamot
}
}Bergamot Model Files by Language Pair:
| Language Pair | Model File | Vocab File(s) |
|---|---|---|
| en→it | model.enit.intgemm.alphas.bin | vocab.enit.spm |
| it→en | model.iten.intgemm.alphas.bin | vocab.iten.spm |
| en→es | model.enes.intgemm.alphas.bin | vocab.enes.spm |
| es→en | model.esen.intgemm.alphas.bin | vocab.esen.spm |
| en→fr | model.enfr.intgemm.alphas.bin | vocab.enfr.spm |
| fr→en | model.fren.intgemm.alphas.bin | (see Firefox Translations models) |
| en→de | model.ende.intgemm.alphas.bin | vocab.ende.spm |
| en→ru | model.enru.intgemm.alphas.bin | vocab.enru.spm |
| ru→en | model.ruen.intgemm.alphas.bin | vocab.ruen.spm |
| en→zh | model.enzh.intgemm.alphas.bin | srcvocab.enzh.spm, trgvocab.enzh.spm |
| zh→en | model.zhen.intgemm.alphas.bin | vocab.zhen.spm |
| en→ja | model.enja.intgemm.alphas.bin | srcvocab.enja.spm, trgvocab.enja.spm |
| ja→en | model.jaen.intgemm.alphas.bin | vocab.jaen.spm |
getBergamotFileNames(srcLang, dstLang) returns the correct modelName, srcVocabName, and dstVocabName for each pair (including the separate source/target vocabs used by CJK languages), so you normally don't need to hard-code these.
Key Parameters:
| Parameter | Description | Example |
|---|---|---|
srcLang | Source language (ISO 639-1) | 'en', 'es', 'de' |
dstLang | Target language (ISO 639-1) | 'it', 'fr', 'de' |
files.model | Path to the model weights file | './model/bergamot/enit/model.enit.intgemm.alphas.bin' |
files.srcVocab | Path to the source vocabulary file | './model/bergamot/enit/vocab.enit.spm' |
files.dstVocab | Path to the target vocabulary file | './model/bergamot/enit/vocab.enit.spm' |
modelType | Required in config: TranslationNmtcpp.ModelTypes.Bergamot | - |
Bergamot model file naming convention:
model.{srctgt}.intgemm.alphas.bin- Model weights (e.g.,model.enit.intgemm.alphas.bin)vocab.{srctgt}.spm- Shared vocabulary for most language pairssrcvocab.{srctgt}.spm+trgvocab.{srctgt}.spm- Separate vocabs for CJK languages (zh, ja)
Model directory layout
Use a unique directory per model to avoid file conflicts when using multiple models:
./model/indictransfor IndicTrans English→Hindi./model/bergamot/enitfor Bergamot English→Italian
The list of supported languages for the srcLang and dstLang parameters differ by model type.
3. Create the config object
The config object contains two types of parameters:
- Model-specific parameters (required for some backends)
- Generation/decoding parameters (optional, controls output quality)
Model-Specific Parameters
| Parameter | IndicTrans2 | Bergamot |
|---|---|---|
config.modelType | Required | Required |
files.srcVocab | Not needed | Required (passed via files, see Step 2) |
files.dstVocab | Not needed | Required (passed via files, see Step 2) |
Generation/Decoding Parameters (IndicTrans Only)
These parameters control how the model generates output. Note: Full parameter support is only available for IndicTrans2 models. Bergamot has limited parameter support.
// Generation parameters for IndicTrans2
const generationParams = {
beamsize: 4, // Beam search width (>=1). 1 disables beam search
lengthpenalty: 0.6, // Length normalization strength (>=0)
maxlength: 128, // Maximum generated tokens (>0)
repetitionpenalty: 1.2, // Penalize previously generated tokens (0..2)
norepeatngramsize: 2, // Disallow repeating n-grams of this size (0..10)
temperature: 0.8, // Sampling temperature [0..2]
topk: 40, // Keep top-K logits [0..vocab_size]
topp: 0.9 // Nucleus sampling threshold (0 < p <= 1)
}4. Create Model Instance
Import TranslationNmtcpp and create an instance from the single options object built in Step 2. Decoding options from Step 3 are merged into args.config:
const TranslationNmtcpp = require('@qvac/translation-nmtcpp')IndicTrans2
// IndicTrans - must specify modelType + generation parameters
const model = new TranslationNmtcpp({
...args, // files + params from Step 2
config: {
modelType: TranslationNmtcpp.ModelTypes.IndicTrans,
...generationParams, // Spread generation params from Step 3
maxlength: 256 // Override for longer outputs
}
})Bergamot
// Bergamot - vocab files are passed via `files` (see Step 2); limited generation params support
const model = new TranslationNmtcpp({
...args, // files (model + srcVocab + dstVocab) + params from Step 2
config: {
modelType: TranslationNmtcpp.ModelTypes.Bergamot,
beamsize: 4 // Only beamsize supported for Bergamot
}
})Available Model Types:
TranslationNmtcpp.ModelTypes = {
IndicTrans: 'IndicTrans', // Indic language models
Bergamot: 'Bergamot' // Firefox Translations models
}5. Load Model
try {
// Basic usage
await model.load()
} catch (error) {
console.error('Failed to load model:', error)
}6. Run the Model
We can perform inference on the input text using the run() method. This method returns a QVACResponse object.
try {
// Execute translation on input text
const response = await model.run('Hello world! Welcome to the internet of peers!')
// Process streamed output using callback
await response
.onUpdate(outputChunk => {
// Handle each new piece of translated text
console.log(outputChunk)
})
.await() // Wait for translation to complete
// Access performance statistics (if enabled with opts.stats)
if (response.stats) {
console.log('Translation completed in:', response.stats.totalTime, 'ms')
}
} catch (error) {
console.error('Translation failed:', error)
}7. Batch Translation (Bergamot Only)
For translating multiple texts efficiently, use the runBatch() method instead of calling run() multiple times.
runBatch() is only available with the Bergamot backend. IndicTrans2 models should use sequential run() calls.
// Array of texts to translate (English)
const textsToTranslate = [
'Hello world!',
'How are you today?',
'Machine translation has revolutionized communication.'
]
try {
// Batch translation - returns array of translated strings
const translations = await model.runBatch(textsToTranslate)
// Output each translation
translations.forEach((translatedText, index) => {
console.log(`Original: ${textsToTranslate[index]}`)
console.log(`Translated: ${translatedText}\n`)
})
} catch (error) {
console.error('Batch translation failed:', error)
}runBatch() vs run():
| Method | Input | Output | Backend Support |
|---|---|---|---|
run(text) | Single string | QVACResponse with streaming | All (IndicTrans, Bergamot) |
runBatch(texts) | Array of strings | Array of strings | Bergamot only |
runBatch() is significantly faster when translating multiple texts as it processes them in a single batch operation.
8. Unload the Model
// Always unload the model when finished to free memory
try {
await model.unload()
} catch (error) {
console.error('Failed to unload model:', error)
}Supported Languages
IndicTrans2 Models (QVAC registry)
IndicTrans2 supports translation between English and 22 Indic languages. The following directions are available via the QVAC model registry:
| Direction | Available | Sizes |
|---|---|---|
| English → Indic | Yes | 200M, 1B |
| Indic → English | Yes | 200M, 1B |
| Indic → Indic | Yes | 320M, 1B |
Supported Indic Languages:
| Assamese (asm_Beng) | Kashmiri (Arabic) (kas_Arab) | Punjabi (pan_Guru) |
| Bengali (ben_Beng) | Kashmiri (Devanagari) (kas_Deva) | Sanskrit (san_Deva) |
| Bodo (brx_Deva) | Maithili (mai_Deva) | Santali (sat_Olck) |
| Dogri (doi_Deva) | Malayalam (mal_Mlym) | Sindhi (Arabic) (snd_Arab) |
| English (eng_Latn) | Marathi (mar_Deva) | Sindhi (Devanagari) (snd_Deva) |
| Konkani (gom_Deva) | Manipuri (Bengali) (mni_Beng) | Tamil (tam_Taml) |
| Gujarati (guj_Gujr) | Manipuri (Meitei) (mni_Mtei) | Telugu (tel_Telu) |
| Hindi (hin_Deva) | Nepali (npi_Deva) | Urdu (urd_Arab) |
| Kannada (kan_Knda) | Odia (ory_Orya) |
Bergamot Models (Firefox Translations)
Language pairs available via the Firefox Remote Settings CDN:
| Language | Code | en→X | X→en |
|---|---|---|---|
| Arabic | ar | Yes | Yes |
| Czech | cs | Yes | Yes |
| Spanish | es | Yes | Yes |
| French | fr | Yes | Yes |
| Italian | it | Yes | Yes |
| Japanese | ja | Yes | Yes |
| Portuguese | pt | Yes | Yes |
| Russian | ru | Yes | Yes |
| Chinese | zh | Yes | Yes |
The Bergamot backend supports all language pairs available in Firefox Translations. See the Firefox Translations models repository for the complete and up-to-date list of supported language pairs.
ModelClasses and Packages
ModelClass
The main class exported by this library is TranslationNmtcpp, which supports multiple translation backends:
const TranslationNmtcpp = require('@qvac/translation-nmtcpp')
// Available model types
TranslationNmtcpp.ModelTypes = {
IndicTrans: 'IndicTrans', // For Indic language translations
Bergamot: 'Bergamot' // For Bergamot/Firefox translations
}Available Packages
Main Package
| Package | Description | Backends | Languages |
|---|---|---|---|
@qvac/translation-nmtcpp | Main translation package | Bergamot, IndicTrans | See Supported Languages |
The main package supports both backends and all their respective languages. See Supported Languages for the complete list.
Logging
The library supports configurable logging for both JavaScript and C++ (native) components. By default, C++ logs are suppressed for cleaner output.
Enabling C++ Logs
To enable verbose C++ logging, pass a logger object in the options:
// Enable C++ logging
const logger = {
info: (msg) => console.log('[C++ INFO]', msg),
warn: (msg) => console.warn('[C++ WARN]', msg),
error: (msg) => console.error('[C++ ERROR]', msg),
debug: (msg) => console.log('[C++ DEBUG]', msg)
}
const args = {
files: { model: modelPath, srcVocab, dstVocab }, // resolved paths (see Step 1)
params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
config: { modelType: TranslationNmtcpp.ModelTypes.Bergamot },
logger // Pass logger to enable C++ logs
}Disabling C++ Logs
To suppress all C++ logs, either omit the logger parameter or set it to null:
const args = {
files: { model: modelPath, srcVocab, dstVocab }, // resolved paths (see Step 1)
params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
config: { modelType: TranslationNmtcpp.ModelTypes.Bergamot }
// No logger = suppress C++ logs
}Using Environment Variables (Recommended for Examples)
All examples support the VERBOSE environment variable:
# Run with C++ logging disabled (default)
bare examples/quickstart.js
# Run with C++ logging enabled
VERBOSE=1 bare examples/quickstart.jsLog Levels
The C++ backend supports these log levels (mapped from native priority):
| Priority | Level | Description |
|---|---|---|
| 0 | error | Critical errors |
| 1 | warn | Warnings |
| 2 | info | Informational messages |
| 3 | debug | Debug/trace messages |