# @qvac/ocr-onnx (/addons/ocr-onnx)



## Overview

[Bare module](https://bare.pears.com) that adds support for OCR in QVAC using [ONNX runtime](https://onnxruntime.ai) as the inference engine. It runs a two-stage pipeline and requires compatible models for both stages:

* **Text detection**: locate text regions in an image
* **Text recognition**: decode characters in detected regions

## Models

You can load any ONNX Runtime-compatible OCR pipeline. Required files: `detector_craft.onnx` + `recognizer_<lang>.onnx` (file format: `*.onnx`).

## Requirement

Bare $\geq$ v1.24

## Installation

```bash
npm i @qvac/ocr-onnx
```

## Quickstart

<Steps>
  <Step>
    If you don't have Bare runtime, install it:

    ```bash
    npm i -g bare
    ```
  </Step>

  <Step>
    Create a new project:

    ```bash
    mkdir qvac-ocr-quickstart
    cd qvac-ocr-quickstart
    npm init -y
    ```
  </Step>

  <Step>
    Install dependencies:

    ```bash
    npm i @qvac/ocr-onnx bare-path
    ```
  </Step>

  <Step>
    Place the OCR model files (`detector_craft.onnx` and `recognizer_latin.onnx`) into `models/ocr/`. These are available from our model registry.
  </Step>

  <Step>
    Create `index.js`:
  </Step>

  <WrapCode>
    ```js title="index.js" lineNumbers'use strict'

    const path = require('bare-path')
    const { ONNXOcr } = require('@qvac/ocr-onnx')

    const imagePath = path.resolve('./my-image.jpg')

    async function main () {
      const detectorPath = './models/ocr/detector_craft.onnx'
      const recognizerPath = './models/ocr/recognizer_latin.onnx'

      const model = new ONNXOcr({
        params: {
          langList: ['en'],
          pathDetector: detectorPath,
          pathRecognizer: recognizerPath,
          useGPU: false
        },
        opts: { stats: true }
      })

      try {
        console.log('Loading OCR model...')
        await model.load()
        console.log('Model loaded.')

        console.log(`Running OCR on: ${imagePath}`)
        const response = await model.run({
          path: imagePath
        })

        console.log('Waiting for OCR results...')
        await response
          .onUpdate(data => {
            console.log('--- OCR Update ---')
            console.log('Output: ' + JSON.stringify(data.map(o => o[1])))
            console.log('--- data ---')
            console.log(JSON.stringify(data, null, 2))
            console.log('------------------')
          })
          .await()

        console.log('OCR finished!')
        if (response.stats) {
          console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
        }
      } catch (err) {
        console.error('Error during OCR processing:', err)
      } finally {
        console.log('Unloading model...')
        await model.unload()
        console.log('Model unloaded.')
      }
    }

    main().catch(console.error)
    ```
  </WrapCode>

  <Step>
    Run `index.js`:

    ```bash
    bare index.js
    ```
  </Step>
</Steps>

## Usage

The library provides a straightforward workflow for image-based text recognition:

### 1. Configure Parameters

Define the arguments for the OCR instance, including paths to the ONNX models and the list of languages to recognize.

```javascript
const args = {
  params: {
    // Required parameters
    langList: ['en'],                              // Language codes (ISO 639-1)
    pathDetector: './models/ocr/detector_craft.onnx',
    pathRecognizer: './models/ocr/recognizer_latin.onnx',
    // Or use prefix: pathRecognizerPrefix: './models/ocr/recognizer_',

    // Optional parameters
    useGPU: true,                    // Enable GPU acceleration (default: true)
    timeout: 120,                    // Inference timeout in seconds (default: 120)

    // Performance tuning (optional)
    magRatio: 1.5,                   // Detection magnification ratio (default: 1.5)
    defaultRotationAngles: [90, 270], // Rotation angles to try (default: [90, 270])
    contrastRetry: false,            // Retry low-confidence with contrast adjustment (default: false)
    lowConfidenceThreshold: 0.4,     // Threshold for contrast retry (default: 0.4)
    recognizerBatchSize: 32          // Batch size for recognizer inference (default: 32)
  },
  opts: {
    stats: true                      // Enable performance statistics logging
  }
}
```

#### Required Parameters

| Parameter              | Type       | Description                                                                                                                                                                |
| ---------------------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `langList`             | `string[]` | List of language codes (ISO 639-1). The first supported language determines the recognizer model. See [Supported Languages](#supported-languages).                         |
| `pathDetector`         | `string`   | Path to the detector ONNX model file.                                                                                                                                      |
| `pathRecognizer`       | `string`   | Path to the recognizer ONNX model file. **Required if `pathRecognizerPrefix` is not provided.**                                                                            |
| `pathRecognizerPrefix` | `string`   | Prefix path for recognizer model. The library appends the language suffix automatically (e.g., `recognizer_latin.onnx`). **Required if `pathRecognizer` is not provided.** |

#### Optional Parameters

| Parameter                | Type       | Default     | Description                                                                                                          |
| ------------------------ | ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------- |
| `useGPU`                 | `boolean`  | `true`      | Enable GPU/NPU/TPU acceleration. Falls back to CPU if unavailable.                                                   |
| `timeout`                | `number`   | `120`       | Maximum inference time in seconds. Increase for complex images or slower devices.                                    |
| `magRatio`               | `number`   | `1.5`       | Detection magnification ratio (1.0-2.0). Higher values improve detection of small text but increase processing time. |
| `defaultRotationAngles`  | `number[]` | `[90, 270]` | Rotation angles to try for text detection. Use `[]` to disable rotation variants.                                    |
| `contrastRetry`          | `boolean`  | `false`     | Re-process low-confidence regions with adjusted contrast. Improves accuracy but increases memory usage.              |
| `lowConfidenceThreshold` | `number`   | `0.4`       | Confidence threshold (0-1) below which contrast retry is triggered (when `contrastRetry` is enabled).                |
| `recognizerBatchSize`    | `number`   | `32`        | Number of text regions processed per batch. Lower values reduce memory usage on mobile devices.                      |

### 2. Create Model Instance

Import the library and create a new instance with the configured arguments.

```javascript
const { ONNXOcr } = require('@qvac/ocr-onnx')

const model = new ONNXOcr(args)
```

### 3. Load Model

Asynchronously load the ONNX models specified in the parameters.

```javascript
try {
  await model.load()
  console.log('OCR model loaded successfully.')
} catch (error) {
  console.error('Failed to load OCR model:', error)
}
```

### 4. Run OCR

Pass the path to the input image file to the `run` method. Supported formats: **BMP**, **JPEG**, and **PNG**.

```javascript
const imagePath = 'path/to/your/image.jpg'

try {
  const response = await model.run({
     path: imagePath,
     options: {
       paragraph: true,           // Group results into paragraphs (default: false)
       rotationAngles: [90, 270], // Override default rotation angles for this run
       boxMarginMultiplier: 1.0   // Adjust bounding box margins
     }
  })
  // ... process the response (see step 5)
} catch (error) {
  console.error('OCR failed:', error)
}
```

#### Runtime Options

| Option                | Type       | Default                      | Description                                                     |
| --------------------- | ---------- | ---------------------------- | --------------------------------------------------------------- |
| `paragraph`           | `boolean`  | `false`                      | Group detected text regions into paragraphs based on proximity. |
| `rotationAngles`      | `number[]` | Uses `defaultRotationAngles` | Override default rotation angles for this specific run.         |
| `boxMarginMultiplier` | `number`   | `1.0`                        | Multiplier for bounding box margins around detected text.       |

### 5. Process Output

The `run` method returns a `QvacResponse` object. Use its methods to handle the OCR results as they become available.

```javascript
// Option 1: Using onUpdate callback
await response
  .onUpdate(data => {
    // data contains OCR results for a chunk or the final result
    console.log('OCR Update:', JSON.stringify(data))
  })
  .await() // Wait for the entire process to complete

// Option 2: Using async iterator (if supported by QvacResponse in the future)
// for await (const data of response.iterate()) {
//   console.log('OCR Chunk:', JSON.stringify(data))
// }

// Access performance stats if enabled
if (response.stats) {
  console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
}
```

See [Output Format](#output-format) for the structure of the results.

### 6. Release Resources

Unload the model and free up resources when done.

```javascript
try {
  await model.unload()
  console.log('OCR model unloaded.')
} catch (error) {
  console.error('Failed to unload model:', error)
}
```

## Output Format

The output is typically received via the `onUpdate` callback of the `QvacResponse` object. It's a JSON array where each element represents a detected text block.

Each text block contains:

1. **Bounding Box:** An array of four `[x, y]` coordinate pairs defining the corners of the box around the detected text. Coordinates are clockwise, starting from the top-left relative to the text orientation.
2. **Detected Text:** The recognized text string.
3. **Confidence Score:** A numerical value indicating the model's confidence in the recognition (range may vary, often 0-1).

```json
[ // Array of detected text blocks
  [ // First text block
    [ // Bounding Box
      [x1, y1], // Top-left corner
      [x2, y2], // Top-right corner
      [x3, y3], // Bottom-right corner
      [x4, y4]  // Bottom-left corner
    ],
    "Detected Text String", // Recognized text
    0.95 // Confidence score
  ],
  [ // Second text block
    [ /* Bounding Box */ ],
    "Another piece of text",
    0.88
  ]
  // ... more text blocks
]
```

**Example:**

```json
[[
  [
    [10, 10],
    [150, 12],
    [149, 30],
    [9, 28]
  ],
  "Example Text",
  0.85
]]
```

The box coordinates are always provided in clockwise direction and starting from the top-left point with relation to the extracted text. Therefore, it is possible to know how extracted text is rotated based on this.

*(Note: The exact structure and timing of updates might depend on internal buffering and the `paragraph` option.)*

## Supported Languages

Language support is determined by the recognizer model used. Each recognizer model supports a specific set of languages. The library automatically selects the appropriate model based on the `langList` parameter.

| Recognizer Model             | Languages                                                                                                                                                                 |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `recognizer_latin.onnx`      | af, az, bs, cs, cy, da, de, en, es, et, fr, ga, hr, hu, id, is, it, ku, la, lt, lv, mi, ms, mt, nl, no, oc, pi, pl, pt, ro, rs\_latin, sk, sl, sq, sv, sw, tl, tr, uz, vi |
| `recognizer_arabic.onnx`     | ar, fa, ug, ur                                                                                                                                                            |
| `recognizer_cyrillic.onnx`   | ru, rs\_cyrillic, be, bg, uk, mn, abq, ady, kbd, ava, dar, inh, che, lbe, lez, tab, tjk                                                                                   |
| `recognizer_devanagari.onnx` | hi, mr, ne, bh, mai, ang, bho, mah, sck, new, gom, sa, bgc                                                                                                                |
| `recognizer_bengali.onnx`    | bn, as, mni                                                                                                                                                               |
| `recognizer_thai.onnx`       | th                                                                                                                                                                        |
| `recognizer_zh_sim.onnx`     | ch\_sim                                                                                                                                                                   |
| `recognizer_zh_tra.onnx`     | ch\_tra                                                                                                                                                                   |
| `recognizer_japanese.onnx`   | ja                                                                                                                                                                        |
| `recognizer_korean.onnx`     | ko                                                                                                                                                                        |
| `recognizer_tamil.onnx`      | ta                                                                                                                                                                        |
| `recognizer_telugu.onnx`     | te                                                                                                                                                                        |
| `recognizer_kannada.onnx`    | kn                                                                                                                                                                        |

See `supportedLanguages.js` for the complete language definitions.

## More resources

[Package at npm](https://www.npmjs.com/package/@qvac/ocr-onnx)
