Overview

RAG (retrieval-augmented generation) uses text embeddings: you embed (vectorize) documents, persist them in a vector store, and later retrieve the most relevant chunks for a query using similarity search.

Compared to generating text embeddings only, the key differences are:

You must persist embeddings (vectors) alongside the original text (and optional metadata) in a vector store.
At query time, you embed the query and run top‑K vector search to fetch the most relevant documents/chunks.
You typically pass the retrieved text to completion() as context to ground the model's answer (this retrieval step is what makes it RAG).

Functions

loadModel() (with modelType: "embeddings")
embed() — generate vectors
Use any combination of the RAG functions below as needed:
- ragChunk() — chunk documents
- ragIngest() — embed and store (requires model)
- ragSaveEmbeddings() — save pre-computed vectors
- ragSearch() — query similar documents (requires model)
- ragReindex() — optimize search index
- ragDeleteEmbeddings() — remove documents
- ragListWorkspaces() — list workspaces
- ragCloseWorkspace() — release resources
- ragDeleteWorkspace() — delete workspace and data
unloadModel()

For how to use each function, see SDK — API reference.

Pipeline

Create your RAG pipeline using text embeddings, the functions above, and completion.

Regarding vector storage, you may use:

Built-in vector store: use the RAG workspace functions (ragIngest(), ragSearch(), etc.). QVAC persists the vectors for you, so this is the simplest path.
External vector DB: use the embedding primitives (ragChunk(), embed()) and store the vectors wherever you want (e.g., LanceDB, ChromaDB, SQLite-Vector). Choose this when you need custom persistence, filtering, or integration with an existing database.

Important: when using an external vector DB, make sure its schema matches the embedding dimensionality produced by your model (e.g., GTE Large embeddings used in the example are 1024‑dimensional).

Examples

Built-in vector store

The following script shows how to ingest documents into a built-in RAG workspace with ragIngest() and query them with ragSearch(), without setting up an external vector DB:

rag-managed.js

import { loadModel, unloadModel, GTE_LARGE_FP16, ragIngest, ragSearch, ragCloseWorkspace, } from "@qvac/sdk";
try {
    // Get query from command line or use default
    const query = process.argv[2] || "machine learning algorithms";
    const workspace = "ingest-example";
    console.log(`▸ Query: "${query}"`);
    const modelId = await loadModel({
        modelSrc: GTE_LARGE_FP16,
        onProgress: (p) => {
            const mb = (n) => (n / 1e6).toFixed(1);
            const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
            process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
            if (p.percentage >= 100)
                process.stderr.write("\n");
        },
    });
    const samples = [
        "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn and make predictions from data without being explicitly programmed for every task.",
        "Deep learning uses neural networks with multiple layers to process and learn from complex data patterns, enabling breakthroughs in image recognition and natural language processing.",
        "Natural language processing combines computational linguistics with machine learning to help computers understand, interpret, and generate human language in a meaningful way.",
        "Computer vision enables machines to interpret and understand visual information from the world, using techniques like image classification, object detection, and facial recognition.",
        "Quantum computing leverages quantum mechanical phenomena to process information in fundamentally different ways than classical computers, potentially solving certain problems exponentially faster.",
        "Blockchain technology creates decentralized, immutable ledgers that enable secure peer-to-peer transactions without requiring a central authority or intermediary.",
        "Cloud computing delivers computing services over the internet, allowing users to access resources like storage, processing power, and applications on-demand from anywhere.",
        "Cybersecurity protects digital systems, networks, and data from malicious attacks, unauthorized access, and various forms of cyber threats through multiple layers of defense.",
    ];
    console.log("▸ Ingesting documents...");
    const result = await ragIngest({
        modelId,
        workspace,
        documents: samples,
        chunk: false,
    });
    console.log(`▸ Ingested ${result.processed.length} documents`);
    console.log("▸ Searching for similar documents...");
    const results = await ragSearch({
        modelId,
        workspace,
        query,
        topK: 3,
    });
    console.log("▸ Top 3 most similar documents:");
    results.forEach((result, index) => {
        console.log(`${index + 1}. (Score: ${result.score})`);
        console.log(`   ${result.content}`);
        console.log();
    });
    // Cleanup: close and delete workspace
    await ragCloseWorkspace({ workspace, deleteOnClose: true });
    console.log(`▸ Deleted '${workspace}' workspace`);
    await unloadModel({ modelId });
}
catch (error) {
    console.error("✖", error);
    process.exit(1);
}

External vector DB

The following script shows the same workflow using embed() plus a SQLite vector index:

rag-sqlite.js

import { embed, loadModel, unloadModel, GTE_LARGE_FP16 } from "@qvac/sdk";
import sqlite3InitModule from "@sqliteai/sqlite-wasm";
try {
    // Get query from command line or use default
    const query = process.argv[2] || "machine learning algorithms";
    console.log(`▸ Query: "${query}"`);
    // Initialize SQLite with vector extension
    const sqlite3 = await sqlite3InitModule();
    const db = new sqlite3.oo1.DB(":memory:", "c");
    const modelId = await loadModel({
        modelSrc: GTE_LARGE_FP16,
        onProgress: (p) => {
            const mb = (n) => (n / 1e6).toFixed(1);
            const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
            process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
            if (p.percentage >= 100)
                process.stderr.write("\n");
        },
    });
    const samples = [
        {
            id: 1,
            text: "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn and make predictions from data without being explicitly programmed for every task.",
        },
        {
            id: 2,
            text: "Deep learning uses neural networks with multiple layers to process and learn from complex data patterns, enabling breakthroughs in image recognition and natural language processing.",
        },
        {
            id: 3,
            text: "Natural language processing combines computational linguistics with machine learning to help computers understand, interpret, and generate human language in a meaningful way.",
        },
        {
            id: 4,
            text: "Computer vision enables machines to interpret and understand visual information from the world, using techniques like image classification, object detection, and facial recognition.",
        },
        {
            id: 5,
            text: "Quantum computing leverages quantum mechanical phenomena to process information in fundamentally different ways than classical computers, potentially solving certain problems exponentially faster.",
        },
        {
            id: 6,
            text: "Blockchain technology creates decentralized, immutable ledgers that enable secure peer-to-peer transactions without requiring a central authority or intermediary.",
        },
        {
            id: 7,
            text: "Cloud computing delivers computing services over the internet, allowing users to access resources like storage, processing power, and applications on-demand from anywhere.",
        },
        {
            id: 8,
            text: "Cybersecurity protects digital systems, networks, and data from malicious attacks, unauthorized access, and various forms of cyber threats through multiple layers of defense.",
        },
    ];
    // Create table for documents with vector storage
    db.exec(`
  CREATE TABLE IF NOT EXISTS documents (
    id INTEGER PRIMARY KEY,
    text TEXT NOT NULL,
    embedding BLOB NOT NULL
  )
`);
    console.log("▸ Embedding documents...");
    for (const sample of samples) {
        const { embedding } = await embed({ modelId, text: sample.text });
        db.exec({
            sql: "INSERT INTO documents VALUES (?, ?, vector_as_f32(?))",
            bind: [sample.id, sample.text, JSON.stringify(embedding)],
        });
    }
    // Initialize and optimize vector index
    db.exec(`SELECT vector_init('documents', 'embedding', 'type=FLOAT32,dimension=1024')`);
    // Quantize vectors
    db.exec(`SELECT vector_quantize('documents', 'embedding')`);
    // [Optional] Preload quantized vectors in memory for optimal performance
    db.exec(`SELECT vector_quantize_preload('documents', 'embedding')`);
    // Search for similar documents
    console.log("▸ Searching for similar documents...");
    const { embedding: queryEmbedding } = await embed({ modelId, text: query });
    const results = [];
    // Perform vector search
    db.exec({
        sql: `
    SELECT d.id, d.text, v.distance 
    FROM documents d
    JOIN vector_quantize_scan('documents', 'embedding', vector_as_f32(?), 3) v
    ON d.id = v.rowid
  `,
        bind: [JSON.stringify(queryEmbedding)],
        rowMode: "object",
        callback: (row) => {
            const typedRow = row;
            results.push(typedRow);
        },
    });
    console.log("\n▸ Top 3 most similar documents:");
    results.forEach((result, index) => {
        console.log("=".repeat(50) + " Top result:");
        console.log(`\n${index + 1}. [ID: ${result.id}] (Score: ${result.distance.toFixed(4)})`);
        console.log(`   ${result.text}`);
        console.log("=".repeat(100));
        console.log();
    });
    await unloadModel({ modelId });
    db.close();
}
catch (error) {
    console.error("✖", error);
    process.exit(1);
}

Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.

RAG