RAG
Out-of-the-box retrieval-augmented generation workflow.
Overview
RAG (retrieval-augmented generation) uses text embeddings: you embed (vectorize) documents, persist them in a vector store, and later retrieve the most relevant chunks for a query using similarity search.
Compared to generating text embeddings only, the key differences are:
- You must persist embeddings (vectors) alongside the original text (and optional metadata) in a vector store.
- At query time, you embed the query and run top‑K vector search to fetch the most relevant documents/chunks.
- You typically pass the retrieved text to
completion()as context to ground the model's answer (this retrieval step is what makes it RAG).
Functions
loadModel()(withmodelType: "embeddings")embed()— generate vectors- Use any combination of the RAG functions below as needed:
ragChunk()— chunk documentsragIngest()— embed and store (requires model)ragSaveEmbeddings()— save pre-computed vectorsragSearch()— query similar documents (requires model)ragReindex()— optimize search indexragDeleteEmbeddings()— remove documentsragListWorkspaces()— list workspacesragCloseWorkspace()— release resourcesragDeleteWorkspace()— delete workspace and data
unloadModel()
For how to use each function, see SDK — API reference.
Pipeline
Create your RAG pipeline using text embeddings, the functions above, and completion.
Regarding vector storage, you may use:
- Built-in vector store: use the RAG workspace functions (
ragIngest(),ragSearch(), etc.). QVAC persists the vectors for you, so this is the simplest path. - External vector DB: use the embedding primitives (
ragChunk(),embed()) and store the vectors wherever you want (e.g., LanceDB, ChromaDB, SQLite-Vector). Choose this when you need custom persistence, filtering, or integration with an existing database.
Important: when using an external vector DB, make sure its schema matches the embedding dimensionality produced by your model (e.g., GTE Large embeddings used in the example are 1024‑dimensional).
Examples
Built-in vector store
The following script shows how to ingest documents into a built-in RAG workspace with ragIngest() and query them with ragSearch(), without setting up an external vector DB:
import { loadModel, unloadModel, GTE_LARGE_FP16, ragIngest, ragSearch, ragCloseWorkspace, } from "@qvac/sdk";
try {
// Get query from command line or use default
const query = process.argv[2] || "machine learning algorithms";
const workspace = "ingest-example";
console.log(`▸ Query: "${query}"`);
const modelId = await loadModel({
modelSrc: GTE_LARGE_FP16,
onProgress: (p) => {
const mb = (n) => (n / 1e6).toFixed(1);
const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
if (p.percentage >= 100)
process.stderr.write("\n");
},
});
const samples = [
"Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn and make predictions from data without being explicitly programmed for every task.",
"Deep learning uses neural networks with multiple layers to process and learn from complex data patterns, enabling breakthroughs in image recognition and natural language processing.",
"Natural language processing combines computational linguistics with machine learning to help computers understand, interpret, and generate human language in a meaningful way.",
"Computer vision enables machines to interpret and understand visual information from the world, using techniques like image classification, object detection, and facial recognition.",
"Quantum computing leverages quantum mechanical phenomena to process information in fundamentally different ways than classical computers, potentially solving certain problems exponentially faster.",
"Blockchain technology creates decentralized, immutable ledgers that enable secure peer-to-peer transactions without requiring a central authority or intermediary.",
"Cloud computing delivers computing services over the internet, allowing users to access resources like storage, processing power, and applications on-demand from anywhere.",
"Cybersecurity protects digital systems, networks, and data from malicious attacks, unauthorized access, and various forms of cyber threats through multiple layers of defense.",
];
console.log("▸ Ingesting documents...");
const result = await ragIngest({
modelId,
workspace,
documents: samples,
chunk: false,
});
console.log(`▸ Ingested ${result.processed.length} documents`);
console.log("▸ Searching for similar documents...");
const results = await ragSearch({
modelId,
workspace,
query,
topK: 3,
});
console.log("▸ Top 3 most similar documents:");
results.forEach((result, index) => {
console.log(`${index + 1}. (Score: ${result.score})`);
console.log(` ${result.content}`);
console.log();
});
// Cleanup: close and delete workspace
await ragCloseWorkspace({ workspace, deleteOnClose: true });
console.log(`▸ Deleted '${workspace}' workspace`);
await unloadModel({ modelId });
}
catch (error) {
console.error("✖", error);
process.exit(1);
}External vector DB
The following script shows the same workflow using embed() plus a SQLite vector index:
import { embed, loadModel, unloadModel, GTE_LARGE_FP16 } from "@qvac/sdk";
import sqlite3InitModule from "@sqliteai/sqlite-wasm";
try {
// Get query from command line or use default
const query = process.argv[2] || "machine learning algorithms";
console.log(`▸ Query: "${query}"`);
// Initialize SQLite with vector extension
const sqlite3 = await sqlite3InitModule();
const db = new sqlite3.oo1.DB(":memory:", "c");
const modelId = await loadModel({
modelSrc: GTE_LARGE_FP16,
onProgress: (p) => {
const mb = (n) => (n / 1e6).toFixed(1);
const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
if (p.percentage >= 100)
process.stderr.write("\n");
},
});
const samples = [
{
id: 1,
text: "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn and make predictions from data without being explicitly programmed for every task.",
},
{
id: 2,
text: "Deep learning uses neural networks with multiple layers to process and learn from complex data patterns, enabling breakthroughs in image recognition and natural language processing.",
},
{
id: 3,
text: "Natural language processing combines computational linguistics with machine learning to help computers understand, interpret, and generate human language in a meaningful way.",
},
{
id: 4,
text: "Computer vision enables machines to interpret and understand visual information from the world, using techniques like image classification, object detection, and facial recognition.",
},
{
id: 5,
text: "Quantum computing leverages quantum mechanical phenomena to process information in fundamentally different ways than classical computers, potentially solving certain problems exponentially faster.",
},
{
id: 6,
text: "Blockchain technology creates decentralized, immutable ledgers that enable secure peer-to-peer transactions without requiring a central authority or intermediary.",
},
{
id: 7,
text: "Cloud computing delivers computing services over the internet, allowing users to access resources like storage, processing power, and applications on-demand from anywhere.",
},
{
id: 8,
text: "Cybersecurity protects digital systems, networks, and data from malicious attacks, unauthorized access, and various forms of cyber threats through multiple layers of defense.",
},
];
// Create table for documents with vector storage
db.exec(`
CREATE TABLE IF NOT EXISTS documents (
id INTEGER PRIMARY KEY,
text TEXT NOT NULL,
embedding BLOB NOT NULL
)
`);
console.log("▸ Embedding documents...");
for (const sample of samples) {
const { embedding } = await embed({ modelId, text: sample.text });
db.exec({
sql: "INSERT INTO documents VALUES (?, ?, vector_as_f32(?))",
bind: [sample.id, sample.text, JSON.stringify(embedding)],
});
}
// Initialize and optimize vector index
db.exec(`SELECT vector_init('documents', 'embedding', 'type=FLOAT32,dimension=1024')`);
// Quantize vectors
db.exec(`SELECT vector_quantize('documents', 'embedding')`);
// [Optional] Preload quantized vectors in memory for optimal performance
db.exec(`SELECT vector_quantize_preload('documents', 'embedding')`);
// Search for similar documents
console.log("▸ Searching for similar documents...");
const { embedding: queryEmbedding } = await embed({ modelId, text: query });
const results = [];
// Perform vector search
db.exec({
sql: `
SELECT d.id, d.text, v.distance
FROM documents d
JOIN vector_quantize_scan('documents', 'embedding', vector_as_f32(?), 3) v
ON d.id = v.rowid
`,
bind: [JSON.stringify(queryEmbedding)],
rowMode: "object",
callback: (row) => {
const typedRow = row;
results.push(typedRow);
},
});
console.log("\n▸ Top 3 most similar documents:");
results.forEach((result, index) => {
console.log("=".repeat(50) + " Top result:");
console.log(`\n${index + 1}. [ID: ${result.id}] (Score: ${result.distance.toFixed(4)})`);
console.log(` ${result.text}`);
console.log("=".repeat(100));
console.log();
});
await unloadModel({ modelId });
db.close();
}
catch (error) {
console.error("✖", error);
process.exit(1);
}Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.