# Multimodal (/ai-capabilities/multimodal) ## Overview [Text generation](/ai-capabilities/text-generation) supports multimodal prompts. Multimodal lets you attach media files to inputs for [`completion()`](/reference/api#completion). You can include multiple attachments in the same request (e.g., to compare two images). Compared to text-only completion inference, the key differences are: * You must load a multimodal-capable LLM **and** its matching `projectionModelSrc` via [`loadModel()`](/reference/api#loadmodel). * Your `history` messages can include `attachments: [{ path: "/path/to/image.jpg" }]` (the file must exist on disk). * Aside from attachments, you still call `completion({ modelId, history, stream })` the same way and consume the same streaming output. ## Functions Use the following sequence of function calls: 1. [`loadModel()`](/reference/api#loadmodel) 2. [`completion()`](/reference/api#completion) 3. [`unloadModel()`](/reference/api#unloadmodel) For how to use each function, see [SDK — API reference](/reference/api/). ## Models You should load two models: * a `llama.cpp`-compatible multimodal-capable LLM. Model file format: `*.gguf`; and * a matching projection model (`mmproj-*.gguf`). Model file format: `*.gguf`. Recommended pairs: * SmolVLM2 + mmproj-\* * Qwen2.5-Omni + mmproj-\* (or Qwen3-VL + mmproj-\*) For models available as constants, see [SDK — Models](/introduction#models). ## Example The following script shows an example of multimodal completion with one image (and optionally two): ```js file=/packages/sdk/dist/examples/llamacpp-multimodal.js title="multimodal.js" lineNumbers import { completion, loadModel, SMOLVLM2_500M_MULTIMODAL_Q8_0, MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0, unloadModel, } from "@qvac/sdk"; if (process.argv.length < 3) { console.error(`▸ Specify an image file path as the first argument and a second image file path as the second (optional) argument`); process.exit(1); } try { // const modelPath = args[modelIndex + 1]!; const imageFilePath = process.argv[2]; // Load the main model with projection in a single step const modelId = await loadModel({ modelSrc: SMOLVLM2_500M_MULTIMODAL_Q8_0, modelConfig: { ctx_size: 1024, projectionModelSrc: MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0, }, onProgress: (p) => { const mb = (n) => (n / 1e6).toFixed(1); const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`; process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`); if (p.percentage >= 100) process.stderr.write("\n"); }, }); //Using one particular media const history = [ { role: "user", content: "What's in this image?", attachments: [{ path: imageFilePath }], }, ]; const result = completion({ modelId, history, stream: true }); for await (const token of result.tokenStream) { process.stdout.write(token); } const stats = await result.stats; console.log("\n▸ Performance Stats:", stats); console.log("▸ --------------------------------"); //Using multiple media if (process.argv.length < 4) { console.log(`▸ Only one image provided, terminating`); process.exit(0); } const imageFilePath2 = process.argv[3]; const history2 = [ { role: "user", content: "Compare the two newspaper articles", attachments: [{ path: imageFilePath }, { path: imageFilePath2 }], }, ]; const result2 = completion({ modelId, history: history2, stream: true }); for await (const token of result2.tokenStream) { process.stdout.write(token); } const stats2 = await result2.stats; console.log("\n▸ Performance Stats:", stats2); console.log("▸ --------------------------------"); await unloadModel({ modelId, clearStorage: false }); } catch (error) { console.error("✖", error); process.exit(1); } ``` ```ts file=/packages/sdk/examples/llamacpp-multimodal.ts title="multimodal.ts" lineNumbers import { completion, loadModel, SMOLVLM2_500M_MULTIMODAL_Q8_0, MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0, unloadModel, } from "@qvac/sdk"; if (process.argv.length < 3) { console.error( `▸ Specify an image file path as the first argument and a second image file path as the second (optional) argument`, ); process.exit(1); } try { // const modelPath = args[modelIndex + 1]!; const imageFilePath = process.argv[2]!; // Load the main model with projection in a single step const modelId = await loadModel({ modelSrc: SMOLVLM2_500M_MULTIMODAL_Q8_0, modelConfig: { ctx_size: 1024, projectionModelSrc: MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0, }, onProgress: (p) => { const mb = (n: number) => (n / 1e6).toFixed(1); const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`; process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`); if (p.percentage >= 100) process.stderr.write("\n"); }, }); //Using one particular media const history = [ { role: "user", content: "What's in this image?", attachments: [{ path: imageFilePath }], }, ]; const result = completion({ modelId, history, stream: true }); for await (const token of result.tokenStream) { process.stdout.write(token); } const stats = await result.stats; console.log("\n▸ Performance Stats:", stats); console.log("▸ --------------------------------"); //Using multiple media if (process.argv.length < 4) { console.log(`▸ Only one image provided, terminating`); process.exit(0); } const imageFilePath2 = process.argv[3]!; const history2 = [ { role: "user", content: "Compare the two newspaper articles", attachments: [{ path: imageFilePath }, { path: imageFilePath2 }], }, ]; const result2 = completion({ modelId, history: history2, stream: true }); for await (const token of result2.tokenStream) { process.stdout.write(token); } const stats2 = await result2.stats; console.log("\n▸ Performance Stats:", stats2); console.log("▸ --------------------------------"); await unloadModel({ modelId, clearStorage: false }); } catch (error) { console.error("✖", error); process.exit(1); } ``` **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/quickstart).