# Multimodal (/ai-capabilities/multimodal)



## Overview

[Text generation](/ai-capabilities/text-generation) supports multimodal prompts. Multimodal lets you attach media files to inputs for [`completion()`](/reference/api#completion). You can include multiple attachments in the same request (e.g., to compare two images).

Compared to text-only completion inference, the key differences are:

* You must load a multimodal-capable LLM **and** its matching `projectionModelSrc` via [`loadModel()`](/reference/api#loadmodel).
* Your `history` messages can include `attachments: [{ path: "/path/to/image.jpg" }]` (the file must exist on disk).
* Aside from attachments, you still call `completion({ modelId, history, stream })` the same way and consume the same streaming output.

## Functions

Use the following sequence of function calls:

1. [`loadModel()`](/reference/api#loadmodel)
2. [`completion()`](/reference/api#completion)
3. [`unloadModel()`](/reference/api#unloadmodel)

For how to use each function, see [SDK — API reference](/reference/api/).

## Models

You should load two models:

* a `llama.cpp`-compatible multimodal-capable LLM. Model file format: `*.gguf`; and
* a matching projection model (`mmproj-*.gguf`). Model file format: `*.gguf`.

Recommended pairs:

* SmolVLM2 + mmproj-\*
* Qwen2.5-Omni + mmproj-\* (or Qwen3-VL + mmproj-\*)

For models available as constants, see [SDK — Models](/introduction#models).

## Example

The following script shows an example of multimodal completion with one image (and optionally two):

<Tabs>
  <Tab value="js" label="JavaScript" default>
    <WrapCode>
      ```js file=<rootDir>/packages/sdk/dist/examples/llamacpp-multimodal.js title="multimodal.js" lineNumbers
      import { completion, loadModel, SMOLVLM2_500M_MULTIMODAL_Q8_0, MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0, unloadModel, } from "@qvac/sdk";
      if (process.argv.length < 3) {
          console.error(`▸ Specify an image file path as the first argument and a second image file path as the second (optional) argument`);
          process.exit(1);
      }
      try {
          // const modelPath = args[modelIndex + 1]!;
          const imageFilePath = process.argv[2];
          // Load the main model with projection in a single step
          const modelId = await loadModel({
              modelSrc: SMOLVLM2_500M_MULTIMODAL_Q8_0,
              modelConfig: {
                  ctx_size: 1024,
                  projectionModelSrc: MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0,
              },
              onProgress: (p) => {
                  const mb = (n) => (n / 1e6).toFixed(1);
                  const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
                  process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
                  if (p.percentage >= 100)
                      process.stderr.write("\n");
              },
          });
          //Using one particular media
          const history = [
              {
                  role: "user",
                  content: "What's in this image?",
                  attachments: [{ path: imageFilePath }],
              },
          ];
          const result = completion({ modelId, history, stream: true });
          for await (const token of result.tokenStream) {
              process.stdout.write(token);
          }
          const stats = await result.stats;
          console.log("\n▸ Performance Stats:", stats);
          console.log("▸ --------------------------------");
          //Using multiple media
          if (process.argv.length < 4) {
              console.log(`▸ Only one image provided, terminating`);
              process.exit(0);
          }
          const imageFilePath2 = process.argv[3];
          const history2 = [
              {
                  role: "user",
                  content: "Compare the two newspaper articles",
                  attachments: [{ path: imageFilePath }, { path: imageFilePath2 }],
              },
          ];
          const result2 = completion({ modelId, history: history2, stream: true });
          for await (const token of result2.tokenStream) {
              process.stdout.write(token);
          }
          const stats2 = await result2.stats;
          console.log("\n▸ Performance Stats:", stats2);
          console.log("▸ --------------------------------");
          await unloadModel({ modelId, clearStorage: false });
      }
      catch (error) {
          console.error("✖", error);
          process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>

  <Tab value="ts" label="TypeScript">
    <WrapCode>
      ```ts file=<rootDir>/packages/sdk/examples/llamacpp-multimodal.ts title="multimodal.ts" lineNumbers
      import {
        completion,
        loadModel,
        SMOLVLM2_500M_MULTIMODAL_Q8_0,
        MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0,
        unloadModel,
      } from "@qvac/sdk";

      if (process.argv.length < 3) {
        console.error(
          `▸ Specify an image file path as the first argument and a second image file path as the second (optional) argument`,
        );
        process.exit(1);
      }

      try {
        // const modelPath = args[modelIndex + 1]!;
        const imageFilePath = process.argv[2]!;

        // Load the main model with projection in a single step
        const modelId = await loadModel({
          modelSrc: SMOLVLM2_500M_MULTIMODAL_Q8_0,
          modelConfig: {
            ctx_size: 1024,
            projectionModelSrc: MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0,
          },
          onProgress: (p) => {
            const mb = (n: number) => (n / 1e6).toFixed(1);
            const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
            process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
            if (p.percentage >= 100) process.stderr.write("\n");
          },
        });

        //Using one particular media
        const history = [
          {
            role: "user",
            content: "What's in this image?",
            attachments: [{ path: imageFilePath }],
          },
        ];
        const result = completion({ modelId, history, stream: true });

        for await (const token of result.tokenStream) {
          process.stdout.write(token);
        }

        const stats = await result.stats;

        console.log("\n▸ Performance Stats:", stats);

        console.log("▸ --------------------------------");

        //Using multiple media
        if (process.argv.length < 4) {
          console.log(`▸ Only one image provided, terminating`);
          process.exit(0);
        }

        const imageFilePath2 = process.argv[3]!;

        const history2 = [
          {
            role: "user",
            content: "Compare the two newspaper articles",
            attachments: [{ path: imageFilePath }, { path: imageFilePath2 }],
          },
        ];

        const result2 = completion({ modelId, history: history2, stream: true });

        for await (const token of result2.tokenStream) {
          process.stdout.write(token);
        }

        const stats2 = await result2.stats;

        console.log("\n▸ Performance Stats:", stats2);

        console.log("▸ --------------------------------");

        await unloadModel({ modelId, clearStorage: false });
      } catch (error) {
        console.error("✖", error);
        process.exit(1);
      }
      ```
    </WrapCode>
  </Tab>
</Tabs>

<Callout type="success">
  **Tip:** all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see [SDK quickstart](/quickstart).
</Callout>
