Delegated inference
Perform peer-to-peer inference delegation out-of-the-box via Holepunch stack, enabling resource sharing.
Overview
Lets a consumer delegate inference requests to a remote provider in a P2P manner over the Hyperswarm DHT. Use it when an inference requires more resources than the local device is able to provide.
Connectivity is direct: the consumer opens a dht.connect(providerPublicKey) connection straight to the provider — there is no topic or discovery phase. The provider exposes itself on the DHT via its keypair, and consumers connect by public key.
Delegation is configured at model-load time by passing a delegate object to loadModel(). The provider is started separately using startQVACProvider().
Functions
Provider:
startQVACProvider()— bind the DHT server on the provider's keypair and start accepting delegated requestsstopQVACProvider()— stop accepting requests
Consumer:
loadModel()— withdelegateoptioncompletion()/transcribe()/translate()/ etc. — same as localunloadModel()
For how to use each function, see SDK — API reference.
Provider
Binds the DHT server on its keypair and serves delegated requests. It publishes its public key; consumers use that key to connect directly.
Consumer
Creates a delegated model via loadModel({ delegate: ... }).
delegate main options:
providerPublicKey: provider public key (required)timeout: request timeout in ms (optional). Use a generous value (e.g.60_000) on the first call — cold-DHT bootstrap can take 15–45s. Subsequent calls reuse the open socket and are sub-second.fallbackToLocal: iftrue, run locally when delegation fails (optional)forceNewConnection: iftrue, do not reuse cached connections (optional)
Examples
Consumer
The following script shows an example of a consumer that delegates completion() requests to a provider:
import { completion, LLAMA_3_2_1B_INST_Q4_0, loadModel, close, } from "@qvac/sdk";
const providerPublicKey = process.argv[2];
if (!providerPublicKey) {
console.error("✖ Provider public key is required. Usage: node consumer.ts <provider-public-key> [consumer-seed]");
process.exit(1);
}
try {
// Optional: Consumer seed for deterministic consumer identity (for firewall testing)
const consumerSeed = process.argv[3];
process.env["QVAC_HYPERSWARM_SEED"] = consumerSeed;
console.log(`▸ Testing delegated inference`);
console.log(`▸ Provider: ${providerPublicKey}`);
if (consumerSeed) {
console.log(`▸ Consumer seed: ${consumerSeed.substring(0, 16)}... (deterministic identity)`);
}
else {
console.log(`▸ No consumer seed provided (random identity)`);
}
const modelId = await loadModel({
modelSrc: LLAMA_3_2_1B_INST_Q4_0,
delegate: {
providerPublicKey,
// Generous timeout for the first call on a cold DHT: bootstrapping
// hyperdht and looking up the provider's key can take 15–45s on the
// very first run. Subsequent connections in the same process are
// sub-second because the DHT is already warm.
timeout: 60_000,
fallbackToLocal: true, // Optional: Fall back to local inference if delegation fails
// forceNewConnection: true, // Optional: Force a new connection instead of reusing cached one
},
onProgress: (p) => {
const mb = (n) => (n / 1e6).toFixed(1);
const line = `▸ Downloading ${p.percentage.toFixed(0)}% (${mb(p.downloaded)}/${mb(p.total)} MB)`;
process.stderr.write(process.stderr.isTTY ? `\r${line}` : `${line}\n`);
if (p.percentage >= 100)
process.stderr.write("\n");
},
});
console.log(`▸ Delegated model registered: ${modelId}`);
const response = completion({
modelId,
history: [{ role: "user", content: "Hello!" }],
stream: true,
});
for await (const token of response.tokenStream) {
process.stdout.write(token);
}
console.log("\n▸ Stats:", await response.stats);
console.log("▸ Delegation infrastructure working! Server correctly detected and routed the delegated request.");
void close();
}
catch (error) {
console.error("✖", error);
process.exit(1);
}Provider
The following script shows an example of starting a provider and printing its publicKey for consumers:
import { startQVACProvider } from "@qvac/sdk";
// Optional: Seed for deterministic provider identity (64-character hex string)
const seed = process.argv[2];
if (seed) {
process.env["QVAC_HYPERSWARM_SEED"] = seed;
}
// Optional: Consumer public key for firewall (allow only this consumer)
const allowedConsumerPublicKey = process.argv[3];
console.log(`▸ Starting provider service...`);
try {
if (allowedConsumerPublicKey) {
console.log(`▸ Firewall enabled: only allowing consumer ${allowedConsumerPublicKey}`);
}
const response = await startQVACProvider({
firewall: allowedConsumerPublicKey
? {
mode: "allow",
publicKeys: [allowedConsumerPublicKey],
}
: undefined,
});
console.log("▸ Provider service started successfully!");
console.log("▸ Provider is now available for delegated inference requests");
console.log("");
console.log("▸ Connection Details:");
console.log(`▸ Provider Public Key: ${response.publicKey}`);
console.log("");
console.log("▸ Consumer command:");
console.log(` node consumer.ts ${response.publicKey}`);
console.log("");
console.log("▸ To reproduce this provider identity:");
console.log(` node provider.ts ${seed || "<random-seed>"}`);
if (!seed) {
console.log(" (Note: seed was random this time, set one for reproducible identity)");
}
console.log("");
console.log("▸ For firewall testing:");
console.log(" 1. Generate a consumer seed (64-char hex)");
console.log(" 2. Get consumer public key: getConsumerPublicKey(consumerSeed)");
console.log(" 3. Restart provider with consumer public key as 2nd argument");
console.log(` 4. Run consumer with: node consumer.ts ${response.publicKey} <consumer-seed>`);
console.log("▸ Provider is running... Press Ctrl+C to stop");
process.on("SIGINT", () => {
console.log("\n▸ Provider service stopped");
process.exit(0);
});
process.stdin.resume();
}
catch (error) {
console.error("✖", error);
process.exit(1);
}Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.
Notes
- Consumers do not handle reconnection automatically yet. If the provider restarts, restart the consumer.
- To stop a running provider, call
stopQVACProvider(). - When starting the provider, you can optionally set a firewall rule to allow/deny specific consumer public keys.
- Cold-start DHT bootstrap on the first connect can take 15–45s; subsequent connections in the same process are sub-second.