OpOffload

OpOffload toggles offloading of host-side tensor operations to the GPU device. This is supplementary to GpuLayers and affects specific small operations that would otherwise run on CPU even with GPU offload active.

Quick reference


Type	`bool?`
Default	`null` (use native default)
Category	GPU offload (auxiliary)
Field on	`ContextParameters.OpOffload`

What it does

Some tensor operations (embedding lookups, small reductions) are relatively cheap and traditionally run on the host. OpOffload lets the engine offload them to the device too, in exchange for minimal host-device overhead.

null — native default. Modern GPU backends usually benefit from true.
true — offload.
false — keep on host.

When to change it

Scenario	Value
Default	`null`
Benchmarking GPU-centric paths	`true`
Debugging device-specific issues	`false`

Example

var preset = new Qwen25Preset();
preset.BaseModelInferenceParameters.GpuLayers = 999;
preset.ContextParameters.OpOffload = true;  // ensure all operations on device

Interactions

GpuLayers — primary offload control.
OffloadKqv — KV specific.

What’s next

GpuLayers — primary layer offload.
Context parameters hub — all context knobs.

KvUnified NoPerf