OpOffload
Contents
[
Hide
]
OpOffload toggles offloading of host-side tensor operations to the GPU device. This is supplementary to GpuLayers and affects specific small operations that would otherwise run on CPU even with GPU offload active.
Quick reference
| Type | bool? |
| Default | null (use native default) |
| Category | GPU offload (auxiliary) |
| Field on | ContextParameters.OpOffload |
What it does
Some tensor operations (embedding lookups, small reductions) are relatively cheap and traditionally run on the host. OpOffload lets the engine offload them to the device too, in exchange for minimal host-device overhead.
null— native default. Modern GPU backends usually benefit fromtrue.true— offload.false— keep on host.
When to change it
| Scenario | Value |
|---|---|
| Default | null |
| Benchmarking GPU-centric paths | true |
| Debugging device-specific issues | false |
Example
var preset = new Qwen25Preset();
preset.BaseModelInferenceParameters.GpuLayers = 999;
preset.ContextParameters.OpOffload = true; // ensure all operations on device
Interactions
GpuLayers— primary offload control.OffloadKqv— KV specific.
What’s next
- GpuLayers — primary layer offload.
- Context parameters hub — all context knobs.