KvUnified

KvUnified is an internal llama.cpp flag controlling whether the engine uses a single unified buffer for the KV cache across input sequences during attention. Leave at default unless specifically instructed by SDK guidance.

Quick reference


Type	`bool?`
Default	`null` (use native default)
Category	KV cache (internal)
Field on	`ContextParameters.KvUnified`

What it does

The unified buffer layout can optimize some multi-sequence scenarios by colocating K and V for all sequences in one memory block. Whether this helps or hurts depends on backend and workload.

null — native default. Usually correct.
true — force unified buffer.
false — force separate buffers.

When to change it

Scenario	Value
Default	`null`
Specific backend-tuning guidance from SDK docs	As instructed

Most workloads never touch this.

Example

var preset = new Qwen25Preset();
// preset.ContextParameters.KvUnified = null; // default

Interactions

NSeqMax — multi-sequence scenarios may interact with unified-buffer layout.

What’s next

Context parameters hub — all context knobs.

SwaFull OpOffload