KvUnified
Contents
[
Hide
]
KvUnified is an internal llama.cpp flag controlling whether the engine uses a single unified buffer for the KV cache across input sequences during attention. Leave at default unless specifically instructed by SDK guidance.
Quick reference
| Type | bool? |
| Default | null (use native default) |
| Category | KV cache (internal) |
| Field on | ContextParameters.KvUnified |
What it does
The unified buffer layout can optimize some multi-sequence scenarios by colocating K and V for all sequences in one memory block. Whether this helps or hurts depends on backend and workload.
null— native default. Usually correct.true— force unified buffer.false— force separate buffers.
When to change it
| Scenario | Value |
|---|---|
| Default | null |
| Specific backend-tuning guidance from SDK docs | As instructed |
Most workloads never touch this.
Example
var preset = new Qwen25Preset();
// preset.ContextParameters.KvUnified = null; // default
Interactions
NSeqMax— multi-sequence scenarios may interact with unified-buffer layout.
What’s next
- Context parameters hub — all context knobs.