KvUnified

KvUnified is an internal llama.cpp flag controlling whether the engine uses a single unified buffer for the KV cache across input sequences during attention. Leave at default unless specifically instructed by SDK guidance.

Quick reference

Type bool?
Default null (use native default)
Category KV cache (internal)
Field on ContextParameters.KvUnified

What it does

The unified buffer layout can optimize some multi-sequence scenarios by colocating K and V for all sequences in one memory block. Whether this helps or hurts depends on backend and workload.

  • null — native default. Usually correct.
  • true — force unified buffer.
  • false — force separate buffers.

When to change it

Scenario Value
Default null
Specific backend-tuning guidance from SDK docs As instructed

Most workloads never touch this.

Example

var preset = new Qwen25Preset();
// preset.ContextParameters.KvUnified = null; // default

Interactions

  • NSeqMax — multi-sequence scenarios may interact with unified-buffer layout.

What’s next