SwaFull

SwaFull controls whether the engine stores the full, uncompressed SWA (sliding-window attention) cache for models that use sliding-window attention. Only relevant for models with SWA layers.

Quick reference


Type	`bool?`
Default	`null` (use native default)
Category	KV cache (SWA-specific)
Field on	`ContextParameters.SwaFull`

What it does

Sliding-window attention (used by some Mistral, Gemma, and other architectures) attends only to a bounded recent window. The engine can store this window either:

Compressed (SwaFull = false or null) — smaller memory footprint, typical default.
Full (SwaFull = true) — uncompressed, larger memory footprint, may be faster in specific workloads.

For models without SWA, this field has no effect.

When to change it

Scenario	Value
Default	`null`
Benchmarking SWA performance	`true` to test uncompressed path
Memory constrained on SWA model	`null` or `false`

Few models currently on the built-in preset list use SWA extensively. If you are unsure, leave null.

Example

var preset = new Qwen25Preset();  // not SWA — SwaFull has no effect
preset.ContextParameters.SwaFull = null;  // default

Interactions

Only relevant for SWA-architected models.
TypeK, TypeV — the dtype applies regardless.

What’s next

Context parameters hub — all context knobs.
Supported presets — check which presets use SWA.

DefragThreshold KvUnified