KvOverrides

KvOverrides lets you patch specific keys in the GGUF metadata at load time. Each override targets one metadata key and provides a typed replacement value. Use to fix missing or incorrect metadata on a custom GGUF without rebuilding the file.

Quick reference


Type	`ModelKeyValueOverride[]?`
Default	`null` (no overrides)
Category	Model metadata
Field on	`ModelInferenceParameters.KvOverrides`

What it does

The engine reads model configuration from GGUF metadata at load. KvOverrides intercepts specific keys and substitutes your values. Common targets: context length, RoPE frequency base, RoPE scaling type.

Each override has:

Field	Type
`Key`	`string` — metadata key (e.g., `llama.context_length`)
`Type`	`ModelKvOverrideType` — `Int`, `Float`, `Bool`, `String`
`IntValue`, `FloatValue`, `BoolValue`, `StringValue`	typed value slots

Only the slot matching Type is read.

When to change it

Scenario	Value
Default — trust GGUF metadata	`null`
GGUF missing expected metadata	Single override for each missing key
Force a specific YaRN/RoPE recipe	Overrides for `llama.rope.*` keys
Diagnostic — test different metadata	Temporary overrides

Wrong overrides silently break the model. Only patch metadata you have a clear reason to change.

Example

using Aspose.LLM.Abstractions.Parameters;

var preset = new Qwen25Preset();
preset.BaseModelInferenceParameters.KvOverrides = new[]
{
    new ModelKeyValueOverride
    {
        Key = "llama.rope.scaling.type",
        Type = ModelKvOverrideType.String,
        StringValue = "yarn",
    },
    new ModelKeyValueOverride
    {
        Key = "llama.context_length",
        Type = ModelKvOverrideType.Int,
        IntValue = 131072,
    },
};

using var api = AsposeLLMApi.Create(preset);

Common override keys

Key	Type	Notes
`llama.context_length`	`Int`	Declared training context length
`llama.embedding_length`	`Int`	Hidden size
`llama.rope.freq_base`	`Float`	RoPE theta
`llama.rope.scaling.type`	`String`	`"none"`, `"linear"`, `"yarn"`, `"longrope"`
`llama.rope.scaling.factor`	`Float`	Scaling multiplier
`general.architecture`	`String`	Model family name

Exact key names vary by architecture. Inspect the model’s metadata with a tool like gguf-dump from llama.cpp before overriding.

Interactions

ContextParameters.RopeScalingType — overriding llama.rope.scaling.type via KvOverrides has similar effect.
ContextParameters.ContextSize — at load time, KvOverrides of llama.context_length defines what the runtime treats as the trained window.

What’s next

RopeScalingType — alternative way to control scaling.
Long context tuning — when KvOverrides helps.
Bring your own GGUF — custom-model workflows.

TensorSplit