TypeV
Contents
[
Hide
]
TypeV is the data type for the V (values) tensor of the KV cache. V tolerates more aggressive quantization than K — Q8_0 is often a safe default when memory is tight.
Quick reference
| Type | GgmlType? enum (39 values) |
| Default | null (use native default, usually F16) |
| Common values | F32, F16, BF16, Q8_0, Q5_1, Q4_0 |
| Category | KV cache |
| Field on | ContextParameters.TypeV |
What it does
Identical mechanics to TypeK — stored shape, memory scaling, and options are the same. The difference is sensitivity: attention output depends on V through a weighted sum, which averages over many tokens, so quantization error averages out. Attention scores depend on K through a dot product where individual errors survive more.
Rule of thumb: Quantize V one step more aggressively than K.
| Configuration | K | V |
|---|---|---|
| Default balanced | F16 |
F16 |
| Save memory with minimal quality loss | F16 |
Q8_0 |
| Tight memory | Q8_0 |
Q8_0 |
| Very tight | Q8_0 |
Q5_1 |
| Extreme | Q8_0 |
Q4_0 |
When to change it
| Scenario | Value |
|---|---|
| Default | null (F16) |
| Save memory with minimal quality loss | Q8_0 |
| Tight memory | Q5_1 |
| Extreme memory pressure | Q4_0 |
V quantization at Q8_0 is often indistinguishable from F16 on benchmark tasks. Start there when memory is tight.
Example
using Aspose.LLM.Abstractions.Models;
var preset = new Qwen25Preset();
preset.ContextParameters.TypeV = GgmlType.Q8_0;
// V quantized; K left at default F16.
using var api = AsposeLLMApi.Create(preset);
Interactions
TypeK— companion K-cache dtype.ContextSize— memory savings scale with context.FlashAttentionMode— compatible with quantized V.
What’s next
- TypeK — companion.
- Low memory tuning — recipes for tight-memory deployments.
- Context parameters hub — all context knobs.