VocabOnly
Contents
[
Hide
]
VocabOnly loads just the model’s vocabulary and tokenizer without loading the weights. The resulting model cannot generate output — it is a tokenizer-only configuration.
Quick reference
| Type | bool? |
| Default | null (false — load full model) |
| Category | Model loading |
| Field on | ModelInferenceParameters.VocabOnly |
What it does
nullorfalse— load the full model (vocabulary + weights). Required for inference.true— load only vocabulary data. The model is loaded with no weights; chat methods are not meaningful.
Use VocabOnly = true only for tokenizer-level operations — for example, probing token IDs to populate LogitBias without paying the cost of loading weights.
When to change it
| Scenario | Value |
|---|---|
| Normal chat inference | null or false |
| Tokenizer-only probing | true |
Rare in practice. Most applications load the full model.
Example
var preset = new Qwen25Preset();
preset.BaseModelInferenceParameters.VocabOnly = true;
// Tokenizer-only mode. Chat methods will not function; use only for
// token-ID discovery.
Interactions
- Other
ModelInferenceParametersfields (GpuLayers, TensorSplit, etc.) are largely irrelevant inVocabOnlymode.
What’s next
- LogitBias — use case for token-ID probing.
- Model inference hub — all inference knobs.