VocabOnly

VocabOnly loads just the model’s vocabulary and tokenizer without loading the weights. The resulting model cannot generate output — it is a tokenizer-only configuration.

Quick reference

Type bool?
Default null (false — load full model)
Category Model loading
Field on ModelInferenceParameters.VocabOnly

What it does

  • null or false — load the full model (vocabulary + weights). Required for inference.
  • true — load only vocabulary data. The model is loaded with no weights; chat methods are not meaningful.

Use VocabOnly = true only for tokenizer-level operations — for example, probing token IDs to populate LogitBias without paying the cost of loading weights.

When to change it

Scenario Value
Normal chat inference null or false
Tokenizer-only probing true

Rare in practice. Most applications load the full model.

Example

var preset = new Qwen25Preset();
preset.BaseModelInferenceParameters.VocabOnly = true;
// Tokenizer-only mode. Chat methods will not function; use only for
// token-ID discovery.

Interactions

  • Other ModelInferenceParameters fields (GpuLayers, TensorSplit, etc.) are largely irrelevant in VocabOnly mode.

What’s next