MaxTokens

MaxTokens is the upper bound on tokens the engine generates for a single assistant response. The default 2048 fits most general-purpose tasks; raise it for reasoning models (Qwen3, DeepSeek-R1) that emit hidden <think> blocks before the answer.

Quick reference

Type int
Default 2048
Range > 0
Category Chat session
Field on ChatParameters.MaxTokens

What it does

Once the assistant turn begins generation, the engine counts produced tokens. When the counter reaches MaxTokens, generation stops — even mid-sentence. The rest of the response is never produced.

  • 256 — short answers; classifications; yes/no responses.
  • 5121024 — conversational replies, brief explanations.
  • 2048 (default) — general-purpose.
  • 20484096 — reasoning-model output (Qwen3, DeepSeek-R1). Leaves room for <think> block plus the final answer.
  • 4096+ — long-form writing, essays, code generation.

MaxTokens is a cap, not an allocation. Raising it does not cost memory or compute upfront; the engine generates only as many tokens as the model actually produces up to the limit.

When to change it

Scenario Value
Classifications, yes/no 128256
Conversational chat 5121024
General-purpose (default) 2048
Reasoning models 10244096
Essays, code, long-form 40968192

Example

var preset = new Qwen25Preset();
preset.ChatParameters.MaxTokens = 1024;
// Generous cap for conversational output.

using var api = AsposeLLMApi.Create(preset);

For DeepSeek-R1:

var preset = new DeepseekR1Qwen3Preset();
preset.ChatParameters.MaxTokens = 2048; // room for <think> + answer

Interactions

  • ContextSize — input plus output plus history must fit; a high MaxTokens leaves less room for input.
  • CacheCleanupStrategy — trims history as output approaches the context cap.

What’s next