MaxTokens

MaxTokens is the upper bound on tokens the engine generates for a single assistant response. The default 2048 fits most general-purpose tasks; raise it for reasoning models (Qwen3, DeepSeek-R1) that emit hidden <think> blocks before the answer.

Quick reference


Type	`int`
Default	`2048`
Range	`> 0`
Category	Chat session
Field on	`ChatParameters.MaxTokens`

What it does

Once the assistant turn begins generation, the engine counts produced tokens. When the counter reaches MaxTokens, generation stops — even mid-sentence. The rest of the response is never produced.

256 — short answers; classifications; yes/no responses.
512 – 1024 — conversational replies, brief explanations.
2048 (default) — general-purpose.
2048 – 4096 — reasoning-model output (Qwen3, DeepSeek-R1). Leaves room for <think> block plus the final answer.
4096+ — long-form writing, essays, code generation.

Reasoning model budget. Qwen3, DeepSeek-R1, and similar chain-of-thought models emit hidden reasoning tokens (<think>…</think>) that consume 300-500 tokens before the actual answer. Set MaxTokens to at least 1024 — ideally 2048-4096 — when using these models, or the response truncates mid-reasoning and produces no visible answer.

MaxTokens is a cap, not an allocation. Raising it does not cost memory or compute upfront; the engine generates only as many tokens as the model actually produces up to the limit.

When to change it

Scenario	Value
Classifications, yes/no	`128` – `256`
Conversational chat	`512` – `1024`
General-purpose (default)	`2048`
Reasoning models	`1024` – `4096`
Essays, code, long-form	`4096` – `8192`

Example

var preset = new Qwen25Preset();
preset.ChatParameters.MaxTokens = 1024;
// Generous cap for conversational output.

using var api = AsposeLLMApi.Create(preset);

For DeepSeek-R1:

var preset = new DeepseekR1Qwen3Preset();
preset.ChatParameters.MaxTokens = 2048; // room for <think> + answer

Interactions

ContextSize — input plus output plus history must fit; a high MaxTokens leaves less room for input.
CacheCleanupStrategy — trims history as output approaches the context cap.

What’s next

Chat parameters hub — all chat knobs.
Garbled output troubleshooting — truncation symptoms.
Tune for speed vs quality — response length trade-offs.

History CacheCleanupStrategy