NThreads

NThreads is the number of CPU threads the engine uses during generation — when producing each output token sequentially. Generation is bandwidth-bound and often does not benefit from all available cores.

Quick reference


Type	`int?`
Default	`null` (falls back to `EngineParameters.DefaultThreads`)
Range	`1` and above
Category	Threading
Field on	`ContextParameters.NThreads`

What it does

During the generation phase (token-by-token decode), the engine distributes matrix multiplications across NThreads CPU threads. When null, it uses EngineParameters.DefaultThreads, which defaults to ProcessorCount - 1.

NThreads = 4 — decent for 4-core machines; use most cores.
NThreads = 8 — common sweet spot on mainstream desktop CPUs.
NThreads = 16+ — diminishing returns; sometimes slower due to cache contention and memory-bandwidth saturation.

Unlike prompt processing (which scales well with more threads), generation often peaks at 8-12 threads and degrades with more. Benchmark on your hardware.

When to change it

Scenario	Value
Default	`null` (use `DefaultThreads`)
Laptop / 4-8 core	`4` – `6`
Mainstream desktop	`8` – `10`
High-core server (but avoid over-allocation)	`10` – `16`
Competing with other CPU workloads	Cap explicitly to half `ProcessorCount`

Set NThreads and NThreadsBatch separately — generation and prompt processing have different optima.

Example

var preset = new Qwen25Preset();
preset.ContextParameters.NThreads = 8;
preset.ContextParameters.NThreadsBatch = 16;  // more threads for prompt processing

using var api = AsposeLLMApi.Create(preset);

Interactions

EngineParameters.DefaultThreads — fallback when NThreads is null.
NThreadsBatch — prompt-processing threads.
CPU acceleration — NThreads has no effect when GPU offload is active for every layer.

What’s next

NThreadsBatch — prompt-processing variant.
CPU acceleration — how threading interacts with AVX variants.
Performance issues — thread-related throughput issues.

NSeqMax NThreadsBatch