NThreads

NThreads is the number of CPU threads the engine uses during generation — when producing each output token sequentially. Generation is bandwidth-bound and often does not benefit from all available cores.

Quick reference

Type int?
Default null (falls back to EngineParameters.DefaultThreads)
Range 1 and above
Category Threading
Field on ContextParameters.NThreads

What it does

During the generation phase (token-by-token decode), the engine distributes matrix multiplications across NThreads CPU threads. When null, it uses EngineParameters.DefaultThreads, which defaults to ProcessorCount - 1.

  • NThreads = 4 — decent for 4-core machines; use most cores.
  • NThreads = 8 — common sweet spot on mainstream desktop CPUs.
  • NThreads = 16+ — diminishing returns; sometimes slower due to cache contention and memory-bandwidth saturation.

Unlike prompt processing (which scales well with more threads), generation often peaks at 8-12 threads and degrades with more. Benchmark on your hardware.

When to change it

Scenario Value
Default null (use DefaultThreads)
Laptop / 4-8 core 46
Mainstream desktop 810
High-core server (but avoid over-allocation) 1016
Competing with other CPU workloads Cap explicitly to half ProcessorCount

Set NThreads and NThreadsBatch separately — generation and prompt processing have different optima.

Example

var preset = new Qwen25Preset();
preset.ContextParameters.NThreads = 8;
preset.ContextParameters.NThreadsBatch = 16;  // more threads for prompt processing

using var api = AsposeLLMApi.Create(preset);

Interactions

What’s next