NThreadsBatch

NThreadsBatch is the number of CPU threads the engine uses during prompt processing (the initial prefill phase). Prompt processing is embarrassingly parallel and benefits from using most or all available cores.

Quick reference

Type int?
Default null (falls back to EngineParameters.DefaultThreads)
Range 1 and above
Category Threading
Field on ContextParameters.NThreadsBatch

What it does

During prompt processing, the engine runs matrix multiplications over many tokens at once. This workload parallelizes well: more threads directly translate to higher throughput, up to memory-bandwidth limits.

  • NThreadsBatch = ProcessorCount — typical. Use all cores for fast prompt ingestion.
  • NThreadsBatch = half ProcessorCount — leave room for other CPU workloads.
  • NThreadsBatch < NThreads — unusual, almost always wrong for modern CPUs.

Prompt processing happens once per incoming message (on user input). Generation (NThreads) happens per output token. For chat, prompt-processing time dominates when the prompt is long, generation dominates when the output is long.

When to change it

Scenario Value
Default null (use DefaultThreads)
Dedicated inference machine ProcessorCount (all cores)
Shared machine ProcessorCount / 2
Very long prompts, memory-bound hardware Benchmark — adding threads may not help past 16

Example

var preset = new Qwen25Preset();
preset.ContextParameters.NThreads = 8;                             // generation
preset.ContextParameters.NThreadsBatch = Environment.ProcessorCount; // prompt prefill

Interactions

What’s next