NSeqMax

NSeqMax is the maximum number of distinct sequences the engine handles in parallel, each with its own state. It matters for recurrent or state-tracking models. Standard transformer chat presets do not require tuning it.

Quick reference

Type uint?
Default null (native default, typically 1 for transformer models)
Range 1 and above; power-of-two values recommended for advanced scenarios
Category Context size and batching
Field on ContextParameters.NSeqMax

What it does

For recurrent or state-space models (Mamba, RWKV, hybrid architectures), each independent sequence carries its own recurrent state. NSeqMax caps how many such states the engine maintains simultaneously.

  • NSeqMax = 1 (default for standard transformers) — no parallel state tracking needed.
  • NSeqMax = 4+ — enables parallel recurrent-model sequences.

Transformer models (Qwen, Llama, Gemma, Phi, etc.) do not maintain per-sequence hidden state in this sense. NSeqMax = 1 is correct for them.

When to change it

Scenario Value
Standard transformer chat presets Leave null or 1
Recurrent / state-space model Set to the number of parallel sequences you serve

If you are not building against a recurrent-model-specific preset, leave NSeqMax at the default.

Example

// Standard transformer use case — no change needed.
var preset = new Qwen25Preset();
// preset.ContextParameters.NSeqMax = null; // (default)

using var api = AsposeLLMApi.Create(preset);

Interactions

  • NBatch, NUbatch — batch sizes interact with sequence count in multi-sequence scenarios.

What’s next