PoolingType

PoolingType selects the strategy the engine uses to reduce per-token embeddings to a single vector for the full input. Relevant only when Embeddings is true.

Quick reference


Type	`PoolingType?` enum
Default	`null` (use model default)
Values	`Unspecified`, `None`, `Mean`, `Cls`, `Last`, `Rank`
Category	Embeddings
Field on	`ContextParameters.PoolingType`

What it does

Value	Behavior
`Unspecified` (`-1`)	Use model default.
`None` (`0`)	Return per-token embeddings without reduction.
`Mean` (`1`)	Average all token embeddings. Good default for sentence-level semantic similarity.
`Cls` (`2`)	Use the first (CLS) token’s embedding. Common for BERT-family.
`Last` (`3`)	Use the last token’s embedding. Common for causal-LM embeddings.
`Rank` (`4`)	Rank-based pooling (experimental).

Pick the pooling strategy the model was trained with. Mismatched pooling produces embeddings of degraded quality.

When to change it

Scenario	Value
Default chat — not used	`null`
Causal-LM embeddings	`Last`
BERT-style embedder	`Cls`
Sentence-transformer-style	`Mean`

Example

using Aspose.LLM.Abstractions.Models;

var preset = new Qwen25Preset();
preset.ContextParameters.Embeddings = true;
preset.ContextParameters.PoolingType = PoolingType.Mean;
preset.ContextParameters.AttentionType = AttentionType.NonCausal;

using var api = AsposeLLMApi.Create(preset);
// Embedding-only configuration.

Interactions

Embeddings — must be true for PoolingType to take effect.
AttentionType — usually NonCausal with embedding-specific pooling.

What’s next

Embeddings — flag that enables this pipeline.
AttentionType — companion choice.
Context parameters hub — all context knobs.

FlashAttention (legacy) Embeddings