Embeddings

Embeddings is a boolean flag. When true, the engine extracts embedding vectors alongside (or instead of) logits. Use it with a PoolingType that matches the model’s training regime.

Quick reference


Type	`bool?`
Default	`null` (disabled)
Category	Embeddings
Field on	`ContextParameters.Embeddings`

What it does

null or false — standard generation mode. Only logits are produced; no embedding extraction.
true — the engine configures the pipeline to output embeddings per input.

Embeddings are typically used for semantic search, clustering, classification, or as retrieval keys in RAG systems. The SDK’s current chat API (SendMessageAsync) focuses on text generation; embedding workflows require reaching into the Engine and ChatSession APIs directly.

When to change it

Scenario	Value
Default chat	`null`
Extract embeddings	`true`, paired with `PoolingType` and often `AttentionType = NonCausal`

A dedicated use case for embeddings is on the documentation roadmap but not covered in this version.

Example

using Aspose.LLM.Abstractions.Models;

var preset = new Qwen25Preset();
preset.ContextParameters.Embeddings = true;
preset.ContextParameters.PoolingType = PoolingType.Mean;
preset.ContextParameters.AttentionType = AttentionType.NonCausal;

using var api = AsposeLLMApi.Create(preset);
// Direct chat methods do not surface embeddings; use Engine/ChatSession internals.

Interactions

PoolingType — reducer for token-level embeddings.
AttentionType — usually NonCausal for embedding-only models.

What’s next

PoolingType — pooling strategy.
AttentionType — attention direction.
Context parameters hub — all context knobs.

PoolingType TypeK