Documentation – Sampler parameters

Net: Temperature

Thu, 23 Apr 2026 00:00:00 +0000

Temperature scales the model’s logits (unnormalized probabilities) before sampling. Lower temperature sharpens the distribution toward the most likely token; higher temperature flattens it, making rare tokens more competitive.

Quick reference


Type	`float`
Default	`0.7`
Range	`0.0` and above (typical `0.0` – `1.5`)
Category	Core sampling
Field on	`SamplerParameters.Temperature`

What it does

Each generation step produces a vector of logits — one value per vocabulary token. The engine divides every logit by Temperature before applying softmax:

At Temperature = 1.0, the softmax is unchanged; the model samples from its native distribution.
Below 1.0, differences between logits are magnified. The top tokens become more likely; rare tokens are suppressed. At the limit Temperature = 0.0, the engine picks the single highest-logit token every step (greedy decoding).
Above 1.0, logits flatten. The top token’s advantage shrinks; tail tokens gain relative probability. At high temperatures, output becomes increasingly random.

Temperature runs before all downstream filters (TopP, TopK, MinP). A combination of low temperature and tight TopP produces very deterministic output; high temperature with wide filters produces creative, unpredictable output.

When to change it

Scenario	Value
Fully deterministic, reproducible output	`0.0` (greedy; `Seed` becomes irrelevant)
Precise tasks — code, structured data, classification	`0.1` – `0.3`
General-purpose chat (default balance)	`0.7`
Creative writing, brainstorming	`0.8` – `1.0`
Maximum variety, risk of incoherence	`> 1.0`

Temperature trades accuracy for variety. On factual questions, high temperature increases hallucination rate. On creative tasks, low temperature produces dull, repetitive output.

Example

using Aspose.LLM;
using Aspose.LLM.Abstractions.Parameters.Presets;

var preset = new Qwen25Preset();
preset.SamplerParameters.Temperature = 0.2f; // precise, low-variety output

using var api = AsposeLLMApi.Create(preset);
string reply = await api.SendMessageAsync("List three primes under 20.");
Console.WriteLine(reply);
// Output: 2, 3, 5.

For a fully reproducible run:

preset.SamplerParameters.Temperature = 0.0f;
preset.SamplerParameters.Seed = 42; // seed is ignored when Temperature == 0

Interactions

TopP — applies after temperature to trim the tail.
TopK — hard cap on candidate count after temperature.
MinP — minimum relative probability after temperature.
Seed — irrelevant when Temperature = 0.
Mirostat — when active, adaptively adjusts entropy and bypasses Temperature tuning.
DynatempRange — dynamically varies Temperature per step based on entropy.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
Tune for speed vs quality — where Temperature sits in the trade-off.
Garbled output troubleshooting — when high temperature causes incoherence.

Net: TopP

Thu, 23 Apr 2026 00:00:00 +0000

TopP implements nucleus sampling. At each generation step, tokens are sorted by probability and the engine keeps only the smallest set whose cumulative probability is at least TopP. Tokens outside that nucleus are discarded before sampling.

Quick reference


Type	`float`
Default	`0.9`
Range	`0.0` – `1.0` (values ≥ `1.0` disable the filter)
Category	Core sampling
Field on	`SamplerParameters.TopP`

What it does

After Temperature scales the distribution, sort tokens by probability descending. Walk the sorted list accumulating probability. Stop when the running sum reaches TopP. Every token past that cutoff is removed; the remaining tokens are renormalized and sampled from.

At TopP = 1.0, no tokens are removed — the filter is effectively off.
At TopP = 0.9 (default), the engine keeps ~90 % of the probability mass. On a peaked distribution this is 2-5 tokens; on a flat distribution it can be 50+.
At TopP = 0.5, only the dominant half of the mass survives — tighter, more deterministic output.

TopP is probability-aware: on a confident step (one token at 0.95 probability) it keeps only that one token; on an uncertain step (many comparable tokens) it keeps a larger set.

When to change it

Scenario	Value
Disabled (rely on `TopK` + `MinP`)	`1.0`
Creative output with some variety	`0.95`
Balanced general-purpose chat (default)	`0.9`
Conservative, precise output	`0.7` – `0.8`
Very tight, near-greedy	`0.5` – `0.6`

Lower TopP excludes more of the tail — output is more predictable but less varied. The default 0.9 is a broadly accepted balance.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.TopP = 0.95f; // keep a bit more of the tail for creativity

using var api = AsposeLLMApi.Create(preset);
string reply = await api.SendMessageAsync("Suggest a name for a coffee shop.");
Console.WriteLine(reply);

For precision work:

preset.SamplerParameters.Temperature = 0.2f;
preset.SamplerParameters.TopP = 0.8f;
// Narrow both dimensions — precise and deterministic.

Interactions

Temperature — applied before TopP. Very low Temperature + loose TopP still produces near-greedy output.
TopK — stacks with TopP. The final candidate set is the intersection.
MinP — complements TopP on the tail; both can be active.
TypicalP — alternative to TopP based on local typicality.
MinKeep — floor on candidate count; TopP never cuts below MinKeep.
Mirostat — bypasses TopP when active.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
Temperature — the partner knob that runs before TopP.
MinP — another probability-relative cutoff.

Net: TopK

Thu, 23 Apr 2026 00:00:00 +0000

TopK caps the number of candidate tokens the sampler considers at each step. After Temperature, the engine sorts tokens by probability and keeps only the top TopK entries.

Quick reference


Type	`int` (documented as `float` in the SDK class, but used as integer count)
Default	`40`
Range	`0` disables; `1` = greedy; typical `10` – `100`
Category	Core sampling
Field on	`SamplerParameters.TopK`

What it does

Sort the distribution by probability descending. Take the first TopK tokens. Discard everything else. The remaining tokens are renormalized and passed downstream.

TopK = 0 — filter is disabled (or interpreted as unlimited by llama.cpp).
TopK = 1 — greedy. Only the single highest-probability token survives regardless of distribution shape.
TopK = 40 (default) — a cap that leaves room for diversity while excluding the long tail.

Unlike TopP, TopK is probability-blind — it always keeps exactly TopK tokens, even if the distribution is tightly peaked on one token and 39 runner-ups are all nearly zero.

When to change it

Scenario	Value
Disabled (rely on `TopP` + `MinP`)	`0` (or set very large, e.g., `100000`)
Greedy-like, single candidate	`1`
Very conservative	`10` – `20`
Balanced default	`40`
Creative, wide pool	`80` – `100`

Most users leave TopK = 40 and tune TopP instead. TopK is a safety net that prevents the sampler from ever considering extremely rare tokens.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.TopK = 20; // conservative candidate pool

using var api = AsposeLLMApi.Create(preset);
string reply = await api.SendMessageAsync("Write one short haiku about rain.");
Console.WriteLine(reply);

Combining with TopP for fine control:

preset.SamplerParameters.TopK = 40;
preset.SamplerParameters.TopP = 0.9f;
// Final candidate set is the intersection — whichever is stricter per step.

Interactions

Temperature — applied before TopK.
TopP — both active filters stack; sampling happens over the intersection.
MinP — can further thin the candidate pool.
MinKeep — floor; TopK never cuts below MinKeep.
Mirostat — bypasses TopK when active.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
TopP — the probability-aware counterpart.
MinP — another truncation strategy.

Net: MinP

Thu, 23 Apr 2026 00:00:00 +0000

MinP defines a minimum probability threshold relative to the top token. Any candidate whose probability is below MinP × p(top) is discarded.

Quick reference


Type	`float`
Default	`0.05`
Range	`0.0` – `1.0`
Category	Core sampling
Field on	`SamplerParameters.MinP`

What it does

After the other filters run, compute the top candidate’s probability p_max. For every remaining token, check if its probability is at least MinP × p_max. If not, discard.

MinP = 0.0 — filter disabled.
MinP = 0.05 (default) — keep tokens at 5 % of the top token’s probability or higher.
MinP = 0.1 — keep tokens at 10 % or higher (stricter).

Unlike TopP (which is cumulative-mass aware) and TopK (which is count aware), MinP is relative-to-top aware. It adapts to distribution shape differently: on a peaked distribution it keeps fewer tokens; on a flat distribution it keeps more.

When to change it

Scenario	Value
Disabled	`0.0`
Loose tail retention	`0.02`
Default balance	`0.05`
Conservative, drops more tail	`0.1` – `0.15`

Some users recommend MinP as a replacement for TopP — simpler to reason about, less sensitive to vocabulary size. The default 0.05 is a conservative, broadly safe value.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.MinP = 0.1f; // stricter tail cutoff

using var api = AsposeLLMApi.Create(preset);
string reply = await api.SendMessageAsync("Describe today's weather in three words.");
Console.WriteLine(reply);

Creative writing with wider tail:

preset.SamplerParameters.MinP = 0.02f;
preset.SamplerParameters.Temperature = 0.9f;
// More diverse sampling, still filtered against very-low-probability tokens.

Interactions

Temperature — applied before MinP.
TopP — can coexist; final candidate set respects both.
TopK — count-based cap; stacks with MinP.
MinKeep — floor; MinP never cuts below MinKeep.
Mirostat — bypasses MinP when active.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
TopP — cumulative-mass cousin of MinP.
TopK — count-based cap.

Net: Seed

Thu, 23 Apr 2026 00:00:00 +0000

Seed controls the random number generator used for stochastic sampling. A fixed integer makes generation reproducible — the same prompt with the same model and the same parameters produces the same output. The default sentinel maps to a time-based seed, giving non-deterministic output on every run.

Quick reference


Type	`int`
Default	`0xFFFFFFFF` (sentinel — time-based seed inside llama.cpp)
Range	Any `int` value; default sentinel is a signed-int reinterpretation of `0xFFFFFFFF`
Category	Core sampling
Field on	`SamplerParameters.Seed`

What it does

Seed initializes the sampler’s pseudo-random number generator. The RNG drives every stochastic choice — which token to pick from the candidate pool when multiple tokens survive filtering.

A specific integer (42, 12345, etc.) — deterministic. Two runs with the same seed and identical parameters produce identical token sequences.
The default sentinel 0xFFFFFFFF — llama.cpp substitutes a time-based seed. Output varies per run.

Seed only matters when sampling is actually stochastic. At Temperature = 0 the sampler is greedy (always picks the top token), and Seed has no effect.

When to change it

Scenario	Value
Default non-deterministic behavior	`unchecked((int)0xFFFFFFFF)` (default)
Reproducible tests and benchmarks	A fixed `int`, e.g., `42`
Per-request seeding (rotate value per request)	Pass a computed `int` from your caller

Set Seed when writing tests that assert on specific output, or when building regression suites that catch sampling drift across SDK versions.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.Temperature = 0.7f;
preset.SamplerParameters.Seed = 42;

using var api = AsposeLLMApi.Create(preset);
string reply1 = await api.SendMessageAsync("Write one short sentence about rain.");
string reply2 = await api.SendMessageAsync("Write one short sentence about rain.");
// reply1 and reply2 are identical because Seed is fixed and history is the same.

For deterministic single-request output:

preset.SamplerParameters.Temperature = 0.0f;
// Temperature 0 = greedy; Seed becomes irrelevant.

Interactions

Temperature — at Temperature = 0, Seed has no effect.
Mirostat — uses its own adaptive process; Seed still affects the underlying RNG but the entropy target dominates.
Session history — a fixed Seed only produces identical output when the entire preceding history is identical. A fresh session and a loaded session with the same history will both honor the seed.

Notes

Reproducibility is deterministic within the same SDK version and BinaryManagerParameters.ReleaseTag. Upgrading the SDK or the llama.cpp release tag may shift token sequences slightly even with the same Seed.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
Temperature — the primary randomness knob.
Session persistence portability — interaction between seeding and restored sessions.

Net: MinKeep

Thu, 23 Apr 2026 00:00:00 +0000

MinKeep sets a floor on the number of candidate tokens that survive after every filter (TopK, TopP, MinP, TypicalP, etc.) runs. If filters would leave fewer than MinKeep tokens, the engine widens them until at least MinKeep tokens remain.

Quick reference


Type	`int`
Default	`1`
Range	`1` and above
Category	Safety / filter floor
Field on	`SamplerParameters.MinKeep`

What it does

After all filters run, count the surviving candidates. If the count is below MinKeep, the engine relaxes the filters to bring it back up to MinKeep — typically by reverting the last applied filter or by keeping the top-MinKeep candidates regardless of probability.

MinKeep = 1 (default) — guarantees at least one token survives. Safe default; prevents the pathological case where every filter combines to produce zero candidates.
MinKeep = 5 — always keep at least five candidates. Adds variety floor at the cost of filter tightness.

MinKeep is a backstop rather than a primary tuning knob. Most users leave it at the default.

When to change it

Scenario	Value
Default safe floor	`1`
Guarantee variety even under strict filters	`3` – `5`
Very aggressive tightness with unusual filter combos	Keep at `1` or lower

Raising MinKeep effectively weakens the other filters in corner cases where they combine too tightly. Lowering is not useful — 1 is already the minimum meaningful value.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.TopK = 5;       // tight
preset.SamplerParameters.TopP = 0.3f;    // tight
preset.SamplerParameters.MinP = 0.2f;    // tight
preset.SamplerParameters.MinKeep = 3;    // but at least 3 candidates always survive

using var api = AsposeLLMApi.Create(preset);

Interactions

TopK — MinKeep floors the candidate count regardless.
TopP — same — nucleus can’t cut below MinKeep.
MinP — same.
TypicalP — same.
Mirostat — uses its own candidate-management algorithm; MinKeep is not relevant when Mirostat is active.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
TopK — count-based filter that respects MinKeep.
TopP — cumulative-mass filter that respects MinKeep.

Net: TypicalP

Thu, 23 Apr 2026 00:00:00 +0000

TypicalP enables locally-typical sampling. Instead of keeping the most likely tokens (TopP) or the top count (TopK), this filter keeps tokens whose log-probability is close to the expected entropy of the distribution at that step.

Quick reference


Type	`float`
Default	`-1.0` (disabled)
Range	`0.0` – `1.0` enables; `≤ 0` disables
Category	Advanced filter
Field on	`SamplerParameters.TypicalP`

What it does

Locally-typical sampling is an alternative tail-trimming strategy based on information theory. For each token, the engine computes how “typical” its log-probability is relative to the distribution’s expected entropy. Tokens whose log-probability deviates too far from the expectation — either too high or too low — are discarded, and the engine keeps tokens whose cumulative typicality reaches TypicalP.

TypicalP = -1 (default) — disabled.
TypicalP = 0.95 — keep tokens covering 95 % of the typical mass.
TypicalP = 0.5 — keep only very-typical tokens; tighter than nucleus sampling at the same threshold.

Locally-typical sampling tends to produce output that “sounds like the training distribution” — less repetitive than greedy, less random than high-temperature sampling.

When to change it

Scenario	Value
Default (disabled)	`-1.0`
Alternative to `TopP` for naturalness	`0.95`
Experiment with information-theoretic sampling	`0.5` – `0.9`

Most production workloads use TopP + TopK + MinP. TypicalP is worth trying when standard filters produce repetitive output or when research / experimentation calls for it.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.TopP = 1.0f;        // disable nucleus
preset.SamplerParameters.TypicalP = 0.95f;   // use locally-typical instead

using var api = AsposeLLMApi.Create(preset);
string reply = await api.SendMessageAsync("Tell a short story.");
Console.WriteLine(reply);

Interactions

Temperature — applied before TypicalP.
TopP — both can be active; most users pick one or the other.
TopK — compatible as a count cap.
MinP — compatible.
MinKeep — floor applies.
Mirostat — bypasses all standard filters including TypicalP.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
TopP — the common alternative.
TopNSigma — another experimental filter.

Net: TopNSigma

Thu, 23 Apr 2026 00:00:00 +0000

TopNSigma is an experimental filter from recent llama.cpp versions. It keeps only tokens whose logit is within N standard deviations of the top logit, discarding the rest.

Quick reference


Type	`float`
Default	`-1.0` (disabled)
Range	`> 0` enables (typical `1.0` – `3.0`); `≤ 0` disables
Category	Advanced / experimental filter
Field on	`SamplerParameters.TopNSigma`

What it does

Compute the standard deviation of the logit distribution at a generation step. Take the maximum logit (logit_max). Keep only tokens whose logit is at least logit_max - N × stddev. Discard the rest.

This filter adapts automatically to distribution shape: on peaked distributions it keeps few tokens (the tail is far from the mean); on flat distributions it keeps many (the whole distribution fits within N sigmas).

TopNSigma = -1 (default) — disabled.
TopNSigma = 1.0 — tight; keeps only tokens very close to the top.
TopNSigma = 2.0 — moderate; keeps tokens within two standard deviations.
TopNSigma = 3.0 — wide; covers ~99.7 % of a normal distribution.

This is a newer filter and interactions with other knobs are less well-studied than classic TopP / TopK. Reserve for experimentation.

When to change it

Scenario	Value
Default (disabled)	`-1.0`
Experimental usage	`1.5` – `2.5`

Stick with TopP + TopK + MinP for production unless you have a specific reason.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.TopP = 1.0f;          // disable nucleus
preset.SamplerParameters.TopK = 0;             // disable top-K
preset.SamplerParameters.TopNSigma = 2.0f;     // use sigma-based filter instead

using var api = AsposeLLMApi.Create(preset);

Interactions

Temperature — applied before TopNSigma.
TopP — can coexist; experimental combinations are not well-studied.
TopK — can coexist.
MinP — can coexist.
MinKeep — floor applies.
Mirostat — bypasses TopNSigma when active.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
TypicalP — another experimental filter.
TopP — the standard alternative.

Net: PenaltyContextSize

Thu, 23 Apr 2026 00:00:00 +0000

PenaltyContextSize sets the number of recent tokens considered when applying repetition, presence, and frequency penalties. Only the last PenaltyContextSize tokens influence penalty calculations.

Quick reference


Type	`int`
Default	`-1` (use the model’s full context size)
Range	`-1` or positive integer
Category	Penalty window
Field on	`SamplerParameters.PenaltyContextSize`

What it does

Before each sampling step, the three penalty knobs (RepetitionPenalty, PresencePenalty, FrequencyPenalty) need to know which prior tokens to examine. PenaltyContextSize defines that window.

PenaltyContextSize = -1 — use the full context (equivalent to ContextParameters.ContextSize). Maximum recall; penalties apply across the entire conversation.
PenaltyContextSize = 256 — only the last 256 tokens contribute. Penalties are local; the model can freely reuse words that appeared earlier than that.
PenaltyContextSize = 64 — very local window; penalties essentially prevent immediate repetition only.

Narrow windows make penalties local (avoid recent verbatim repeats); wide windows make them global (avoid any mention of a token anywhere in history).

When to change it

Scenario	Value
Default — penalize repetition across full context	`-1`
Fresh-style writing that can revisit topics	`256` – `512`
Strict anti-repetition for short outputs	`128`
Very local penalty (only consecutive repeats)	`64`

Longer conversations may benefit from a smaller penalty window so the model isn’t punished for reusing common words across a long dialogue. For short-form answers, the default -1 is usually fine.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.PenaltyContextSize = 256;
preset.SamplerParameters.RepetitionPenalty = 1.15f;
// Discourage repetition within the last 256 tokens; older history doesn't trigger the penalty.

using var api = AsposeLLMApi.Create(preset);

Interactions

RepetitionPenalty — applied within this window.
PresencePenalty — same window.
FrequencyPenalty — same window.
ContextParameters.ContextSize — upper bound; at -1, PenaltyContextSize equals this.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
RepetitionPenalty — the main penalty that uses this window.
Garbled output — repetition loops — when penalty tuning helps.

Net: RepetitionPenalty

Thu, 23 Apr 2026 00:00:00 +0000

RepetitionPenalty divides the logit of any token that appeared in the recent window (see PenaltyContextSize) by the penalty value. Values above 1.0 make recently-seen tokens less likely to be picked again.

Quick reference


Type	`float`
Default	`1.1`
Range	`1.0` = disabled, above `1.0` = active
Category	Repetition penalty
Field on	`SamplerParameters.RepetitionPenalty`

What it does

For each token that appeared at least once in the penalty window, divide its logit by RepetitionPenalty before sampling. The higher the penalty, the stronger the suppression.

1.0 — no penalty; behaves as if disabled.
1.05 – 1.15 — gentle. Breaks common loop patterns without starving the sampler of basic words.
1.2 – 1.3 — aggressive. Useful when the model loops persistently, but risks under-generating common words like “the” or “and”.
1.5+ — very aggressive. The model will work hard to use different words; output quality usually drops.

When to change it

Scenario	Value
Disabled	`1.0`
Default (most chat)	`1.1`
Model loops occasionally	`1.15`
Model loops persistently	`1.2`
Do not change beyond 1.3 without reason

If raising RepetitionPenalty past 1.2 still does not solve looping, switch to DryMultiplier (DRY) — it catches phrase-level repeats that token-level penalty misses.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.RepetitionPenalty = 1.15f;
preset.SamplerParameters.PenaltyContextSize = 256;

using var api = AsposeLLMApi.Create(preset);
string reply = await api.SendMessageAsync("Describe spring in three sentences.");
Console.WriteLine(reply);

Interactions

PenaltyContextSize — window over which this penalty applies.
PresencePenalty — alternative additive penalty; can combine.
FrequencyPenalty — scales with occurrence count; can combine.
DryMultiplier — phrase-level anti-repetition; complementary to token-level.
Mirostat — managing entropy adaptively may reduce the need for penalties.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
PresencePenalty and FrequencyPenalty — additive alternatives.
Garbled output troubleshooting — diagnosing repetition issues.

Net: PresencePenalty

Thu, 23 Apr 2026 00:00:00 +0000

PresencePenalty is an additive penalty applied once to any token that appeared at least once in the penalty window. Unlike RepetitionPenalty (multiplicative) or FrequencyPenalty (scales with count), PresencePenalty fires uniformly on first appearance.

Quick reference


Type	`float`
Default	`0.0` (disabled)
Range	`0.0` = disabled, typical `0.0` – `1.0`
Category	Repetition penalty
Field on	`SamplerParameters.PresencePenalty`

What it does

For each token in the penalty window, subtract PresencePenalty from its logit before sampling. The penalty is applied once per token regardless of how often the token appeared.

0.0 (default) — disabled.
0.3 – 0.6 — moderate push toward fresh tokens.
0.8 – 1.0 — strong push; output lean heavily on new vocabulary.

PresencePenalty encourages broader vocabulary without caring about repetition frequency. It pairs well with topical content that should stay within a subject but use varied terms.

When to change it

Scenario	Value
Disabled	`0.0` (default)
Slightly broaden vocabulary	`0.3`
Strongly encourage new words	`0.6`
Push model to avoid any previously-seen vocabulary	`1.0`

Unlike RepetitionPenalty, PresencePenalty does not care how many times a token has been used — only whether it has appeared. For volume-sensitive control, combine with FrequencyPenalty.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.PresencePenalty = 0.5f;
preset.SamplerParameters.FrequencyPenalty = 0.3f;
// Broader vocabulary; slight extra push against over-used tokens.

using var api = AsposeLLMApi.Create(preset);

Interactions

PenaltyContextSize — window over which this penalty applies.
RepetitionPenalty — multiplicative alternative.
FrequencyPenalty — scales with frequency; stacks with PresencePenalty.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
FrequencyPenalty — companion count-based penalty.
RepetitionPenalty — the multiplicative variant.

Net: FrequencyPenalty

Thu, 23 Apr 2026 00:00:00 +0000

FrequencyPenalty is an additive penalty proportional to how many times each token has appeared in the penalty window. The more often a token appears, the larger the penalty.

Quick reference


Type	`float`
Default	`0.0` (disabled)
Range	`0.0` = disabled, typical `0.0` – `1.0`
Category	Repetition penalty
Field on	`SamplerParameters.FrequencyPenalty`

What it does

For each token in the penalty window, compute count × FrequencyPenalty and subtract that from the token’s logit. Tokens that appeared ten times get penalized ten times as hard as tokens that appeared once.

0.0 (default) — disabled.
0.1 – 0.3 — moderate; common words stay usable but over-used ones get suppressed.
0.5+ — aggressive; breaks repetition but risks under-generating common function words (articles, prepositions).

FrequencyPenalty is the finest-grained of the three penalties: RepetitionPenalty is binary-ish on the token, PresencePenalty fires once per token, and FrequencyPenalty scales with count.

When to change it

Scenario	Value
Disabled	`0.0` (default)
Discourage over-used words	`0.2`
Strong anti-repetition when other penalties do not suffice	`0.5`
Stacks with others; tune one penalty at a time

When experimenting, enable one penalty at a time and raise until looping stops. Enabling all three at once with aggressive values produces awkward output.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.FrequencyPenalty = 0.3f;
// Over-used words get progressively suppressed.

using var api = AsposeLLMApi.Create(preset);

Interactions

PenaltyContextSize — window over which the count is measured.
RepetitionPenalty — multiplicative companion.
PresencePenalty — uniform additive companion.

What’s next

Sampler parameters hub — all sampler knobs at a glance.
PresencePenalty — companion uniform penalty.
RepetitionPenalty — multiplicative penalty.

Net: DynatempRange

Thu, 23 Apr 2026 00:00:00 +0000

DynatempRange enables dynamic temperature (dynatemp). Instead of a fixed Temperature, the engine varies temperature per step based on how confident the model is at that step. Low-entropy (confident) steps drop the temperature; high-entropy (uncertain) steps raise it.

Quick reference


Type	`float`
Default	`0.0` (disabled)
Range	`0.0` = disabled, typical active values `0.1` – `0.5`
Category	Adaptive temperature
Field on	`SamplerParameters.DynatempRange`

What it does

At each step, llama.cpp computes the Shannon entropy of the current token distribution. It then adjusts Temperature within [Temperature - DynatempRange/2, Temperature + DynatempRange/2]:

When entropy is low (model is confident about the next token), temperature drops toward the lower bound — preserves the confident choice.
When entropy is high (many plausible tokens), temperature rises toward the upper bound — encourages variety where the model has no strong preference.

The exact shape of the entropy-to-temperature curve is controlled by DynatempExponent.

DynatempRange = 0.0 (default) — dynatemp disabled; Temperature is used as-is.
DynatempRange = 0.3 — moderate adaptive variation.
DynatempRange = 0.5 — wide swing; near-greedy on confident steps, high variety on uncertain steps.

When to change it

Scenario	Value
Default (disabled)	`0.0`
Slight adaptation	`0.2`
Classical dynatemp recipe	`0.3` – `0.4`
Aggressive adaptation	`0.5`

Dynatemp shines on mixed-content generation (structured plus creative) where fixed Temperature struggles. For pure factual tasks, leave disabled. For pure creative writing, fixed high Temperature is simpler.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.Temperature = 0.8f;          // mid-point
preset.SamplerParameters.DynatempRange = 0.3f;        // effective range [0.65, 0.95]
preset.SamplerParameters.DynatempExponent = 1.0f;     // linear entropy-to-temperature curve

using var api = AsposeLLMApi.Create(preset);

Interactions

Temperature — the base / mid-point around which dynatemp varies.
DynatempExponent — shapes the entropy-to-temperature curve.
Mirostat — alternative entropy-aware sampler; do not combine.

What’s next

DynatempExponent — the curve-shape knob.
Temperature — the fixed baseline.
Mirostat — alternative adaptive sampler.

Net: DynatempExponent

Thu, 23 Apr 2026 00:00:00 +0000

DynatempExponent controls the shape of the entropy-to-temperature mapping in dynamic temperature sampling. It only matters when DynatempRange is enabled (non-zero).

Quick reference


Type	`float`
Default	`1.0` (linear mapping)
Range	`> 0`
Category	Adaptive temperature
Field on	`SamplerParameters.DynatempExponent`

What it does

Dynatemp maps the normalized entropy e ∈ [0, 1] at each step to an offset within DynatempRange. DynatempExponent reshapes the mapping:

DynatempExponent = 1.0 — linear mapping. Temperature scales proportionally with entropy.
DynatempExponent > 1.0 — convex. Only very high entropy triggers significant temperature increases; medium entropy stays near the base.
DynatempExponent < 1.0 — concave. Small entropy changes trigger larger temperature swings.

The exact formula from llama.cpp: the step’s temperature is the base Temperature adjusted by an offset proportional to entropy^DynatempExponent over DynatempRange.

When to change it

Scenario	Value
Default linear mapping	`1.0`
Only react to very uncertain steps	`1.5` – `2.0`
React to mild uncertainty	`0.7` – `0.9`

Most users leave DynatempExponent = 1.0. Change only after experimenting with DynatempRange alone and finding the linear mapping unsuitable for your workload.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.Temperature = 0.7f;
preset.SamplerParameters.DynatempRange = 0.3f;
preset.SamplerParameters.DynatempExponent = 1.5f;
// Temperature rises significantly only on high-entropy steps; low-to-medium
// entropy stays close to 0.7.

using var api = AsposeLLMApi.Create(preset);

Interactions

DynatempRange — must be non-zero for DynatempExponent to have any effect.
Temperature — the base temperature dynatemp varies around.

What’s next

DynatempRange — enables dynatemp.
Temperature — the base.
Sampler parameters hub — all sampler knobs.

Net: XtcProbability

Thu, 23 Apr 2026 00:00:00 +0000

XtcProbability enables the XTC (eXclude Top Choices) sampler. With probability XtcProbability at each step, XTC removes the top candidate tokens, forcing the engine to sample from the tail. Useful for injecting diversity without raising temperature.

Quick reference


Type	`float`
Default	`-1.0` (disabled)
Range	`0.0` – `1.0`, or `-1.0` to disable
Category	Advanced / diversity
Field on	`SamplerParameters.XtcProbability`

What it does

At each generation step, with probability XtcProbability, XTC fires: the engine excludes all tokens whose probability is above XtcThreshold, and samples from whatever remains. This nudges the model into less-obvious paths without changing temperature.

-1.0 (default) — disabled.
0.1 – 0.3 — XTC fires on 10–30 % of steps. Mild diversity boost.
0.5 – 0.8 — XTC fires often. Strong variety; risk of incoherence.

When to change it

Scenario	Value
Default (disabled)	`-1.0`
Light creativity boost	`0.1` – `0.2`
Alternative to high temperature	`0.3` – `0.5`
Experimental variety push	`0.5+`

XTC is a recent addition. Prefer conventional Temperature + TopP tuning first. Reach for XTC when you want variety specifically when the model is confident — standard sampling tightens those steps; XTC breaks them.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.XtcProbability = 0.2f;
preset.SamplerParameters.XtcThreshold = 0.1f;
// On ~20 % of steps, tokens above 0.1 probability are excluded.

using var api = AsposeLLMApi.Create(preset);
string reply = await api.SendMessageAsync("Suggest three fictional startup names.");
Console.WriteLine(reply);

Interactions

XtcThreshold — minimum probability a token must have to be considered for exclusion.
Temperature — orthogonal. XTC works at any temperature.
TopP — still applied; XTC runs within what TopP keeps.

What’s next

XtcThreshold — the probability cutoff companion.
Temperature — conventional randomness knob.
Sampler parameters hub — all sampler knobs.

Net: XtcThreshold

Thu, 23 Apr 2026 00:00:00 +0000

XtcThreshold is the probability cutoff for XTC (eXclude Top Choices). Only tokens whose probability is above XtcThreshold are candidates for XTC exclusion. This field has no effect unless XtcProbability is enabled.

Quick reference


Type	`float`
Default	`0.0`
Range	`0.0` – `1.0`
Category	Advanced / diversity
Field on	`SamplerParameters.XtcThreshold`

What it does

When XTC fires at a step, the engine excludes every token whose probability exceeds XtcThreshold:

XtcThreshold = 0.0 — any token with non-zero probability can be excluded. Very aggressive.
XtcThreshold = 0.1 — only tokens above 10 % probability are exclusion candidates. Mild.
XtcThreshold = 0.3 — only dominant tokens (>30 % probability) get excluded. Very selective.
XtcThreshold = 1.0 — nothing can be excluded; XTC effectively does nothing.

Raise XtcThreshold when XTC is producing too much noise (excluding too many tokens); lower it for stronger diversity injection.

When to change it

Scenario	Value
Default	`0.0`
Target only dominant tokens	`0.1` – `0.2`
Target only near-certain tokens	`0.3` – `0.5`
Effectively disable (keep `XtcProbability` for logging)	`1.0`

XtcThreshold tunes “which tokens XTC may touch”. XtcProbability tunes “how often XTC runs at all”.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.XtcProbability = 0.3f;
preset.SamplerParameters.XtcThreshold = 0.2f;
// 30 % of steps, exclude tokens with probability > 20 %.

using var api = AsposeLLMApi.Create(preset);

Interactions

XtcProbability — gate; XtcThreshold only matters when XTC fires.
Temperature, TopP, TopK — apply before XTC.

What’s next

XtcProbability — enables XTC.
Sampler parameters hub — all sampler knobs.

Net: DryMultiplier

Thu, 23 Apr 2026 00:00:00 +0000

DryMultiplier enables and scales the DRY (Don’t Repeat Yourself) sampler. DRY detects when the model is about to produce a verbatim copy of a phrase it said earlier and applies an exponentially growing penalty to discourage the repeat.

Quick reference


Type	`float`
Default	`-1.0` (disabled)
Range	`> 0` enables; typical `0.5` – `1.5`
Category	Phrase-level repetition penalty
Field on	`SamplerParameters.DryMultiplier`

What it does

DRY scans the generation window and detects consecutive-token sequences that match what the model generated earlier. When a match extends beyond DryAllowedLength, it applies a penalty that grows as DryMultiplier × DryBase^(match_length - DryAllowedLength).

DryMultiplier = -1 (default) — disabled.
DryMultiplier = 0.8 — classic strength recommended by the DRY paper.
DryMultiplier = 1.5+ — aggressive; risks distorting natural common phrases.

DRY is phrase-level, complementary to token-level RepetitionPenalty. Use it when output has entire paragraphs or sentences echoing earlier text — token-level penalties cannot catch that.

When to change it

Scenario	Value
Default (disabled)	`-1.0`
Creative writing, stop phrase repeats	`0.8`
Persistent phrase loops	`1.2`
Code generation	Leave disabled (code naturally repeats syntax)

DRY is often the right tool when RepetitionPenalty tuning up to 1.2 doesn’t fix phrase-level loops. Enable DRY and tune DryMultiplier + DryAllowedLength.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.DryMultiplier = 0.8f;
preset.SamplerParameters.DryBase = 1.75f;
preset.SamplerParameters.DryAllowedLength = 3;

using var api = AsposeLLMApi.Create(preset);
string reply = await api.SendMessageAsync("Write an essay about patience.");
Console.WriteLine(reply);

Interactions

DryBase — exponent base for growing penalty.
DryAllowedLength — minimum match length before penalty fires.
DryPenaltyLastN — how far back to look.
DrySequenceBreakers — tokens that reset the match detector.
RepetitionPenalty — token-level companion; both can be active.

What’s next

DryBase, DryAllowedLength — the other DRY knobs.
RepetitionPenalty — token-level alternative.
Garbled output troubleshooting — when to reach for DRY.

Net: DryBase

Thu, 23 Apr 2026 00:00:00 +0000

DryBase is the exponent base of the DRY penalty growth formula. It determines how quickly the penalty ramps up as a repeated phrase extends. DryBase only has effect when DryMultiplier is enabled (positive).

Quick reference


Type	`float`
Default	`1.75`
Range	`> 1.0`; typical `1.5` – `2.5`
Category	Phrase-level repetition penalty
Field on	`SamplerParameters.DryBase`

What it does

The DRY penalty applied to a token that would extend a repeated match is:

penalty = DryMultiplier × DryBase^(match_length - DryAllowedLength)

Larger DryBase means the penalty grows faster. A 5-token match with DryBase = 1.75 produces 1.75^5 ≈ 16 multiplied by DryMultiplier; at DryBase = 2.5 the same match yields 2.5^5 ≈ 98.

DryBase = 1.5 — gentle growth; long repeats punished but not extremely.
DryBase = 1.75 (default) — balanced; the value recommended by the original DRY paper.
DryBase = 2.0+ — rapid growth; even medium-length repeats are heavily penalized.

When to change it

Scenario	Value
Default	`1.75`
Gentle phrase anti-repetition	`1.5`
Aggressive (model keeps finding ways to repeat)	`2.0` – `2.5`

Most workloads leave DryBase alone and tune DryMultiplier and DryAllowedLength. Change DryBase as a last resort if the standard combination does not stop repeats.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.DryMultiplier = 0.8f;
preset.SamplerParameters.DryBase = 2.0f;    // faster growth
preset.SamplerParameters.DryAllowedLength = 2;

using var api = AsposeLLMApi.Create(preset);

Interactions

DryMultiplier — must be positive for DryBase to matter.
DryAllowedLength — sets the exponent’s zero-point.
DryPenaltyLastN — window of history DRY scans.
DrySequenceBreakers — tokens that reset match detection.

What’s next

DryMultiplier — enables DRY.
DryAllowedLength — match-length threshold.
Sampler parameters hub — all sampler knobs.

Net: DryAllowedLength

Thu, 23 Apr 2026 00:00:00 +0000

DryAllowedLength is the minimum consecutive-token match length before the DRY penalty starts firing. Matches up to this length pass without penalty; longer matches get the exponentially growing DRY penalty.

Quick reference


Type	`int`
Default	`2`
Range	`1` and above
Category	Phrase-level repetition penalty
Field on	`SamplerParameters.DryAllowedLength`

What it does

DRY scans recent generation for consecutive-token sequences that match earlier content. Matches strictly longer than DryAllowedLength trigger the penalty, which grows by DryBase for each token beyond the threshold:

penalty_factor = DryMultiplier × DryBase^(match_length - DryAllowedLength)

DryAllowedLength = 2 (default) — any 3+ token repeat gets penalized. Catches most phrase repeats while allowing natural short patterns.
DryAllowedLength = 3 — allows short 3-token patterns to repeat unpenalized. Safer for code / formatted output.
DryAllowedLength = 5 — only long repeats are penalized. Use when the default causes awkward word choices.
DryAllowedLength = 1 — very aggressive; any 2-token match penalized.

When to change it

Scenario	Value
Default	`2`
Code or formatted output with natural repeats	`4` – `6`
Model loops even on short phrases	`1` – `2`
Balanced creative writing	`3`

DryAllowedLength is the best knob to tune when DRY starts producing strange word choices — raise it so DRY leaves short natural patterns alone.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.DryMultiplier = 0.8f;
preset.SamplerParameters.DryAllowedLength = 4;  // allow short repeats

using var api = AsposeLLMApi.Create(preset);

Interactions

DryMultiplier — must be positive for DRY to be active.
DryBase — sets how fast penalty grows beyond this length.
DrySequenceBreakers — tokens that reset the match counter.

What’s next

DryMultiplier — enables DRY.
DryBase — growth rate past this length.
Sampler parameters hub — all sampler knobs.

Net: DryPenaltyLastN

Thu, 23 Apr 2026 00:00:00 +0000

DryPenaltyLastN controls how far back DRY looks when scanning for phrase repetition. It determines the size of the generation window scanned for matches.

Quick reference


Type	`int`
Default	`0` (scan all available context)
Range	`0` = unlimited; `> 0` = window size
Category	Phrase-level repetition penalty
Field on	`SamplerParameters.DryPenaltyLastN`

What it does

DRY scans the last DryPenaltyLastN tokens of generation for consecutive-token sequences matching whatever the model is about to emit.

DryPenaltyLastN = 0 (default) — scan all available tokens; no distance limit.
DryPenaltyLastN = 512 — only the last 512 tokens are scanned for matches. Older content is ignored.
DryPenaltyLastN = 128 — very local; catches only immediate phrase repeats.

Narrowing the window speeds up DRY scanning slightly and relaxes the constraint for long conversations where older repeats are benign.

When to change it

Scenario	Value
Default (scan all)	`0`
Long conversations, focus DRY on recent output	`512`
Very local enforcement	`128`

Most deployments leave DryPenaltyLastN = 0. Raise only if DRY is catching repeats from very old conversation history that are acceptable in your domain.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.DryMultiplier = 0.8f;
preset.SamplerParameters.DryPenaltyLastN = 512;  // only scan last 512 tokens

using var api = AsposeLLMApi.Create(preset);

Interactions

DryMultiplier — gate; DryPenaltyLastN only matters when DRY is active.
Other DRY knobs operate within this window.

What’s next

DryMultiplier — enables DRY.
DrySequenceBreakers — reset tokens.
Sampler parameters hub — all sampler knobs.

Net: DrySequenceBreakers

Thu, 23 Apr 2026 00:00:00 +0000

DrySequenceBreakers is the list of string tokens that reset DRY’s phrase-match detector. When DRY encounters one of these tokens, any partial match in progress is discarded.

Quick reference


Type	`List<string>`
Default	`["\n", ":", "\"", "*"]`
Category	Phrase-level repetition penalty
Field on	`SamplerParameters.DrySequenceBreakers`

What it does

DRY detects long matches by comparing the token stream against earlier tokens. When the detector encounters a “breaker” string — typically punctuation or structural markup — it treats that as the end of one phrase and the start of the next. This prevents DRY from penalizing tokens that naturally repeat across paragraph boundaries.

Default breakers:

"\n" — newline: paragraph boundary.
":" — colon: list-item or header boundary.
"\"" — double quote: start/end of quoted text.
"*" — asterisk: markdown emphasis or list markers.

When to change it

Scenario	Value
Default list	`["\n", ":", "\"", "*"]`
Markdown-heavy output	Add `"#"` for headings, `"-"` for bullets
Code generation	Add `"{"`, `"}"`, `";"`, `"("`, `")"`
Minimal breakers — let DRY fire more aggressively across paragraphs	`["\n"]` or empty list

Breakers shape what DRY considers a “phrase”. Narrowing the list makes DRY more global; widening it localizes enforcement.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.DryMultiplier = 0.8f;
preset.SamplerParameters.DrySequenceBreakers = new List<string>
{
    "\n", ":", "\"", "*", "#", "-", // markdown-heavy content
};

using var api = AsposeLLMApi.Create(preset);

Interactions

DryMultiplier — DRY gate.
DryAllowedLength — match length threshold within a phrase boundary.

What’s next

DryMultiplier — enables DRY.
DryAllowedLength — length threshold.
Sampler parameters hub — all sampler knobs.

Net: Mirostat

Thu, 23 Apr 2026 00:00:00 +0000

Mirostat is an adaptive sampler that targets a specific output entropy (roughly perplexity) instead of relying on fixed temperature and nucleus filters. When enabled, it bypasses Temperature, TopP, TopK, and MinP.

Quick reference


Type	`int`
Default	`0` (disabled)
Range	`0` = disabled, `1` = Mirostat 1.0, `2` = Mirostat 2.0
Category	Adaptive sampler
Field on	`SamplerParameters.Mirostat`

What it does

Mirostat monitors the entropy of the output distribution and adjusts the sampling process to match a target entropy (MirostatTau). Higher measured entropy → Mirostat tightens. Lower → Mirostat relaxes.

Mirostat = 0 (default) — disabled. Standard Temperature + TopP + TopK + MinP pipeline is used.
Mirostat = 1 — Mirostat 1.0. Original algorithm from the paper.
Mirostat = 2 — Mirostat 2.0. Simplified and usually preferred; faster convergence.

When Mirostat is enabled, the standard filters (TopP, TopK, MinP, TypicalP, TopNSigma) are effectively bypassed. Temperature tuning is ignored — Mirostat manages its own temperature-like adjustments internally.

When to change it

Scenario	Value
Default — use standard filters	`0`
Adaptive perplexity-targeting (preferred)	`2`
Older Mirostat 1.0 (rarely preferred)	`1`

Consider Mirostat when:

You want consistent perplexity across very long outputs.
Fixed temperature produces output that is either too tight or too loose in different parts of the same generation.
Research / experimentation with entropy-aware sampling.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.Mirostat = 2;           // Mirostat 2.0
preset.SamplerParameters.MirostatTau = 5.0f;     // target entropy
preset.SamplerParameters.MirostatEta = 0.1f;     // learning rate

using var api = AsposeLLMApi.Create(preset);

Interactions

MirostatTau — target entropy.
MirostatEta — learning rate.
Temperature, TopP, TopK, MinP, TypicalP, TopNSigma — all bypassed when Mirostat is active.
DynatempRange — alternative entropy-aware sampler; do not combine.
Seed — still affects the RNG; reproducibility works with Mirostat.

What’s next

MirostatTau — entropy target.
MirostatEta — learning rate.
DynatempRange — alternative adaptive approach.

Net: MirostatTau

Thu, 23 Apr 2026 00:00:00 +0000

MirostatTau is the target entropy Mirostat tries to maintain across the output. Mirostat adjusts its sampling parameters on the fly to keep the actual output entropy close to this value.

Quick reference


Type	`float`
Default	`5.0`
Range	`> 0`; typical `3.0` – `8.0`
Category	Adaptive sampler
Field on	`SamplerParameters.MirostatTau`

What it does

Entropy is measured in nats (natural-log units) over the per-step token distribution. Lower target entropy → tighter, more deterministic output. Higher target → more variety.

MirostatTau = 3.0 — tight. Output is focused, often close to greedy.
MirostatTau = 5.0 (default) — balanced. Good match for general chat.
MirostatTau = 7.0+ — looser. Output has more variety, higher perplexity.

MirostatTau only matters when Mirostat is enabled (1 or 2). Otherwise the field is ignored.

When to change it

Scenario	Value
Default	`5.0`
Precise / code / factual output	`3.0` – `4.0`
Creative writing	`6.0` – `8.0`

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.Mirostat = 2;
preset.SamplerParameters.MirostatTau = 4.0f;  // tighter than default
preset.SamplerParameters.MirostatEta = 0.1f;

using var api = AsposeLLMApi.Create(preset);

Interactions

Mirostat — must be 1 or 2 for MirostatTau to take effect.
MirostatEta — how fast Mirostat adapts toward Tau.

What’s next

Mirostat — main mode selector.
MirostatEta — learning rate.
Sampler parameters hub — all sampler knobs.

Net: MirostatEta

Thu, 23 Apr 2026 00:00:00 +0000

MirostatEta is the learning rate of Mirostat’s adaptive loop. It controls how quickly Mirostat reacts to divergence between observed entropy and the target entropy MirostatTau.

Quick reference


Type	`float`
Default	`0.1`
Range	`> 0`; typical `0.05` – `0.3`
Category	Adaptive sampler
Field on	`SamplerParameters.MirostatEta`

What it does

After each token, Mirostat computes the difference between the observed entropy and MirostatTau. It scales that difference by MirostatEta and adjusts its internal threshold accordingly.

MirostatEta = 0.05 — slow adaptation. Smoother behavior; takes longer to settle.
MirostatEta = 0.1 (default) — balanced.
MirostatEta = 0.3 — fast adaptation. Tighter tracking of Tau but noisier.

MirostatEta only matters when Mirostat is enabled.

When to change it

Scenario	Value
Default	`0.1`
Smoother, slower adaptation	`0.05`
Fast tracking, aggressive correction	`0.2` – `0.3`

Most users leave MirostatEta = 0.1. Adjust only when Mirostat’s entropy wanders too far from Tau or oscillates too much.

Example

var preset = new Qwen25Preset();
preset.SamplerParameters.Mirostat = 2;
preset.SamplerParameters.MirostatTau = 5.0f;
preset.SamplerParameters.MirostatEta = 0.05f;  // slow, smooth adaptation

using var api = AsposeLLMApi.Create(preset);

Interactions

Mirostat — must be 1 or 2.
MirostatTau — the target MirostatEta adapts toward.

What’s next

Mirostat — mode selector.
MirostatTau — entropy target.
Sampler parameters hub — all sampler knobs.

Net: LogitBias

Thu, 23 Apr 2026 00:00:00 +0000

LogitBias lets you add a per-token offset to the model’s logits before sampling. Positive values favor a token; large negative values effectively ban it.

Quick reference


Type	`Dictionary<int, float>` (token ID → bias)
Default	Empty dictionary (no biases)
Range	Typical `-100.0` to `+10.0`
Category	Fine-grained control
Field on	`SamplerParameters.LogitBias`

What it does

Before any filter runs, for each token ID present in LogitBias, the engine adds the associated float to that token’s logit.

LogitBias[1234] = +5.0 — the token with ID 1234 gets a +5 boost to its logit. Strongly favors it.
LogitBias[5678] = -100.0 — the token at ID 5678 gets a massive negative bias. Effectively banned.
LogitBias[42] = -2.0 — token 42 is discouraged but still available.

Bias is applied once per step regardless of token occurrence history. It does not interact with the penalty window.

When to change it

Scenario	Example
Ban profanity / slurs token IDs	Set to `-100.0`
Force specific token (end-of-message, for instance)	Set to `+5.0`
Soft discouragement without ban	Set to `-2.0`

Obtaining token IDs requires the model’s tokenizer — not exposed in this parameter bag in the current SDK version. If you need logit-bias control, collect token IDs by tokenizing your target strings externally (for example, with the llama.cpp tokenize tool against the same model).

Example

var preset = new Qwen25Preset();

// Soft discourage token ID 1234; strong discourage 5678.
preset.SamplerParameters.LogitBias[1234] = -2.0f;
preset.SamplerParameters.LogitBias[5678] = -100.0f;

using var api = AsposeLLMApi.Create(preset);

Interactions

Temperature, other filters — run after LogitBias is applied.
Mirostat — LogitBias still applies; Mirostat adapts around the biased distribution.

What’s next

Sampler parameters hub — all sampler knobs.
EnableInfill — specialized sampling mode.

Net: EnableInfill

Thu, 23 Apr 2026 00:00:00 +0000

EnableInfill switches the sampler to the INFILL variant used for fill-in-the-middle code completion. Only useful for models trained for FIM (code-completion-oriented) workflows.

Quick reference


Type	`bool`
Default	`false`
Category	Specialized sampling
Field on	`SamplerParameters.EnableInfill`

What it does

When true, the engine uses a specialized sampler variant tuned for completing a gap between a prefix and a suffix. This is the FIM (Fill-in-the-Middle) pattern used by code models trained with FIM tokens (for example, some DeepSeek-Coder or StarCoder derivatives).

false (default) — standard chat sampler. Correct for all text presets and vision presets.
true — INFILL sampler. Only enables when the model supports FIM.

When to change it

Scenario	Value
Standard chat, reasoning, or generation	`false`
Code completion with FIM-trained model	`true`

Most users will leave this false. Enable only when you explicitly use a FIM-trained code model and understand the prefix/suffix prompt format that model expects.

Example

var preset = new DeepSeekCoder2Preset();
preset.SamplerParameters.EnableInfill = true;

using var api = AsposeLLMApi.Create(preset);
// Use a prompt that follows the FIM template for your specific model.

Interactions

Independent of other sampler knobs. EnableInfill toggles which sampler implementation runs; all filter knobs still apply within the chosen sampler.

What’s next

Sampler parameters hub — all sampler knobs.
Custom preset — patterns for building presets around specialized models.