<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Documentation – Parameters</title>
    <link>https://docs.aspose.com/llm/net/developer-reference/parameters/</link>
    <description>Recent content in Parameters on Documentation</description>
    <generator>Hugo -- gohugo.io</generator>
    <lastBuildDate>Thu, 23 Apr 2026 00:00:00 +0000</lastBuildDate>
    
	  <atom:link href="https://docs.aspose.com/llm/net/developer-reference/parameters/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Net: Model source parameters</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/parameters/model-source/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/parameters/model-source/</guid>
      <description>
        
        
        &lt;p&gt;&lt;code&gt;ModelSourceParameters&lt;/code&gt; tells the engine where to find the model file. The same class type is used in two places on every preset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;BaseModelSourceParameters&lt;/code&gt;&lt;/strong&gt; — the main language model (text or vision).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;MmprojSourceParameters&lt;/code&gt;&lt;/strong&gt; — the vision projector (&lt;code&gt;mmproj&lt;/code&gt;) for vision presets. Ignored for text-only presets.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;class-reference&#34;&gt;Class reference&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;namespace&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Parameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;ModelSourceParameters&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ModelFilePath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;          &lt;span class=&#34;c1&#34;&gt;// priority 1 — local path
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeModelId&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;          &lt;span class=&#34;c1&#34;&gt;// priority 2 — Aspose catalog
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;HuggingFaceRepoId&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;      &lt;span class=&#34;c1&#34;&gt;// priority 3 — HF repo ID
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;HuggingFaceFileName&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;    &lt;span class=&#34;c1&#34;&gt;//             HF file within the repo
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;All four properties are nullable. Leave a property &lt;code&gt;null&lt;/code&gt; to defer to the next priority source.&lt;/p&gt;
&lt;h2 id=&#34;detailed-field-reference&#34;&gt;Detailed field reference&lt;/h2&gt;
&lt;p&gt;Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-source/model-file-path/&#34;&gt;ModelFilePath&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-source/aspose-model-id/&#34;&gt;AsposeModelId&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-source/hugging-face-repo-id/&#34;&gt;HuggingFaceRepoId&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-source/hugging-face-file-name/&#34;&gt;HuggingFaceFileName&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;resolution-order&#34;&gt;Resolution order&lt;/h2&gt;
&lt;p&gt;At &lt;code&gt;AsposeLLMApi.Create&lt;/code&gt;, the engine walks the sources in priority order and uses the first one that resolves.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;Field(s)&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ModelFilePath&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;If set to a non-null non-empty path, load directly from disk. No download.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AsposeModelId&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;If set, download the named model from Aspose&amp;rsquo;s internal catalog into the cache.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;HuggingFaceRepoId&lt;/code&gt; + &lt;code&gt;HuggingFaceFileName&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;If both set, download the named file from the Hugging Face repo into the cache.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The engine throws an exception if none of the three resolve.&lt;/p&gt;
&lt;h2 id=&#34;where-models-are-cached&#34;&gt;Where models are cached&lt;/h2&gt;
&lt;p&gt;Downloaded models live in &lt;code&gt;EngineParameters.ModelCachePath&lt;/code&gt;. The default on Windows is &lt;code&gt;%LOCALAPPDATA%\Aspose.LLM\models&lt;/code&gt;; on Linux and macOS it is the equivalent &lt;code&gt;LocalApplicationData&lt;/code&gt; folder. Change the cache location per preset on &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/engine/&#34;&gt;&lt;code&gt;EngineParameters.ModelCachePath&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When a model has already been downloaded, subsequent runs load from the cache. Delete the cache to force a re-download.&lt;/p&gt;
&lt;h2 id=&#34;typical-recipes&#34;&gt;Typical recipes&lt;/h2&gt;
&lt;h3 id=&#34;use-a-built-in-presets-model-source&#34;&gt;Use a built-in preset&amp;rsquo;s model source&lt;/h3&gt;
&lt;p&gt;Built-in presets set &lt;code&gt;HuggingFaceRepoId&lt;/code&gt; and &lt;code&gt;HuggingFaceFileName&lt;/code&gt; on &lt;code&gt;BaseModelSourceParameters&lt;/code&gt; in their constructor — no changes needed:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// BaseModelSourceParameters.HuggingFaceRepoId = &amp;#34;bartowski/Qwen2.5-7B-Instruct-GGUF&amp;#34;
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// BaseModelSourceParameters.HuggingFaceFileName = &amp;#34;Qwen2.5-7B-Instruct-Q4_K_M.gguf&amp;#34;
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;See &lt;a href=&#34;https://docs.aspose.com/llm/net/product-overview/supported-presets/&#34;&gt;Supported presets&lt;/a&gt; for the exact values per preset.&lt;/p&gt;
&lt;h3 id=&#34;load-a-model-from-a-local-file&#34;&gt;Load a model from a local file&lt;/h3&gt;
&lt;p&gt;Override &lt;code&gt;ModelFilePath&lt;/code&gt; to skip the download entirely. Useful for air-gapped environments or when you already have the GGUF on disk.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelSourceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ModelFilePath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@&amp;#34;C:\models\Qwen2.5-7B-Instruct-Q4_K_M.gguf&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// ModelFilePath wins over the preset&amp;#39;s Hugging Face values.
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;switch-to-a-different-hugging-face-quantization&#34;&gt;Switch to a different Hugging Face quantization&lt;/h3&gt;
&lt;p&gt;Keep the preset but change the file name to a different quantization available in the same repo:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelSourceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;HuggingFaceFileName&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;Qwen2.5-7B-Instruct-Q8_0.gguf&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Q8_0 is larger than Q4_K_M (about 2× the file size) but retains more of the original model&amp;rsquo;s quality.&lt;/p&gt;
&lt;h3 id=&#34;bring-a-custom-model-from-hugging-face&#34;&gt;Bring a custom model from Hugging Face&lt;/h3&gt;
&lt;p&gt;For a model that does not have a built-in preset, create a custom preset and set &lt;code&gt;BaseModelSourceParameters&lt;/code&gt; from scratch:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;MyCustomPreset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;PresetCoreBase&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MyCustomPreset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
        &lt;span class=&#34;n&#34;&gt;BaseModelSourceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;HuggingFaceRepoId&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;your-org/your-model-GGUF&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
        &lt;span class=&#34;n&#34;&gt;BaseModelSourceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;HuggingFaceFileName&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;your-model.Q4_K_M.gguf&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;See &lt;a href=&#34;https://docs.aspose.com/llm/net/use-cases/custom-preset/&#34;&gt;Custom preset&lt;/a&gt; for the full pattern.&lt;/p&gt;
&lt;h3 id=&#34;configure-a-vision-projector&#34;&gt;Configure a vision projector&lt;/h3&gt;
&lt;p&gt;Vision presets also set &lt;code&gt;MmprojSourceParameters&lt;/code&gt; — the projector follows the same resolution rules as the base model:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen3VL2BPreset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// BaseModelSourceParameters.HuggingFaceRepoId     = &amp;#34;Qwen/Qwen3-VL-2B-Instruct-GGUF&amp;#34;
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// BaseModelSourceParameters.HuggingFaceFileName   = &amp;#34;Qwen3VL-2B-Instruct-Q4_K_M.gguf&amp;#34;
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// MmprojSourceParameters.HuggingFaceRepoId        = &amp;#34;Qwen/Qwen3-VL-2B-Instruct-GGUF&amp;#34;
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// MmprojSourceParameters.HuggingFaceFileName      = &amp;#34;mmproj-Qwen3VL-2B-Instruct-Q8_0.gguf&amp;#34;
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// Override the projector to a local file:
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MmprojSourceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ModelFilePath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@&amp;#34;C:\models\custom-mmproj.gguf&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MmprojSourceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;HuggingFaceRepoId&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;null&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MmprojSourceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;HuggingFaceFileName&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;null&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Leaving the Hugging Face fields on an override is harmless — &lt;code&gt;ModelFilePath&lt;/code&gt; wins — but clearing them keeps the intent obvious.&lt;/p&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/engine/&#34;&gt;Engine parameters&lt;/a&gt; — change the cache location.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/&#34;&gt;Binary manager parameters&lt;/a&gt; — pre-populate native binaries for offline scenarios.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/use-cases/custom-preset/&#34;&gt;Custom preset&lt;/a&gt; — full preset-customization patterns.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Model inference parameters</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/</guid>
      <description>
        
        
        &lt;p&gt;&lt;code&gt;ModelInferenceParameters&lt;/code&gt; controls how the engine loads a model into memory: how many layers to offload to GPU, how to split across multiple GPUs, whether to use memory mapping, and how to override GGUF metadata at runtime.&lt;/p&gt;
&lt;p&gt;Most fields are nullable — a &lt;code&gt;null&lt;/code&gt; value means &amp;ldquo;use the native default&amp;rdquo;. Set an explicit value only when you need to override.&lt;/p&gt;
&lt;h2 id=&#34;class-reference&#34;&gt;Class reference&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;namespace&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Parameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;ModelInferenceParameters&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;GpuLayers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;UseMemoryMapping&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;UseMemoryLocking&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MainGpu&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LlamaSplitMode&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;SplitMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;VocabOnly&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;CheckTensors&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;UseExtraBuffers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TensorSplit&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ModelKeyValueOverride&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;KvOverrides&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;detailed-field-reference&#34;&gt;Detailed field reference&lt;/h2&gt;
&lt;p&gt;Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions. The rest of this page is an inline overview of the same content; follow the links for the deeper treatment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Load knobs&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/gpu-layers/&#34;&gt;GpuLayers&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/use-memory-mapping/&#34;&gt;UseMemoryMapping&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/use-memory-locking/&#34;&gt;UseMemoryLocking&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/main-gpu/&#34;&gt;MainGpu&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/split-mode/&#34;&gt;SplitMode&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Other knobs&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/vocab-only/&#34;&gt;VocabOnly&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/check-tensors/&#34;&gt;CheckTensors&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/use-extra-buffers/&#34;&gt;UseExtraBuffers&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/tensor-split/&#34;&gt;TensorSplit&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/kv-overrides/&#34;&gt;KvOverrides&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;fields&#34;&gt;Fields&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;GpuLayers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;int?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default&lt;/td&gt;
&lt;td&gt;Number of model layers to offload to GPU. &lt;code&gt;0&lt;/code&gt; = CPU only, &lt;code&gt;999&lt;/code&gt; = full offload.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;UseMemoryMapping&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bool?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default (&lt;code&gt;true&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Map the GGUF file instead of reading it in. Reduces startup time and memory copying.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;UseMemoryLocking&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bool?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default (&lt;code&gt;false&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Lock model memory to prevent OS paging. Needs &lt;code&gt;mlock&lt;/code&gt; / &lt;code&gt;VirtualLock&lt;/code&gt; privileges.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MainGpu&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;int?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Index of the GPU used when &lt;code&gt;SplitMode&lt;/code&gt; is &lt;code&gt;None&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SplitMode&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;LlamaSplitMode?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default&lt;/td&gt;
&lt;td&gt;How to split the model across multiple GPUs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;VocabOnly&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bool?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default (&lt;code&gt;false&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Load only the vocabulary without weights. Used for tokenizer-only scenarios.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CheckTensors&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bool?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default (&lt;code&gt;false&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Validate tensor data on load. Adds startup time; helpful for diagnosing corrupted GGUF files.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;UseExtraBuffers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bool?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default&lt;/td&gt;
&lt;td&gt;Use extra buffer types for weight repacking. Advanced.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TensorSplit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;float[]?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;equal split&lt;/td&gt;
&lt;td&gt;Proportions per GPU when &lt;code&gt;SplitMode&lt;/code&gt; splits across devices.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;KvOverrides&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ModelKeyValueOverride[]?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;Runtime overrides for GGUF metadata keys.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;gpulayers&#34;&gt;&lt;code&gt;GpuLayers&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Controls GPU offload. Each transformer layer lives either in system RAM (CPU inference) or GPU VRAM (GPU inference). Partial offload is supported: you can put the first N layers on the GPU and keep the rest on the CPU.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CPU only. No GPU memory allocated for model weights.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A layer count&lt;/td&gt;
&lt;td&gt;Offload the first N layers to GPU, keep the remaining on CPU.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;999&lt;/code&gt; (or any value ≥ the model&amp;rsquo;s layer count)&lt;/td&gt;
&lt;td&gt;Full GPU offload. Idiomatic way to request &amp;ldquo;put everything on the GPU&amp;rdquo;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;null&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Use the native default.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Pair with a GPU-capable &lt;code&gt;BinaryManagerParameters.PreferredAcceleration&lt;/code&gt; — setting &lt;code&gt;GpuLayers = 999&lt;/code&gt; on a CPU-only binary silently keeps the model on the CPU.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;GpuLayers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;999&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryManagerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PreferredAcceleration&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AccelerationType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CUDA&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;usememorymapping&#34;&gt;&lt;code&gt;UseMemoryMapping&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;true&lt;/code&gt; (default on most platforms), the engine memory-maps the GGUF file so the OS streams it in on demand. This reduces startup time and avoids copying the full model into RAM before inference.&lt;/p&gt;
&lt;p&gt;Set to &lt;code&gt;false&lt;/code&gt; only for rare scenarios — network file systems that do not support &lt;code&gt;mmap&lt;/code&gt;, or environments where you want the file fully loaded before the first token. Disabling &lt;code&gt;mmap&lt;/code&gt; doubles peak memory during load (the read buffer plus the mapped copy).&lt;/p&gt;
&lt;h3 id=&#34;usememorylocking&#34;&gt;&lt;code&gt;UseMemoryLocking&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;true&lt;/code&gt;, the engine calls &lt;code&gt;mlock&lt;/code&gt; (Linux/macOS) or &lt;code&gt;VirtualLock&lt;/code&gt; (Windows) on the model&amp;rsquo;s memory so the OS does not page it out. Requires elevated privileges or raised ulimits.&lt;/p&gt;
&lt;p&gt;Leave &lt;code&gt;null&lt;/code&gt; or &lt;code&gt;false&lt;/code&gt; in most deployments. Enable only when the model is swapped out under memory pressure and you have the system tuning to support locking.&lt;/p&gt;
&lt;h3 id=&#34;maingpu-and-splitmode&#34;&gt;&lt;code&gt;MainGpu&lt;/code&gt; and &lt;code&gt;SplitMode&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;On multi-GPU hosts, &lt;code&gt;SplitMode&lt;/code&gt; decides how to place the model and &lt;code&gt;MainGpu&lt;/code&gt; selects the primary device when the mode is not splitting.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;SplitMode&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LLAMA_SPLIT_MODE_NONE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Single GPU. Whole model on device &lt;code&gt;MainGpu&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LLAMA_SPLIT_MODE_LAYER&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Split layers across GPUs. KV cache follows layers.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LLAMA_SPLIT_MODE_ROW&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Split layers and rows across GPUs. Uses tensor parallelism where supported.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Parameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;c1&#34;&gt;// Single GPU, use device 1:
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SplitMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LlamaSplitMode&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LLAMA_SPLIT_MODE_NONE&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MainGpu&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;c1&#34;&gt;// Split across all GPUs, layer mode:
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SplitMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LlamaSplitMode&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LLAMA_SPLIT_MODE_LAYER&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TensorSplit&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;null&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// equal distribution
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;tensorsplit&#34;&gt;&lt;code&gt;TensorSplit&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Proportion of the model placed on each GPU. The array length should match the number of available GPUs.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;null&lt;/code&gt; — equal distribution.&lt;/li&gt;
&lt;li&gt;Explicit array — values are normalized to sum to 1. For example, &lt;code&gt;[2, 1]&lt;/code&gt; on two GPUs places 67 % on GPU 0 and 33 % on GPU 1.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Useful when GPUs have different memory sizes (for example, a 24 GB card paired with a 12 GB card):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SplitMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LlamaSplitMode&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LLAMA_SPLIT_MODE_LAYER&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TensorSplit&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1.0f&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;vocabonly&#34;&gt;&lt;code&gt;VocabOnly&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;true&lt;/code&gt;, the engine loads only the model&amp;rsquo;s vocabulary, skipping the weights. The model is not usable for generation in this state — it is a tokenizer-only configuration for rare tooling scenarios.&lt;/p&gt;
&lt;p&gt;Leave &lt;code&gt;null&lt;/code&gt; (or &lt;code&gt;false&lt;/code&gt;) for any normal inference use.&lt;/p&gt;
&lt;h3 id=&#34;checktensors&#34;&gt;&lt;code&gt;CheckTensors&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;true&lt;/code&gt;, the engine validates every tensor&amp;rsquo;s data during load. Adds significant startup time but catches corrupted GGUF files early. Useful when testing a new download or a model from an untrusted source; leave &lt;code&gt;null&lt;/code&gt; in production.&lt;/p&gt;
&lt;h3 id=&#34;useextrabuffers&#34;&gt;&lt;code&gt;UseExtraBuffers&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Enables additional buffer types used by weight repacking paths in &lt;code&gt;llama.cpp&lt;/code&gt;. Advanced; most users should leave it &lt;code&gt;null&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;kvoverrides&#34;&gt;&lt;code&gt;KvOverrides&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Overrides specific keys in the GGUF metadata at load time. Each override targets a single metadata key and provides the new typed value.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;KvOverrides&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ModelKeyValueOverride&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
        &lt;span class=&#34;n&#34;&gt;Key&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;llama.rope.scaling.type&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
        &lt;span class=&#34;n&#34;&gt;Type&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ModelKvOverrideType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;String&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
        &lt;span class=&#34;n&#34;&gt;StringValue&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;yarn&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;},&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ModelKeyValueOverride&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
        &lt;span class=&#34;n&#34;&gt;Key&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;llama.context_length&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
        &lt;span class=&#34;n&#34;&gt;Type&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ModelKvOverrideType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Int&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
        &lt;span class=&#34;n&#34;&gt;IntValue&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;131072&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;},&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;ModelKeyValueOverride&lt;/code&gt; class carries one of four typed values depending on &lt;code&gt;Type&lt;/code&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;Type&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Value field&lt;/th&gt;
&lt;th&gt;Example keys&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Int&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;IntValue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;llama.context_length&lt;/code&gt;, &lt;code&gt;llama.embedding_length&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Float&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FloatValue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;llama.rope.freq_base&lt;/code&gt;, &lt;code&gt;llama.rope.scaling.factor&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Bool&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;BoolValue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Model-specific boolean flags&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;String&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;StringValue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;general.architecture&lt;/code&gt;, &lt;code&gt;llama.rope.scaling.type&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Use overrides with care — wrong metadata makes the model load incorrectly or silently produce garbage.&lt;/p&gt;
&lt;h2 id=&#34;typical-recipes&#34;&gt;Typical recipes&lt;/h2&gt;
&lt;h3 id=&#34;cpu-only-inference&#34;&gt;CPU-only inference&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;GpuLayers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;full-gpu-offload&#34;&gt;Full GPU offload&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;GpuLayers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;999&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryManagerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PreferredAcceleration&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AccelerationType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CUDA&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;partial-offload-on-a-memory-constrained-gpu&#34;&gt;Partial offload on a memory-constrained GPU&lt;/h3&gt;
&lt;p&gt;A 12 GB GPU cannot fit a full 8B Q4_K_M model plus KV cache. Offload the first 28 layers and keep the rest on the CPU:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen3Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// 8B, 32 layers
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;GpuLayers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;28&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;UseMemoryMapping&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Benchmark to find the right split for your hardware — &amp;ldquo;offload until VRAM is ~1-2 GB short of full&amp;rdquo;.&lt;/p&gt;
&lt;h3 id=&#34;two-unequal-gpus&#34;&gt;Two unequal GPUs&lt;/h3&gt;
&lt;p&gt;A 24 GB GPU paired with a 12 GB GPU:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SplitMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LlamaSplitMode&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LLAMA_SPLIT_MODE_LAYER&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TensorSplit&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1.0f&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;GpuLayers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;999&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;validate-a-suspect-gguf&#34;&gt;Validate a suspect GGUF&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CheckTensors&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// After a clean load, switch back to the default for production.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/&#34;&gt;Binary manager parameters&lt;/a&gt; — pair with &lt;code&gt;PreferredAcceleration&lt;/code&gt; to select the right native binary.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/&#34;&gt;Context parameters&lt;/a&gt; — KV cache configuration and batch sizes that interact with GPU offload.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/system-requirements/&#34;&gt;System requirements&lt;/a&gt; — GPU backends and their driver / runtime requirements.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Context parameters</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/parameters/context/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/parameters/context/</guid>
      <description>
        
        
        &lt;p&gt;&lt;code&gt;ContextParameters&lt;/code&gt; mirrors &lt;code&gt;llama_context_params&lt;/code&gt; in &lt;code&gt;llama.cpp&lt;/code&gt;. It controls the shape of the runtime context — how many tokens the model can attend to, how batching is sized, how threads are split, how RoPE scaling stretches the context window, how flash attention is used, and how the KV cache is stored.&lt;/p&gt;
&lt;p&gt;This is the largest parameter bag. Most fields are nullable; &lt;code&gt;null&lt;/code&gt; means &amp;ldquo;use the native default&amp;rdquo; or &amp;ldquo;derive from the model&amp;rdquo;. Touch these values only when you know what you are changing.&lt;/p&gt;
&lt;h2 id=&#34;class-reference&#34;&gt;Class reference&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;namespace&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Models&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;partial&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;ContextParameters&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;// Context size and batching
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ContextSize&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;uint?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;NBatch&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;uint?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;NUbatch&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;uint?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;NSeqMax&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// Threading
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;NThreads&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;NThreadsBatch&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// RoPE scaling
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;RopeScalingType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;RopeScalingType&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;RopeFreqBase&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;RopeFreqScale&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// YaRN
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;YarnExtFactor&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;YarnAttnFactor&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;YarnBetaFast&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;YarnBetaSlow&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;uint?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;YarnOrigCtx&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// Attention
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AttentionType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AttentionType&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FlashAttentionType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FlashAttentionMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FlashAttention&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// legacy
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;// Pooling and embeddings
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;PoolingType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;PoolingType&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Embeddings&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// KV cache
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;GgmlType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TypeK&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;GgmlType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TypeV&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;OffloadKqv&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DefragThreshold&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;SwaFull&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;KvUnified&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// Other
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;OpOffload&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;NoPerf&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;detailed-field-reference&#34;&gt;Detailed field reference&lt;/h2&gt;
&lt;p&gt;Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions. The rest of this page is an inline overview of the same content; follow the links for the deeper treatment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context size and batching&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/context-size/&#34;&gt;ContextSize&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/n-batch/&#34;&gt;NBatch&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/n-ubatch/&#34;&gt;NUbatch&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/n-seq-max/&#34;&gt;NSeqMax&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Threading&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/n-threads/&#34;&gt;NThreads&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/n-threads-batch/&#34;&gt;NThreadsBatch&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RoPE and YaRN&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/rope-scaling-type/&#34;&gt;RopeScalingType&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/rope-freq-base/&#34;&gt;RopeFreqBase&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/rope-freq-scale/&#34;&gt;RopeFreqScale&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/yarn-ext-factor/&#34;&gt;YarnExtFactor&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/yarn-attn-factor/&#34;&gt;YarnAttnFactor&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/yarn-beta-fast/&#34;&gt;YarnBetaFast&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/yarn-beta-slow/&#34;&gt;YarnBetaSlow&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/yarn-orig-ctx/&#34;&gt;YarnOrigCtx&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Attention&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/attention-type/&#34;&gt;AttentionType&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/flash-attention-mode/&#34;&gt;FlashAttentionMode&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/flash-attention/&#34;&gt;FlashAttention&lt;/a&gt; (legacy).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pooling and embeddings&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/pooling-type/&#34;&gt;PoolingType&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/embeddings/&#34;&gt;Embeddings&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KV cache&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/type-k/&#34;&gt;TypeK&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/type-v/&#34;&gt;TypeV&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/offload-kqv/&#34;&gt;OffloadKqv&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/defrag-threshold/&#34;&gt;DefragThreshold&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/swa-full/&#34;&gt;SwaFull&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/kv-unified/&#34;&gt;KvUnified&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Other&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/op-offload/&#34;&gt;OpOffload&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/no-perf/&#34;&gt;NoPerf&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;context-size-and-batching&#34;&gt;Context size and batching&lt;/h2&gt;
&lt;h3 id=&#34;contextsize&#34;&gt;&lt;code&gt;ContextSize&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Length of the context window in tokens — the maximum number of tokens the model sees at once. Set to &lt;code&gt;null&lt;/code&gt; (or &lt;code&gt;0&lt;/code&gt;) to use the model&amp;rsquo;s maximum from its GGUF metadata.&lt;/p&gt;
&lt;p&gt;Built-in presets pre-set this: &lt;code&gt;Qwen25Preset&lt;/code&gt; uses 32 768, &lt;code&gt;Llama32Preset&lt;/code&gt; uses 131 072, &lt;code&gt;Oss20Preset&lt;/code&gt; uses 131 072. See &lt;a href=&#34;https://docs.aspose.com/llm/net/product-overview/supported-presets/&#34;&gt;Supported presets&lt;/a&gt; for each default.&lt;/p&gt;
&lt;p&gt;Trade-off: larger context allows longer conversations and documents, but the KV cache size scales with &lt;code&gt;ContextSize × model-depth&lt;/code&gt;. Going from 32K to 131K quadruples KV memory.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextSize&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;8192&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// save memory for short conversations
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;nbatch&#34;&gt;&lt;code&gt;NBatch&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Logical maximum batch size — the largest number of tokens submitted in one &lt;code&gt;llama_decode&lt;/code&gt; call. Affects prompt-processing throughput.&lt;/p&gt;
&lt;p&gt;Typical values: 512 - 4096. Larger &lt;code&gt;NBatch&lt;/code&gt; speeds up prompt processing but needs more temporary memory.&lt;/p&gt;
&lt;p&gt;Built-in presets use &lt;code&gt;NBatch&lt;/code&gt; between 2 048 and 4 096 depending on model and context size.&lt;/p&gt;
&lt;h3 id=&#34;nubatch&#34;&gt;&lt;code&gt;NUbatch&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Physical maximum batch size — the largest chunk actually processed per kernel call. Normally &lt;code&gt;NUbatch ≤ NBatch&lt;/code&gt;. &lt;code&gt;NUbatch&lt;/code&gt; ≈ &lt;code&gt;NBatch&lt;/code&gt; for simplicity on most deployments; different values apply only to specific multi-sequence scenarios.&lt;/p&gt;
&lt;h3 id=&#34;nseqmax&#34;&gt;&lt;code&gt;NSeqMax&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Maximum number of distinct sequences (for recurrent models) handled in parallel. For standard transformer models, leave at &lt;code&gt;null&lt;/code&gt; or &lt;code&gt;1&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;threading&#34;&gt;Threading&lt;/h2&gt;
&lt;h3 id=&#34;nthreads&#34;&gt;&lt;code&gt;NThreads&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;CPU threads used during generation (token-by-token decode). When &lt;code&gt;null&lt;/code&gt;, the engine falls back to &lt;code&gt;EngineParameters.DefaultThreads&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Override when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Running multiple concurrent inferences per process and you want to cap each.&lt;/li&gt;
&lt;li&gt;Benchmarking to find the sweet spot for your model and CPU.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NThreads&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;8&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;nthreadsbatch&#34;&gt;&lt;code&gt;NThreadsBatch&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Threads used for batch (prompt) processing. Often different from &lt;code&gt;NThreads&lt;/code&gt; because prompt processing parallelizes better. Typical production setting: &lt;code&gt;NThreadsBatch&lt;/code&gt; ≈ &lt;code&gt;ProcessorCount&lt;/code&gt; and &lt;code&gt;NThreads&lt;/code&gt; ≈ half.&lt;/p&gt;
&lt;h2 id=&#34;rope-scaling&#34;&gt;RoPE scaling&lt;/h2&gt;
&lt;p&gt;RoPE (rotary position embedding) is how transformer models encode token positions. Scaling lets you use a larger context than the model was trained on, at some accuracy cost.&lt;/p&gt;
&lt;h3 id=&#34;ropescalingtype&#34;&gt;&lt;code&gt;RopeScalingType&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Scaling algorithm.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Unspecified&lt;/code&gt; (&lt;code&gt;-1&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Use the model default (usually what you want).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;None&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Disable RoPE scaling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Linear&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Linear interpolation scaling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Yarn&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;YaRN scaling — better quality at long contexts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LongRope&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;LongRoPE scaling for very extended contexts.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;ropefreqbase-and-ropefreqscale&#34;&gt;&lt;code&gt;RopeFreqBase&lt;/code&gt; and &lt;code&gt;RopeFreqScale&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Override the RoPE frequency base and scaling factor. &lt;code&gt;null&lt;/code&gt; or &lt;code&gt;0&lt;/code&gt; means &amp;ldquo;from model metadata&amp;rdquo;. Most users leave these alone; advanced users override when extending context beyond the trained window.&lt;/p&gt;
&lt;h2 id=&#34;yarn&#34;&gt;YaRN&lt;/h2&gt;
&lt;p&gt;YaRN is a specific RoPE-scaling recipe. All five fields activate only when &lt;code&gt;RopeScalingType = Yarn&lt;/code&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Default behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;YarnExtFactor&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Extrapolation mix factor&lt;/td&gt;
&lt;td&gt;Negative or null = from model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;YarnAttnFactor&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Magnitude scaling factor&lt;/td&gt;
&lt;td&gt;Null = from model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;YarnBetaFast&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Low correction dim&lt;/td&gt;
&lt;td&gt;Null = from model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;YarnBetaSlow&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;High correction dim&lt;/td&gt;
&lt;td&gt;Null = from model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;YarnOrigCtx&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Original training context length&lt;/td&gt;
&lt;td&gt;Null = from model&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Typical usage: the preset author (or you, for a custom preset) picks &lt;code&gt;Yarn&lt;/code&gt; and sets &lt;code&gt;YarnOrigCtx&lt;/code&gt; to the model&amp;rsquo;s training context; other YaRN fields default from the model&amp;rsquo;s metadata.&lt;/p&gt;
&lt;h2 id=&#34;attention&#34;&gt;Attention&lt;/h2&gt;
&lt;h3 id=&#34;attentiontype&#34;&gt;&lt;code&gt;AttentionType&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Causal (autoregressive) versus non-causal (bidirectional) attention.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Unspecified&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Model default.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Causal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Standard chat/inference attention.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NonCausal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bidirectional; used for embedding models.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Leave &lt;code&gt;Unspecified&lt;/code&gt; unless you know why you are changing it.&lt;/p&gt;
&lt;h3 id=&#34;flashattentionmode&#34;&gt;&lt;code&gt;FlashAttentionMode&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Flash attention is a fused-kernel optimization that reduces memory and speeds up long-context inference.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Auto&lt;/code&gt; (&lt;code&gt;-1&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Runtime picks based on support. Recommended.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Disabled&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Never use flash attention.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Enabled&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Force flash attention (fails on backends that do not support it).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;FlashAttentionMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FlashAttentionType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Enabled&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Flash attention is especially effective on long contexts (32K+). For short contexts, the speed-up is small.&lt;/p&gt;
&lt;h3 id=&#34;flashattention&#34;&gt;&lt;code&gt;FlashAttention&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Legacy boolean flag. Prefer &lt;code&gt;FlashAttentionMode&lt;/code&gt;. Kept for backwards compatibility with earlier SDK code.&lt;/p&gt;
&lt;h2 id=&#34;pooling-and-embeddings&#34;&gt;Pooling and embeddings&lt;/h2&gt;
&lt;h3 id=&#34;poolingtype&#34;&gt;&lt;code&gt;PoolingType&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Only relevant when generating embeddings (not normal chat).&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Unspecified&lt;/code&gt; / &lt;code&gt;None&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No pooling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Mean&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Average the token embeddings.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cls&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Use the CLS token&amp;rsquo;s embedding.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Last&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Use the last token&amp;rsquo;s embedding.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Rank&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rank-based pooling (experimental).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;embeddings&#34;&gt;&lt;code&gt;Embeddings&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;true&lt;/code&gt;, the engine extracts embeddings alongside logits. Combined with a suitable &lt;code&gt;PoolingType&lt;/code&gt;, this turns the model into an embedding generator.&lt;/p&gt;
&lt;p&gt;Embedding workflows are not covered by the chat API (&lt;code&gt;SendMessageAsync&lt;/code&gt;). A dedicated use case will be added in a future release.&lt;/p&gt;
&lt;h2 id=&#34;kv-cache&#34;&gt;KV cache&lt;/h2&gt;
&lt;p&gt;The KV cache stores the keys and values of every token the model has seen. Its size grows with context, and its precision affects both accuracy and memory.&lt;/p&gt;
&lt;h3 id=&#34;typek-and-typev&#34;&gt;&lt;code&gt;TypeK&lt;/code&gt; and &lt;code&gt;TypeV&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Data type for the K and V tensors in the cache. Lower precision saves memory but can degrade output quality.&lt;/p&gt;
&lt;p&gt;Common values (full enum has 30+ entries; see the &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/api-reference/&#34;&gt;API reference&lt;/a&gt; for the complete list):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th style=&#34;text-align:right&#34;&gt;Bits per value&lt;/th&gt;
&lt;th style=&#34;text-align:right&#34;&gt;Relative size&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;F32&lt;/code&gt;&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;32&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1.0×&lt;/td&gt;
&lt;td&gt;Full precision. Rarely worth the memory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;F16&lt;/code&gt;&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;16&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;0.5×&lt;/td&gt;
&lt;td&gt;Default on many builds. Good balance.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;BF16&lt;/code&gt;&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;16&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;0.5×&lt;/td&gt;
&lt;td&gt;Alternative to F16; slightly different range.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Q8_0&lt;/code&gt;&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;8&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;0.25×&lt;/td&gt;
&lt;td&gt;Very minor quality loss for substantial memory savings.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Q5_1&lt;/code&gt;&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;5&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;~0.16×&lt;/td&gt;
&lt;td&gt;More compact; quality drop noticeable on long contexts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Q4_0&lt;/code&gt;&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;4&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;~0.125×&lt;/td&gt;
&lt;td&gt;Aggressive quantization; only for memory-tight deployments.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Rule of thumb: &lt;strong&gt;quantize V more aggressively than K&lt;/strong&gt; — V is less sensitive to precision.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TypeK&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;GgmlType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;F16&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TypeV&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;GgmlType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Q8_0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;offloadkqv&#34;&gt;&lt;code&gt;OffloadKqv&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;true&lt;/code&gt;, the KQV computation (attention) and the KV cache itself live on the GPU. When &lt;code&gt;false&lt;/code&gt;, they stay on CPU even if layers are offloaded. Default varies by build; usually &lt;code&gt;true&lt;/code&gt; on GPU-enabled builds.&lt;/p&gt;
&lt;p&gt;Disable only when you are fighting for a sliver of VRAM and willing to trade throughput.&lt;/p&gt;
&lt;h3 id=&#34;defragthreshold&#34;&gt;&lt;code&gt;DefragThreshold&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Threshold (fraction of holes) above which the engine defragments the KV cache. Negative = disabled (default).&lt;/p&gt;
&lt;p&gt;Set to &lt;code&gt;0.1&lt;/code&gt; - &lt;code&gt;0.5&lt;/code&gt; for long-running services that churn sessions — keeps KV memory compact over time.&lt;/p&gt;
&lt;h3 id=&#34;swafull&#34;&gt;&lt;code&gt;SwaFull&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Relevant for models using sliding-window attention (SWA). When &lt;code&gt;true&lt;/code&gt;, stores the full SWA cache instead of the compressed form. Costs more memory, can be faster for certain workloads.&lt;/p&gt;
&lt;h3 id=&#34;kvunified&#34;&gt;&lt;code&gt;KvUnified&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;true&lt;/code&gt;, uses a unified buffer across input sequences for attention. Implementation detail; leave at the default.&lt;/p&gt;
&lt;h2 id=&#34;other&#34;&gt;Other&lt;/h2&gt;
&lt;h3 id=&#34;opoffload&#34;&gt;&lt;code&gt;OpOffload&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Offload host tensor operations to the device. Supplementary to &lt;code&gt;GpuLayers&lt;/code&gt; in &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/&#34;&gt;&lt;code&gt;ModelInferenceParameters&lt;/code&gt;&lt;/a&gt;. Leave &lt;code&gt;null&lt;/code&gt; unless you know why.&lt;/p&gt;
&lt;h3 id=&#34;noperf&#34;&gt;&lt;code&gt;NoPerf&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;true&lt;/code&gt;, the engine stops collecting performance timings. A micro-optimization for high-throughput production — shaves a small amount of overhead.&lt;/p&gt;
&lt;h2 id=&#34;typical-recipes&#34;&gt;Typical recipes&lt;/h2&gt;
&lt;h3 id=&#34;minimize-memory-on-a-tight-gpu&#34;&gt;Minimize memory on a tight GPU&lt;/h3&gt;
&lt;p&gt;Short context and aggressive KV quantization:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextSize&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;4096&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TypeK&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;GgmlType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Q8_0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TypeV&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;GgmlType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Q4_0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;FlashAttentionMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FlashAttentionType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Enabled&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;extended-context-with-yarn-scaling&#34;&gt;Extended context with YaRN scaling&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextSize&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;131072&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;RopeScalingType&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;RopeScalingType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Yarn&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;YarnOrigCtx&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;32768&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;   &lt;span class=&#34;c1&#34;&gt;// the model&amp;#39;s original training context
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;FlashAttentionMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FlashAttentionType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Enabled&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;high-throughput-production&#34;&gt;High-throughput production&lt;/h3&gt;
&lt;p&gt;Lean on flash attention and optimized KV:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;FlashAttentionMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FlashAttentionType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Enabled&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TypeK&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;GgmlType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;F16&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TypeV&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;GgmlType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Q8_0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DefragThreshold&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.3f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NoPerf&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NThreads&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Environment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ProcessorCount&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NThreadsBatch&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Environment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ProcessorCount&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;benchmark-focused-reproducible&#34;&gt;Benchmark-focused (reproducible)&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextSize&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;8192&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NBatch&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2048&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NUbatch&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2048&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NThreads&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;8&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NThreadsBatch&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;16&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;FlashAttentionMode&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FlashAttentionType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Enabled&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NoPerf&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;false&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// collect timings
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;embedding-extraction&#34;&gt;Embedding extraction&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Embeddings&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PoolingType&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;PoolingType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Mean&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;AttentionType&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AttentionType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NonCausal&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Chat APIs are not the right surface for embedding workflows — this recipe is preparation for a dedicated embeddings API that will be covered in a future release.&lt;/p&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/&#34;&gt;Model inference parameters&lt;/a&gt; — GPU layers and tensor split that complement context KV settings.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/chat/&#34;&gt;Chat parameters&lt;/a&gt; — per-session max tokens and cache cleanup strategy.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/&#34;&gt;Sampler parameters&lt;/a&gt; — how the engine picks tokens within the context.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Chat parameters</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/parameters/chat/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/parameters/chat/</guid>
      <description>
        
        
        &lt;p&gt;&lt;code&gt;ChatParameters&lt;/code&gt; holds the settings that apply to every chat session created from a preset: the system prompt, the initial conversation history, the per-response token budget, and how the KV cache is pruned when it overflows.&lt;/p&gt;
&lt;h2 id=&#34;class-reference&#34;&gt;Class reference&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;namespace&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Models&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;ChatParameters&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;SystemPrompt&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;List&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;History&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MaxTokens&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2048&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;CacheCleanupStrategy&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;CacheCleanupStrategy&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
        &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;CacheCleanupStrategy&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;RemoveOldestMessages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;enum&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;CacheCleanupStrategy&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;KeepSystemPromptOnly&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;KeepSystemPromptAndHalf&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;KeepSystemPromptAndFirstUserMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;KeepSystemPromptAndLastUserMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;RemoveOldestMessages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;detailed-field-reference&#34;&gt;Detailed field reference&lt;/h2&gt;
&lt;p&gt;Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/chat/system-prompt/&#34;&gt;SystemPrompt&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/chat/history/&#34;&gt;History&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/chat/max-tokens/&#34;&gt;MaxTokens&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/chat/cache-cleanup-strategy/&#34;&gt;CacheCleanupStrategy&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;fields&#34;&gt;Fields&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SystemPrompt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;quot;&amp;quot;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Default system prompt applied to every new session created from this preset.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;History&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;List&amp;lt;ChatMessage&amp;gt;?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;null&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Initial chat history injected into newly created sessions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MaxTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;int&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2048&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hard cap on tokens generated per single assistant response.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CacheCleanupStrategy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;enum&lt;/td&gt;
&lt;td&gt;&lt;code&gt;RemoveOldestMessages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Policy for trimming the KV cache when context fills up.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;systemprompt&#34;&gt;&lt;code&gt;SystemPrompt&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The default system prompt for sessions created from this preset. Applied at session-creation time — both when you call &lt;code&gt;StartNewChatAsync&lt;/code&gt; and when &lt;code&gt;SendMessageAsync&lt;/code&gt; creates the current session implicitly.&lt;/p&gt;
&lt;p&gt;Empty string is a valid choice for presets whose chat template does not want a system turn (some Gemma variants, for example). To disable the system turn entirely, keep this empty:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SystemPrompt&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For consistent behavior across sessions, set it in your preset subclass or before calling &lt;code&gt;AsposeLLMApi.Create&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SystemPrompt&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;
    &lt;span class=&#34;s&#34;&gt;&amp;#34;You are a precise technical assistant. Cite sources and say when you do not know.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;history&#34;&gt;&lt;code&gt;History&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Optional pre-filled conversation history. When set, new sessions start with these turns already in the KV cache — useful for few-shot priming or for restoring a context that you assembled in your application.&lt;/p&gt;
&lt;p&gt;Leave &lt;code&gt;null&lt;/code&gt; for a blank session:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;History&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;null&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Or inject turns:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Models&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;History&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;List&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CreateUserMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;What is the capital of France?&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CreateAssistantMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Paris.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The history is applied to every new session created from this preset. If you only want to prime one session, construct it and send the priming turns manually instead of setting &lt;code&gt;History&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;maxtokens&#34;&gt;&lt;code&gt;MaxTokens&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Upper bound on tokens the engine generates for a single assistant response. The default &lt;code&gt;2048&lt;/code&gt; fits most general-purpose models and prompts.&lt;/p&gt;


&lt;div class=&#34;alert alert-primary&#34; role=&#34;alert&#34;&gt;

&lt;strong&gt;Reasoning-model budget.&lt;/strong&gt; Qwen3, DeepSeek-R1, and other chain-of-thought models emit a hidden &lt;code&gt;&amp;lt;think&amp;gt;…&amp;lt;/think&amp;gt;&lt;/code&gt; block before the actual answer. That block alone routinely uses 300-500 tokens for even trivial questions. Set &lt;code&gt;MaxTokens&lt;/code&gt; to &lt;strong&gt;at least 512&lt;/strong&gt; — ideally 1024-2048 — when using such models, or the response is truncated mid-reasoning and you get no visible answer. The &lt;code&gt;/no_think&lt;/code&gt; directive (Qwen3) is unreliable across versions; raise the token budget instead.
&lt;/div&gt;

&lt;p&gt;Pick based on the task:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th style=&#34;text-align:right&#34;&gt;Recommended &lt;code&gt;MaxTokens&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Short answers, classifications&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;128 - 256&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversational replies&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;512 - 1024&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Essays, long summaries&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;2048 - 4096&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning-model output (Qwen3, DeepSeek-R1)&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1024 - 4096&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code generation&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1024 - 4096&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Raising &lt;code&gt;MaxTokens&lt;/code&gt; does not cost anything upfront — it is a cap, not an allocation. The engine generates as many tokens as needed up to this limit.&lt;/p&gt;
&lt;h3 id=&#34;cachecleanupstrategy&#34;&gt;&lt;code&gt;CacheCleanupStrategy&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Policy the engine applies when the current session&amp;rsquo;s KV cache would overflow &lt;code&gt;ContextParameters.ContextSize&lt;/code&gt;. The engine trims automatically as generation proceeds; you can also trigger explicit trimming with &lt;code&gt;AsposeLLMApi.ForceCacheCleanup(strategy)&lt;/code&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Keeps&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RemoveOldestMessages&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;System prompt + the most recent turns&lt;/td&gt;
&lt;td&gt;General-purpose. Preserves recency, drops the earliest history.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;KeepSystemPromptOnly&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;System prompt only&lt;/td&gt;
&lt;td&gt;Hard reset of session context. Useful between topics in a long-running session.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;KeepSystemPromptAndHalf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;System prompt + the second half of history&lt;/td&gt;
&lt;td&gt;Balanced recall and room for new turns.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;KeepSystemPromptAndFirstUserMessage&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;System prompt + the first user turn&lt;/td&gt;
&lt;td&gt;Recall-heavy tasks where the original ask matters (long analyses, debugging sessions).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;KeepSystemPromptAndLastUserMessage&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;System prompt + the most recent user turn&lt;/td&gt;
&lt;td&gt;Focus on the current question, drop the middle.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;See &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/chat-sessions/#manage-the-kv-cache&#34;&gt;Chat sessions — Manage the KV cache&lt;/a&gt; for the call-site details and runtime semantics.&lt;/p&gt;
&lt;h2 id=&#34;typical-recipes&#34;&gt;Typical recipes&lt;/h2&gt;
&lt;h3 id=&#34;concise-assistant-with-moderate-output&#34;&gt;Concise assistant with moderate output&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SystemPrompt&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;
    &lt;span class=&#34;s&#34;&gt;&amp;#34;You are a concise assistant. Each answer fits in two short sentences.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MaxTokens&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;256&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;long-form-technical-writer&#34;&gt;Long-form technical writer&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SystemPrompt&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;
    &lt;span class=&#34;s&#34;&gt;&amp;#34;You are a technical writer. Produce detailed, well-structured paragraphs.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MaxTokens&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;4096&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CacheCleanupStrategy&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;CacheCleanupStrategy&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;KeepSystemPromptAndFirstUserMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;reasoning-model-with-enough-budget&#34;&gt;Reasoning model with enough budget&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DeepseekR1Qwen3Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SystemPrompt&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;You are a careful reasoning assistant.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MaxTokens&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2048&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// leaves room for &amp;lt;think&amp;gt; plus final answer
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;few-shot-priming-via-history&#34;&gt;Few-shot priming via history&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Models&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;History&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;List&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CreateUserMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Translate to French: The cat sits on the mat.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CreateAssistantMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Le chat est assis sur le tapis.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CreateUserMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Translate to French: The dog runs in the park.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CreateAssistantMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Le chien court dans le parc.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// Now every new session starts with these examples.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/chat-sessions/&#34;&gt;Chat sessions&lt;/a&gt; — how the engine uses these parameters at runtime.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/&#34;&gt;Context parameters&lt;/a&gt; — the &lt;code&gt;ContextSize&lt;/code&gt; field that drives cache cleanup timing.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/use-cases/multi-turn-chat/&#34;&gt;Multi-turn chat&lt;/a&gt; — runnable example showing cache management in practice.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Sampler parameters</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/</guid>
      <description>
        
        
        &lt;p&gt;&lt;code&gt;SamplerParameters&lt;/code&gt; controls how the engine picks each next token during generation. It exposes the full sampler surface of &lt;code&gt;llama.cpp&lt;/code&gt; common parameters: the classic temperature and nucleus sampling knobs, repetition penalties, specialized filters (DRY, XTC, dynatemp, Mirostat), reproducibility controls, and fine-grained logit bias.&lt;/p&gt;
&lt;p&gt;Every built-in preset sets sensible defaults for its model family. Override specific fields before &lt;code&gt;AsposeLLMApi.Create&lt;/code&gt; to tune behavior.&lt;/p&gt;
&lt;h2 id=&#34;class-reference&#34;&gt;Class reference&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;namespace&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Parameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;SamplerParameters&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;// Core sampling
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Temperature&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.7f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TopP&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.9f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TopK&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;40&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MinP&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.05f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// Reproducibility
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Seed&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;unchecked&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;((&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;xFFFFFFFF&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// time-based
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MinKeep&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// Repetition controls
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;PenaltyContextSize&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// -1 = full context
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;RepetitionPenalty&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1.1f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;PresencePenalty&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FrequencyPenalty&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// Advanced filters
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TypicalP&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;   &lt;span class=&#34;c1&#34;&gt;// disabled if &amp;lt;= 0
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TopNSigma&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;  &lt;span class=&#34;c1&#34;&gt;// disabled if &amp;lt;= 0
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;// Dynamic temperature
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DynatempRange&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// 0 disables
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DynatempExponent&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// XTC (Exclude Top Choices)
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;XtcProbability&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// disabled if &amp;lt;= 0
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;XtcThreshold&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// DRY (Don&amp;#39;t Repeat Yourself)
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DryMultiplier&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// disabled if &amp;lt;= 0
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DryBase&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1.75f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DryAllowedLength&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DryPenaltyLastN&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;List&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DrySequenceBreakers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;\n&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;:&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;\&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;*&amp;#34;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// Mirostat
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Mirostat&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// 0=off, 1=Mirostat 1.0, 2=Mirostat 2.0
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MirostatTau&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;5.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MirostatEta&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.1f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;c1&#34;&gt;// Fine-grained controls
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Dictionary&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LogitBias&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;EnableInfill&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;false&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;detailed-field-reference&#34;&gt;Detailed field reference&lt;/h2&gt;
&lt;p&gt;Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions. The rest of this page is an inline overview of the same content; follow the links for the deeper treatment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core sampling&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/temperature/&#34;&gt;Temperature&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/top-p/&#34;&gt;TopP&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/top-k/&#34;&gt;TopK&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/min-p/&#34;&gt;MinP&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/seed/&#34;&gt;Seed&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/min-keep/&#34;&gt;MinKeep&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Repetition controls&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/penalty-context-size/&#34;&gt;PenaltyContextSize&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/repetition-penalty/&#34;&gt;RepetitionPenalty&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/presence-penalty/&#34;&gt;PresencePenalty&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/frequency-penalty/&#34;&gt;FrequencyPenalty&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Advanced filters&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/typical-p/&#34;&gt;TypicalP&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/top-n-sigma/&#34;&gt;TopNSigma&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dynamic temperature&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/dynatemp-range/&#34;&gt;DynatempRange&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/dynatemp-exponent/&#34;&gt;DynatempExponent&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;XTC (Exclude Top Choices)&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/xtc-probability/&#34;&gt;XtcProbability&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/xtc-threshold/&#34;&gt;XtcThreshold&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;DRY (Don&amp;rsquo;t Repeat Yourself)&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/dry-multiplier/&#34;&gt;DryMultiplier&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/dry-base/&#34;&gt;DryBase&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/dry-allowed-length/&#34;&gt;DryAllowedLength&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/dry-penalty-last-n/&#34;&gt;DryPenaltyLastN&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/dry-sequence-breakers/&#34;&gt;DrySequenceBreakers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mirostat&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/mirostat/&#34;&gt;Mirostat&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/mirostat-tau/&#34;&gt;MirostatTau&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/mirostat-eta/&#34;&gt;MirostatEta&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fine-grained controls&lt;/strong&gt;: &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/logit-bias/&#34;&gt;LogitBias&lt;/a&gt;, &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/sampler/enable-infill/&#34;&gt;EnableInfill&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;core-sampling&#34;&gt;Core sampling&lt;/h2&gt;
&lt;p&gt;These four fields govern the baseline sampling pipeline. Start with these; rarely do you need the advanced filters.&lt;/p&gt;
&lt;h3 id=&#34;temperature&#34;&gt;&lt;code&gt;Temperature&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;0.7&lt;/code&gt;. Scales the logits before sampling. Higher values flatten the probability distribution (more random output); lower values sharpen it (more deterministic).&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Greedy sampling — always pick the most likely token. Fully deterministic.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0.1 - 0.3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Precise, low-creativity output. Good for code, structured data, classification.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0.7&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;General-purpose balance of accuracy and variety.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0.8 - 1.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;More creative, more varied output.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;gt; 1.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;High randomness. Useful for brainstorming; risks incoherent output.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;topp&#34;&gt;&lt;code&gt;TopP&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;0.9&lt;/code&gt;. Nucleus sampling threshold. Only the smallest set of tokens whose cumulative probability exceeds &lt;code&gt;TopP&lt;/code&gt; is considered for sampling. Lower &lt;code&gt;TopP&lt;/code&gt; is more conservative.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;0.9&lt;/code&gt; (default) — balanced; a small tail of unlikely tokens is kept.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;0.7 - 0.8&lt;/code&gt; — more conservative; drops more of the tail.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1.0&lt;/code&gt; — disabled; all tokens are candidates.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;topk&#34;&gt;&lt;code&gt;TopK&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;40&lt;/code&gt;. Consider only the top &lt;code&gt;K&lt;/code&gt; most likely tokens per step. Works alongside &lt;code&gt;TopP&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;40&lt;/code&gt; (default) — reasonable upper bound for most models.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;20 - 30&lt;/code&gt; — more conservative.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;0&lt;/code&gt; or a very large number — disabled; &lt;code&gt;TopP&lt;/code&gt; alone filters.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;minp&#34;&gt;&lt;code&gt;MinP&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;0.05&lt;/code&gt;. Minimum probability relative to the top token. A token is kept only if its probability is at least &lt;code&gt;MinP × p(top)&lt;/code&gt;. Useful when the tail distribution has very low-probability tokens you never want to sample.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;0.05&lt;/code&gt; (default) — reasonable.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;0.1&lt;/code&gt; — stricter; drops more tail.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;0.0&lt;/code&gt; — disabled.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;reproducibility&#34;&gt;Reproducibility&lt;/h2&gt;
&lt;h3 id=&#34;seed&#34;&gt;&lt;code&gt;Seed&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;0xFFFFFFFF&lt;/code&gt; — a sentinel that &lt;code&gt;llama.cpp&lt;/code&gt; maps to a time-based (non-deterministic) seed. For reproducible output, set a specific integer.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Seed&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;42&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Two runs with the same seed, the same prompt, and identical parameters produce the same output.&lt;/p&gt;
&lt;h3 id=&#34;minkeep&#34;&gt;&lt;code&gt;MinKeep&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;1&lt;/code&gt;. Minimum number of candidate tokens kept after the filters (&lt;code&gt;TopK&lt;/code&gt;, &lt;code&gt;TopP&lt;/code&gt;, &lt;code&gt;MinP&lt;/code&gt;, &lt;code&gt;TypicalP&lt;/code&gt;) are applied. Guarantees the sampler always has at least &lt;code&gt;MinKeep&lt;/code&gt; tokens to choose from, preventing empty-candidate edge cases.&lt;/p&gt;
&lt;p&gt;Leave at &lt;code&gt;1&lt;/code&gt; unless you have a specific reason.&lt;/p&gt;
&lt;h2 id=&#34;repetition-controls&#34;&gt;Repetition controls&lt;/h2&gt;
&lt;p&gt;The engine penalizes tokens that appeared recently to avoid loops and verbatim repetition.&lt;/p&gt;
&lt;h3 id=&#34;penaltycontextsize&#34;&gt;&lt;code&gt;PenaltyContextSize&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;-1&lt;/code&gt; (= full context). Number of recent tokens considered for repetition penalties.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;-1&lt;/code&gt; — use the model&amp;rsquo;s full context size.&lt;/li&gt;
&lt;li&gt;A positive integer — only the last N tokens contribute.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Smaller windows make penalties more local; larger windows spread them across the whole conversation.&lt;/p&gt;
&lt;h3 id=&#34;repetitionpenalty&#34;&gt;&lt;code&gt;RepetitionPenalty&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;1.1&lt;/code&gt;. Multiplicative penalty applied to recently-seen tokens. Values &lt;code&gt;&amp;gt; 1&lt;/code&gt; make repeats less likely; &lt;code&gt;1.0&lt;/code&gt; disables repetition penalty.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;1.0&lt;/code&gt; — no penalty.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1.05 - 1.15&lt;/code&gt; (default range) — gentle anti-repetition.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1.2 - 1.3&lt;/code&gt; — aggressive; risks under-generating common words like &amp;ldquo;the&amp;rdquo; or &amp;ldquo;and&amp;rdquo;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;presencepenalty&#34;&gt;&lt;code&gt;PresencePenalty&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;0&lt;/code&gt; (disabled). Additive penalty for tokens that have appeared at least once in the penalty window. Biases the model toward new tokens not yet used.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PresencePenalty&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.6f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;frequencypenalty&#34;&gt;&lt;code&gt;FrequencyPenalty&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;0&lt;/code&gt; (disabled). Additive penalty proportional to how many times a token has appeared. Stronger version of presence penalty.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;FrequencyPenalty&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.3f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Typical ranges for both penalties: &lt;code&gt;0.0 - 1.0&lt;/code&gt;. Combine with &lt;code&gt;RepetitionPenalty&lt;/code&gt; to shape repetition behavior; tune one at a time.&lt;/p&gt;
&lt;h2 id=&#34;advanced-filters&#34;&gt;Advanced filters&lt;/h2&gt;
&lt;h3 id=&#34;typicalp&#34;&gt;&lt;code&gt;TypicalP&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;-1&lt;/code&gt; (disabled). Locally-typical sampling — keeps tokens whose log-probability is close to the expected entropy. Alternative to nucleus sampling; rarely needed when &lt;code&gt;TopP&lt;/code&gt; is set.&lt;/p&gt;
&lt;p&gt;Enable with a value in &lt;code&gt;(0, 1]&lt;/code&gt;, e.g. &lt;code&gt;0.95&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;topnsigma&#34;&gt;&lt;code&gt;TopNSigma&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;-1&lt;/code&gt; (disabled). Keeps tokens within &lt;code&gt;N&lt;/code&gt; standard deviations of the top logit. Experimental filter from recent &lt;code&gt;llama.cpp&lt;/code&gt; versions.&lt;/p&gt;
&lt;h3 id=&#34;dynamic-temperature-dynatemprange-dynatempexponent&#34;&gt;Dynamic temperature (&lt;code&gt;DynatempRange&lt;/code&gt;, &lt;code&gt;DynatempExponent&lt;/code&gt;)&lt;/h3&gt;
&lt;p&gt;Dynamically adjusts temperature per step based on token entropy. When entropy is low (the model is confident), temperature drops; when entropy is high (the model is uncertain), temperature rises.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;DynatempRange&lt;/code&gt; default &lt;code&gt;0&lt;/code&gt; — dynatemp disabled.&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;DynatempRange &amp;gt; 0&lt;/code&gt; to enable. Typical values &lt;code&gt;0.2 - 0.5&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DynatempExponent&lt;/code&gt; default &lt;code&gt;1.0&lt;/code&gt; controls the shape of the entropy-to-temperature curve.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Temperature&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.8f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DynatempRange&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.3f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// Effective temperature varies in [0.5, 1.1] based on per-step entropy.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;xtc--exclude-top-choices&#34;&gt;XTC — Exclude Top Choices&lt;/h3&gt;
&lt;p&gt;XTC randomly excludes the top tokens during sampling at a configurable probability. Useful for injecting diversity without raising overall temperature.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;XtcProbability&lt;/code&gt; default &lt;code&gt;-1&lt;/code&gt; (disabled). Probability of applying XTC at each step.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;XtcThreshold&lt;/code&gt; — minimum probability below which tokens are not excluded.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;dry--dont-repeat-yourself&#34;&gt;DRY — Don&amp;rsquo;t Repeat Yourself&lt;/h3&gt;
&lt;p&gt;DRY detects and penalizes verbatim string repetition (word-for-word copies from earlier in the context). Useful for creative writing; often too aggressive for code.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DryMultiplier&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;-1&lt;/code&gt; (disabled)&lt;/td&gt;
&lt;td&gt;Strength of the penalty. Enable with a value like &lt;code&gt;0.8&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DryBase&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;1.75&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Base of the exponential penalty for consecutive repeated tokens.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DryAllowedLength&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Minimum repeat length before DRY engages.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DryPenaltyLastN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Number of trailing tokens checked (0 = all).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DrySequenceBreakers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;[&amp;quot;\n&amp;quot;, &amp;quot;:&amp;quot;, &amp;quot;\&amp;quot;&amp;quot;, &amp;quot;*&amp;quot;]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokens that reset the repeat detector.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DryMultiplier&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.8f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DryBase&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1.75f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DryAllowedLength&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;mirostat&#34;&gt;Mirostat&lt;/h3&gt;
&lt;p&gt;Adaptive sampler that targets a specific output entropy (perplexity). Alternative to temperature + nucleus sampling.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Mirostat = 0&lt;/code&gt; (default) — disabled.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Mirostat = 1&lt;/code&gt; — Mirostat 1.0.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Mirostat = 2&lt;/code&gt; — Mirostat 2.0 (usually preferred).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MirostatTau = 5.0&lt;/code&gt; — target entropy. Lower = more deterministic.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MirostatEta = 0.1&lt;/code&gt; — learning rate.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When Mirostat is on, &lt;code&gt;TopP&lt;/code&gt;, &lt;code&gt;TopK&lt;/code&gt;, and &lt;code&gt;MinP&lt;/code&gt; are effectively bypassed — Mirostat manages the full distribution itself.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Mirostat&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MirostatTau&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;4.5f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;fine-grained-controls&#34;&gt;Fine-grained controls&lt;/h2&gt;
&lt;h3 id=&#34;logitbias&#34;&gt;&lt;code&gt;LogitBias&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Per-token logit adjustment. Keys are token IDs, values are additive biases applied before sampling.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;c1&#34;&gt;// Strongly bias against token 1234:
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LogitBias&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1234&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;100f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// Slightly favor token 5678:
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LogitBias&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;5678&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;2f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A bias of &lt;code&gt;-100&lt;/code&gt; (or lower) effectively bans a token. A bias of &lt;code&gt;+2&lt;/code&gt; to &lt;code&gt;+5&lt;/code&gt; noticeably favors it. Obtaining token IDs requires the model&amp;rsquo;s tokenizer — not exposed in this parameter bag but available via the API reference.&lt;/p&gt;
&lt;h3 id=&#34;enableinfill&#34;&gt;&lt;code&gt;EnableInfill&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Default &lt;code&gt;false&lt;/code&gt;. Enables the INFILL sampler variant used for fill-in-the-middle code completion (models that support it). Leave off for normal chat.&lt;/p&gt;
&lt;h2 id=&#34;typical-recipes&#34;&gt;Typical recipes&lt;/h2&gt;
&lt;h3 id=&#34;deterministic-output-for-tests-and-reproducible-demos&#34;&gt;Deterministic output (for tests and reproducible demos)&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Temperature&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// greedy
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Seed&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;42&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;balanced-chat-assistant&#34;&gt;Balanced chat assistant&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// Defaults are already tuned:
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// Temperature 0.7, TopP 0.9, TopK 40, MinP 0.05, RepetitionPenalty 1.1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;creative-writing&#34;&gt;Creative writing&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Temperature&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.9f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TopP&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.95f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MinP&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.02f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DryMultiplier&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.8f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// discourage verbatim repetition
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;tight-technical-output-code-structured-data&#34;&gt;Tight technical output (code, structured data)&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Temperature&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.2f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TopP&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.95f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;RepetitionPenalty&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1.05f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// gentler for code patterns
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;precision-via-mirostat&#34;&gt;Precision via Mirostat&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Mirostat&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MirostatTau&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;5.0f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SamplerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MirostatEta&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0.1f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// Temperature, TopP, TopK are now ignored.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/chat/&#34;&gt;Chat parameters&lt;/a&gt; — max tokens and system prompt shape what the sampler generates into.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/&#34;&gt;Context parameters&lt;/a&gt; — the context window the sampler reads penalties from.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/use-cases/custom-preset/&#34;&gt;Custom preset&lt;/a&gt; — full customization patterns.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Engine parameters</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/parameters/engine/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/parameters/engine/</guid>
      <description>
        
        
        &lt;p&gt;&lt;code&gt;EngineParameters&lt;/code&gt; holds engine-wide settings that apply across the entire &lt;code&gt;AsposeLLMApi&lt;/code&gt; lifecycle. Unlike the per-session parameter bags, these values are read once during engine construction and not re-evaluated per request.&lt;/p&gt;
&lt;h2 id=&#34;class-reference&#34;&gt;Class reference&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;namespace&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Core.DependencyInjection&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;EngineParameters&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ModelCachePath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;    &lt;span class=&#34;c1&#34;&gt;// default: &amp;lt;LocalAppData&amp;gt;/Aspose.LLM/models
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;EnableDebugLogging&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;   &lt;span class=&#34;c1&#34;&gt;// default: false
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LogDirectoryPath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;   &lt;span class=&#34;c1&#34;&gt;// default: &amp;#34;logs/log.txt&amp;#34;
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DefaultThreads&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;        &lt;span class=&#34;c1&#34;&gt;// default: Environment.ProcessorCount - 1
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;All four properties have working defaults. Override only when the default does not match your deployment.&lt;/p&gt;
&lt;h2 id=&#34;detailed-field-reference&#34;&gt;Detailed field reference&lt;/h2&gt;
&lt;p&gt;Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/engine/model-cache-path/&#34;&gt;ModelCachePath&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/engine/enable-debug-logging/&#34;&gt;EnableDebugLogging&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/engine/log-directory-path/&#34;&gt;LogDirectoryPath&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/engine/default-threads/&#34;&gt;DefaultThreads&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;fields&#34;&gt;Fields&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ModelCachePath&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;LocalAppData&amp;gt;/Aspose.LLM/models&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Folder where downloaded model files are stored.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;EnableDebugLogging&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bool&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;false&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enables native-level debug logs from &lt;code&gt;llama.cpp&lt;/code&gt;. Verbose — use for diagnosis, not in production.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LogDirectoryPath&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;quot;logs/log.txt&amp;quot;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;File path for native log output. Despite the name, this is a full file path, not a directory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DefaultThreads&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;int&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ProcessorCount - 1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Default thread count used when a parameter bag does not specify its own threading.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;modelcachepath&#34;&gt;&lt;code&gt;ModelCachePath&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Points to the folder where downloaded GGUF files are cached. On first run, the engine:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Checks whether the model file already exists under this folder.&lt;/li&gt;
&lt;li&gt;If not, downloads it from the source defined in &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-source/&#34;&gt;&lt;code&gt;ModelSourceParameters&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Loads the cached file on subsequent runs.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Typical reasons to override:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Shared model cache&lt;/strong&gt; across multiple applications or users.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faster disk&lt;/strong&gt; — point to an SSD when your &lt;code&gt;LocalApplicationData&lt;/code&gt; is on an HDD.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Constrained disk layout&lt;/strong&gt; — put models on a data drive, code on the system drive.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docker / container&lt;/strong&gt; scenarios — mount the cache as a volume so models survive restarts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example — use a shared cache under &lt;code&gt;D:\models&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ModelCachePath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@&amp;#34;D:\models&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;enabledebuglogging&#34;&gt;&lt;code&gt;EnableDebugLogging&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;true&lt;/code&gt;, the native &lt;code&gt;llama.cpp&lt;/code&gt; layer emits verbose logs — useful when diagnosing inference errors, template mismatches, or KV cache issues. Combine with an &lt;code&gt;ILogger&lt;/code&gt; passed to &lt;code&gt;AsposeLLMApi.Create(preset, logger)&lt;/code&gt; to capture the output:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Microsoft.Extensions.Logging&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;loggerFactory&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LoggerFactory&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;builder&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;builder&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;AddConsole&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;().&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SetMinimumLevel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LogLevel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Debug&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;));&lt;/span&gt;
&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;logger&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;loggerFactory&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CreateLogger&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Program&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;gt;();&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EnableDebugLogging&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;logger&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Leave this &lt;code&gt;false&lt;/code&gt; in production; debug logs materially affect throughput.&lt;/p&gt;
&lt;h3 id=&#34;logdirectorypath&#34;&gt;&lt;code&gt;LogDirectoryPath&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The path used by the native log writer. Defaults to &lt;code&gt;logs/log.txt&lt;/code&gt; relative to the working directory. The name suggests a directory, but the value is a full file path.&lt;/p&gt;
&lt;p&gt;Change this when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Your app has a specific logging convention (for example, centralized log folder).&lt;/li&gt;
&lt;li&gt;You need per-environment log files (&lt;code&gt;logs/dev.log&lt;/code&gt;, &lt;code&gt;logs/prod.log&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;The working directory is not writable in your deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LogDirectoryPath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@&amp;#34;C:\logs\aspose-llm\app.log&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;defaultthreads&#34;&gt;&lt;code&gt;DefaultThreads&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Thread count used when &lt;code&gt;ContextParameters.NThreads&lt;/code&gt; is not set explicitly. The default is &lt;code&gt;Environment.ProcessorCount - 1&lt;/code&gt; — one fewer than the total logical cores — to leave one core for the rest of your application.&lt;/p&gt;
&lt;p&gt;Override in two situations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Dedicated inference machine&lt;/strong&gt; — use &lt;code&gt;Environment.ProcessorCount&lt;/code&gt; for maximum throughput.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tight envelope (containers, laaS)&lt;/strong&gt; — use a fixed smaller number to stay inside a CPU quota.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DefaultThreads&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;4&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For finer control over threading during generation, set &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/&#34;&gt;&lt;code&gt;ContextParameters.NThreads&lt;/code&gt;&lt;/a&gt; and &lt;code&gt;NThreadsBatch&lt;/code&gt; directly — those override &lt;code&gt;DefaultThreads&lt;/code&gt; when set.&lt;/p&gt;
&lt;h2 id=&#34;typical-recipes&#34;&gt;Typical recipes&lt;/h2&gt;
&lt;h3 id=&#34;development-with-debug-logs&#34;&gt;Development with debug logs&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EnableDebugLogging&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LogDirectoryPath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;logs/debug.log&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;production-on-a-dedicated-server&#34;&gt;Production on a dedicated server&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ModelCachePath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@&amp;#34;/var/lib/aspose-llm/models&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EnableDebugLogging&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;false&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LogDirectoryPath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@&amp;#34;/var/log/aspose-llm/app.log&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DefaultThreads&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Environment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ProcessorCount&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;container-with-mounted-model-volume&#34;&gt;Container with mounted model volume&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ModelCachePath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;/models&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;           &lt;span class=&#34;c1&#34;&gt;// volume mount
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LogDirectoryPath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;/var/log/app.log&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// volume mount
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DefaultThreads&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;4&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;                   &lt;span class=&#34;c1&#34;&gt;// CPU quota
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/&#34;&gt;Context parameters&lt;/a&gt; — threads per inference call, batch sizes.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/&#34;&gt;Binary manager parameters&lt;/a&gt; — where native &lt;code&gt;llama.cpp&lt;/code&gt; binaries are cached.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/&#34;&gt;Logging and diagnostics&lt;/a&gt; — &lt;code&gt;ILogger&lt;/code&gt; integration and native log tags (planned reference page in a future release).&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Binary manager parameters</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/</guid>
      <description>
        
        
        &lt;p&gt;&lt;code&gt;BinaryManagerParameters&lt;/code&gt; controls how the SDK obtains the native &lt;code&gt;llama.cpp&lt;/code&gt; binaries (&lt;code&gt;libllama&lt;/code&gt;, &lt;code&gt;libmtmd&lt;/code&gt;, &lt;code&gt;libggml-*&lt;/code&gt;). On first use, the engine downloads the matching build from GitHub for your platform and acceleration backend, caches it, and reuses the cache on subsequent runs.&lt;/p&gt;
&lt;h2 id=&#34;class-reference&#34;&gt;Class reference&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;namespace&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Parameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;BinaryManagerParameters&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Owner&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;ggml-org&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Repo&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;llama.cpp&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ReleaseTag&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;b8816&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;BinaryPath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;              &lt;span class=&#34;c1&#34;&gt;// &amp;lt;LocalAppData&amp;gt;/Aspose.LLM/runtimes
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;SystemSpec&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;SystemSpecification&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AccelerationType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;PreferredAcceleration&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;detailed-field-reference&#34;&gt;Detailed field reference&lt;/h2&gt;
&lt;p&gt;Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/owner/&#34;&gt;Owner&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/repo/&#34;&gt;Repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/release-tag/&#34;&gt;ReleaseTag&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/binary-path/&#34;&gt;BinaryPath&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/system-specification/&#34;&gt;SystemSpecification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/binary-manager/preferred-acceleration/&#34;&gt;PreferredAcceleration&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;fields&#34;&gt;Fields&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Owner&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;quot;ggml-org&amp;quot;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GitHub repository owner for &lt;code&gt;llama.cpp&lt;/code&gt; releases.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Repo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;quot;llama.cpp&amp;quot;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GitHub repository name.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ReleaseTag&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;quot;b8816&amp;quot;&lt;/code&gt; (as of SDK v26.5.0)&lt;/td&gt;
&lt;td&gt;Specific &lt;code&gt;llama.cpp&lt;/code&gt; release to pin.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;BinaryPath&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;LocalAppData&amp;gt;/Aspose.LLM/runtimes&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local cache for downloaded native binaries.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SystemSpecification&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SystemSpec?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;null&lt;/code&gt; (auto-detect)&lt;/td&gt;
&lt;td&gt;Override the detected OS / architecture / acceleration capabilities.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PreferredAcceleration&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AccelerationType?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;null&lt;/code&gt; (auto-select)&lt;/td&gt;
&lt;td&gt;Force a specific acceleration backend (CUDA, HIP, Metal, Vulkan, CPU variants).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;owner-and-repo&#34;&gt;&lt;code&gt;Owner&lt;/code&gt; and &lt;code&gt;Repo&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Together they form &lt;code&gt;github.com/&amp;lt;Owner&amp;gt;/&amp;lt;Repo&amp;gt;/releases/...&lt;/code&gt;. The defaults target the upstream &lt;code&gt;llama.cpp&lt;/code&gt; repository. Change them only if you mirror releases to a fork that stays byte-compatible with upstream — for example, in an air-gapped enterprise setup that syncs selected releases into a private GitHub Enterprise instance.&lt;/p&gt;
&lt;h3 id=&#34;releasetag&#34;&gt;&lt;code&gt;ReleaseTag&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Pins a specific upstream release. As of SDK v26.5.0, the default is &lt;code&gt;b8816&lt;/code&gt;. Each release tag corresponds to native binaries with matching P/Invoke signatures; changing the tag without a matching SDK version is unsupported.&lt;/p&gt;
&lt;p&gt;Override only when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You are testing a newer upstream release against the current SDK (development only).&lt;/li&gt;
&lt;li&gt;You are locked to an older release because of a validated deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryManagerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ReleaseTag&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;b8816&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class=&#34;alert alert-primary&#34; role=&#34;alert&#34;&gt;

The SDK&amp;rsquo;s P/Invoke layer is validated against the default &lt;code&gt;ReleaseTag&lt;/code&gt;. Pinning a different tag can produce runtime errors if upstream changed a struct layout or function signature. Do not ship custom tags to production without a migration pass — see the &lt;code&gt;llama-cpp-migration&lt;/code&gt; workflow used by the Aspose team.
&lt;/div&gt;

&lt;h3 id=&#34;binarypath&#34;&gt;&lt;code&gt;BinaryPath&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Folder where downloaded binaries live. The default is &lt;code&gt;&amp;lt;LocalAppData&amp;gt;/Aspose.LLM/runtimes&lt;/code&gt; — &lt;code&gt;%LOCALAPPDATA%\Aspose.LLM\runtimes&lt;/code&gt; on Windows and the equivalent &lt;code&gt;LocalApplicationData&lt;/code&gt; folder elsewhere.&lt;/p&gt;
&lt;p&gt;Override when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Shared cache&lt;/strong&gt; across multiple applications or services on the same host.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Read-only root filesystem&lt;/strong&gt; — point the cache at a writable volume.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pre-populated deployment&lt;/strong&gt; — bundle the binaries with your application and point &lt;code&gt;BinaryPath&lt;/code&gt; at them to skip the download on first run.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryManagerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryPath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@&amp;#34;/var/lib/aspose-llm/runtimes&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;systemspecification&#34;&gt;&lt;code&gt;SystemSpecification&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;null&lt;/code&gt; (the default), the SDK detects the host&amp;rsquo;s OS, architecture, and available accelerations at engine construction. Override with an explicit &lt;code&gt;SystemSpec&lt;/code&gt; only for diagnostics or cross-platform binary preparation — leaving this &lt;code&gt;null&lt;/code&gt; is correct for normal deployments.&lt;/p&gt;
&lt;h3 id=&#34;preferredacceleration&#34;&gt;&lt;code&gt;PreferredAcceleration&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;null&lt;/code&gt;, the SDK picks the best available acceleration for the host in this order: CUDA → HIP → Metal → Vulkan → CPU (AVX level best to worst). Set an explicit value to override the selection.&lt;/p&gt;
&lt;p&gt;Supported values (see &lt;code&gt;Aspose.LLM.Abstractions.Acceleration.AccelerationType&lt;/code&gt;):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CUDA&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Windows, Linux&lt;/td&gt;
&lt;td&gt;NVIDIA GPUs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HIP&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Linux&lt;/td&gt;
&lt;td&gt;AMD GPUs via ROCm.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Metal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;macOS (Apple Silicon)&lt;/td&gt;
&lt;td&gt;M-series chips.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Vulkan&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Windows, Linux&lt;/td&gt;
&lt;td&gt;Cross-platform GPU.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AVX512&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Any x64 with AVX-512&lt;/td&gt;
&lt;td&gt;Fastest CPU path.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AVX2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Any x64 with AVX2&lt;/td&gt;
&lt;td&gt;Default CPU fallback.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AVX&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Older x64&lt;/td&gt;
&lt;td&gt;Slower.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NoAVX&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Very old CPUs&lt;/td&gt;
&lt;td&gt;Last-resort compatibility.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Kompute&lt;/code&gt;, &lt;code&gt;OpenCL&lt;/code&gt;, &lt;code&gt;SYCL&lt;/code&gt;, &lt;code&gt;OpenBLAS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Platform-dependent&lt;/td&gt;
&lt;td&gt;Less common; verify availability for your target.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The enum has additional values (&lt;code&gt;None&lt;/code&gt;) used internally — avoid setting them explicitly.&lt;/p&gt;
&lt;h2 id=&#34;typical-recipes&#34;&gt;Typical recipes&lt;/h2&gt;
&lt;h3 id=&#34;force-cpu-only-execution&#34;&gt;Force CPU-only execution&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Acceleration&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryManagerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PreferredAcceleration&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AccelerationType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;AVX2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;GpuLayers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// complement on the inference side
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;force-cuda-on-a-multi-gpu-box&#34;&gt;Force CUDA on a multi-GPU box&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryManagerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PreferredAcceleration&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AccelerationType&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CUDA&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;GpuLayers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;999&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// Pick a specific GPU via BaseModelInferenceParameters.MainGpu = N;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;pre-populated-offline-deployment&#34;&gt;Pre-populated offline deployment&lt;/h3&gt;
&lt;p&gt;Download the binaries on a connected machine, copy the cache into your deployment, and point &lt;code&gt;BinaryPath&lt;/code&gt; at it on the offline host:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen25Preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryManagerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryPath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@&amp;#34;/opt/aspose-llm/runtimes&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// Also point EngineParameters.ModelCachePath at a pre-populated model cache.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;shared-cache-across-services&#34;&gt;Shared cache across services&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryManagerParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BinaryPath&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@&amp;#34;/srv/shared/aspose-llm/runtimes&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Make sure every service using this cache runs the same SDK version — version mismatches produce binary incompatibilities.&lt;/p&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/system-requirements/&#34;&gt;System requirements&lt;/a&gt; — what runtimes and hardware the binaries support.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-inference/&#34;&gt;Model inference parameters&lt;/a&gt; — complement &lt;code&gt;PreferredAcceleration&lt;/code&gt; with &lt;code&gt;GpuLayers&lt;/code&gt; and split settings.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/product-overview/architecture/&#34;&gt;Architecture&lt;/a&gt; — what happens during first-run binary deployment.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Multimodal context parameters</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/</guid>
      <description>
        
        
        &lt;p&gt;&lt;code&gt;MultimodalContextParameters&lt;/code&gt; — exposed on the preset as &lt;code&gt;MtmdContextParameters&lt;/code&gt; — configures the &lt;code&gt;mtmd&lt;/code&gt; context used by vision presets to evaluate image tokens. The base text model is configured by &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/context/&#34;&gt;&lt;code&gt;ContextParameters&lt;/code&gt;&lt;/a&gt;; this bag covers only the multimodal layer.&lt;/p&gt;
&lt;p&gt;Only vision presets use these settings. On text-only presets the bag is instantiated but has no effect.&lt;/p&gt;
&lt;h2 id=&#34;class-reference&#34;&gt;Class reference&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;namespace&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Parameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;MultimodalContextParameters&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;UseGpu&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;bool?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;PrintTimings&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ThreadCount&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Verbosity&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MediaMarker&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Every field is nullable. A &lt;code&gt;null&lt;/code&gt; value means &amp;ldquo;use the native &lt;code&gt;mtmd&lt;/code&gt; default&amp;rdquo; — override only when you have a specific reason.&lt;/p&gt;
&lt;h2 id=&#34;detailed-field-reference&#34;&gt;Detailed field reference&lt;/h2&gt;
&lt;p&gt;Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/use-gpu/&#34;&gt;UseGpu&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/print-timings/&#34;&gt;PrintTimings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/thread-count/&#34;&gt;ThreadCount&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/verbosity/&#34;&gt;Verbosity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/media-marker/&#34;&gt;MediaMarker&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;fields&#34;&gt;Fields&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;UseGpu&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bool?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default&lt;/td&gt;
&lt;td&gt;Whether to offload the vision projector to GPU.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PrintTimings&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bool?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default&lt;/td&gt;
&lt;td&gt;Emit per-step timing diagnostics from the &lt;code&gt;mtmd&lt;/code&gt; layer.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ThreadCount&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;int?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default&lt;/td&gt;
&lt;td&gt;Threads used by &lt;code&gt;mtmd&lt;/code&gt; processing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Verbosity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;int?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default&lt;/td&gt;
&lt;td&gt;Log level for the &lt;code&gt;mtmd&lt;/code&gt; layer.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MediaMarker&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;native default&lt;/td&gt;
&lt;td&gt;Placeholder token text that marks image positions in the prompt.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;usegpu&#34;&gt;&lt;code&gt;UseGpu&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Controls whether the vision projector (&lt;code&gt;mmproj&lt;/code&gt;) runs on the GPU alongside the base model. The &lt;code&gt;mmproj&lt;/code&gt; is typically small (200 MB - 2 GB), so GPU offload is fast even on modest hardware.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;null&lt;/code&gt; — delegate to &lt;code&gt;mtmd&lt;/code&gt;&amp;rsquo;s auto-detection (currently: GPU if available).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;true&lt;/code&gt; — force GPU.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;false&lt;/code&gt; — force CPU. Use when you have limited GPU memory and want to spend it entirely on the base model.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;UseGpu&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;false&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// keep GPU memory for the base model
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;printtimings&#34;&gt;&lt;code&gt;PrintTimings&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Enables &lt;code&gt;mtmd&lt;/code&gt;&amp;rsquo;s built-in per-step timing logs — the time spent tokenizing images, running the projector, and evaluating chunks. Useful for diagnosing slow first-response latency on vision queries.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PrintTimings&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Leave this &lt;code&gt;null&lt;/code&gt; (off) in production. Timing logs add overhead and flood the output.&lt;/p&gt;
&lt;h3 id=&#34;threadcount&#34;&gt;&lt;code&gt;ThreadCount&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Threads used for CPU-side &lt;code&gt;mtmd&lt;/code&gt; work (image preprocessing, CPU portions of the projector). When &lt;code&gt;null&lt;/code&gt;, &lt;code&gt;mtmd&lt;/code&gt; follows its own heuristic — usually half the logical cores.&lt;/p&gt;
&lt;p&gt;Override when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The rest of your application needs more cores and &lt;code&gt;mtmd&lt;/code&gt; is single-shot work.&lt;/li&gt;
&lt;li&gt;You run multiple vision requests concurrently and want to cap each one&amp;rsquo;s CPU footprint.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ThreadCount&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;verbosity&#34;&gt;&lt;code&gt;Verbosity&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Log verbosity for the &lt;code&gt;mtmd&lt;/code&gt; layer. The native layer accepts an integer; the typical mapping is:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Error&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Warn&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Debug&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Verbosity&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// debug — useful when images are tokenized unexpectedly
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Keep verbosity low in production (&lt;code&gt;0&lt;/code&gt; or &lt;code&gt;1&lt;/code&gt;). Higher levels emit tagged lines that need post-processing to be useful — see the &lt;code&gt;parse_mm_logs.zsh&lt;/code&gt; helper script in the Aspose.LLM SDK repository.&lt;/p&gt;
&lt;h3 id=&#34;mediamarker&#34;&gt;&lt;code&gt;MediaMarker&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Placeholder text used in the chat template to mark where images are inserted. The default is the chat-template-specific marker (different per model family — LLaVA, Qwen-VL, Gemma-Vision, and others have different tokens). Override only if you understand the model&amp;rsquo;s prompt format and need a non-standard marker.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MediaMarker&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;&amp;lt;|image|&amp;gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In nearly all cases, leave this &lt;code&gt;null&lt;/code&gt;. The correct marker is selected automatically from the model&amp;rsquo;s metadata.&lt;/p&gt;
&lt;h2 id=&#34;typical-recipes&#34;&gt;Typical recipes&lt;/h2&gt;
&lt;h3 id=&#34;default-vision-configuration&#34;&gt;Default vision configuration&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen3VL2BPreset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// MtmdContextParameters stays at defaults — all fields null.
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;debug-slow-image-processing&#34;&gt;Debug slow image processing&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen3VL2BPreset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PrintTimings&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Verbosity&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;logger&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// Inspect logs for per-stage mtmd timings.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;cpu-only-projector-to-save-gpu-memory&#34;&gt;CPU-only projector to save GPU memory&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen3VL2BPreset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;UseGpu&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;false&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;                  &lt;span class=&#34;c1&#34;&gt;// projector on CPU
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;BaseModelInferenceParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;GpuLayers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;999&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;          &lt;span class=&#34;c1&#34;&gt;// base model fully on GPU
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;On a GPU tight for memory, keeping the projector on CPU trades some first-token latency for more headroom for the base model and KV cache.&lt;/p&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/product-overview/supported-presets/#vision-presets&#34;&gt;Supported presets — vision&lt;/a&gt; — built-in vision presets and their &lt;code&gt;mmproj&lt;/code&gt; sources.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-source/&#34;&gt;Model source parameters&lt;/a&gt; — configure the vision projector&amp;rsquo;s download source.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/use-cases/&#34;&gt;Attaching images&lt;/a&gt; — vision use case (planned in a future release).&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
  </channel>
</rss>
