<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Documentation – Multimodal</title>
    <link>https://docs.aspose.com/llm/net/developer-reference/multimodal/</link>
    <description>Recent content in Multimodal on Documentation</description>
    <generator>Hugo -- gohugo.io</generator>
    <lastBuildDate>Thu, 23 Apr 2026 00:00:00 +0000</lastBuildDate>
    
	  <atom:link href="https://docs.aspose.com/llm/net/developer-reference/multimodal/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Net: Vision presets</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/multimodal/vision-presets/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/multimodal/vision-presets/</guid>
      <description>
        
        
        &lt;p&gt;The SDK ships four built-in vision presets. Each preset configures both the base language model and its multimodal projector (&lt;code&gt;mmproj&lt;/code&gt;) — the two files load together on first &lt;code&gt;Create&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;available-presets&#34;&gt;Available presets&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Preset&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th style=&#34;text-align:right&#34;&gt;Base context&lt;/th&gt;
&lt;th&gt;Quantization&lt;/th&gt;
&lt;th&gt;mmproj file&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Qwen25VL3BPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Qwen 2.5 VL 3B Instruct&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;128 000&lt;/td&gt;
&lt;td&gt;UD-IQ2_XXS&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mmproj-F16.gguf&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Qwen3VL2BPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Qwen 3 VL 2B Instruct&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;262 144&lt;/td&gt;
&lt;td&gt;Q4_K_M&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mmproj-Qwen3VL-2B-Instruct-Q8_0.gguf&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Gemma3VisionPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gemma 3 Vision (Latex fine-tune)&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;8 096&lt;/td&gt;
&lt;td&gt;Q4_K_M&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Gemma-3-Vision-Latex.mmproj-f16.gguf&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ministral3VisionPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ministral 3 8B Instruct (Mistral AI, 2512 release)&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;262 144&lt;/td&gt;
&lt;td&gt;Q4_K_M&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Ministral-3-8B-Instruct-2512-BF16-mmproj.gguf&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;See &lt;a href=&#34;https://docs.aspose.com/llm/net/product-overview/supported-presets/#vision-presets&#34;&gt;Supported presets&lt;/a&gt; for the Hugging Face source repositories.&lt;/p&gt;
&lt;h2 id=&#34;picker&#34;&gt;Picker&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Need&lt;/th&gt;
&lt;th&gt;Try&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Smallest footprint, long context&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Qwen3VL2BPreset&lt;/code&gt; (2B parameters, 262K context)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;General-purpose vision Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Qwen25VL3BPreset&lt;/code&gt; (3B, 128K)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text-heavy images (documents, LaTeX)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Gemma3VisionPreset&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strongest reasoning on complex images&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Ministral3VisionPreset&lt;/code&gt; (8B, 262K)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;All four produce reasonable image descriptions and simple spatial reasoning. For OCR-style tasks on dense text, lean toward Gemma 3 Vision or Ministral 3 — the larger projectors handle small text better.&lt;/p&gt;
&lt;h2 id=&#34;memory&#34;&gt;Memory&lt;/h2&gt;
&lt;p&gt;Vision presets load two files: the base model and the projector. Add the projector memory footprint on top of the base model — typically 200 MB to 2 GB depending on precision.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Preset&lt;/th&gt;
&lt;th&gt;Base (VRAM/RAM)&lt;/th&gt;
&lt;th&gt;Projector&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Qwen25VL3BPreset&lt;/code&gt; (UD-IQ2_XXS)&lt;/td&gt;
&lt;td&gt;~2 GB&lt;/td&gt;
&lt;td&gt;~0.8 GB (F16)&lt;/td&gt;
&lt;td&gt;~3 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Qwen3VL2BPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~2 GB&lt;/td&gt;
&lt;td&gt;~0.5 GB (Q8_0)&lt;/td&gt;
&lt;td&gt;~2.5 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Gemma3VisionPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~3 GB&lt;/td&gt;
&lt;td&gt;~1 GB (F16)&lt;/td&gt;
&lt;td&gt;~4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ministral3VisionPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~6 GB&lt;/td&gt;
&lt;td&gt;~2 GB (BF16)&lt;/td&gt;
&lt;td&gt;~8 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Add KV cache on top (scales with &lt;code&gt;ContextParameters.ContextSize&lt;/code&gt;). For long contexts, reduce &lt;code&gt;TypeV&lt;/code&gt; to &lt;code&gt;Q8_0&lt;/code&gt; to claw back memory.&lt;/p&gt;
&lt;h2 id=&#34;using-a-vision-preset&#34;&gt;Using a vision preset&lt;/h2&gt;
&lt;p&gt;Same pattern as any other preset. Pass images via the &lt;code&gt;media&lt;/code&gt; parameter of &lt;code&gt;SendMessageAsync&lt;/code&gt; or &lt;code&gt;SendMessageToSessionAsync&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Parameters.Presets&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen3VL2BPreset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;byte&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;imageBytes&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;File&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ReadAllBytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;document.png&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;reply&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SendMessageAsync&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
    &lt;span class=&#34;s&#34;&gt;&amp;#34;Transcribe the text in this image verbatim.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;media&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;imageBytes&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;});&lt;/span&gt;

&lt;span class=&#34;n&#34;&gt;Console&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;WriteLine&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;reply&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;See &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/attaching-images/&#34;&gt;Attaching images&lt;/a&gt; for format and size rules.&lt;/p&gt;
&lt;h2 id=&#34;customizing-a-vision-preset&#34;&gt;Customizing a vision preset&lt;/h2&gt;
&lt;p&gt;The same override patterns as text presets apply. See &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/presets/&#34;&gt;Presets&lt;/a&gt; for the three approaches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Override before &lt;code&gt;Create&lt;/code&gt;&lt;/strong&gt; — tweak fields on the preset instance.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Subclass&lt;/strong&gt; — inherit from a built-in vision preset and set defaults in the constructor.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;From scratch&lt;/strong&gt; — extend &lt;code&gt;PresetCoreBase&lt;/code&gt; and populate both &lt;code&gt;BaseModelSourceParameters&lt;/code&gt; and &lt;code&gt;MmprojSourceParameters&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additional vision-only knobs live on &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/&#34;&gt;&lt;code&gt;MtmdContextParameters&lt;/code&gt;&lt;/a&gt; — control projector GPU offload, threading, and verbosity.&lt;/p&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/attaching-images/&#34;&gt;Attaching images&lt;/a&gt; — how to pass image bytes.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/chat-templates/&#34;&gt;Chat templates&lt;/a&gt; — how the SDK selects the right template per model.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/&#34;&gt;Multimodal context parameters&lt;/a&gt; — vision-side tuning knobs.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/model-source/&#34;&gt;Model source parameters&lt;/a&gt; — configure the &lt;code&gt;mmproj&lt;/code&gt; download source.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Attaching images</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/multimodal/attaching-images/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/multimodal/attaching-images/</guid>
      <description>
        
        
        &lt;p&gt;The &lt;code&gt;media&lt;/code&gt; parameter of &lt;code&gt;SendMessageAsync&lt;/code&gt; and &lt;code&gt;SendMessageToSessionAsync&lt;/code&gt; accepts image byte arrays. The engine detects each image&amp;rsquo;s format from its magic bytes, validates size, and passes it through the vision projector alongside the text prompt.&lt;/p&gt;
&lt;h2 id=&#34;quick-example&#34;&gt;Quick example&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;byte&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;imageBytes&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;File&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ReadAllBytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;cat.jpg&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;reply&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SendMessageAsync&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
    &lt;span class=&#34;s&#34;&gt;&amp;#34;Describe the animal in this image.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;media&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;imageBytes&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The parameter is &lt;code&gt;IEnumerable&amp;lt;byte[]&amp;gt;?&lt;/code&gt;. Pass &lt;code&gt;null&lt;/code&gt; or omit it entirely for a text-only message.&lt;/p&gt;
&lt;h2 id=&#34;supported-formats&#34;&gt;Supported formats&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Magic bytes (hex)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JPEG&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FF D8&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PNG&lt;/td&gt;
&lt;td&gt;&lt;code&gt;89 50 4E 47&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BMP&lt;/td&gt;
&lt;td&gt;&lt;code&gt;42 4D&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GIF&lt;/td&gt;
&lt;td&gt;&lt;code&gt;47 49 46&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WebP&lt;/td&gt;
&lt;td&gt;&lt;code&gt;52 49 46 46 .. .. .. .. 57 45 42 50&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Formats outside this list are rejected. If your source is TIFF, HEIC, SVG, or another format, convert to one of the supported formats before passing to the API.&lt;/p&gt;
&lt;h2 id=&#34;50-mb-per-attachment-limit&#34;&gt;50 MB per-attachment limit&lt;/h2&gt;
&lt;p&gt;Each image must be 50 MB or smaller. Larger images throw:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;System.InvalidOperationException: Image size exceeds maximum allowed (50MB).
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In practice, downsize images to the projector&amp;rsquo;s native resolution (typically 336 or 448 pixels on the short side) before attaching — large images are resized internally and eat wall time.&lt;/p&gt;
&lt;h2 id=&#34;multiple-images-per-message&#34;&gt;Multiple images per message&lt;/h2&gt;
&lt;p&gt;Pass several byte arrays in one call:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;kt&#34;&gt;byte&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;img1&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;File&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ReadAllBytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;before.png&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;span class=&#34;kt&#34;&gt;byte&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;img2&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;File&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ReadAllBytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;after.png&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;reply&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SendMessageAsync&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
    &lt;span class=&#34;s&#34;&gt;&amp;#34;Compare these two screenshots. What changed?&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;media&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;img1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;img2&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The engine processes images in the order they appear in the array. The chat template places them at marker positions in the prompt.&lt;/p&gt;
&lt;p&gt;Not every vision model handles arbitrary numbers of images equally well — most are tuned for one or two. Refer to the model&amp;rsquo;s own documentation for recommended limits.&lt;/p&gt;
&lt;h2 id=&#34;mediaattachment-class&#34;&gt;&lt;code&gt;MediaAttachment&lt;/code&gt; class&lt;/h2&gt;
&lt;p&gt;When &lt;code&gt;SendMessageAsync&lt;/code&gt; receives &lt;code&gt;byte[]&lt;/code&gt;, it wraps each array in a &lt;code&gt;MediaAttachment&lt;/code&gt; internally. The class is also public if you need to handle attachments explicitly — for example, when adding them to &lt;code&gt;ChatMessage&lt;/code&gt; instances manually.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;namespace&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Models&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;MediaAttachment&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;byte&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Data&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MediaFormat&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Format&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MimeType&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Width&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Height&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Description&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FileName&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;set&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MediaAttachment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MediaAttachment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;byte&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MediaFormat&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;format&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;

    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;static&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MediaAttachment&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FromBytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
        &lt;span class=&#34;kt&#34;&gt;byte&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
        &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;fileName&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;null&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
        &lt;span class=&#34;kt&#34;&gt;string?&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;description&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;null&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;

    &lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;void&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Validate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;frombytes--auto-detecting-format&#34;&gt;&lt;code&gt;FromBytes&lt;/code&gt; — auto-detecting format&lt;/h3&gt;
&lt;p&gt;The idiomatic constructor:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Models&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;attachment&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MediaAttachment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;FromBytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;File&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ReadAllBytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;scan.png&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;fileName&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;scan.png&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;description&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;&amp;#34;Invoice scanned at 300 DPI&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The factory:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Reads the first 12 bytes.&lt;/li&gt;
&lt;li&gt;Matches against the magic-byte table above.&lt;/li&gt;
&lt;li&gt;Sets &lt;code&gt;Format&lt;/code&gt;, &lt;code&gt;MimeType&lt;/code&gt;, &lt;code&gt;FileName&lt;/code&gt;, and &lt;code&gt;Description&lt;/code&gt; accordingly.&lt;/li&gt;
&lt;li&gt;Returns the attachment.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If the magic bytes do not match, &lt;code&gt;Format&lt;/code&gt; is &lt;code&gt;MediaFormat.Unknown&lt;/code&gt; and &lt;code&gt;Validate&lt;/code&gt; throws when called.&lt;/p&gt;
&lt;h3 id=&#34;validate&#34;&gt;&lt;code&gt;Validate&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Enforces three checks:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;th&gt;Failure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data is non-null and non-empty&lt;/td&gt;
&lt;td&gt;&lt;code&gt;InvalidOperationException: Image data cannot be null or empty.&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data size ≤ 50 MB&lt;/td&gt;
&lt;td&gt;&lt;code&gt;InvalidOperationException: Image size exceeds maximum allowed (50MB).&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Format != Unknown&lt;/td&gt;
&lt;td&gt;&lt;code&gt;InvalidOperationException: Unknown or unsupported image format.&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The SDK calls &lt;code&gt;Validate&lt;/code&gt; automatically when you send a message. You can call it earlier to fail-fast in application code.&lt;/p&gt;
&lt;h3 id=&#34;mediaformat-enum&#34;&gt;&lt;code&gt;MediaFormat&lt;/code&gt; enum&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;enum&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MediaFormat&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;Unknown&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;JPEG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;PNG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;BMP&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;GIF&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;WebP&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;chatmessage-with-media&#34;&gt;&lt;code&gt;ChatMessage&lt;/code&gt; with media&lt;/h2&gt;
&lt;p&gt;For scenarios that construct the chat history manually (for example, seeding &lt;code&gt;ChatParameters.History&lt;/code&gt; with a multimodal turn), build &lt;code&gt;ChatMessage&lt;/code&gt; objects directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Aspose.LLM.Abstractions.Models&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;attachment&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MediaAttachment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;FromBytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;File&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ReadAllBytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;label.png&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;));&lt;/span&gt;
&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;msg&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CreateUserMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Read this label.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;attachment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;

&lt;span class=&#34;c1&#34;&gt;// Attach to preset history:
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;History&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;List&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ChatMessage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;msg&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;ChatMessage.HasMedia&lt;/code&gt; and &lt;code&gt;ChatMessage.TotalMediaSize&lt;/code&gt; are convenience properties for inspection.&lt;/p&gt;
&lt;h2 id=&#34;common-errors&#34;&gt;Common errors&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Unknown or unsupported image format&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;File is TIFF, HEIC, SVG, or corrupt.&lt;/td&gt;
&lt;td&gt;Convert to JPEG/PNG/BMP/GIF/WebP.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Image size exceeds maximum allowed (50MB)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Very high-resolution image.&lt;/td&gt;
&lt;td&gt;Downscale or recompress before attaching.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Garbled reply, no image detail&lt;/td&gt;
&lt;td&gt;Chat template mismatch or wrong preset.&lt;/td&gt;
&lt;td&gt;Verify you are using a vision preset; see &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/chat-templates/&#34;&gt;Chat templates&lt;/a&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reply ignores the image&lt;/td&gt;
&lt;td&gt;Prompt is too generic.&lt;/td&gt;
&lt;td&gt;Reference &amp;ldquo;the image&amp;rdquo;, &amp;ldquo;this diagram&amp;rdquo;, etc. explicitly.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/vision-presets/&#34;&gt;Vision presets&lt;/a&gt; — pick the right preset for your images.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/chat-templates/&#34;&gt;Chat templates&lt;/a&gt; — how the engine inserts image markers.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/debugging-vision/&#34;&gt;Debugging vision&lt;/a&gt; — diagnose misalignments and garbled output.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Chat templates</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/multimodal/chat-templates/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/multimodal/chat-templates/</guid>
      <description>
        
        
        &lt;p&gt;Every vision model family has its own prompt format — a specific way to mark where images are inserted and how text turns are wrapped. Aspose.LLM for .NET ships eight templates and selects the right one automatically from the model&amp;rsquo;s GGUF metadata at load time.&lt;/p&gt;
&lt;p&gt;You do &lt;strong&gt;not&lt;/strong&gt; configure templates yourself in the current release. The selection is automatic and internal. This page exists so you can identify which templates are supported and recognize the model families behind them.&lt;/p&gt;
&lt;h2 id=&#34;supported-templates&#34;&gt;Supported templates&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Template&lt;/th&gt;
&lt;th&gt;Model family&lt;/th&gt;
&lt;th&gt;Typical models&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLaVA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LLaVA-style vision fine-tunes&lt;/td&gt;
&lt;td&gt;LLaVA-1.5, LLaVA-1.6, various community LLaVAs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen2VL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Qwen 2 / 2.5 VL&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Qwen2-VL-*&lt;/code&gt;, &lt;code&gt;Qwen2.5-VL-*&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen3VL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Qwen 3 VL&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Qwen3-VL-*&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pixtral&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mistral Pixtral&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Pixtral-12B&lt;/code&gt; and derivatives&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;InternVL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;InternVL series&lt;/td&gt;
&lt;td&gt;&lt;code&gt;InternVL2-*&lt;/code&gt;, &lt;code&gt;InternVL3-*&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemma 3 Vision&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemma-3-*-vision&lt;/code&gt; variants&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Llama4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Llama 4 Vision&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Llama-4-*-Vision&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MiniCPMV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MiniCPM-V&lt;/td&gt;
&lt;td&gt;&lt;code&gt;MiniCPM-V-*&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;how-selection-works&#34;&gt;How selection works&lt;/h2&gt;
&lt;p&gt;When the engine loads a vision model, it:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Reads the model&amp;rsquo;s metadata (architecture name, base model name, and vision-specific keys) from the GGUF file.&lt;/li&gt;
&lt;li&gt;Matches those values against the template dispatch table in &lt;code&gt;Aspose.LLM.Interop.Multimodal.VisualModelChatTemplates&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Picks the template whose marker tokens and turn format match the detected model family.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If no template matches, the engine falls back to the default text template. Vision turns then emit a generic marker that the model may or may not recognize — output quality degrades.&lt;/p&gt;
&lt;h2 id=&#34;built-in-preset--template-mapping&#34;&gt;Built-in preset → template mapping&lt;/h2&gt;
&lt;p&gt;Each built-in vision preset is already tested against its matching template:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Preset&lt;/th&gt;
&lt;th&gt;Template auto-selected&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Qwen25VL3BPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Qwen2VL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Qwen3VL2BPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Qwen3VL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Gemma3VisionPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gemma3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ministral3VisionPreset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pixtral (Mistral vision models share the Pixtral template)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you extend one of these presets or use a custom GGUF from the same family, template detection still works.&lt;/p&gt;
&lt;h2 id=&#34;custom-vision-models&#34;&gt;Custom vision models&lt;/h2&gt;
&lt;p&gt;If you build a custom preset pointing at a non-Aspose GGUF from one of the eight supported families, template auto-selection should work out of the box — the engine relies on metadata keys that most upstream GGUF conversions preserve.&lt;/p&gt;
&lt;p&gt;If selection fails — the response is garbled or includes literal marker tokens like &lt;code&gt;&amp;lt;image&amp;gt;&lt;/code&gt; in the output — the model&amp;rsquo;s metadata is missing the expected keys. Options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pick a different GGUF export of the same model with richer metadata.&lt;/li&gt;
&lt;li&gt;File a support request via the &lt;a href=&#34;https://forum.aspose.com/&#34;&gt;Aspose Support Forum&lt;/a&gt; with the model&amp;rsquo;s Hugging Face URL so the team can add detection for that specific export.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The current SDK version does not expose a public override for forcing a specific template. A future release may add that surface.&lt;/p&gt;
&lt;h2 id=&#34;checking-which-template-was-picked&#34;&gt;Checking which template was picked&lt;/h2&gt;
&lt;p&gt;Enable debug logging and look for the &lt;code&gt;[MM]&lt;/code&gt; tags:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EnableDebugLogging&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Lines like &lt;code&gt;[MM] selected template: Qwen3VL&lt;/code&gt; appear shortly after model load. See &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/debugging-vision/&#34;&gt;Debugging vision&lt;/a&gt; for the full log taxonomy.&lt;/p&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/vision-presets/&#34;&gt;Vision presets&lt;/a&gt; — built-in presets with their matching templates.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/debugging-vision/&#34;&gt;Debugging vision&lt;/a&gt; — inspect the selected template in logs.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/attaching-images/&#34;&gt;Attaching images&lt;/a&gt; — the sending side.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Net: Debugging vision</title>
      <link>https://docs.aspose.com/llm/net/developer-reference/multimodal/debugging-vision/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://docs.aspose.com/llm/net/developer-reference/multimodal/debugging-vision/</guid>
      <description>
        
        
        &lt;p&gt;Vision flows involve several extra stages compared to plain text: image preprocessing, projector evaluation, marker tokenization, KV alignment, and generation. When something goes wrong, the failure is rarely a clean exception — instead you see garbled, repetitive, or off-topic output. This page shows how to diagnose those failures.&lt;/p&gt;
&lt;h2 id=&#34;turn-on-debug-logging&#34;&gt;Turn on debug logging&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-csharp&#34; data-lang=&#34;csharp&#34;&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;Microsoft.Extensions.Logging&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;loggerFactory&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LoggerFactory&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;builder&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;n&#34;&gt;builder&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;AddConsole&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;().&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;SetMinimumLevel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LogLevel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Debug&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;));&lt;/span&gt;
&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;logger&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;loggerFactory&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CreateLogger&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Program&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;&amp;gt;();&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;new&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Qwen3VL2BPreset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EngineParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;EnableDebugLogging&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Verbosity&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;// mtmd debug level
&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MtmdContextParameters&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PrintTimings&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;true&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;var&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;api&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AsposeLLMApi&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Create&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;preset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;logger&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With these settings, the native layers emit tagged lines during every chat operation.&lt;/p&gt;
&lt;h2 id=&#34;log-tag-taxonomy&#34;&gt;Log tag taxonomy&lt;/h2&gt;
&lt;p&gt;Tags appear at the start of each log line. Grep by tag to isolate one concern at a time.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tag&lt;/th&gt;
&lt;th&gt;Origin&lt;/th&gt;
&lt;th&gt;What it shows&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[MM]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multimodal layer&lt;/td&gt;
&lt;td&gt;Projector load, template selection, chunk counts, alignment.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[CTX]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Context manager&lt;/td&gt;
&lt;td&gt;Context state, batch dispatch, KV reservations.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[KV]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;KV cache&lt;/td&gt;
&lt;td&gt;Evictions, defragmentation, cleanup strategy application.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DBG tok:&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokenizer&lt;/td&gt;
&lt;td&gt;Raw tokens emitted for each prompt and generation step.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DBG mtmd-tokenize&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mtmd_tokenize&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-chunk tokenization details when images are inserted.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Typical line shapes:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[MM] selected template: Qwen3VL
[MM] projector loaded: mmproj-Qwen3VL-2B-Instruct-Q8_0.gguf (524 MB)
[MM] tokenize: text chunks = 3, image chunks = 1
[CTX] batch dispatched: tokens=1536, seq_id=0
[KV] cleanup: strategy=RemoveOldestMessages, freed=4096 tokens
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;the-parse_mm_logszsh-helper&#34;&gt;The &lt;code&gt;parse_mm_logs.zsh&lt;/code&gt; helper&lt;/h2&gt;
&lt;p&gt;The Aspose.LLM SDK repository includes a helper at &lt;code&gt;parse_mm_logs.zsh&lt;/code&gt; that filters a raw log file into digestible sections. It groups output by concern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pairing&lt;/strong&gt; — base model + projector match.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Projector load&lt;/strong&gt; — file path and size.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Template choice&lt;/strong&gt; — which of the eight templates was selected.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Marker tokenization&lt;/strong&gt; — how the image marker tokens were emitted in the prompt.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Chunks&lt;/strong&gt; — text/image chunk counts and their positions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Alignment&lt;/strong&gt; — whether the image chunks line up with the prompt markers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Eval&lt;/strong&gt; — projector evaluation timings.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context adoption&lt;/strong&gt; — which KV positions hold image embeddings.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conditioning&lt;/strong&gt; — tokens produced in the first decode step.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;KV state&lt;/strong&gt; — reservation and eviction.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Token stream&lt;/strong&gt; — every generated token.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Final answer&lt;/strong&gt; — the trimmed response text.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Usage:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;./parse_mm_logs.zsh &amp;lt; raw-run.log &amp;gt; sectioned-run.txt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The sectioned output makes misalignments obvious — the &amp;ldquo;Alignment&amp;rdquo; section is usually where the problem sits when output is garbled.&lt;/p&gt;
&lt;h2 id=&#34;common-vision-failures&#34;&gt;Common vision failures&lt;/h2&gt;
&lt;h3 id=&#34;garbled-output-or-literal-marker-tokens-in-reply&#34;&gt;Garbled output or literal marker tokens in reply&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: the reply contains &lt;code&gt;&amp;lt;image&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt;, or other bracketed tokens verbatim. Or the model describes something unrelated to the image.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Root cause&lt;/strong&gt;: chat template mismatch. The engine picked the wrong template (or the fallback) because the GGUF&amp;rsquo;s metadata does not expose the model family expected by the dispatch table.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Verify you are using a supported preset family (Qwen VL, Gemma 3 Vision, Ministral, LLaVA, Pixtral, InternVL, Llama 4, MiniCPMV). See &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/chat-templates/&#34;&gt;Chat templates&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If using a custom GGUF, try a different export from Hugging Face with richer metadata.&lt;/li&gt;
&lt;li&gt;Inspect &lt;code&gt;[MM] selected template: ...&lt;/code&gt; in logs — if it says &amp;ldquo;fallback&amp;rdquo;, detection failed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;repeated-first-token-or-truncated-reply&#34;&gt;Repeated first token or truncated reply&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: the response loops on the first few tokens or cuts off mid-sentence after one word.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Root cause&lt;/strong&gt;: &lt;code&gt;MaxTokens&lt;/code&gt; too low for a reasoning model, or KV cache cleanup too aggressive.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Raise &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/chat/&#34;&gt;&lt;code&gt;ChatParameters.MaxTokens&lt;/code&gt;&lt;/a&gt; to 1024 or 2048.&lt;/li&gt;
&lt;li&gt;Change &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/cache-management/&#34;&gt;&lt;code&gt;CacheCleanupStrategy&lt;/code&gt;&lt;/a&gt; to &lt;code&gt;KeepSystemPromptAndFirstUserMessage&lt;/code&gt; if the model loses the image reference during generation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;slow-first-response&#34;&gt;Slow first response&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: 30+ seconds before the first token on a well-specced GPU.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Root cause&lt;/strong&gt;: projector loaded on CPU while base model is on GPU, or the image is very large.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Set &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/&#34;&gt;&lt;code&gt;MtmdContextParameters.UseGpu = true&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Downscale large images before attaching.&lt;/li&gt;
&lt;li&gt;Enable &lt;code&gt;MtmdContextParameters.PrintTimings&lt;/code&gt; to see per-stage time.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;image-format-error&#34;&gt;Image format error&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: &lt;code&gt;InvalidOperationException: Unknown or unsupported image format.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix&lt;/strong&gt;: convert to JPEG, PNG, BMP, GIF, or WebP. See &lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/attaching-images/#supported-formats&#34;&gt;Attaching images&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;image-too-large&#34;&gt;Image too large&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: &lt;code&gt;InvalidOperationException: Image size exceeds maximum allowed (50MB).&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix&lt;/strong&gt;: downscale or recompress. The projector processes images at 336-448 pixels regardless — oversized source is wasted work.&lt;/p&gt;
&lt;h2 id=&#34;collecting-a-bug-report&#34;&gt;Collecting a bug report&lt;/h2&gt;
&lt;p&gt;When escalating to support:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Run with &lt;code&gt;EnableDebugLogging = true&lt;/code&gt;, &lt;code&gt;MtmdContextParameters.Verbosity = 3&lt;/code&gt;, &lt;code&gt;MtmdContextParameters.PrintTimings = true&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Capture the full log file.&lt;/li&gt;
&lt;li&gt;Pipe through &lt;code&gt;parse_mm_logs.zsh&lt;/code&gt; for a sectioned version.&lt;/li&gt;
&lt;li&gt;Include:
&lt;ul&gt;
&lt;li&gt;Preset class name.&lt;/li&gt;
&lt;li&gt;Model and &lt;code&gt;mmproj&lt;/code&gt; Hugging Face paths (or custom GGUF details).&lt;/li&gt;
&lt;li&gt;A sample image that reproduces the issue (if legally shareable).&lt;/li&gt;
&lt;li&gt;The exact prompt that triggered the failure.&lt;/li&gt;
&lt;li&gt;The sectioned log.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Submit via the &lt;a href=&#34;https://forum.aspose.com/&#34;&gt;Aspose Support Forum&lt;/a&gt; or a paid support ticket.&lt;/p&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/chat-templates/&#34;&gt;Chat templates&lt;/a&gt; — the eight supported template families.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/parameters/multimodal-context/&#34;&gt;Multimodal context parameters&lt;/a&gt; — GPU offload and verbosity for the projector.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aspose.com/llm/net/developer-reference/multimodal/attaching-images/&#34;&gt;Attaching images&lt;/a&gt; — correct image input.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
  </channel>
</rss>
