Multimodal context parameters

MultimodalContextParameters — exposed on the preset as MtmdContextParameters — configures the mtmd context used by vision presets to evaluate image tokens. The base text model is configured by ContextParameters; this bag covers only the multimodal layer.

Only vision presets use these settings. On text-only presets the bag is instantiated but has no effect.

Class reference

namespace Aspose.LLM.Abstractions.Parameters;

public class MultimodalContextParameters
{
    public bool? UseGpu { get; set; }
    public bool? PrintTimings { get; set; }
    public int? ThreadCount { get; set; }
    public int? Verbosity { get; set; }
    public string? MediaMarker { get; set; }
}

Every field is nullable. A null value means “use the native mtmd default” — override only when you have a specific reason.

Detailed field reference

Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions.

Fields

Field Type Default Purpose
UseGpu bool? native default Whether to offload the vision projector to GPU.
PrintTimings bool? native default Emit per-step timing diagnostics from the mtmd layer.
ThreadCount int? native default Threads used by mtmd processing.
Verbosity int? native default Log level for the mtmd layer.
MediaMarker string? native default Placeholder token text that marks image positions in the prompt.

UseGpu

Controls whether the vision projector (mmproj) runs on the GPU alongside the base model. The mmproj is typically small (200 MB - 2 GB), so GPU offload is fast even on modest hardware.

  • null — delegate to mtmd’s auto-detection (currently: GPU if available).
  • true — force GPU.
  • false — force CPU. Use when you have limited GPU memory and want to spend it entirely on the base model.
preset.MtmdContextParameters.UseGpu = false; // keep GPU memory for the base model

PrintTimings

Enables mtmd’s built-in per-step timing logs — the time spent tokenizing images, running the projector, and evaluating chunks. Useful for diagnosing slow first-response latency on vision queries.

preset.MtmdContextParameters.PrintTimings = true;

Leave this null (off) in production. Timing logs add overhead and flood the output.

ThreadCount

Threads used for CPU-side mtmd work (image preprocessing, CPU portions of the projector). When null, mtmd follows its own heuristic — usually half the logical cores.

Override when:

  • The rest of your application needs more cores and mtmd is single-shot work.
  • You run multiple vision requests concurrently and want to cap each one’s CPU footprint.
preset.MtmdContextParameters.ThreadCount = 2;

Verbosity

Log verbosity for the mtmd layer. The native layer accepts an integer; the typical mapping is:

Value Level
0 Error
1 Warn
2 Info
3 Debug
preset.MtmdContextParameters.Verbosity = 3; // debug — useful when images are tokenized unexpectedly

Keep verbosity low in production (0 or 1). Higher levels emit tagged lines that need post-processing to be useful — see the parse_mm_logs.zsh helper script in the Aspose.LLM SDK repository.

MediaMarker

Placeholder text used in the chat template to mark where images are inserted. The default is the chat-template-specific marker (different per model family — LLaVA, Qwen-VL, Gemma-Vision, and others have different tokens). Override only if you understand the model’s prompt format and need a non-standard marker.

preset.MtmdContextParameters.MediaMarker = "<|image|>";

In nearly all cases, leave this null. The correct marker is selected automatically from the model’s metadata.

Typical recipes

Default vision configuration

var preset = new Qwen3VL2BPreset();
// MtmdContextParameters stays at defaults — all fields null.

using var api = AsposeLLMApi.Create(preset);

Debug slow image processing

var preset = new Qwen3VL2BPreset();
preset.MtmdContextParameters.PrintTimings = true;
preset.MtmdContextParameters.Verbosity = 3;

using var api = AsposeLLMApi.Create(preset, logger);
// Inspect logs for per-stage mtmd timings.

CPU-only projector to save GPU memory

var preset = new Qwen3VL2BPreset();
preset.MtmdContextParameters.UseGpu = false;                  // projector on CPU
preset.BaseModelInferenceParameters.GpuLayers = 999;          // base model fully on GPU

On a GPU tight for memory, keeping the projector on CPU trades some first-token latency for more headroom for the base model and KV cache.

What’s next