Multimodal context parameters

MultimodalContextParameters — exposed on the preset as MtmdContextParameters — configures the mtmd context used by vision presets to evaluate image tokens. The base text model is configured by ContextParameters; this bag covers only the multimodal layer.

Only vision presets use these settings. On text-only presets the bag is instantiated but has no effect.

Class reference

namespace Aspose.LLM.Abstractions.Parameters;

public class MultimodalContextParameters
{
    public bool? UseGpu { get; set; }
    public bool? PrintTimings { get; set; }
    public int? ThreadCount { get; set; }
    public int? Verbosity { get; set; }
    public string? MediaMarker { get; set; }
}

Every field is nullable. A null value means “use the native mtmd default” — override only when you have a specific reason.

Detailed field reference

Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions.

Fields

Field	Type	Default	Purpose
`UseGpu`	`bool?`	native default	Whether to offload the vision projector to GPU.
`PrintTimings`	`bool?`	native default	Emit per-step timing diagnostics from the `mtmd` layer.
`ThreadCount`	`int?`	native default	Threads used by `mtmd` processing.
`Verbosity`	`int?`	native default	Log level for the `mtmd` layer.
`MediaMarker`	`string?`	native default	Placeholder token text that marks image positions in the prompt.

`UseGpu`

Controls whether the vision projector (mmproj) runs on the GPU alongside the base model. The mmproj is typically small (200 MB - 2 GB), so GPU offload is fast even on modest hardware.

null — delegate to mtmd’s auto-detection (currently: GPU if available).
true — force GPU.
false — force CPU. Use when you have limited GPU memory and want to spend it entirely on the base model.

preset.MtmdContextParameters.UseGpu = false; // keep GPU memory for the base model

`PrintTimings`

Enables mtmd’s built-in per-step timing logs — the time spent tokenizing images, running the projector, and evaluating chunks. Useful for diagnosing slow first-response latency on vision queries.

preset.MtmdContextParameters.PrintTimings = true;

Leave this null (off) in production. Timing logs add overhead and flood the output.

`ThreadCount`

Threads used for CPU-side mtmd work (image preprocessing, CPU portions of the projector). When null, mtmd follows its own heuristic — usually half the logical cores.

Override when:

The rest of your application needs more cores and mtmd is single-shot work.
You run multiple vision requests concurrently and want to cap each one’s CPU footprint.

preset.MtmdContextParameters.ThreadCount = 2;

`Verbosity`

Log verbosity for the mtmd layer. The native layer accepts an integer; the typical mapping is:

Value	Level
`0`	Error
`1`	Warn
`2`	Info
`3`	Debug

preset.MtmdContextParameters.Verbosity = 3; // debug — useful when images are tokenized unexpectedly

Keep verbosity low in production (0 or 1). Higher levels emit tagged lines that need post-processing to be useful — see the parse_mm_logs.zsh helper script in the Aspose.LLM SDK repository.

`MediaMarker`

Placeholder text used in the chat template to mark where images are inserted. The default is the chat-template-specific marker (different per model family — LLaVA, Qwen-VL, Gemma-Vision, and others have different tokens). Override only if you understand the model’s prompt format and need a non-standard marker.

preset.MtmdContextParameters.MediaMarker = "<|image|>";

In nearly all cases, leave this null. The correct marker is selected automatically from the model’s metadata.

Typical recipes

Default vision configuration

var preset = new Qwen3VL2BPreset();
// MtmdContextParameters stays at defaults — all fields null.

using var api = AsposeLLMApi.Create(preset);

Debug slow image processing

var preset = new Qwen3VL2BPreset();
preset.MtmdContextParameters.PrintTimings = true;
preset.MtmdContextParameters.Verbosity = 3;

using var api = AsposeLLMApi.Create(preset, logger);
// Inspect logs for per-stage mtmd timings.

CPU-only projector to save GPU memory

var preset = new Qwen3VL2BPreset();
preset.MtmdContextParameters.UseGpu = false;                  // projector on CPU
preset.BaseModelInferenceParameters.GpuLayers = 999;          // base model fully on GPU

On a GPU tight for memory, keeping the projector on CPU trades some first-token latency for more headroom for the base model and KV cache.

What’s next

Supported presets — vision — built-in vision presets and their mmproj sources.
Model source parameters — configure the vision projector’s download source.
Attaching images — vision use case (planned in a future release).

Binary manager parameters