UseMemoryMapping

UseMemoryMapping controls whether the engine memory-maps the GGUF file instead of reading it into RAM. Memory mapping lets the OS stream the model on demand, which cuts startup time and peak memory.

Quick reference


Type	`bool?`
Default	`null` (native default — usually `true`)
Category	Model loading
Field on	`ModelInferenceParameters.UseMemoryMapping`

What it does

true (default) — the OS maps the GGUF file into address space. Pages are brought into memory on first access. Startup time is fast; peak memory is bounded by the working set.
false — the engine reads the full file into RAM before model init. Startup is slower; peak memory doubles during load (read buffer + allocation).
null — native default; behaves as true on most platforms.

Memory mapping is preferred unless your filesystem does not support mmap (some network filesystems, container volume drivers).

When to change it

Scenario	Value
Default (recommended)	`null`
Network filesystem without mmap support	`false`
Need full model loaded into RAM upfront	`false`
Debugging mmap-specific issues	`false`

Example

var preset = new Qwen25Preset();
preset.BaseModelInferenceParameters.UseMemoryMapping = true;  // default, shown explicitly

For an NFS-mounted model:

preset.BaseModelInferenceParameters.UseMemoryMapping = false;
// Loading takes longer, but works on filesystems where mmap fails.

Interactions

UseMemoryLocking — lock working set to prevent paging.
GpuLayers — offloaded layers are copied from the mapped file to GPU memory.

What’s next

UseMemoryLocking — prevent paging.
GpuLayers — GPU offload.
Model inference hub — all inference knobs.

GpuLayers UseMemoryLocking