UseMemoryLocking

UseMemoryLocking requests the OS to lock model memory pages, preventing them from being swapped out. Requires elevated privileges or raised ulimits.

Quick reference

Type bool?
Default null (native default — usually false)
Category Model loading
Field on ModelInferenceParameters.UseMemoryLocking

What it does

  • true — the engine calls mlock (Linux/macOS) or VirtualLock (Windows) on the model memory. The OS will not page it out.
  • false or null — no locking. OS may page model memory under pressure.

Paging inference model memory is catastrophic for performance — suddenly generation stalls for seconds while the kernel pages weights back from disk. UseMemoryLocking = true prevents that.

Cost: requires appropriate privileges. On Linux, the user must have sufficient RLIMIT_MEMLOCK (raise via ulimit -l or /etc/security/limits.conf). On Windows, the process needs “Lock Pages in Memory” permission.

When to change it

Scenario Value
Default null (disabled)
Shared host under memory pressure true (requires privilege)
Container without memlock capability null (do not attempt)
Dedicated inference machine with ample RAM null (unnecessary)

Example

var preset = new Qwen25Preset();
preset.BaseModelInferenceParameters.UseMemoryLocking = true;
// Requires the process to have the required OS-level privilege.

Linux ulimit bump (at the shell, before running):

ulimit -l unlimited
dotnet run

Interactions

  • UseMemoryMapping — with mmap on, mlock locks the mapped pages as they fault in.
  • System-level configuration — mlock availability depends on OS limits.

What’s next