Binary manager parameters

BinaryManagerParameters controls how the SDK obtains the native llama.cpp binaries (libllama, libmtmd, libggml-*). On first use, the engine downloads the matching build from GitHub for your platform and acceleration backend, caches it, and reuses the cache on subsequent runs.

Class reference

namespace Aspose.LLM.Abstractions.Parameters;

public class BinaryManagerParameters
{
    public string Owner { get; set; } = "ggml-org";
    public string Repo { get; set; } = "llama.cpp";
    public string ReleaseTag { get; set; } = "b8816";
    public string BinaryPath { get; set; }              // <LocalAppData>/Aspose.LLM/runtimes
    public SystemSpec? SystemSpecification { get; set; }
    public AccelerationType? PreferredAcceleration { get; set; }
}

Detailed field reference

Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions.

Fields

Field Type Default Purpose
Owner string "ggml-org" GitHub repository owner for llama.cpp releases.
Repo string "llama.cpp" GitHub repository name.
ReleaseTag string "b8816" (as of SDK v26.5.0) Specific llama.cpp release to pin.
BinaryPath string <LocalAppData>/Aspose.LLM/runtimes Local cache for downloaded native binaries.
SystemSpecification SystemSpec? null (auto-detect) Override the detected OS / architecture / acceleration capabilities.
PreferredAcceleration AccelerationType? null (auto-select) Force a specific acceleration backend (CUDA, HIP, Metal, Vulkan, CPU variants).

Owner and Repo

Together they form github.com/<Owner>/<Repo>/releases/.... The defaults target the upstream llama.cpp repository. Change them only if you mirror releases to a fork that stays byte-compatible with upstream — for example, in an air-gapped enterprise setup that syncs selected releases into a private GitHub Enterprise instance.

ReleaseTag

Pins a specific upstream release. As of SDK v26.5.0, the default is b8816. Each release tag corresponds to native binaries with matching P/Invoke signatures; changing the tag without a matching SDK version is unsupported.

Override only when:

  • You are testing a newer upstream release against the current SDK (development only).
  • You are locked to an older release because of a validated deployment.
preset.BinaryManagerParameters.ReleaseTag = "b8816";

BinaryPath

Folder where downloaded binaries live. The default is <LocalAppData>/Aspose.LLM/runtimes%LOCALAPPDATA%\Aspose.LLM\runtimes on Windows and the equivalent LocalApplicationData folder elsewhere.

Override when:

  • Shared cache across multiple applications or services on the same host.
  • Read-only root filesystem — point the cache at a writable volume.
  • Pre-populated deployment — bundle the binaries with your application and point BinaryPath at them to skip the download on first run.
preset.BinaryManagerParameters.BinaryPath = @"/var/lib/aspose-llm/runtimes";

SystemSpecification

When null (the default), the SDK detects the host’s OS, architecture, and available accelerations at engine construction. Override with an explicit SystemSpec only for diagnostics or cross-platform binary preparation — leaving this null is correct for normal deployments.

PreferredAcceleration

When null, the SDK picks the best available acceleration for the host in this order: CUDA → HIP → Metal → Vulkan → CPU (AVX level best to worst). Set an explicit value to override the selection.

Supported values (see Aspose.LLM.Abstractions.Acceleration.AccelerationType):

Value Platform Notes
CUDA Windows, Linux NVIDIA GPUs.
HIP Linux AMD GPUs via ROCm.
Metal macOS (Apple Silicon) M-series chips.
Vulkan Windows, Linux Cross-platform GPU.
AVX512 Any x64 with AVX-512 Fastest CPU path.
AVX2 Any x64 with AVX2 Default CPU fallback.
AVX Older x64 Slower.
NoAVX Very old CPUs Last-resort compatibility.
Kompute, OpenCL, SYCL, OpenBLAS Platform-dependent Less common; verify availability for your target.

The enum has additional values (None) used internally — avoid setting them explicitly.

Typical recipes

Force CPU-only execution

using Aspose.LLM.Abstractions.Acceleration;

var preset = new Qwen25Preset();
preset.BinaryManagerParameters.PreferredAcceleration = AccelerationType.AVX2;
preset.BaseModelInferenceParameters.GpuLayers = 0; // complement on the inference side

Force CUDA on a multi-GPU box

var preset = new Qwen25Preset();
preset.BinaryManagerParameters.PreferredAcceleration = AccelerationType.CUDA;
preset.BaseModelInferenceParameters.GpuLayers = 999;
// Pick a specific GPU via BaseModelInferenceParameters.MainGpu = N;

Pre-populated offline deployment

Download the binaries on a connected machine, copy the cache into your deployment, and point BinaryPath at it on the offline host:

var preset = new Qwen25Preset();
preset.BinaryManagerParameters.BinaryPath = @"/opt/aspose-llm/runtimes";
// Also point EngineParameters.ModelCachePath at a pre-populated model cache.

Shared cache across services

preset.BinaryManagerParameters.BinaryPath = @"/srv/shared/aspose-llm/runtimes";

Make sure every service using this cache runs the same SDK version — version mismatches produce binary incompatibilities.

What’s next