Binary manager parameters
BinaryManagerParameters controls how the SDK obtains the native llama.cpp binaries (libllama, libmtmd, libggml-*). On first use, the engine downloads the matching build from GitHub for your platform and acceleration backend, caches it, and reuses the cache on subsequent runs.
Class reference
namespace Aspose.LLM.Abstractions.Parameters;
public class BinaryManagerParameters
{
public string Owner { get; set; } = "ggml-org";
public string Repo { get; set; } = "llama.cpp";
public string ReleaseTag { get; set; } = "b8816";
public string BinaryPath { get; set; } // <LocalAppData>/Aspose.LLM/runtimes
public SystemSpec? SystemSpecification { get; set; }
public AccelerationType? PreferredAcceleration { get; set; }
}
Detailed field reference
Each field has a dedicated page with full defaults, scenario tables, code examples, and interactions.
Fields
| Field | Type | Default | Purpose |
|---|---|---|---|
Owner |
string |
"ggml-org" |
GitHub repository owner for llama.cpp releases. |
Repo |
string |
"llama.cpp" |
GitHub repository name. |
ReleaseTag |
string |
"b8816" (as of SDK v26.5.0) |
Specific llama.cpp release to pin. |
BinaryPath |
string |
<LocalAppData>/Aspose.LLM/runtimes |
Local cache for downloaded native binaries. |
SystemSpecification |
SystemSpec? |
null (auto-detect) |
Override the detected OS / architecture / acceleration capabilities. |
PreferredAcceleration |
AccelerationType? |
null (auto-select) |
Force a specific acceleration backend (CUDA, HIP, Metal, Vulkan, CPU variants). |
Owner and Repo
Together they form github.com/<Owner>/<Repo>/releases/.... The defaults target the upstream llama.cpp repository. Change them only if you mirror releases to a fork that stays byte-compatible with upstream — for example, in an air-gapped enterprise setup that syncs selected releases into a private GitHub Enterprise instance.
ReleaseTag
Pins a specific upstream release. As of SDK v26.5.0, the default is b8816. Each release tag corresponds to native binaries with matching P/Invoke signatures; changing the tag without a matching SDK version is unsupported.
Override only when:
- You are testing a newer upstream release against the current SDK (development only).
- You are locked to an older release because of a validated deployment.
preset.BinaryManagerParameters.ReleaseTag = "b8816";
ReleaseTag. Pinning a different tag can produce runtime errors if upstream changed a struct layout or function signature. Do not ship custom tags to production without a migration pass — see the llama-cpp-migration workflow used by the Aspose team.
BinaryPath
Folder where downloaded binaries live. The default is <LocalAppData>/Aspose.LLM/runtimes — %LOCALAPPDATA%\Aspose.LLM\runtimes on Windows and the equivalent LocalApplicationData folder elsewhere.
Override when:
- Shared cache across multiple applications or services on the same host.
- Read-only root filesystem — point the cache at a writable volume.
- Pre-populated deployment — bundle the binaries with your application and point
BinaryPathat them to skip the download on first run.
preset.BinaryManagerParameters.BinaryPath = @"/var/lib/aspose-llm/runtimes";
SystemSpecification
When null (the default), the SDK detects the host’s OS, architecture, and available accelerations at engine construction. Override with an explicit SystemSpec only for diagnostics or cross-platform binary preparation — leaving this null is correct for normal deployments.
PreferredAcceleration
When null, the SDK picks the best available acceleration for the host in this order: CUDA → HIP → Metal → Vulkan → CPU (AVX level best to worst). Set an explicit value to override the selection.
Supported values (see Aspose.LLM.Abstractions.Acceleration.AccelerationType):
| Value | Platform | Notes |
|---|---|---|
CUDA |
Windows, Linux | NVIDIA GPUs. |
HIP |
Linux | AMD GPUs via ROCm. |
Metal |
macOS (Apple Silicon) | M-series chips. |
Vulkan |
Windows, Linux | Cross-platform GPU. |
AVX512 |
Any x64 with AVX-512 | Fastest CPU path. |
AVX2 |
Any x64 with AVX2 | Default CPU fallback. |
AVX |
Older x64 | Slower. |
NoAVX |
Very old CPUs | Last-resort compatibility. |
Kompute, OpenCL, SYCL, OpenBLAS |
Platform-dependent | Less common; verify availability for your target. |
The enum has additional values (None) used internally — avoid setting them explicitly.
Typical recipes
Force CPU-only execution
using Aspose.LLM.Abstractions.Acceleration;
var preset = new Qwen25Preset();
preset.BinaryManagerParameters.PreferredAcceleration = AccelerationType.AVX2;
preset.BaseModelInferenceParameters.GpuLayers = 0; // complement on the inference side
Force CUDA on a multi-GPU box
var preset = new Qwen25Preset();
preset.BinaryManagerParameters.PreferredAcceleration = AccelerationType.CUDA;
preset.BaseModelInferenceParameters.GpuLayers = 999;
// Pick a specific GPU via BaseModelInferenceParameters.MainGpu = N;
Pre-populated offline deployment
Download the binaries on a connected machine, copy the cache into your deployment, and point BinaryPath at it on the offline host:
var preset = new Qwen25Preset();
preset.BinaryManagerParameters.BinaryPath = @"/opt/aspose-llm/runtimes";
// Also point EngineParameters.ModelCachePath at a pre-populated model cache.
Shared cache across services
preset.BinaryManagerParameters.BinaryPath = @"/srv/shared/aspose-llm/runtimes";
Make sure every service using this cache runs the same SDK version — version mismatches produce binary incompatibilities.
What’s next
- System requirements — what runtimes and hardware the binaries support.
- Model inference parameters — complement
PreferredAccelerationwithGpuLayersand split settings. - Architecture — what happens during first-run binary deployment.