System requirements
Plan the .NET runtime, operating system, and hardware for your Aspose.LLM for .NET deployment. Requirements depend on the preset you pick: small models run on modest hardware, while 20B-parameter models need substantial RAM and preferably GPU acceleration.
Supported .NET runtimes
Aspose.LLM targets .NET Standard 2.0. It runs on any .NET runtime that supports .NET Standard 2.0, including:
- .NET 10 (current release)
- .NET 8 LTS (recommended for new projects)
- .NET 6 and later
- .NET Framework 4.7.2 and later
New projects should use .NET 8 LTS or .NET 10. All examples in this documentation assume dotnet CLI tooling and modern .NET runtime features.
.NET Framework 4.7.2 is the minimum Framework version the SDK is validated against. Earlier Framework versions may work because the SDK builds against .NET Standard 2.0, but they are not tested and not supported.
Supported operating systems
| OS | Versions | Notes |
|---|---|---|
| Windows | 10 version 1809 and later, Windows Server 2019+ | x64 and ARM64 |
| Linux | Debian 10+, Ubuntu 20.04+, RHEL/Alma/Rocky 8+ (glibc 2.28+) | x64 and ARM64; other distributions with compatible glibc work |
| macOS | 11 (Big Sur) and later | Intel x64 and Apple Silicon (arm64); Metal acceleration on Apple Silicon |
CPU
A modern x64 CPU with AVX2 support runs any preset in CPU-only mode. BinaryManager selects an AVX-level-specific native binary at runtime:
- AVX512 — fastest; requires a recent Intel / AMD CPU.
- AVX2 — the default fallback on most modern CPUs.
- No-AVX — available for older CPUs and compatibility scenarios; inference is slow.
ARM64 CPUs are supported on Linux and macOS via native ARM binaries.
GPU (optional)
GPU acceleration is optional but strongly recommended for models larger than 3B parameters. Supported backends:
| Backend | Platforms | Requires |
|---|---|---|
| CUDA | Windows, Linux | NVIDIA driver 525+, CUDA-capable GPU (compute capability 5.0+) |
| HIP / ROCm | Linux | AMD ROCm 6+, supported AMD GPU |
| Metal | macOS (Apple Silicon) | Apple Silicon M1 or later |
| Vulkan | Windows, Linux | Vulkan 1.2+ driver, GPU with Vulkan support |
Select a backend via BinaryManagerParameters.PreferredAcceleration on the preset. When unset, the SDK picks the best available backend for your system. Control the amount of GPU offload with BaseModelInferenceParameters.GpuLayers.
Memory
RAM requirements scale with model size, quantization, and context size. The table below lists typical working-set sizes for the preset’s default quantization and context:
| Preset example | Model size | Default context | RAM (CPU) / VRAM (GPU) |
|---|---|---|---|
Llama32Preset |
3B Q4_K_M | 131 072 | 6-8 GB |
Phi4Preset |
Mini Q4_K_M | 16 384 | 4-6 GB |
Qwen25Preset |
7B Q4_K_M | 32 768 | 8-12 GB |
Qwen3Preset |
8B Q4_K_M | 32 768 | 8-12 GB |
DeepseekR1Qwen3Preset |
8B Q4_K_M | 131 072 | 12-16 GB |
DeepSeekCoder2Preset |
MoE IQ3_M | 163 840 | 12-20 GB |
Oss20Preset |
20B Q4_K_M | 131 072 | 16-24 GB |
Qwen25VL3BPreset |
VL 3B UD-IQ2_XXS | 128 000 | 4-6 GB (plus mmproj) |
Ministral3VisionPreset |
8B Q4_K_M | 262 144 | 12-20 GB (plus mmproj) |
Ranges assume default sampler and batch settings. Actual usage varies with the number of active sessions, the CacheCleanupStrategy, and KV cache quantization (ContextParameters.TypeK / TypeV).
For running the full default context on very long sessions, add 1-4 GB for the KV cache on top of the model weight size.
Disk
Two distinct caches:
| Cache | Typical size | Location |
|---|---|---|
| Model files (downloaded from Hugging Face) | 2-15 GB per model | EngineParameters.ModelCachePath (default: per-user cache folder) |
Native llama.cpp binaries |
100-500 MB per acceleration variant | BinaryManagerParameters.BinaryPath (default: per-user cache folder) |
Plan for at least 20-30 GB of free disk space when experimenting with multiple presets.
Network
Internet access is required on first use:
BinaryManagerfetches native binaries fromgithub.com/ggml-org/llama.cpp/releases.ModelManagerdownloads model files (andmmprojfor vision presets) fromhuggingface.co.
Subsequent runs use the local caches. For offline or firewalled environments, pre-download both caches and point the preset’s BinaryPath and ModelCachePath at them — details will be covered in the offline deployment use case.
Development
Any IDE or editor with .NET support works:
- Visual Studio 2022 (v17.8+) — full experience on Windows.
- JetBrains Rider — full experience cross-platform.
- VS Code with the C# Dev Kit — light-weight cross-platform option.
dotnetCLI alone — works for build, run, publish.
What’s next
- Installation — add the NuGet package to your project.
- Licensing — apply a license before running inference.
- Hello, world! — first runnable example.