HIP / ROCm
HIP (and its underlying ROCm stack) is the AMD-specific GPU backend for Aspose.LLM for .NET. It is Linux-only and targeted at AMD Instinct data-center GPUs and recent Radeon consumer cards.
Requirements
- GPU: ROCm-supported AMD GPU.
- Instinct: MI100, MI210, MI250, MI300 series.
- Radeon: RDNA 3 (RX 7900 series) and newer are officially supported. Some RDNA 2 cards (RX 6800/6900) work with
HSA_OVERRIDE_GFX_VERSIONworkarounds.
- OS: Linux with ROCm 6.x. Ubuntu 22.04 LTS and RHEL 9 are the commonly tested hosts.
- Driver: install the ROCm stack via the official AMD packages. Verify with
rocminfo. - No Windows support: AMD has Windows ROCm in preview, but Aspose.LLM’s HIP binaries currently target Linux only. On Windows with AMD GPUs, use Vulkan instead.
Verify ROCm:
rocminfo | grep "Name:"
The output should list your GPU by name (e.g., gfx1100 for RX 7900 XTX).
Select HIP
using Aspose.LLM.Abstractions.Acceleration;
var preset = new Qwen25Preset();
preset.BinaryManagerParameters.PreferredAcceleration = AccelerationType.HIP;
preset.BaseModelInferenceParameters.GpuLayers = 999;
using var api = AsposeLLMApi.Create(preset);
The SDK downloads the HIP variant (typically 300-500 MB) on first run.
Multi-GPU
HIP supports multi-GPU across AMD cards of the same ROCm generation.
preset.BaseModelInferenceParameters.SplitMode = LlamaSplitMode.LLAMA_SPLIT_MODE_LAYER;
preset.BaseModelInferenceParameters.GpuLayers = 999;
// Optionally tune TensorSplit per-GPU VRAM.
Use ROCR_VISIBLE_DEVICES to control which GPUs the process sees.
RDNA 2 workaround
Some RDNA 2 cards (e.g., RX 6800 XT, RX 6900 XT) are not officially supported by ROCm, but work with a GFX version override:
export HSA_OVERRIDE_GFX_VERSION=10.3.0
dotnet run
Apply this before starting your process. The override tricks ROCm into treating an RX 6800/6900 as a supported variant. Quality is fine; performance is slightly lower than on supported cards.
Performance tips
- Flash Attention — supported on recent ROCm releases; enable via
ContextParameters.FlashAttentionMode = FlashAttentionType.Enabled. - Partial offload — for consumer Radeon cards with limited VRAM (16-24 GB), tune
GpuLayersto slightly below full to leave room for KV cache. - KV quantization — aggressive
TypeV = GgmlType.Q8_0or evenQ4_0claws back meaningful VRAM on long contexts.
Windows AMD users: use Vulkan
Aspose.LLM does not ship Windows HIP binaries. If your AMD GPU is on Windows, use Vulkan — AMD’s Vulkan driver is excellent and gets you GPU acceleration without ROCm.
Common issues
| Symptom | Likely cause | Fix |
|---|---|---|
rocblas_status_internal_error on load |
Incompatible ROCm version. | Match ROCm 6.x; upgrade if older. |
| Unsupported GPU at startup | Card not on ROCm’s support list. | Use Vulkan as fallback, or try HSA_OVERRIDE_GFX_VERSION. |
| Inference on CPU despite HIP binary | GpuLayers = 0 or ROCm runtime missing. |
Set GpuLayers = 999; verify with rocminfo. |
| Multi-GPU instability | Mixing different RDNA generations. | Stick to same-generation GPUs; try LLAMA_SPLIT_MODE_LAYER. |
What’s next
- Vulkan — Windows alternative for AMD.
- Binary manager parameters —
PreferredAccelerationselection. - Model inference parameters — GPU offload and multi-GPU settings.