HIP / ROCm

HIP (and its underlying ROCm stack) is the AMD-specific GPU backend for Aspose.LLM for .NET. It is Linux-only and targeted at AMD Instinct data-center GPUs and recent Radeon consumer cards.

Requirements

  • GPU: ROCm-supported AMD GPU.
    • Instinct: MI100, MI210, MI250, MI300 series.
    • Radeon: RDNA 3 (RX 7900 series) and newer are officially supported. Some RDNA 2 cards (RX 6800/6900) work with HSA_OVERRIDE_GFX_VERSION workarounds.
  • OS: Linux with ROCm 6.x. Ubuntu 22.04 LTS and RHEL 9 are the commonly tested hosts.
  • Driver: install the ROCm stack via the official AMD packages. Verify with rocminfo.
  • No Windows support: AMD has Windows ROCm in preview, but Aspose.LLM’s HIP binaries currently target Linux only. On Windows with AMD GPUs, use Vulkan instead.

Verify ROCm:

rocminfo | grep "Name:"

The output should list your GPU by name (e.g., gfx1100 for RX 7900 XTX).

Select HIP

using Aspose.LLM.Abstractions.Acceleration;

var preset = new Qwen25Preset();
preset.BinaryManagerParameters.PreferredAcceleration = AccelerationType.HIP;
preset.BaseModelInferenceParameters.GpuLayers = 999;

using var api = AsposeLLMApi.Create(preset);

The SDK downloads the HIP variant (typically 300-500 MB) on first run.

Multi-GPU

HIP supports multi-GPU across AMD cards of the same ROCm generation.

preset.BaseModelInferenceParameters.SplitMode = LlamaSplitMode.LLAMA_SPLIT_MODE_LAYER;
preset.BaseModelInferenceParameters.GpuLayers = 999;
// Optionally tune TensorSplit per-GPU VRAM.

Use ROCR_VISIBLE_DEVICES to control which GPUs the process sees.

RDNA 2 workaround

Some RDNA 2 cards (e.g., RX 6800 XT, RX 6900 XT) are not officially supported by ROCm, but work with a GFX version override:

export HSA_OVERRIDE_GFX_VERSION=10.3.0
dotnet run

Apply this before starting your process. The override tricks ROCm into treating an RX 6800/6900 as a supported variant. Quality is fine; performance is slightly lower than on supported cards.

Performance tips

  • Flash Attention — supported on recent ROCm releases; enable via ContextParameters.FlashAttentionMode = FlashAttentionType.Enabled.
  • Partial offload — for consumer Radeon cards with limited VRAM (16-24 GB), tune GpuLayers to slightly below full to leave room for KV cache.
  • KV quantization — aggressive TypeV = GgmlType.Q8_0 or even Q4_0 claws back meaningful VRAM on long contexts.

Windows AMD users: use Vulkan

Aspose.LLM does not ship Windows HIP binaries. If your AMD GPU is on Windows, use Vulkan — AMD’s Vulkan driver is excellent and gets you GPU acceleration without ROCm.

Common issues

Symptom Likely cause Fix
rocblas_status_internal_error on load Incompatible ROCm version. Match ROCm 6.x; upgrade if older.
Unsupported GPU at startup Card not on ROCm’s support list. Use Vulkan as fallback, or try HSA_OVERRIDE_GFX_VERSION.
Inference on CPU despite HIP binary GpuLayers = 0 or ROCm runtime missing. Set GpuLayers = 999; verify with rocminfo.
Multi-GPU instability Mixing different RDNA generations. Stick to same-generation GPUs; try LLAMA_SPLIT_MODE_LAYER.

What’s next