Vulkan
Vulkan is a cross-vendor GPU backend that works on most modern discrete and integrated GPUs. It does not need CUDA or ROCm installed — only a recent graphics driver with Vulkan support. Use it when CUDA or HIP are unavailable, or when you want one codepath that works across NVIDIA, AMD, and Intel hardware.
Requirements
- GPU: any GPU with Vulkan 1.2 or later support. This covers most discrete GPUs from 2018+ and integrated GPUs from Intel (Xe, UHD), AMD (RDNA, Vega), and NVIDIA (Maxwell+).
- Driver:
- NVIDIA: 470 or later.
- AMD: modern Adrenalin or Mesa RADV on Linux.
- Intel: current Arc / Xe driver.
- OS: Windows 10+ or Linux (glibc 2.28+). Not supported on macOS — Metal is the macOS equivalent.
Verify Vulkan support:
- Windows: install Vulkan SDK’s
vulkaninfoSDK.exeor run a Vulkan demo app. - Linux:
vulkaninfofrom thevulkan-toolspackage.
Select Vulkan
using Aspose.LLM.Abstractions.Acceleration;
var preset = new Qwen25Preset();
preset.BinaryManagerParameters.PreferredAcceleration = AccelerationType.Vulkan;
preset.BaseModelInferenceParameters.GpuLayers = 999;
using var api = AsposeLLMApi.Create(preset);
The SDK downloads the Vulkan variant (typically 200-400 MB) on first run.
When to prefer Vulkan
- Cross-vendor deployments — the same binary works on NVIDIA, AMD, and Intel GPUs without swapping backends.
- Integrated GPUs — Intel Xe, AMD integrated Radeon, and NVIDIA integrated GPUs work via Vulkan, no vendor-specific runtime.
- Containers without CUDA/ROCm — Vulkan runs without NVIDIA Container Toolkit or ROCm Docker setup.
- Older hardware — many GPUs that no longer get CUDA updates still have Vulkan drivers.
When to prefer CUDA or HIP instead
For dedicated NVIDIA or AMD workloads, vendor-specific backends are usually faster:
- CUDA is typically 20-40 % faster than Vulkan on NVIDIA.
- HIP is faster than Vulkan on AMD for most workloads.
Use Vulkan when portability trumps raw speed.
Multi-GPU
Vulkan supports multi-GPU setups via SplitMode and TensorSplit like CUDA, but driver support for multi-device Vulkan is less mature. Test on your specific hardware before committing — single-GPU Vulkan is well-trodden; multi-GPU Vulkan is hit-or-miss.
preset.BaseModelInferenceParameters.SplitMode = LlamaSplitMode.LLAMA_SPLIT_MODE_LAYER;
preset.BaseModelInferenceParameters.GpuLayers = 999;
// Optional TensorSplit for unequal GPU sizes.
Performance tips
- Flash Attention — supported on most Vulkan drivers; enable via
ContextParameters.FlashAttentionMode = FlashAttentionType.Enabled. - Integrated GPU memory — iGPUs use system RAM for GPU memory. Short contexts and heavy KV quantization help fit within the shared pool.
- Driver version matters — recent Vulkan drivers have significant perf improvements for LLM inference. Keep drivers current.
Common issues
| Symptom | Likely cause | Fix |
|---|---|---|
| Vulkan binary downloaded but runs on CPU | GpuLayers = 0, or Vulkan driver missing. |
Set GpuLayers = 999; verify Vulkan with vulkaninfo. |
| Crash at model load | Very old driver or unsupported GPU. | Update driver; fall back to CPU if hardware is pre-Vulkan-1.2. |
| Significantly slower than CUDA on NVIDIA | Expected — use CUDA instead for best performance. | Switch PreferredAcceleration to CUDA. |
What’s next
- CUDA — faster on NVIDIA.
- HIP / ROCm — faster on AMD.
- CPU — last-resort fallback.
- Model inference parameters — GPU offload configuration.