TensorSplit
Contents
[
Hide
]
TensorSplit is an array of floats, one per GPU, that controls the proportion of the model placed on each GPU during multi-GPU split. Values are normalized — [2.0, 1.0] means 2/3 on GPU 0 and 1/3 on GPU 1.
Quick reference
| Type | float[]? |
| Default | null (equal distribution across GPUs) |
| Range | Array length = GPU count; values positive |
| Category | Multi-GPU configuration |
| Field on | ModelInferenceParameters.TensorSplit |
What it does
When SplitMode is LAYER or ROW, the engine distributes layers (or row blocks) across GPUs according to TensorSplit. Each GPU gets a share proportional to its entry in the array.
null— equal distribution. Splits evenly regardless of VRAM.[2.0, 1.0]— 2:1 split. First GPU gets twice the share.[1.0, 1.0, 1.0]— explicit equal across 3 GPUs.[3.0, 2.0, 1.0]— 3:2:1 split across 3 GPUs (50 %, 33 %, 17 %).
The array length should match the number of GPUs visible to the process (after any CUDA_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES filtering).
When to change it
| Scenario | Value |
|---|---|
| Single GPU | Not applicable |
| Multi-GPU, equal VRAM | null (equal default is correct) |
| Multi-GPU, unequal VRAM (24 GB + 12 GB) | [2.0, 1.0] |
| Multi-GPU with one GPU reserved for other work | Smaller share for that GPU |
Example
using Aspose.LLM.Abstractions.Parameters;
var preset = new Qwen25Preset();
preset.BaseModelInferenceParameters.SplitMode = LlamaSplitMode.LLAMA_SPLIT_MODE_LAYER;
preset.BaseModelInferenceParameters.GpuLayers = 999;
preset.BaseModelInferenceParameters.TensorSplit = new float[] { 2.0f, 1.0f };
// 2/3 of layers on GPU 0 (larger VRAM), 1/3 on GPU 1.
using var api = AsposeLLMApi.Create(preset);
Interactions
SplitMode—TensorSplitapplies only when mode isLAYERorROW.GpuLayers— total layers on GPUs are distributed perTensorSplit.MainGpu— ignored whenTensorSplitis active.
What’s next
- SplitMode — split strategy selector.
- CUDA multi-GPU — NVIDIA multi-GPU setup.
- GPU deployment use case — runnable example.