TensorSplit

TensorSplit is an array of floats, one per GPU, that controls the proportion of the model placed on each GPU during multi-GPU split. Values are normalized — [2.0, 1.0] means 2/3 on GPU 0 and 1/3 on GPU 1.

Quick reference


Type	`float[]?`
Default	`null` (equal distribution across GPUs)
Range	Array length = GPU count; values positive
Category	Multi-GPU configuration
Field on	`ModelInferenceParameters.TensorSplit`

What it does

When SplitMode is LAYER or ROW, the engine distributes layers (or row blocks) across GPUs according to TensorSplit. Each GPU gets a share proportional to its entry in the array.

null — equal distribution. Splits evenly regardless of VRAM.
[2.0, 1.0] — 2:1 split. First GPU gets twice the share.
[1.0, 1.0, 1.0] — explicit equal across 3 GPUs.
[3.0, 2.0, 1.0] — 3:2:1 split across 3 GPUs (50 %, 33 %, 17 %).

The array length should match the number of GPUs visible to the process (after any CUDA_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES filtering).

When to change it

Scenario	Value
Single GPU	Not applicable
Multi-GPU, equal VRAM	`null` (equal default is correct)
Multi-GPU, unequal VRAM (24 GB + 12 GB)	`[2.0, 1.0]`
Multi-GPU with one GPU reserved for other work	Smaller share for that GPU

Example

using Aspose.LLM.Abstractions.Parameters;

var preset = new Qwen25Preset();
preset.BaseModelInferenceParameters.SplitMode = LlamaSplitMode.LLAMA_SPLIT_MODE_LAYER;
preset.BaseModelInferenceParameters.GpuLayers = 999;
preset.BaseModelInferenceParameters.TensorSplit = new float[] { 2.0f, 1.0f };
// 2/3 of layers on GPU 0 (larger VRAM), 1/3 on GPU 1.

using var api = AsposeLLMApi.Create(preset);

Interactions

SplitMode — TensorSplit applies only when mode is LAYER or ROW.
GpuLayers — total layers on GPUs are distributed per TensorSplit.
MainGpu — ignored when TensorSplit is active.

What’s next

SplitMode — split strategy selector.
CUDA multi-GPU — NVIDIA multi-GPU setup.
GPU deployment use case — runnable example.

UseExtraBuffers KvOverrides