Troubleshooting
When the SDK misbehaves, the failure is usually in one of seven well-known buckets. This section covers each one with the same structure on every page: Symptom (what you see), Cause (what is happening), Resolution (how to fix), and optional Prevention.
Pick the page that matches your symptom from the topics list. If none matches, use the diagnostic flow below to narrow the problem, then check the closest page, then ask for help.
Pre-flight checks
Before diving into a specific page, confirm the basics. The majority of tickets sent to support turn out to be one of these:
- License applied:
Aspose.LLM.License.IsLicensedreturnstruebefore any chat method is called. The SDK does not run inference in evaluation mode — see License errors. - Debug logging on: set
EngineParameters.EnableDebugLogging = trueand pass anILoggertoAsposeLLMApi.Create(preset, logger). Native tagged lines ([MM],[CTX],[KV]) reveal where a failure happens. See Logging and diagnostics. - Known good preset: reproduce with a built-in preset like
Qwen25Presetbefore suspecting the SDK. Custom presets or manual overrides are the most common source of garbled output. - Minimal repro: strip down to the smallest possible snippet that fails. If the minimal snippet passes, the problem is in your integration, not the SDK.
Diagnostic flow
Walk this decision tree when the symptom is not obvious.
-
Does
Createreturn at all?- No → binary download or model load failed. See Binary download fails or Model not loading.
- Yes → step 2.
-
Does the first chat call throw?
Not licensed for this method→ see License errors.- Out-of-memory → see Out of memory.
- Other → capture the full stack trace and open a support ticket.
-
Does chat return, but slowly?
- Much slower than expected → Performance issues or GPU not detected.
- Slow only on first request → see first-token latency guidance in Reduce first-token latency.
-
Does chat return, but output is wrong?
- Nonsense / literal marker tokens / repetition loops → Garbled output.
- Truncated mid-sentence → raise
ChatParameters.MaxTokens; see Chat parameters.
-
Does memory grow across long sessions?
- Yes → tune cache cleanup — see Cache management and Out of memory.
Symptom → page shortcut
| Symptom | Start here |
|---|---|
HttpRequestException during Create |
Binary download fails |
InvalidOperationException during model load |
Model not loading |
cudaErrorOutOfMemory, OutOfMemoryException |
Out of memory |
| Inference runs at CPU speed despite a GPU present | GPU not detected |
Replies contain <image>, <|im_start|>, etc. verbatim |
Garbled output |
| Output loops or repeats | Garbled output |
Not licensed for this method |
License errors |
| Unexpected high first-token latency | Performance issues |
| Throughput well below hardware expectation | Performance issues |
Topics
- Binary download fails —
BinaryManagercannot reach GitHub, TLS interception, disk space. - Out of memory — GPU VRAM, system RAM, KV cache growth.
- GPU not detected — driver, CUDA version,
PreferredAcceleration, container flags. - Model not loading — corrupt GGUF, unsupported architecture, wrong file name.
- Garbled output — template mismatch, repetition loops, truncation, vision misalignment.
- License errors — missing
SetLicense, expired temporary license, embedded resource mis-naming. - Performance issues — low throughput, latency spikes, thread contention, thermal throttling.
Asking for help
When none of the above match, or the fix does not stick, open a thread on the Aspose Support Forum with:
- SDK version (
Aspose.LLMNuGet version). - Host OS and architecture.
- GPU model and driver version (if applicable).
- Preset class name and any overrides you applied.
- Full log from a reproducing run with
EnableDebugLogging = true. - A minimal code sample that reproduces the issue.
- The expected versus actual output.
For paid support, use the Aspose Helpdesk.
What’s next
- Logging and diagnostics — tagged log taxonomy and
parse_mm_logs.zshhelper. - Debugging vision — multimodal-specific diagnosis.
- How-to recipes — related task-focused guides.