Product overview

Aspose.LLM for .NET is a library for integrating large language models into your .NET applications. You run models on your own infrastructure — on-premise or in a controlled cloud environment — and interact with them through a managed API: create an instance from a preset, start chat sessions, send messages, and optionally save or load conversation state.

This section covers what the SDK does, how it is built, and what it supports.

Sections

Architecture — four-layer design (facade, engine, P/Invoke, native), runtime flow on first Create, memory footprint, and lifecycle.
Features — capabilities in detail, plus explicit scope limits (no streaming, no function calling, no fine-tuning, no audio).
Supported presets — built-in text and vision presets with their Hugging Face model sources and default parameters.
Supported acceleration — CUDA, HIP, Metal, Vulkan, CPU backends with platform × backend matrix and first-run download sizes.

At a glance

Preset-based setup — built-in presets for Qwen 2.5 / Qwen 3, Gemma 3, Llama 3.2, Phi 4, DeepSeek, and gpt-oss-20b. Extend PresetCoreBase to bring your own GGUF model.
Chat sessions — create sessions with StartNewChatAsync, send messages with SendMessageAsync or SendMessageToSessionAsync, and maintain multi-turn conversations per session.
Session persistence — SaveChatSession and LoadChatSession serialize a session to disk and restore it later.
Optional multimodal input — pass images (JPEG, PNG, BMP, GIF, WebP; up to 50 MB each) alongside prompts when using a vision preset.
Hardware acceleration — CUDA, HIP, Metal, Vulkan, or CPU with AVX2 / AVX512. Native binaries download automatically on first use.
Single instance per process — one AsposeLLMApi instance at a time. Create it once and reuse it for all sessions.
Licensing — apply a commercial license via License.SetLicense; check status with License.IsLicensed. A free temporary license is available for evaluation and proof-of-concept work. Inference requires an applied license — the SDK does not run chat APIs in evaluation mode.

What’s next

Architecture — layered design and runtime flow.
Features — full capability list and scope limits.
Supported presets — pick a preset for your model and hardware.
Supported acceleration — platform / backend matrix.
Getting started — install, license, and run the first example.

Getting started