Supported presets

Aspose.LLM for .NET bundles ready-to-use presets for several popular open-weight model families. Each preset specifies the model source (Hugging Face repository and file name), context size, chat template, and sampler defaults. Pass a preset to AsposeLLMApi.Create(preset) and the engine downloads the model and any vision projector on first use.

All presets derive from PresetCoreBase (namespace Aspose.LLM.Abstractions.Parameters.Presets). You can use a preset as-is, override any parameter before calling Create, or extend PresetCoreBase for a fully custom model.

Text presets

The text catalog is grouped by size and specialty. All presets ship Q4_K_M quantization unless noted otherwise. Every preset can be used out of the box — pass it to AsposeLLMApi.Create(preset) and the engine downloads the GGUF from the listed Hugging Face source on first run.

Large general-purpose (7-8B)

The default tier for production chat. Balanced quality and speed; expect 6-10 GB RAM/VRAM at 32K context.

Preset	Model	Hugging Face source	Default context
`Qwen25Preset`	Qwen 2.5 7B Instruct	`bartowski/Qwen2.5-7B-Instruct-GGUF`	32 768
`Qwen3Preset`	Qwen 3 8B	`bartowski/Qwen_Qwen3-8B-GGUF`	32 768
`Llama31_8BPreset`	Meta Llama 3.1 8B Instruct	`bartowski/Meta-Llama-3.1-8B-Instruct-GGUF`	32 768
`Mistral7Preset`	Mistral 7B Instruct v0.3	`bartowski/Mistral-7B-Instruct-v0.3-GGUF`	32 768
`Granite3_8BPreset`	IBM Granite 3.1 8B Instruct	`bartowski/granite-3.1-8b-instruct-GGUF`	32 768
`AyaExpanse8BPreset`	Cohere Aya Expanse 8B (multilingual, 23 languages)	`bartowski/aya-expanse-8b-GGUF`	8 192
`OpenChat3_5Preset`	OpenChat 3.5 (Mistral-7B base)	`TheBloke/openchat-3.5-0106-GGUF`	8 192
`Gemma3Preset`	Google Gemma 3	`mradermacher/gemma-3-GGUF`	8 192

Mid-size (3-6B)

Sweet spot for laptops with a discrete GPU or 16 GB-class systems.

Preset	Model	Hugging Face source	Default context
`Yi_6BPreset`	01.AI Yi 1.5 6B Chat	`bartowski/Yi-1.5-6B-Chat-GGUF`	8 192
`Phi35MiniPreset`	Microsoft Phi 3.5 Mini Instruct (~3.8B)	`bartowski/Phi-3.5-mini-instruct-GGUF`	32 768
`Phi4Preset`	Microsoft Phi 4 Mini Instruct	`unsloth/Phi-4-mini-instruct-GGUF`	16 384
`MiniCPM3_4BPreset`	OpenBMB MiniCPM3 4B	`openbmb/MiniCPM3-4B-GGUF`	32 768
`Llama32Preset`	Meta Llama 3.2 3B Instruct	`bartowski/Llama-3.2-3B-Instruct-GGUF`	131 072
`Qwen25_3BPreset`	Qwen 2.5 3B Instruct	`Qwen/Qwen2.5-3B-Instruct-GGUF`	32 768

Small and edge (≤2B)

CPU-only deployments, tutorials, smoke tests, and constrained-memory hosts.

Preset	Model	Hugging Face source	Default context
`SmolLM2_1_7BPreset`	HuggingFaceTB SmolLM2 1.7B Instruct	`HuggingFaceTB/SmolLM2-1.7B-Instruct-GGUF`	8 192
`Llama32_1BPreset`	Meta Llama 3.2 1B Instruct (edge sibling of `Llama32Preset`)	`bartowski/Llama-3.2-1B-Instruct-GGUF`	16 384
`TinyLlamaPreset`	TinyLlama 1.1B Chat v1.0 (smoke-test baseline)	`TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF`	2 048
`SmallModelPreset`	Qwen 2 0.5B Instruct (CPU-first; ~400 MB on disk)	`QuantFactory/Qwen2-0.5B-Instruct-GGUF`	4 096

SmallModelPreset defaults to GpuLayers = 0, OffloadKqv = false, FlashAttention = false so it runs on any laptop without a GPU. Switch to GpuLayers = -1 to offload everything to GPU when one is available.

Coding-focused

Trained or fine-tuned on code; pick these over the general models if you need accurate completions and refactoring suggestions.

Preset	Model	Hugging Face source	Default context	Quantization
`DeepSeekCoder2Preset`	DeepSeek-Coder-V2-Lite Instruct	`lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF`	163 840	IQ3_M
`Qwen25Coder7BPreset`	Qwen 2.5 Coder 7B Instruct	`Qwen/Qwen2.5-Coder-7B-Instruct-GGUF`	32 768	Q4_K_M
`StableCode3BPreset`	Stability AI Stable Code 3B	`TheBloke/stable-code-3b-GGUF`	16 384	Q4_K_M

Reasoning / chain-of-thought

These models emit explicit step-by-step reasoning. Budget MaxTokens = 1024-2048 and expect noticeably higher latency than a general 7B preset.

Preset	Model	Hugging Face source	Default context
`DeepseekR1Qwen3Preset`	DeepSeek-R1 distilled from Qwen 3 8B	`lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-GGUF`	131 072
`Oss20Preset`	gpt-oss 20B (multilingual reasoner fine-tune)	`mradermacher/gpt-oss-20b-multilingual-reasoner-i1-GGUF`	131 072

Baseline template

Preset	Model	Hugging Face source	Default context	Quantization
`UnifiedDefaultLlmParameters`	Baseline template (no model source)	—	4 096	—

UnifiedDefaultLlmParameters is a conservative CPU-safe template. It sets only context, threads, and sampler defaults; you must set a model source yourself before calling Create.

Notes:

Memory requirements scale with context size and quantization. A 7B Q4_K_M model at 32K context needs roughly 6-8 GB of RAM (or GPU memory) for the weights plus KV cache. Longer contexts and larger models need more — see Features for memory guidance.
All presets ship with preset-specific sampler tuning (temperature, repetition penalty, penalty context size, etc.). Override any field on preset.SamplerParameters before calling Create to change behavior.
Hugging Face sources are verified publicly accessible at release time (no gated repos, no removed mirrors). If a source becomes unavailable, set preset.BaseModelSourceParameters to a different mirror or local file before calling Create.

Vision presets

Vision presets configure both the base language model and its multimodal projector (mmproj). Pass image bytes via the media parameter of SendMessageAsync or SendMessageToSessionAsync.

Preset	Model	Hugging Face source	`mmproj` file	Default context	Quantization
`Qwen25VL3BPreset`	Qwen 2.5 VL 3B Instruct	`unsloth/Qwen2.5-VL-3B-Instruct-GGUF`	`mmproj-F16.gguf`	128 000	UD-IQ2_XXS
`Qwen3VL2BPreset`	Qwen 3 VL 2B Instruct	`Qwen/Qwen3-VL-2B-Instruct-GGUF`	`mmproj-Qwen3VL-2B-Instruct-Q8_0.gguf`	262 144	Q4_K_M
`Gemma3VisionPreset`	Gemma 3 Vision (Latex fine-tune)	`mradermacher/Gemma-3-Vision-Latex-GGUF`	`Gemma-3-Vision-Latex.mmproj-f16.gguf`	8 096	Q4_K_M
`Ministral3VisionPreset`	Ministral 3 8B Instruct (Mistral AI, 2512 release)	`mistralai/Ministral-3-8B-Instruct-2512-GGUF`	`Ministral-3-8B-Instruct-2512-BF16-mmproj.gguf`	262 144	Q4_K_M

Supported image formats across all vision presets: JPEG, PNG, BMP, GIF, WebP. Maximum per-attachment size: 50 MB.

Default preset

AsposeLLMApi.GetDefaultPreset() returns a fresh Qwen25Preset instance — useful as a sensible starting point when you do not know which preset to pick. For raw parameter values without a full preset, call await api.GetDefaultParametersAsync().

Picking a preset

If you want…	Try
A balanced general-purpose model	`Qwen25Preset`, `Qwen3Preset`, `Llama31_8BPreset`, or `Mistral7Preset`
A small, fast model	`Llama32Preset` (3B), `Qwen25_3BPreset` (3B), or `Phi4Preset` (mini)
The smallest possible footprint	`SmallModelPreset` (0.5B CPU-first), `TinyLlamaPreset` (1.1B), or `Llama32_1BPreset` (1B)
A long-context model	`Llama32Preset` or `Oss20Preset` (131K), `DeepSeekCoder2Preset` (163K)
A coding-focused model	`DeepSeekCoder2Preset`, `Qwen25Coder7BPreset`, or `StableCode3BPreset`
A reasoning-tuned model	`DeepseekR1Qwen3Preset` or `Oss20Preset` (multilingual-reasoner)
Strong multilingual coverage	`AyaExpanse8BPreset` (23 languages) or `Oss20Preset`
An enterprise-tuned model	`Granite3_8BPreset` (IBM Granite 3.1)
Image input	`Qwen3VL2BPreset` (small, very long context) or `Qwen25VL3BPreset` (3B)
A Mistral family vision model	`Ministral3VisionPreset`

What’s next

Presets — preset base class, parameter bags, and override patterns.
Custom preset — extend or replace a built-in preset.
Features — full list of capabilities and limits.
Hello, world! — a minimal runnable example using Qwen25Preset.

Architecture Supported acceleration