Supported presets
Aspose.LLM for .NET bundles ready-to-use presets for several popular open-weight model families. Each preset specifies the model source (Hugging Face repository and file name), context size, chat template, and sampler defaults. Pass a preset to AsposeLLMApi.Create(preset) and the engine downloads the model and any vision projector on first use.
All presets derive from PresetCoreBase (namespace Aspose.LLM.Abstractions.Parameters.Presets). You can use a preset as-is, override any parameter before calling Create, or extend PresetCoreBase for a fully custom model.
Text presets
The text catalog is grouped by size and specialty. All presets ship Q4_K_M quantization unless noted otherwise. Every preset can be used out of the box — pass it to AsposeLLMApi.Create(preset) and the engine downloads the GGUF from the listed Hugging Face source on first run.
Large general-purpose (7-8B)
The default tier for production chat. Balanced quality and speed; expect 6-10 GB RAM/VRAM at 32K context.
| Preset | Model | Hugging Face source | Default context |
|---|---|---|---|
Qwen25Preset |
Qwen 2.5 7B Instruct | bartowski/Qwen2.5-7B-Instruct-GGUF |
32 768 |
Qwen3Preset |
Qwen 3 8B | bartowski/Qwen_Qwen3-8B-GGUF |
32 768 |
Llama31_8BPreset |
Meta Llama 3.1 8B Instruct | bartowski/Meta-Llama-3.1-8B-Instruct-GGUF |
32 768 |
Mistral7Preset |
Mistral 7B Instruct v0.3 | bartowski/Mistral-7B-Instruct-v0.3-GGUF |
32 768 |
Granite3_8BPreset |
IBM Granite 3.1 8B Instruct | bartowski/granite-3.1-8b-instruct-GGUF |
32 768 |
AyaExpanse8BPreset |
Cohere Aya Expanse 8B (multilingual, 23 languages) | bartowski/aya-expanse-8b-GGUF |
8 192 |
OpenChat3_5Preset |
OpenChat 3.5 (Mistral-7B base) | TheBloke/openchat-3.5-0106-GGUF |
8 192 |
Gemma3Preset |
Google Gemma 3 | mradermacher/gemma-3-GGUF |
8 192 |
Mid-size (3-6B)
Sweet spot for laptops with a discrete GPU or 16 GB-class systems.
| Preset | Model | Hugging Face source | Default context |
|---|---|---|---|
Yi_6BPreset |
01.AI Yi 1.5 6B Chat | bartowski/Yi-1.5-6B-Chat-GGUF |
8 192 |
Phi35MiniPreset |
Microsoft Phi 3.5 Mini Instruct (~3.8B) | bartowski/Phi-3.5-mini-instruct-GGUF |
32 768 |
Phi4Preset |
Microsoft Phi 4 Mini Instruct | unsloth/Phi-4-mini-instruct-GGUF |
16 384 |
MiniCPM3_4BPreset |
OpenBMB MiniCPM3 4B | openbmb/MiniCPM3-4B-GGUF |
32 768 |
Llama32Preset |
Meta Llama 3.2 3B Instruct | bartowski/Llama-3.2-3B-Instruct-GGUF |
131 072 |
Qwen25_3BPreset |
Qwen 2.5 3B Instruct | Qwen/Qwen2.5-3B-Instruct-GGUF |
32 768 |
Small and edge (≤2B)
CPU-only deployments, tutorials, smoke tests, and constrained-memory hosts.
| Preset | Model | Hugging Face source | Default context |
|---|---|---|---|
SmolLM2_1_7BPreset |
HuggingFaceTB SmolLM2 1.7B Instruct | HuggingFaceTB/SmolLM2-1.7B-Instruct-GGUF |
8 192 |
Llama32_1BPreset |
Meta Llama 3.2 1B Instruct (edge sibling of Llama32Preset) |
bartowski/Llama-3.2-1B-Instruct-GGUF |
16 384 |
TinyLlamaPreset |
TinyLlama 1.1B Chat v1.0 (smoke-test baseline) | TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF |
2 048 |
SmallModelPreset |
Qwen 2 0.5B Instruct (CPU-first; ~400 MB on disk) | QuantFactory/Qwen2-0.5B-Instruct-GGUF |
4 096 |
SmallModelPreset defaults to GpuLayers = 0, OffloadKqv = false, FlashAttention = false so it runs on any laptop without a GPU. Switch to GpuLayers = -1 to offload everything to GPU when one is available.
Coding-focused
Trained or fine-tuned on code; pick these over the general models if you need accurate completions and refactoring suggestions.
| Preset | Model | Hugging Face source | Default context | Quantization |
|---|---|---|---|---|
DeepSeekCoder2Preset |
DeepSeek-Coder-V2-Lite Instruct | lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF |
163 840 | IQ3_M |
Qwen25Coder7BPreset |
Qwen 2.5 Coder 7B Instruct | Qwen/Qwen2.5-Coder-7B-Instruct-GGUF |
32 768 | Q4_K_M |
StableCode3BPreset |
Stability AI Stable Code 3B | TheBloke/stable-code-3b-GGUF |
16 384 | Q4_K_M |
Reasoning / chain-of-thought
These models emit explicit step-by-step reasoning. Budget MaxTokens = 1024-2048 and expect noticeably higher latency than a general 7B preset.
| Preset | Model | Hugging Face source | Default context |
|---|---|---|---|
DeepseekR1Qwen3Preset |
DeepSeek-R1 distilled from Qwen 3 8B | lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-GGUF |
131 072 |
Oss20Preset |
gpt-oss 20B (multilingual reasoner fine-tune) | mradermacher/gpt-oss-20b-multilingual-reasoner-i1-GGUF |
131 072 |
Baseline template
| Preset | Model | Hugging Face source | Default context | Quantization |
|---|---|---|---|---|
UnifiedDefaultLlmParameters |
Baseline template (no model source) | — | 4 096 | — |
UnifiedDefaultLlmParameters is a conservative CPU-safe template. It sets only context, threads, and sampler defaults; you must set a model source yourself before calling Create.
Notes:
- Memory requirements scale with context size and quantization. A 7B Q4_K_M model at 32K context needs roughly 6-8 GB of RAM (or GPU memory) for the weights plus KV cache. Longer contexts and larger models need more — see Features for memory guidance.
- All presets ship with preset-specific sampler tuning (temperature, repetition penalty, penalty context size, etc.). Override any field on
preset.SamplerParametersbefore callingCreateto change behavior. - Hugging Face sources are verified publicly accessible at release time (no gated repos, no removed mirrors). If a source becomes unavailable, set
preset.BaseModelSourceParametersto a different mirror or local file before callingCreate.
Vision presets
Vision presets configure both the base language model and its multimodal projector (mmproj). Pass image bytes via the media parameter of SendMessageAsync or SendMessageToSessionAsync.
| Preset | Model | Hugging Face source | mmproj file |
Default context | Quantization |
|---|---|---|---|---|---|
Qwen25VL3BPreset |
Qwen 2.5 VL 3B Instruct | unsloth/Qwen2.5-VL-3B-Instruct-GGUF |
mmproj-F16.gguf |
128 000 | UD-IQ2_XXS |
Qwen3VL2BPreset |
Qwen 3 VL 2B Instruct | Qwen/Qwen3-VL-2B-Instruct-GGUF |
mmproj-Qwen3VL-2B-Instruct-Q8_0.gguf |
262 144 | Q4_K_M |
Gemma3VisionPreset |
Gemma 3 Vision (Latex fine-tune) | mradermacher/Gemma-3-Vision-Latex-GGUF |
Gemma-3-Vision-Latex.mmproj-f16.gguf |
8 096 | Q4_K_M |
Ministral3VisionPreset |
Ministral 3 8B Instruct (Mistral AI, 2512 release) | mistralai/Ministral-3-8B-Instruct-2512-GGUF |
Ministral-3-8B-Instruct-2512-BF16-mmproj.gguf |
262 144 | Q4_K_M |
Supported image formats across all vision presets: JPEG, PNG, BMP, GIF, WebP. Maximum per-attachment size: 50 MB.
Default preset
AsposeLLMApi.GetDefaultPreset() returns a fresh Qwen25Preset instance — useful as a sensible starting point when you do not know which preset to pick. For raw parameter values without a full preset, call await api.GetDefaultParametersAsync().
Picking a preset
| If you want… | Try |
|---|---|
| A balanced general-purpose model | Qwen25Preset, Qwen3Preset, Llama31_8BPreset, or Mistral7Preset |
| A small, fast model | Llama32Preset (3B), Qwen25_3BPreset (3B), or Phi4Preset (mini) |
| The smallest possible footprint | SmallModelPreset (0.5B CPU-first), TinyLlamaPreset (1.1B), or Llama32_1BPreset (1B) |
| A long-context model | Llama32Preset or Oss20Preset (131K), DeepSeekCoder2Preset (163K) |
| A coding-focused model | DeepSeekCoder2Preset, Qwen25Coder7BPreset, or StableCode3BPreset |
| A reasoning-tuned model | DeepseekR1Qwen3Preset or Oss20Preset (multilingual-reasoner) |
| Strong multilingual coverage | AyaExpanse8BPreset (23 languages) or Oss20Preset |
| An enterprise-tuned model | Granite3_8BPreset (IBM Granite 3.1) |
| Image input | Qwen3VL2BPreset (small, very long context) or Qwen25VL3BPreset (3B) |
| A Mistral family vision model | Ministral3VisionPreset |
What’s next
- Presets — preset base class, parameter bags, and override patterns.
- Custom preset — extend or replace a built-in preset.
- Features — full list of capabilities and limits.
- Hello, world! — a minimal runnable example using
Qwen25Preset.