Custom model loader
IModelLoader is the contract the engine uses to turn ModelSourceParameters into an ILlamaModel. Substituting this interface gives you full control over the model-loading pipeline — file resolution, native llama.cpp model creation, inference parameter application, and any diagnostics you want to inject around it.
This is the broadest extensibility point. Prefer IModelFileProvider if you only need to change where the model file comes from; IModelLoader is for changing how it is loaded.
Interface reference
namespace Aspose.LLM.Abstractions.Interfaces;
public interface IModelLoader
{
Task<ILlamaModel> LoadModelAsync(
ModelSourceParameters modelParameters,
ModelInferenceParameters inferenceParameters,
IProgress<double>? progress = null,
CancellationToken cancellationToken = default);
}
The method:
- Takes
ModelSourceParameters(where to find the model) andModelInferenceParameters(how to load it —GpuLayers,SplitMode, etc.). - Reports progress via
IProgress<double>(0.0 to 1.0) during long operations. - Returns an
ILlamaModel— the loaded model ready for inference. - Throws
ArgumentNullExceptionon nullmodelParameters. - Throws
InvalidOperationExceptionwhen the model cannot be loaded.
Implementation skeleton
using Aspose.LLM.Abstractions.Interfaces;
using Aspose.LLM.Abstractions.Parameters;
public class MyModelLoader : IModelLoader
{
public async Task<ILlamaModel> LoadModelAsync(
ModelSourceParameters modelParameters,
ModelInferenceParameters inferenceParameters,
IProgress<double>? progress = null,
CancellationToken cancellationToken = default)
{
if (modelParameters is null)
throw new ArgumentNullException(nameof(modelParameters));
// 1. Resolve the model file path (use IModelFileProvider or your own logic).
progress?.Report(0.1);
string modelFilePath = await ResolveModelFileAsync(modelParameters, cancellationToken);
// 2. Instrument or validate as needed.
progress?.Report(0.5);
ValidateModelFile(modelFilePath);
// 3. Delegate to the SDK's native loading path, or call llama.cpp yourself.
progress?.Report(0.9);
ILlamaModel model = await LoadIntoNativeMemoryAsync(
modelFilePath,
inferenceParameters,
cancellationToken);
progress?.Report(1.0);
return model;
}
// ... helper methods
}
Implementing LoadIntoNativeMemoryAsync from scratch means wrapping the SDK’s P/Invoke layer (Aspose.LLM.Interop). That is advanced work — you are essentially recreating ModelManager with custom behavior. In most cases, the simpler path is:
- Use the default model loading via
ModelManager. - Wrap it in a decorator that adds instrumentation, caching, or retries.
Decorator pattern
using Microsoft.Extensions.Logging;
using Aspose.LLM.Abstractions.Interfaces;
using Aspose.LLM.Abstractions.Parameters;
public class LoggingModelLoader : IModelLoader
{
private readonly IModelLoader _inner;
private readonly ILogger<LoggingModelLoader> _logger;
public LoggingModelLoader(IModelLoader inner, ILogger<LoggingModelLoader> logger)
{
_inner = inner;
_logger = logger;
}
public async Task<ILlamaModel> LoadModelAsync(
ModelSourceParameters modelParameters,
ModelInferenceParameters inferenceParameters,
IProgress<double>? progress = null,
CancellationToken cancellationToken = default)
{
_logger.LogInformation("Loading model from {Source}",
modelParameters.ModelFilePath ?? modelParameters.HuggingFaceRepoId);
var sw = System.Diagnostics.Stopwatch.StartNew();
try
{
var model = await _inner.LoadModelAsync(
modelParameters, inferenceParameters, progress, cancellationToken);
_logger.LogInformation("Model loaded in {Elapsed}", sw.Elapsed);
return model;
}
catch (Exception ex)
{
_logger.LogError(ex, "Model load failed after {Elapsed}", sw.Elapsed);
throw;
}
}
}
Registration
Register your implementation after AddLlamaServices:
services.AddLlamaServices(new Qwen25Preset());
services.AddSingleton<IModelLoader, MyModelLoader>();
If you want to decorate the default loader, keep the default registration and wrap it — but the default ModelManager is registered as a concrete type (ModelManager), not as IModelLoader. You may need to adapt your decorator accordingly, or contact Aspose support for guidance on the current wiring.
What’s next
- Custom file provider — narrower surface, often what you actually need.
- Extensibility overview — when to use which interface.
- Dependency injection — how
AddLlamaServiceswires services.