AI Plugin Integration
AI Plugin Integration
This document summarizes three AI-powered document processing tools—MarkItDown, Marker, and Docling—highlighting their common AI features and their integration with Aspose.Cells for Python via .NET through plugins.
1. Common AI Features
1.1 Multi-format Document Parsing and Structured Representation
All three projects support parsing multiple document formats, including PDF, DOCX, PPTX, XLSX, HTML, etc., and converting them into structured formats (Markdown, JSON, or HTML) suitable for AI processing.
- MarkItDown: Converts documents into Markdown format, easily integrated with LLMs and text analysis pipelines.
- Marker: Supports Markdown, JSON, and HTML output while preserving tables, formulas, and other content.
- Docling: Provides a unified
DoclingDocument
representation, supporting multi-format document parsing and structured export.
1.2 Integration with Generative AI Frameworks
All three tools support integration with generative AI frameworks to enhance document processing capabilities:
- MarkItDown: Integrates with Azure OpenAI to improve document and image processing.
- Marker: Supports leveraging large language model (LLM) technology to improve the accuracy of document processing.
- Docling: Compatible with frameworks like LangChain, LlamaIndex, and Haystack for agentic AI applications.
2. Aspose.Cells for Python via .NET Plugin Integration
To combine Excel data with these AI document processing tools, we developed dedicated plugins for each tool:
Plugin | Repository | Functionality |
---|---|---|
MarkItDown Plugin | markitdown-aspose-cells-plugin | Converts Excel files to Markdown format. |
Marker Plugin | marker plugin | Converts Excel files into Marker-supported formats(Markdown, JSON, or HTML), leveraging Marker’s LLM mode for improved table. |
Docling Plugin | docling plugin | Converts Excel files into DoclingDocument objects, then exports Markdown, JSON, or HTML for multi-modal analysis. |
2.1 Plugin Advantages
- Fast Excel Conversion: Quickly converts Excel content into Markdown, JSON, or HTML for AI processing.
- Preserve Table and Format Information: Basically preserve the original table data to ensure data integrity.
- Compatible with AI Tools: Can be directly used as input for MarkItDown, Marker, or Docling, leveraging advanced parsing features.
- Easy Installation: Only requires additional installation Aspose.Cells for Python via .NET(aspsoe-cells-python) and following the README instructions in the plugin directories.
3. Installation and Usage
3.1 MarkItDown Plugin
Install the plugin from the current directory:
pip install -e .
Verify installation:
markitdown --list-plugins
Convert an XLSX file using the plugin:
markitdown --use-plugins test.xlsx
3.2 Marker Plugin
You’ll need python 3.10+ and PyTorch.
pip install marker-pdf
For non-PDF documents, install full dependencies:
pip install marker-pdf[full]
Convert a single file:
marker_single /path/to/test.xlsx
marker_single /path/to/test.xlsx --output_format html
3.3 Docling Plugin
Install Docling:
pip install -e .
Convert Excel files to different formats:
docling /path/test.xlsx --to html
docling /path/test.xlsx --to md
docling /path/test.xlsx --to json
More detailed installation instructions are available in the docs.
3.4 Set Aspose License
Before using Aspose.Cells in any plugin, set the license:
Windows (PowerShell):
$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
Windows (CMD):
set ASPOSE_LICENSE_PATH=C:\path\to\license
Unix-based systems:
export ASPOSE_LICENSE_PATH="/path/to/license"
4. Summary
AI Features:
The three tools share common advantages in AI document parsing, structured output, multi-modal support, and integration with generative AI frameworks.
Aspose.Cells Plugins:
Enable seamless conversion of Excel data into Markdown, JSON, or HTML, preserving tables, formulas and integrate directly with MarkItDown, Marker, or Docling.
Use Cases:
Ideal for intelligent document processing, knowledge base construction, RAG systems, report parsing, academic document conversion, and other AI-driven workflows.