Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.
Most organizations maintain large digital archives of scanned documents as part of their information management and preservation strategies. These digital archives often contain a wide range of business-critical materials, including historical records, legal documents, research papers and more. Effectively organizing and managing these archives is critical for efficient retrieval, compliance, and long-term retention. Here are some key considerations and best practices for maintaining image archives:
The choice of how to process text associated with scans or photographs depends on your business needs: store them alongside image files, put them into a database, or analyze the data in real time.
Digital image archives can take up a significant amount of disk space. The size of the image archive depends on several factors, including the resolution and format of the images, the number of images, and the level of compression applied.
Aspose.OCR for .NET offers a wide range of features that make it easy to process any number of images. You can extract texts from images, archives and entire catalogues. You can also find and read only specific areas of an image rather than all the text, which is very useful for organizing standardized documents.
In addition to reading images, you can reduce the size of image files by converting them to black and white, downsizing, or converting them to other file formats.
The following example demonstrates how to organize a large number of uniform documents based on their content. The task can be achieved using only 78 lines of code, including comments.
Download sample invoice images (1.2 MB):
using Aspose.Drawing;
using Aspose.Drawing.Imaging;
using Aspose.OCR;
using Aspose.OCR.Models.PreprocessingFilters;
private class Invoice
{
public string Number;
public string Date;
public string BillTo;
public string FileName => $"Invoice_{this.Number}_{this.Date}_{this.BillTo}.jpg";
public Invoice(List<string> recognitionAreas)
{
this.Number = recognitionAreas[1].Replace("\n", "");
this.Date = recognitionAreas[0].Replace("\n", "");
this.BillTo = recognitionAreas[2].Replace("\n", "");
}
}
License license = new License();
license.SetLicense("Aspose.OCR.lic");
OcrInput invoices = new OcrInput(InputType.SingleImage) {
"invoices/01.jpg",
"invoices/02.jpg",
"invoices/03.jpg"
};
PreprocessingFilter preprocess = new PreprocessingFilter() {
PreprocessingFilter.Invert(new Rectangle(149, 566, 1314, 247))
};
invoices.ReplaceFilters(preprocess);
RecognitionSettings settings = new RecognitionSettings();
settings.RecognitionAreas = new List<Aspose.Drawing.Rectangle> {
new Rectangle(1200, 620, 200, 35), // invoice date
new Rectangle(1200, 660, 200, 35), // invoice number
new Rectangle(150, 955, 1310, 32) // bill to name
};
List<Invoice> invoiceData = new List<Invoice>();
AsposeOcr api = new AsposeOcr();
List<RecognitionResult> results = api.Recognize(invoices, settings);
foreach(var result in results)
{
invoiceData.Add(new Invoice(result.RecognitionAreasText));
}
PreprocessingFilter saveSpace = new PreprocessingFilter() {
PreprocessingFilter.Binarize(),
PreprocessingFilter.Scale(0.5F)
};
invoices.ReplaceFilters(saveSpace);
string targetDirectoryName = "archive";
Directory.CreateDirectory(targetDirectoryName);
var convertedInvoices = ImageProcessing.Render(invoices);
for(int i = 0; i<convertedInvoices.Count(); i++)
{
using(Image image = Image.FromStream(convertedInvoices[i].Stream))
{
image.Save($"{targetDirectoryName}/{invoiceData[i].FileName}", ImageFormat.Jpeg);
}
}
using Aspose.Drawing;
using Aspose.Drawing.Imaging;
using Aspose.OCR;
using Aspose.OCR.Models.PreprocessingFilters;
namespace ArchivalOCR
{
internal class Program
{
private class Invoice
{
public string Number;
public string Date;
public string BillTo;
public string FileName => $"Invoice_{this.Number}_{this.Date}_{this.BillTo}.jpg";
public Invoice(List<string> recognitionAreas)
{
this.Number = recognitionAreas[1].Replace("\n", "");
this.Date = recognitionAreas[0].Replace("\n", "");
this.BillTo = recognitionAreas[2].Replace("\n", "");
}
}
static void Main(string[] args)
{
// Apply license
Aspose.OCR.License license = new Aspose.OCR.License();
license.SetLicense("Aspose.OCR.lic");
// Add scanned invoices to the batch
OcrInput invoices = new OcrInput(InputType.SingleImage) {
"invoices/01.jpg",
"invoices/02.jpg",
"invoices/03.jpg"
};
// Invert header colors
PreprocessingFilter preprocess = new PreprocessingFilter() {
PreprocessingFilter.Invert(new Rectangle(149, 566, 1314, 247))
};
invoices.ReplaceFilters(preprocess);
// Specify recognition areas
RecognitionSettings settings = new RecognitionSettings();
settings.RecognitionAreas = new List<Aspose.Drawing.Rectangle> {
new Rectangle(1200, 620, 200, 35), // invoice date
new Rectangle(1200, 660, 200, 35), // invoice number
new Rectangle(150, 955, 1310, 32) // bill to name
};
// Fetch data from invoices
List<Invoice> invoiceData = new List<Invoice>();
AsposeOcr api = new AsposeOcr();
List<RecognitionResult> results = api.Recognize(invoices, settings);
foreach(var result in results)
{
invoiceData.Add(new Invoice(result.RecognitionAreasText));
}
// Adjust image processing to save disk space
PreprocessingFilter saveSpace = new PreprocessingFilter() {
PreprocessingFilter.Binarize(),
PreprocessingFilter.Scale(0.5F)
};
invoices.ReplaceFilters(saveSpace);
// Save archived copies of the invoices
string targetDirectoryName = "archive";
Directory.CreateDirectory(targetDirectoryName);
var convertedInvoices = ImageProcessing.Render(invoices);
for(int i = 0; i<convertedInvoices.Count(); i++)
{
using(Image image = Image.FromStream(convertedInvoices[i].Stream))
{
image.Save($"{targetDirectoryName}/{invoiceData[i].FileName}", ImageFormat.Jpeg);
}
}
// Report progress
Console.WriteLine($@"The invoices are organized into ""{Path.GetFullPath(targetDirectoryName)}"" folder.");
}
}
}
Run the program directly from the Visual Studio or build it and execute the file from the command line. Wait a few seconds, depending on your system performance, until “The invoices are organized into “{project path}\archive” folder.” message is displayed in the console.
Open archive folder in the project directory. As you can see, the invoice images have been converted to black and white to reduce file size and renamed to match the invoice details:
Original invoice image | Original invoice image size | New invoice file name | New invoice image size |
---|---|---|---|
01.jpg | 404 KB | Invoice_1234_28-SEP-2023_John Smith.jpg | 63.4 KB |
02.jpg | 411 KB | Invoice_2345_03-SEP-2023_Road Runner.jpg | 64.7 KB |
03.jpg | 414 KB | Invoice_12_25-DEC-2000_Wile E. Covote.jpg | 64.3 KB |
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.