Multi-page document recognition settings

Aspose.OCR for .NET allows for very flexible customization of recognition accuracy, performance, and other settings by configuring the properties of the DocumentRecognitionSettings object.

These settings are applicable when extracting text from multi-page images in TIFF and DjVu format, as well as scanned PDF documents.

Setting Type Default value Description
AllowedCharacters Aspose.OCR.CharactersAllowedType Aspose.OCR.CharactersAllowedType.ALL The predefined whitelist of characters Aspose.OCR engine will look for.
AutoContrast boolean false Automatically increase the contrast of pages before proceeding to recognition.
AutoDenoising boolean false Automatically remove noise from pages before proceeding to recognition.
AutoSkew boolean true Automatically correct page tilt (deskew) before proceeding to recognition.
DetectAreasMode Aspose.OCR.DetectAreasMode auto Manually override the default document areas detection method.
IgnoredCharacters string none A blacklist of characters that are ignored during recognition.
Language Aspose.OCR.Language Aspose.OCR.Language.None Specify a language for recognition.
LinesFiltration boolean false Set to true to recognize text in tables.
Set to false to improve performance by ignoring table structures and treating tables as plain text.
PagesNumber integer 1 The number of pages to be recognized in a multi-page file.
PreprocessingFilters Aspose.OCR.Models.PreprocessingFilters.PreprocessingFilter none Apply image processing filters that enhance pages before they are sent to the OCR engine.
SkewAngle float 0 Manually rotate the image by the specified degree.
StartPage integer 0 The page number from which to start recognition of the multi-page file. First page number is 0.
ThreadsCount integer auto The number of CPU threads used for recognition.
ThresholdValue integer auto Override the automatic binarization settings.

Applicable to

Example

The following code example shows how to fine-tune recognition:

Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr();
Aspose.OCR.DocumentRecognitionSettings recognitionSettings = new Aspose.OCR.DocumentRecognitionSettings();
recognitionSettings.StartPage = 3;
recognitionSettings.PagesNumber = 10;
recognitionSettings.Language = Aspose.OCR.Language.Spa;
List<Aspose.OCR.RecognitionResult> results = recognitionEngine.RecognizePdf("source.pdf", recognitionSettings);
Aspose.OCR.AsposeOcr.SaveMultipageDocument("result.json", Aspose.OCR.SaveFormat.Json, results);