Multi-page document recognition settings
Contents
[
Hide
]
Aspose.OCR for .NET allows for very flexible customization of recognition accuracy, performance, and other settings by configuring the properties of the DocumentRecognitionSettings
object.
These settings are applicable when extracting text from multi-page images in TIFF and DjVu format, as well as scanned PDF documents.
Setting | Type | Default value | Description |
---|---|---|---|
AllowedCharacters |
Aspose.OCR.CharactersAllowedType | Aspose.OCR.CharactersAllowedType.ALL |
The predefined whitelist of characters Aspose.OCR engine will look for. |
AutoContrast |
boolean | false |
Automatically increase the contrast of pages before proceeding to recognition. |
AutoDenoising |
boolean | false |
Automatically remove noise from pages before proceeding to recognition. |
AutoSkew |
boolean | true |
Automatically correct page tilt (deskew) before proceeding to recognition. |
DetectAreasMode |
Aspose.OCR.DetectAreasMode | auto | Manually override the default document areas detection method. |
IgnoredCharacters |
string | none | A blacklist of characters that are ignored during recognition. |
Language |
Aspose.OCR.Language | Aspose.OCR.Language.None |
Specify a language for recognition. |
LinesFiltration |
boolean | false |
Set to true to recognize text in tables.Set to false to improve performance by ignoring table structures and treating tables as plain text. |
PagesNumber |
integer | 1 |
The number of pages to be recognized in a multi-page file. |
PreprocessingFilters |
Aspose.OCR.Models.PreprocessingFilters.PreprocessingFilter | none | Apply image processing filters that enhance pages before they are sent to the OCR engine. |
SkewAngle |
float | 0 |
Manually rotate the image by the specified degree. |
StartPage |
integer | 0 |
The page number from which to start recognition of the multi-page file. First page number is 0 . |
ThreadsCount |
integer | auto | The number of CPU threads used for recognition. |
ThresholdValue |
integer | auto | Override the automatic binarization settings. |
Applicable to
- Extracting text from PDF document
- Extracting text from multi-page TIFF
- Extracting text from multi-page DjVu
Example
The following code example shows how to fine-tune recognition:
Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr();
Aspose.OCR.DocumentRecognitionSettings recognitionSettings = new Aspose.OCR.DocumentRecognitionSettings();
recognitionSettings.StartPage = 3;
recognitionSettings.PagesNumber = 10;
recognitionSettings.Language = Aspose.OCR.Language.Spa;
List<Aspose.OCR.RecognitionResult> results = recognitionEngine.RecognizePdf("source.pdf", recognitionSettings);
Aspose.OCR.AsposeOcr.SaveMultipageDocument("result.json", Aspose.OCR.SaveFormat.Json, results);