Image recognition settings

Aspose.OCR for Java allows for very flexible customization of recognition accuracy, performance, and other settings by calling the methods of the RecognitionSettings object.

These settings are applicable when extracting text from single-page raster images in JPEG, PNG, TIFF, BMP, and GIF formats.

Method	Parameter	Default state	Description
`setAllowedCharacters`	Case-sensitive string of characters or one of the predefined character sets: `CharactersAllowedType.ALL` - try to recognize all characters. `CharactersAllowedType.LATIN_ALPHABET` - only recognize case-insensitive Latin / English text (`A` to `Z` and `a` to `z`), without accented characters. `CharactersAllowedType.DIGITS` - recognize only binary, octal, decimal, or hexadecimal numbers (`0`-`9` and `A` to `F`).	All characters from the selected recognition language.	The whitelist of characters Aspose.OCR engine will look for.
`setDetectAreasMode`	`DetectAreasMode`	Automatic	Manually override the default document areas detection method.
`setIgnoredCharacters`	Case-sensitive string of characters	All characters are recognized	A blacklist of characters that are ignored during recognition.
`setLanguage`	Recognition language	Latin characters without diacritics	Specify a language for recognition.
`setLinesFiltration`	`true` - enable `false` - disable	Enabled	Set to `true` to recognize text in tables. Set to `false` to improve performance by ignoring table structures and treating tables as plain text.
`setRecognitionAreas`	`ArrayList<Rectangle>`	Entire image	List of areas of the image from which to extract text.
`setRecognizeSingleLine`	`true` - enable `false` - disable	Disabled	Recognize a single-line image. Disables automatic document region detection. Improves the recognition performance of simple images.
`setThreadsCount`	Number of threads, `int`	Automatic	The number of CPU threads used for recognition.
`setUpscaleSmallFont`	`true` - enable `false` - disable	Disabled	Improve small font recognition and detection of dense lines.
`setAutomaticColorInversion`	boolean	`true`	Set the method parameter to `true` automatically detect white text on a dark/black background and use a special OCR algorithm to improve image recognition accuracy. Call this method with the parameter set to “false” to explicitly disable inverted text detection to save resources. This setting is only applicable when using one of the following document area detection modes: `DetectAreasMode.PHOTO` `DetectAreasMode.COMBINE` `DetectAreasMode.TABLE` `DetectAreasMode.CURVED_TEXT`

Applicable to

Extracting text from images, scanned PDFs, DjVu files and other content provided as OcrInput object.

Example

The following code example shows how to fine-tune recognition:

// Create instance of OCR API
AsposeOCR api = new AsposeOCR();
// Specify recognition settings
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAllowedCharacters(CharactersAllowedType.LATIN_ALPHABET);
recognitionSettings.setDetectAreasMode(DetectAreasMode.DOCUMENT);
recognitionSettings.setUpscaleSmallFont(true);
// Prepare batch
OcrInput images = new OcrInput(InputType.SingleImage, filters);
images.add("image1.png");
images.add("image2.png");
// Recognize images
ArrayList<RecognitionResult> results = api.Recognize(input, recognitionSettings);
results.forEach((result) -> {
	System.out.println(result.recognitionText);
});

Receipt recognition settings