Defining the whitelist of characters
Contents
[
Hide
]
Limiting a subset of characters instead of using the full set can greatly improve recognition accuracy, increase speed, and reduce resource consumption. A list of characters can be automatically identified from an image using the built-in Aspose.OCR mechanisms.
You can define a list of characters Aspose.OCR engine will look for by specifying them as a case-sensitive string in setAllowedCharacters
method of RecognitionSettings
object.
Alternatively, you can use the preset:
Preset | Subset of characters |
---|---|
CharactersAllowedType.ALL | All characters. |
CharactersAllowedType.LATIN_ALPHABET | Latin / English text (A to Z and a to z ), without accented characters. |
CharactersAllowedType.DIGITS | Binary, octal, decimal, or hexadecimal numbers (0-9 and A to F ). |
Characters that do not match the provided list are ignored.
AsposeOCR api = new AsposeOCR();
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAllowedCharacters(CharactersAllowedType.DIGITS);
// Prepare batch
OcrInput images = new OcrInput(InputType.SingleImage, filters);
images.add("image.png");
// Recognize images
ArrayList<RecognitionResult> results = api.Recognize(input, recognitionSettings);
System.out.println(results[0].recognitionText);