Defining the whitelist of characters

Limiting a subset of characters instead of using the full set can greatly improve recognition accuracy, increase speed, and reduce resource consumption. A list of characters can be automatically identified from an image using the built-in Aspose.OCR mechanisms.

Predefined character sets

To define the predefined set of characters Aspose.OCR engine will look for, provide one of the following values in AllowedCharacters property of recognition settings:

Subset Action
Aspose.OCR.CharactersAllowedType.ALL Try to recognize all characters.
Aspose.OCR.CharactersAllowedType.LATIN_ALPHABET Only recognize Latin / English text (A to Z and a to z), without accented characters.
Aspose.OCR.CharactersAllowedType.DIGITS Recognize only binary, octal, decimal, or hexadecimal numbers (0-9 and A to F).

Characters that do not match the provided subset are ignored.

Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr();
Aspose.OCR.RecognitionSettings recognitionSettings = new Aspose.OCR.RecognitionSettings();
recognitionSettings.AllowedCharacters = Aspose.OCR.CharactersAllowedType.DIGITS;
Aspose.OCR.RecognitionResult result = recognitionEngine.RecognizeImage("source.png", recognitionSettings);

Custom characters list

You can specify your own list of characters to be recognized in the constructor of Aspose.OCR.AsposeOcr class. The characters are provided as a case-sensitive string.

Characters that do not match the provided list are ignored.

Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr("AÁBCDEÉFG12345");
Aspose.OCR.RecognitionResult result = recognitionEngine.RecognizeImage("source.png", new Aspose.OCR.RecognitionSettings());