Defining the whitelist of characters
Contents
[
Hide
]
Limiting a subset of characters instead of using the full set can greatly improve recognition accuracy, increase speed, and reduce resource consumption. A list of characters can be automatically identified from an image using the built-in Aspose.OCR mechanisms.
Predefined character sets
To define the predefined set of characters Aspose.OCR engine will look for, provide one of the following values in AllowedCharacters
property of recognition settings:
Subset | Action |
---|---|
Aspose.OCR.CharactersAllowedType.ALL | Try to recognize all characters. |
Aspose.OCR.CharactersAllowedType.LATIN_ALPHABET | Only recognize Latin / English text (A to Z and a to z ), without accented characters. |
Aspose.OCR.CharactersAllowedType.DIGITS | Recognize only binary, octal, decimal, or hexadecimal numbers (0-9 and A to F ). |
Characters that do not match the provided subset are ignored.
Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr();
Aspose.OCR.RecognitionSettings recognitionSettings = new Aspose.OCR.RecognitionSettings();
recognitionSettings.AllowedCharacters = Aspose.OCR.CharactersAllowedType.DIGITS;
Aspose.OCR.RecognitionResult result = recognitionEngine.RecognizeImage("source.png", recognitionSettings);
Console.WriteLine(result.RecognitionText);
Custom characters list
You can specify your own list of characters to be recognized in the constructor of Aspose.OCR.AsposeOcr
class. The characters are provided as a case-sensitive string.
Characters that do not match the provided list are ignored.
Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr("AÁBCDEÉFG12345");
Aspose.OCR.RecognitionResult result = recognitionEngine.RecognizeImage("source.png", new Aspose.OCR.RecognitionSettings());
Console.WriteLine(result.RecognitionText);