Common recognition settings
Contents
[
Hide
]
Aspose.OCR for Python via .NET allows for very flexible customization of recognition accuracy, performance, and other settings by configuring the properties of the RecognitionSettings
object.
These universal settings are applicable when extracting text from single-page and multi-page images, scanned PDFs, DjVu files, folders, archives and other content.
Setting | Type | Default value | Description |
---|---|---|---|
allowed_symbols |
string |
All characters of the selected language | The whitelist of characters Aspose.OCR engine will look for. |
detect_areas_mode |
DetectAreasMode | auto | Manually override the default document areas detection method. |
ignored_symbols |
string | none | A blacklist of characters that are ignored during recognition. |
language |
Language | Language.NONE |
Specify a language for recognition. |
lines_filtration |
boolean | false |
Set to true to recognize text in tables.Set to false to improve performance by ignoring table structures and treating tables as plain text. |
recognize_single_line |
boolean | false |
Recognize a single-line image. Disables automatic document region detection. Improves the recognition performance of simple images. |
upscale_small_font |
boolean | false |
Improve small font recognition and detection of dense lines. |
automatic_color_inversion |
boolean | true |
Improve recognition accuracy of white text on a dark/black background. If you are not optimizing every aspect of recognition (for example, for online applications or entry-level devices), leave this setting set to true. This setting is only applicable when using one of the following document area detection modes: |
threads_count |
integer | auto | The number of CPU threads used for recognition. |
Applicable to
- Extracting text from images, scanned PDFs, DjVu files and other content provided as
OcrInput
object.
Example
The following code example shows how to fine-tune recognition:
# Instantiate Aspose.OCR API
api = AsposeOcr()
# Add images to the recognition batch
input = OcrInput(InputType.SINGLE_IMAGE)
input.add("source1.png")
input.add("source2.png")
# Customize recognition settings
recognitionSettings = RecognitionSettings()
recognitionSettings.language = Language.UKR
recognitionSettings.detect_areas_mode = DetectAreasMode.TABLE
# Recognize the image
result = api.recognize(input, recognitionSettings)
# Print recognition result
print(result[0].recognition_text)
input("Press Enter to continue...")