Detecting image defects

[ ]

Image defects can significantly impact the accuracy of OCR. They can be caused by the quality of the image acquisition process, environmental conditions, and the hardware used to capture the image. To improve recognition accuracy, it is essential to preprocess and enhance images to mitigate these defects whenever possible.

Aspose.OCR for C++ can automatically find potentially problematic areas of image during recognition. In order to enable this functionality, specify the type of image defects to be detected in defect_type member of recognition settings or use a specialized asposeocr_detect_defects() function. The latter approach only returns the information about defects without recognizing the image.

The following types of defects can be found:

Defect Enumeration Description Impact How to mitigate
Salt-and-pepper noise AsposeOCRDefectType.ASPOSE_OCR_SALT_PEPPER_NOISE Appears as random white and black pixels scattered across the area. Often occurs in digital photographs.
  • Some characters are misidentified
  • Unnecessary dots or commas appear in recognition results
Low contrast between text and background AsposeOCRDefectType.ASPOSE_OCR_DARK_IMAGES Highlights and shadows typically appear on curved pages.
  • Low recognition accuracy
  • Text not recognized (ignored by OCR engine)
Curved text AsposeOCRDefectType.ASPOSE_OCR_CURVED_TEXT Cylindrical curvature of the page that often appear when photographing pages of books and magazine articles.
  • Some characters are misidentified
  • Text not recognized
Blur AsposeOCRDefectType.ASPOSE_OCR_BLURED_IMAGE The entire image or some of its areas are out of focus.
Important: This detection algorithm can only identify the entire image as blurry. Specific areas cannot be detected.
  • Characters are not recognized correctly
  • Text not recognized (ignored by OCR engine)
Glare AsposeOCRDefectType.ASPOSE_OCR_GLARE Highlight areas in an image caused by uneven lighting, such as spot lights or flash.
  • Low recognition accuracy
  • Text not recognized (ignored by OCR engine)
Thick text AsposeOCRDefectType.ASPOSE_OCR_EXTRA_BOLD_TEXT Extra-bold text.
  • Some characters are misidentified
  • At the moment, Aspose.OCR engine does not have a preprocessing algorithm that can deal with such text.

The number of areas with problems are returned in defects_count property of AsposeOCRRecognizedPage structure returned in recognition results. The areas of the image with defects are returned in defect_areas property of AsposeOCRRecognizedPage structure. It contains the following members:

Member Type Description
type AsposeOCRDefectType Identified defect type:
  • AsposeOCRDefectType.ASPOSE_OCR_SALT_PEPPER_NOISE - salt-and-pepper noise.
  • AsposeOCRDefectType.ASPOSE_OCR_DARK_IMAGES - low contrast between text and background.
  • AsposeOCRDefectType.ASPOSE_OCR_CURVED_TEXT - curved lines.
  • AsposeOCRDefectType.ASPOSE_OCR_BLURED_IMAGE - blur.
  • AsposeOCRDefectType.ASPOSE_OCR_GLARE - glare.
  • AsposeOCRDefectType.ASPOSE_OCR_EXTRA_BOLD_TEXT - extra-bold (thick) text.
area rect Coordinates of the image are with defect (top/left corner, width and height).
Important: When using AsposeOCRDefectType.ASPOSE_OCR_DETECT_BLURED_IMAGE detection algorithm, the entire image area is returned.

You can highlight problem areas when previewing an image and even OCR them using alternative recognition settings to get a better result.

Live demo

Low-contrast image


The following code example shows how to detect problematic areas of an image: