Detecting image defects
Image defects can significantly impact the accuracy of OCR. They can be caused by the quality of the image acquisition process, environmental conditions, and the hardware used to capture the image. To improve recognition accuracy, it is essential to preprocess and enhance images to mitigate these defects whenever possible.
Aspose.OCR for C++ can automatically find potentially problematic areas of image during recognition. In order to enable this functionality, specify the type of image defects to be detected in defect_type
member of recognition settings or use a specialized asposeocr_detect_defects()
function. The latter approach only returns the information about defects without recognizing the image.
The following types of defects can be found:
Defect | Enumeration | Description | Impact | How to mitigate |
---|---|---|---|---|
Salt-and-pepper noise | AsposeOCRDefectType.ASPOSE_OCR_SALT_PEPPER_NOISE |
Appears as random white and black pixels scattered across the area. Often occurs in digital photographs. |
|
|
Low contrast between text and background | AsposeOCRDefectType.ASPOSE_OCR_DARK_IMAGES |
Highlights and shadows typically appear on curved pages. |
|
|
Curved text | AsposeOCRDefectType.ASPOSE_OCR_CURVED_TEXT |
Cylindrical curvature of the page that often appear when photographing pages of books and magazine articles. |
|
|
Blur | AsposeOCRDefectType.ASPOSE_OCR_BLURED_IMAGE |
The entire image or some of its areas are out of focus. Important: This detection algorithm can only identify the entire image as blurry. Specific areas cannot be detected. |
|
|
Glare | AsposeOCRDefectType.ASPOSE_OCR_GLARE |
Highlight areas in an image caused by uneven lighting, such as spot lights or flash. |
|
|
Thick text | AsposeOCRDefectType.ASPOSE_OCR_EXTRA_BOLD_TEXT |
Extra-bold text. |
|
|
defect_type
member in recognition settings is not specified, the image will not be analyzed for problems. This can speed up the OCR speed and lower resource usage.
The number of areas with problems are returned in defects_count
property of AsposeOCRRecognizedPage
structure returned in recognition results. The areas of the image with defects are returned in defect_areas
property of AsposeOCRRecognizedPage
structure. It contains the following members:
Member | Type | Description |
---|---|---|
type |
AsposeOCRDefectType |
Identified defect type:
|
area |
rect |
Coordinates of the image are with defect (top/left corner, width and height). Important: When using AsposeOCRDefectType.ASPOSE_OCR_DETECT_BLURED_IMAGE detection algorithm, the entire image area is returned. |
You can highlight problem areas when previewing an image and even OCR them using alternative recognition settings to get a better result.
Live demo
Example
The following code example shows how to detect problematic areas of an image: