Photos taken in low light conditions can have a lot of digital noise. Noise can also show up in highly compressed JPEG images in form of compression artifacts. This noise can mislead OCR algorithms and prevent other preprocessing filters from working properly.
Aspose.OCR provides an alternative method for removing noise from an image at the cost of some detail, called the median filter. This makes the image a little blurry while preserving the edges of high-contrast objects such as letters. The results can be further improved with the auto-contrast or binarization preprocessing filters.
Applying the median filter
To smooth out noise in an image, run the image through
Median preprocessing filter.
AsposeOCR api = new AsposeOCR(); // Scale the image to twice its original size using bilinear interpolation PreprocessingFilter filters = new PreprocessingFilter(); filters.add(PreprocessingFilter.Median()); // Save preprocessed image to file BufferedImage imageRes = api.PreprocessImage("source.png", filters); File outputSource = new File("result.png"); ImageIO.write(imageRes, "png", outputSource); // Append preprocessing filters to recognition settings RecognitionSettings recognitionSettings = new RecognitionSettings(); recognitionSettings.setPreprocessingFilters(filters); // Recognize image RecognitionResult result = api.RecognizePage("source.png", recognitionSettings); System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");
Image regions preprocessing
The median filter can be applied to specific areas of an image. For example, you can smooth an illustration in the newspaper article while leaving the rest of the content unchanged.
To apply a filter to an area, specify its top left corner along with width and height as
Rectangle object. If the region is omitted, the filter is applied to the entire image.
Rectangle rectangle = new Rectangle(5, 161, 340, 340); PreprocessingFilter filters = new PreprocessingFilter(); filters.add(PreprocessingFilter.Median(rectangle));
Median filter is recommended for the following images:
- Photos that were taken in low light conditions.
- Poor quality printouts.
- Highly compressed / low quality JPEG’s.