Noise removal

Dirt, spots, scratches, glare, unwanted gradients, and other noise are a common problem when scanning low-quality sources such as newspapers or old books, or when taking photographs. These image defects can interfere with recognition, significantly reduce the accuracy of OCR, and may cause spots to be misinterpreted as characters.

Aspose.OCR provides automated processing algorithms that remove noise from images before proceeding to recognition.

Automatic noise removal

To automatically remove the noise from the image before recognition, run the image through AutoDenoising preprocessing filter.

AsposeOCR api = new AsposeOCR();
// Apply automatic noise removal
PreprocessingFilter filters = new PreprocessingFilter();
filters.add(PreprocessingFilter.AutoDenoising());
// Prepare batch
OcrInput images = new OcrInput(InputType.SingleImage, filters);
images.add("image.png");
// Save processed images to the folder
ImageProcessing.Save(images, "C:\\images");
Noisy image Denoised image

Image regions preprocessing

You can automatically remove noise from certain areas of the image. For example, remove compression artifacts from the text of an article, leaving the headings unchanged.

To apply a filter to an area, specify its top left corner along with width and height as Rectangle object. If the region is omitted, the filter is applied to the entire image.

Rectangle rectangle = new Rectangle(5, 161, 340, 340);
PreprocessingFilter filters = new PreprocessingFilter();
filters.add(PreprocessingFilter.AutoDenoising(rectangle));

Usage scenarios

Automatic noise removal is recommended for the following images:

  • Photos, especially those taken in low light conditions.
  • Old books.
  • Newspapers.
  • Postcards.
  • Text with a photo or picture as a background.
  • Scanned papers with spots and dirt.

However, noise removal can reduce recognition accuracy when working with poor-quality prints, as it can lead to the loss of important details, such as light punctuation or heavily fragmented characters.