Noise removal

Dirt, spots, scratches, glare, unwanted gradients, and other noise are a common problem when scanning low-quality sources such as newspapers or old books, or when taking photographs. These image defects can interfere with recognition, significantly reduce the accuracy of OCR, and may cause spots to be misrecognised as characters.

Aspose.OCR for Python via .NET provides automated processing algorithms that remove noise from images before proceeding to recognition.

Automatic noise removal

To automatically remove the noise from the image before recognition, run the image through auto_denoising processing filter.

# Instantiate Aspose.OCR API
api = AsposeOcr()
# Initialize image processing
filters = PreprocessingFilter()
filters.add(PreprocessingFilter.auto_denoising())
# Add image to the recognition batch and apply processing filter
input = OcrInput(InputType.SINGLE_IMAGE, filters)
input.add("source.png")
# Save processed image to the "result" folder
ImageProcessing.save(input, "result")
# Recognize the image
result = api.recognize(input)
# Print recognition result
print(result[0].recognition_text)
Noisy image Denoised image

Usage scenarios

Automatic noise removal is recommended for the following images:

  • Photos, especially those taken in low light conditions.
  • Old books.
  • Newspapers.
  • Postcards.
  • Text with a photo or picture as a background.
  • Scanned papers with spots and dirt.

However, noise removal can reduce recognition accuracy when working with poor-quality prints, as it can lead to the loss of important details, such as light punctuation or heavily fragmented characters.