Multithreading support

Aspose.OCR for Python via .NET allows you to limit the number of threads used by the recognition engine. Decreasing the number of threads may slow down the recognition speed, but will allow you to reserve some resources for other processes such as the parallel image processing or background data analysis.

To set the number of threads, use threads_count parameter of the recognition settings:

Value Behavior
Not set The OCR engine will use all CPU cores/threads.
0 The OCR engine will use all CPU cores/threads.
1 OCR will be performed on the application’s main thread.
More than 1 The OCR engine will use the number of threads provided, but not more than the total number of CPU cores/threads. Read How it works for technical details.

How it works

Multithreading works differently depending on the number of images in the recognition batch.

Recognizing one image

Applies to:

  • recognition of a single image;
  • recognition of a single-page PDF.

Aspose.OCR for Python via .NET will use the provided number of threads to recognize blocks of text found on the image/page in parallel.

Recognizing multiple files/pages

Applies to:

  • recognition of several images;
  • recognition of multi-page PDF documents;
  • recognition of multi-page TIFF images;
  • recognition of DjVu files;
  • bulk recognition of all images in a folder;
  • bulk recognition of all images in a ZIP archive.

Each image from the batch is processed in one separate thread. If more than one thread is specified in threads_count parameter, images are recognized in parallel. The number of images processed simultaneously cannot exceed the value of threads_count recognition settings or the total number of CPU cores/threads (whichever is less). Image regions are always recognized one by one.

Example

# Instantiate Aspose.OCR API
api = AsposeOcr()
# Add images to the recognition batch
input = OcrInput(InputType.SINGLE_IMAGE)
input.add("source1.png")
input.add("source2.png")
input.add("source3.png")
input.add("source4.png")
# Limit resource usage to 2 threads
recognitionSettings = RecognitionSettings()
recognitionSettings.threads_count = 2
# Recognize the image
results = api.recognize(input, recognitionSettings)
# Print recognition results
for result in results:
	print(result.recognition_text)