Supported file formats

Recognized image formats

Aspose.OCR for Python via Java can recognize any file you get from a scanner or camera:

Extension Details
.PDF Portable Document Format.
.JPG JPEG, the most popular format for smartphone photos.
.PNG Portable Network Graphics, 24-bit with transparency.
.TIFF or .TIF Tag Image File Format, commonly used for high quality scanning. Multi-page TIFF images are fully supported.
.GIF Graphics Interchange Format, limited to 256 colors.
.BMP Bitmap image file.
.DJVU DjVu, primarily designed for scanned documents, containing a combination of text, line drawings, indexed color images, and photographs.

Additional recognition options

  • You can recognize the above-mentioned file formats from folders. The number of recognized files is unlimited; however, subfolders are not processed.
  • You can recognize the above-mentioned file formats from ZIP archives. The number of recognized files is unlimited, but the library does not process nested folders and archives.
  • Aspose.OCR for Python via Java can read an image from the public URL that points directly to the file. However, it cannot extract images from HTML pages and does not support authentication.

Recognition results

Recognition results are returned in the most popular document and data exchange formats:

Format Details
.TXT Plain text
.HTML Web page
.RTF A universal format for exchanging rich text documents between different word processing programs
.DOCX Microsoft Word document
.XLSX Microsoft Excel spreadsheet
.PDF Portable Document Format
.EPUB Popular e-book file format
JSON A popular open-standard format, widely used in software development and data exchange
XML Extensible Markup Language, a universal format for most systems