Extract Images from PDF using Python

Contents
[ ]

Use Document to open the PDF, then access the page resources to retrieve an XImage object and save it as a separate file. This approach is useful when you need to reuse images, inspect extracted assets, or build image-processing workflows from PDF content.

  1. Open the PDF as a Document.
  2. Access the image resource from the target page.
  3. Retrieve the required XImage from the page image collection.
  4. Save the extracted image to an output file.

    import aspose.pdf as apdf
    from io import FileIO
    from os import path

    path_infile = path.join(self.dataDir, infile)
    path_outfile = path.join(self.dataDir, outfile)

    document = apdf.Document(path_infile)
    xImage = document.pages[1].resources.images[1]
    with FileIO(path_outfile, "w") as output_image:
        xImage.save(output_image)