Extract Images from PDF using Python
Contents
[
Hide
]
Use Document to open the PDF, then access the page resources to retrieve an XImage object and save it as a separate file. This approach is useful when you need to reuse images, inspect extracted assets, or build image-processing workflows from PDF content.
- Open the PDF as a
Document. - Access the image resource from the target page.
- Retrieve the required
XImagefrom the page image collection. - Save the extracted image to an output file.
import aspose.pdf as apdf
from io import FileIO
from os import path
path_infile = path.join(self.dataDir, infile)
path_outfile = path.join(self.dataDir, outfile)
document = apdf.Document(path_infile)
xImage = document.pages[1].resources.images[1]
with FileIO(path_outfile, "w") as output_image:
xImage.save(output_image)