使用 Python 提取 PDF 中的图像

使用文档打开 PDF，然后访问页面资源以检索一个 XImage 对象并将其另存为单独的文件。此方法在需要重复使用图像、检查提取的资源或从 PDF 内容构建图像处理工作流时非常有用。

将 PDF 打开为 Document.
从目标页面访问图像资源。
检索所需的 XImage 来自页面图像集合。
将提取的图像保存到输出文件。


    import aspose.pdf as apdf
    from io import FileIO
    from os import path

    path_infile = path.join(self.dataDir, infile)
    path_outfile = path.join(self.dataDir, outfile)

    document = apdf.Document(path_infile)
    xImage = document.pages[1].resources.images[1]
    with FileIO(path_outfile, "w") as output_image:
        xImage.save(output_image)

使用 Python 从 PDF 中提取文本通过 Python 提取 PDF 中的字体