在 Python 中从演示文稿形状提取图像
概述
演示文稿中的图像可以出现在多种形状类型中:普通图片框、应用于形状的图片填充、OLE 对象预览图像、视频或音频帧缩略图、缩放图像,或嵌套在表格、图表和 SmartArt 形状中的图像。Aspose.Slides 将这些图像存储在演示文稿的图像集合中,可通过 ImageCollection 和 PPImage 对象访问。
如果您只需要导出演示文稿中嵌入的每个图像资源,只需遍历 presentation.images。本文关注的是另一项任务:遍历形状以查找幻灯片中使用图像的位置,从而在保存的文件中保留有用的上下文信息,例如幻灯片编号、形状位置和来源类型(图片框、填充图像、媒体预览、OLE 预览或缩放图像)。
共享帮助方法
下面的帮助方法使示例保持简短。save_original_image 写入原始嵌入字节,根据 MIME 类型选择安全的扩展名,并通过 SHA-256 哈希跳过重复的图像二进制文件。
import hashlib
import re
from pathlib import Path
import aspose.slides as slides
import aspose.slides.charts as charts
import aspose.slides.smartart as smartart
def save_original_image(image, output_directory, file_name_base, saved_image_hashes):
image_data = bytes(image.binary_data)
image_hash = hashlib.sha256(image_data).hexdigest()
if image_hash in saved_image_hashes:
return False
saved_image_hashes.add(image_hash)
extension = get_extension_from_content_type(image.content_type)
file_name = f"{file_name_base}.{extension}"
output_path = Path(output_directory) / file_name
output_path.write_bytes(image_data)
return True
def save_image_as_png(image, output_directory, file_name_base):
file_name = f"{file_name_base}.png"
output_path = Path(output_directory) / file_name
image.image.save(str(output_path), slides.ImageFormat.PNG)
def get_picture_fill_image(fill_format):
if fill_format is None or fill_format.fill_type != slides.FillType.PICTURE:
return None
return fill_format.picture_fill_format.picture.image
def enumerate_shapes(shapes, prefix, include_grouped_shapes):
for shape_index, shape in enumerate(shapes, start=1):
shape_name_part = f"{prefix}_shape_{shape_index}"
yield shape, shape_name_part
if include_grouped_shapes and isinstance(shape, slides.GroupShape):
yield from enumerate_shapes(
shape.shapes,
shape_name_part,
include_grouped_shapes)
def get_extension_from_content_type(content_type):
if not content_type:
return "bin"
media_type = content_type.split(";")[0].strip().lower()
extensions = {
"image/jpeg": "jpg",
"image/png": "png",
"image/gif": "gif",
"image/bmp": "bmp",
"image/tiff": "tiff",
"image/x-emf": "emf",
"image/emf": "emf",
"image/x-wmf": "wmf",
"image/wmf": "wmf",
"image/svg+xml": "svg",
}
if media_type in extensions:
return extensions[media_type]
if media_type.startswith("image/"):
extension = media_type[len("image/"):]
return make_safe_file_name_part(extension)
return "bin"
def make_safe_file_name_part(value):
return re.sub(r'[<>:"/\\|?*]', "_", value)
从图片框提取图像
对于作为独立对象插入的图片,请使用此方法。PictureFrame 将其图片存储在 picture_format.picture.image 中,返回一个 PPImage 对象。
input_path = "sample.pptx"
output_directory = Path.cwd() / "extracted-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=False):
if type(shape) is slides.PictureFrame:
image = shape.picture_format.picture.image
save_original_image(image, output_directory, name_part, saved_image_hashes)
从填充图片的形状提取图像
形状可以使用图片作为填充。首先检查形状的填充类型:如果不是 FillType.PICTURE,则该填充中没有可提取的图片。下面的示例处理 AutoShape 对象,并通过 PPImage 的 image 属性将每个图像保存为 PNG。
input_path = "sample.pptx"
output_directory = Path.cwd() / "shape-fill-images"
output_directory.mkdir(parents=True, exist_ok=True)
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=False):
if isinstance(shape, slides.AutoShape):
image = get_picture_fill_image(shape.fill_format)
if image is not None:
save_image_as_png(image, output_directory, name_part)
从 OLE 对象框提取预览图像
OleObjectFrame 可以拥有 PowerPoint 用作对象在幻灯片上预览的替代图片。该图像可通过 substitute_picture_format.picture.image 获取。提取此图片得到的是预览图像,而不是嵌入的 OLE 包内容。
input_path = "sample.pptx"
output_directory = Path.cwd() / "ole-preview-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=False):
if isinstance(shape, slides.OleObjectFrame):
image = shape.substitute_picture_format.picture.image
if image is not None:
file_name_base = f"{name_part}_ole_preview"
save_original_image(image, output_directory, file_name_base, saved_image_hashes)
从视频帧提取预览图像
VideoFrame 也可以在 picture_format.picture.image 中存储预览图像。它是显示在幻灯片上的海报或缩略图,而不是从视频流解码的帧。
input_path = "sample.pptx"
output_directory = Path.cwd() / "video-preview-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=False):
if isinstance(shape, slides.VideoFrame):
image = shape.picture_format.picture.image
if image is not None:
file_name_base = f"{name_part}_video_preview"
save_original_image(image, output_directory, file_name_base, saved_image_hashes)
从音频帧提取预览图像
AudioFrame 可以在 picture_format.picture.image 中存储缩略图。它是幻灯片上音频对象显示的图像。
input_path = "sample.pptx"
output_directory = Path.cwd() / "audio-preview-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=False):
if isinstance(shape, slides.AudioFrame):
image = shape.picture_format.picture.image
if image is not None:
file_name_base = f"{name_part}_audio_preview"
save_original_image(image, output_directory, file_name_base, saved_image_hashes)
从缩放对象提取图像
ZoomFrame 和 SectionZoomFrame 形状可以使用自定义图像。请从缩放框中读取 zoom_image。
input_path = "sample.pptx"
output_directory = Path.cwd() / "zoom-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=False):
if isinstance(shape, slides.ZoomFrame) and shape.zoom_image is not None:
file_name_base = f"{name_part}_zoom"
save_original_image(shape.zoom_image, output_directory, file_name_base, saved_image_hashes)
continue
if isinstance(shape, slides.SectionZoomFrame) and shape.zoom_image is not None:
file_name_base = f"{name_part}_section_zoom"
save_original_image(shape.zoom_image, output_directory, file_name_base, saved_image_hashes)
continue
从摘要缩放框提取图像
SummaryZoomFrame 也是一种形状。其章节项可以使用自定义图像,可通过每个摘要缩放章节的 zoom_image 属性获取。
input_path = "sample.pptx"
output_directory = Path.cwd() / "summary-zoom-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=False):
if isinstance(shape, slides.SummaryZoomFrame):
section_count = len(shape.summary_zoom_collection)
for section_index in range(section_count):
section = shape.summary_zoom_collection[section_index]
if section.zoom_image is not None:
display_index = section_index + 1
file_name_base = f"{name_part}_summary_zoom_{display_index}"
save_original_image(section.zoom_image, output_directory, file_name_base, saved_image_hashes)
从表格形状提取图像
Table 是一种形状。表格中的图像通常以图片填充的形式存储在单元格中。
input_path = "sample.pptx"
output_directory = Path.cwd() / "table-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=True):
if isinstance(shape, slides.Table):
row_count = len(shape.rows)
column_count = len(shape.columns)
for row_index in range(row_count):
for column_index in range(column_count):
cell = shape.rows[row_index][column_index]
image = get_picture_fill_image(cell.cell_format.fill_format)
if image is not None:
file_name_base = f"{name_part}_cell_{row_index + 1}_{column_index + 1}"
save_original_image(image, output_directory, file_name_base, saved_image_hashes)
从图表形状提取图像
Chart 是一种形状。下面的示例从图表区域的图片填充中提取图像。
input_path = "sample.pptx"
output_directory = Path.cwd() / "chart-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=True):
if isinstance(shape, charts.Chart):
fill_format = shape.fill_format
image = get_picture_fill_image(fill_format)
if image is not None:
file_name_base = f"{name_part}_chart_area"
save_original_image(image, output_directory, file_name_base, saved_image_hashes)
从 SmartArt 形状提取图像
SmartArt 对象是一种形状。根据 SmartArt 布局,图像可能存储在节点项目符号填充中或节点形状的填充格式中。
input_path = "sample.pptx"
output_directory = Path.cwd() / "smartart-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=True):
if isinstance(shape, smartart.SmartArt):
node_count = len(shape.all_nodes)
for node_index in range(node_count):
node = shape.all_nodes[node_index]
bullet_image = get_picture_fill_image(node.bullet_fill_format)
if bullet_image is not None:
file_name_base = f"{name_part}_smartart_node_{node_index + 1}_bullet"
save_original_image(bullet_image, output_directory, file_name_base, saved_image_hashes)
node_shape_count = len(node.shapes)
for node_shape_index in range(node_shape_count):
node_shape = node.shapes[node_shape_index]
image = get_picture_fill_image(node_shape.fill_format)
if image is not None:
file_name_base = f"{name_part}_smartart_node_{node_index + 1}_shape_{node_shape_index + 1}"
save_original_image(image, output_directory, file_name_base, saved_image_hashes)
包含分组形状内的图像
分组形状包含其自己的形状集合。共享的 enumerate_shapes 帮助方法具有 include_grouped_shapes 选项。当您想检查 GroupShape 对象内部的形状时,请将其设为 True。下面的示例从图片框、填充图片的形状、OLE 对象预览、视频帧缩略图和音频帧缩略图中提取图像。若还要包括表格、图表、SmartArt 和摘要缩放图像,请重复使用前面章节中的专用提取逻辑,同时保持相同的递归形状遍历。
input_path = "sample.pptx"
output_directory = Path.cwd() / "all-shape-images"
output_directory.mkdir(parents=True, exist_ok=True)
saved_image_hashes = set()
with slides.Presentation(input_path) as presentation:
for slide in presentation.slides:
slide_prefix = f"slide_{slide.slide_number}"
for shape, name_part in enumerate_shapes(
slide.shapes,
slide_prefix,
include_grouped_shapes=True):
if isinstance(shape, slides.OleObjectFrame):
image = shape.substitute_picture_format.picture.image
if image is not None:
file_name_base = f"{name_part}_ole_preview"
save_original_image(image, output_directory, file_name_base, saved_image_hashes)
continue
if isinstance(shape, slides.VideoFrame):
image = shape.picture_format.picture.image
if image is not None:
file_name_base = f"{name_part}_video_preview"
save_original_image(image, output_directory, file_name_base, saved_image_hashes)
continue
if isinstance(shape, slides.AudioFrame):
image = shape.picture_format.picture.image
if image is not None:
file_name_base = f"{name_part}_audio_preview"
save_original_image(image, output_directory, file_name_base, saved_image_hashes)
continue
if type(shape) is slides.PictureFrame:
image = shape.picture_format.picture.image
save_original_image(image, output_directory, name_part, saved_image_hashes)
continue
if isinstance(shape, slides.AutoShape):
image = get_picture_fill_image(shape.fill_format)
if image is not None:
save_original_image(image, output_directory, name_part, saved_image_hashes)
边缘情况和实用说明
- 重复图像: 多个形状可能引用相同的图像,或不同的图像具有相同的字节。在写入文件之前对 PPImage 的
binary_data属性进行哈希,如果您希望每个唯一图像只产出一个文件。 - 原始数据与转换后输出: 保存 PPImage 的
binary_data属性可保留嵌入的 JPEG、PNG、GIF、SVG、EMF 或 WMF 数据。通过save保存image属性在您希望得到统一输出格式时很有用。 - 不支持的填充类型: 实心、渐变、图案和无填充的形状不包含图片填充。在读取
picture_fill_format之前请检查 FillType。 - 分组形状: 幻灯片顶层形状集合不会展开分组。当分组内容重要时,请递归检查 GroupShape.shapes。
- OLE 对象预览: OleObjectFrame 可能通过
substitute_picture_format暴露预览图像,但该图像仅为幻灯片预览,并非 OLE 对象内部的嵌入文件。 - 视频帧缩略图: VideoFrame 可能通过
picture_format暴露预览图像,但该图像仅为幻灯片上显示的海报,并非从视频流中提取。 - 音频帧缩略图: AudioFrame 可能通过
picture_format暴露图标或缩略图;这并非嵌入的音频数据。 - 缩放图像: 幻灯片缩放、章节缩放和摘要缩放形状可能通过
image使用自定义的 PPImage 对象。 - 嵌套形状模型: 表格、图表和 SmartArt 对象实现了 Shape,但它们的图像通常存储在嵌套的表格单元格、图表元素或 SmartArt 节点格式对象中。
- 裁剪或变换的图片: 访问 PPImage 可获得存储的图像资源。它不会渲染形状所应用的裁剪、透明度、重新着色、旋转或其他视觉效果。
常见问题
我能提取未裁剪、未应用效果或形状变换的原始图像吗?
可以。访问 PPImage 对象并将其 binary_data 属性写入磁盘。这会保留演示文稿中存储的原始编码图像,而不是图像在幻灯片上的渲染方式。
我能将所有提取的图像导出为 PNG 吗?
可以。使用 PPImage 的 image 属性获取图像对象,然后使用 ImageFormat.PNG 调用 save。这会转换输出,但可能不保留原始文件类型或矢量数据。
我如何避免多次保存相同的图像?
对 PPImage 的 binary_data 属性进行哈希,并将哈希值保存在集合中。如果新图像的哈希已经存在,则跳过或记录对已有输出文件的另一个引用。
为什么某些形状不产生图像?
图片框、填充图片的形状、OLE 对象框、媒体框、缩放框、表格、图表和 SmartArt 对象可以引用图像。某些形状类型通过嵌套的格式对象暴露图像,因此仅检查 picture_format 或形状的 fill_format 并不总是足够。
我能提取视频帧显示的缩略图吗?
可以。使用 VideoFrame 并读取 picture_format.picture.image。这会提取随视频帧存储的海报图像,而不是从视频文件生成的帧。
我如何确定演示文稿图像集合中哪些形状使用了特定图像?
Aspose.Slides 不会存储从 PPImage 到形状的反向链接。在遍历过程中构建映射:每当找到图像引用时,记录幻灯片编号、形状路径以及图像哈希或集合项。
我能提取嵌入在 OLE 对象内部的图像,例如附加文档中的图像吗?
您可以从 OleObjectFrame 的 substitute_picture_format 属性提取 OLE 对象的幻灯片预览。但该预览并非嵌入的文档本身。若要提取嵌入文件内部的图像,需要提取 OLE 数据并使用该文件类型的工具进行检查。