Extract Vector Data from a PDF file using Python
Access Vector Data from a PDF Document
Use GraphicsAbsorber to inspect vector graphic elements on a page of a Document. After visiting the target page, iterate through the extracted elements to examine properties such as rectangle bounds, positions, and drawing operators.
- Open the source PDF as a
Document. - Create a
GraphicsAbsorberinstance. - Call
gr_absorber.visit(page)on the target page. - Read the extracted items from
gr_absorber.elements. - Iterate through the elements and write their properties to an output file.
import aspose.pdf as ap
def extract_graphics_elements(infile, outfile):
"""
Extract vector graphic elements from a specified page of a PDF and log basic element properties.
Args:
infile (str): Path to input PDF file.
outfile (str): Path to output text file for logging element info.
"""
document = ap.Document(infile)
try:
gr_absorber = ap.vector.GraphicsAbsorber()
# Visit page 2 (pages collection is 1-indexed; document.pages[1] is the second page)
gr_absorber.visit(document.pages[1])
elements = gr_absorber.elements
with open(outfile, "w", encoding="utf-8") as f:
for idx, elem in enumerate(elements, start=1):
# Basic properties
rect = elem.rectangle
pos = elem.position
ops_count = len(elem.operators)
f.write(
f"Element {idx}: Rectangle = {rect}, Position = {pos}, Operators = {ops_count}\n"
)
finally:
document.close()
Save Vector Graphics from a Page to an SVG File
Export vector graphics from a PDF page to SVG to preserve scalable paths and shapes outside the original PDF. This method is useful for reusing vector artwork in web, design, or publishing workflows.
- Load the PDF document.
- Access the target page.
- Call
page.try_save_vector_graphics()to export the page’s vector paths to SVG. - Close the document.
import aspose.pdf as ap
def save_vector_graphics_to_svg(infile, svg_outfile):
"""
Save vector graphics from a specified page of a PDF document into an SVG file.
Args:
infile (str): Path to input PDF file.
svg_outfile (str): Path to output SVG file.
"""
document = ap.Document(infile)
try:
page = document.pages[1]
# Try to save vector graphics into SVG
page.try_save_vector_graphics(svg_outfile)
finally:
document.close()
Extract Each Sub-path to a Separate SVG
When a page contains multiple independent vector paths, use SvgExtractionOptions with SvgExtractor to write each sub-path to a separate SVG file.
- Load the PDF.
- Create
SvgExtractionOptionsand setextract_every_subpath_to_svg. - Access the first page of the document.
- Instantiate
SvgExtractorwith the options. - Call
extractor.extract()to write separate SVG files for each vector sub-path. - Close the document.
import aspose.pdf as ap
def extract_subpaths_to_svgs(infile, output_dir):
"""
Extract each vector sub-path on a PDF page into separate SVG files using extraction options.
Args:
infile (str): Input PDF file path.
output_dir (str): Directory path where SVG files will be saved.
"""
document = ap.Document(infile)
try:
options = ap.vector.SvgExtractionOptions()
options.extract_every_subpath_to_svg = True
page = document.pages[1]
extractor = ap.vector.SvgExtractor(options)
extractor.extract(page, output_dir)
finally:
document.close()
Extract a List of Elements to a Single Image
Extract multiple vector elements from a PDF page and save them as a single combined SVG image. This is useful when you want to preserve the visual relationship between grouped shapes, diagrams, or drawing fragments.
- Open the PDF using Document.
- Select a page and prepare a list of vector elements.
- Use SvgExtractor to combine those elements into one SVG.
- Save the output file.
import aspose.pdf as ap
def extract_list_of_elements_to_single_image(infile, outfile):
"""
Extracts multiple vector graphic elements from a PDF page and saves them as a single SVG image.
Args:
infile (str): Path to the input PDF file.
outfile (str): Path to the output SVG file.
"""
document = ap.Document(infile)
try:
page = document.pages[1]
svg_extractor = ap.vector.SvgExtractor()
elements = [] # Fill this list with specific graphic elements as needed
svg_extractor.extract(elements, page, outfile)
finally:
document.close()
Extract single element
Extract one specific vector element from a PDF and save it as an individual SVG file. This is useful for isolating logos, icons, or standalone shapes from more complex vector-based pages.
- Create a GraphicsAbsorber to capture vector data.
- Visit a specific page to collect its vector elements.
- Select a target element, such as an XFormPlacement.
- Save that single element to an SVG file.
import aspose.pdf as ap
def extract_single_vector_element(infile, outfile):
"""
Extracts a specific vector graphic element (e.g., an XFormPlacement) from a PDF page and saves it as an SVG file.
Args:
infile (str): Path to the input PDF file.
outfile (str): Path to the output SVG file.
"""
document = ap.Document(infile)
try:
graphics_absorber = ap.vector.GraphicsAbsorber()
page = document.pages[1]
graphics_absorber.visit(page)
xform_placement = graphics_absorber.elements[1]
if isinstance(xform_placement, ap.vector.XFormPlacement):
xform_placement.elements[2].save_to_svg(outfile)
finally:
document.close()