Count PDF Artifacts in Python
Contents
[
Hide
]
Counting Artifacts of a Particular Type
Inspect and count pagination artifacts in a PDF Document using Aspose.PDF for Python via .NET. Pagination artifacts include elements such as watermarks, backgrounds, headers, and footers that are applied to pages for layout and identification purposes. By filtering Artifact objects on a Page and grouping them by subtype (Artifact.ArtifactSubtype), developers can quickly analyze the document’s structure and verify the presence of specific elements.
To calculate the total count of artifacts of a particular type (for example, the total number of watermarks), use the following code. The example filters the page’s Artifacts collection (an ArtifactCollection) by Artifact.ArtifactType and then counts subtypes (Artifact.ArtifactSubtype).
- Open the PDF document (see
Document). - Filter pagination artifacts using the page’s
Artifactscollection. - Count artifacts by subtype (
Artifact.ArtifactSubtype). - Print results.
from os import path
from collections import Counter
import sys
import aspose.pdf as ap
def count_pdf_artifacts(infile):
"""Count and display artifacts of different types on the first page."""
with ap.Document(infile) as document:
pagination_artifacts = [
artifact
for artifact in document.pages[1].artifacts
if artifact.type == ap.Artifact.ArtifactType.PAGINATION
]
subtypes = [artifact.subtype for artifact in pagination_artifacts]
counts = Counter(subtypes)
print(f"Watermarks: {counts.get(ap.Artifact.ArtifactSubtype.WATERMARK, 0)}")
print(f"Backgrounds: {counts.get(ap.Artifact.ArtifactSubtype.BACKGROUND, 0)}")
print(f"Headers: {counts.get(ap.Artifact.ArtifactSubtype.HEADER, 0)}")
print(f"Footers: {counts.get(ap.Artifact.ArtifactSubtype.FOOTER, 0)}")