Count PDF Artifacts in Python

Counting Artifacts of a Particular Type

Inspect and count pagination artifacts in a PDF Document using Aspose.PDF for Python via .NET. Pagination artifacts include elements such as watermarks, backgrounds, headers, and footers that are applied to pages for layout and identification purposes. By filtering Artifact objects on a Page and grouping them by subtype (Artifact.ArtifactSubtype), developers can quickly analyze the document’s structure and verify the presence of specific elements.

To calculate the total count of artifacts of a particular type (for example, the total number of watermarks), use the following code. The example filters the page’s Artifacts collection (an ArtifactCollection) by Artifact.ArtifactType and then counts subtypes (Artifact.ArtifactSubtype).

  1. Open the PDF document (see Document).
  2. Filter pagination artifacts using the page’s Artifacts collection.
  3. Count artifacts by subtype (Artifact.ArtifactSubtype).
  4. Print results.

from os import path
from collections import Counter
import sys
import aspose.pdf as ap

def count_pdf_artifacts(infile):
    """Count and display artifacts of different types on the first page."""
    with ap.Document(infile) as document:
        pagination_artifacts = [
            artifact
            for artifact in document.pages[1].artifacts
            if artifact.type == ap.Artifact.ArtifactType.PAGINATION
        ]

        subtypes = [artifact.subtype for artifact in pagination_artifacts]
        counts = Counter(subtypes)

        print(f"Watermarks: {counts.get(ap.Artifact.ArtifactSubtype.WATERMARK, 0)}")
        print(f"Backgrounds: {counts.get(ap.Artifact.ArtifactSubtype.BACKGROUND, 0)}")
        print(f"Headers: {counts.get(ap.Artifact.ArtifactSubtype.HEADER, 0)}")
        print(f"Footers: {counts.get(ap.Artifact.ArtifactSubtype.FOOTER, 0)}")