Extract Text From Stamps using Python

Extract Text from Stamp Annotations

Aspose.PDF for Python lets you extract text from stamp annotations. In order to extract text from Stamp Annotations in a PDF, the following steps can be used:

  1. Load the PDF Document
  2. Access the First Page
  3. Iterate Through Annotations
  4. Check for Stamp Annotations
  5. Initialize a Text Absorber
  6. Extract Appearance Information
  7. Extract Text from the Appearance Stream
  8. Print the Extracted Text

    import aspose.pdf as apdf
    from io import FileIO
    from os import path
    import json
    from aspose.pycore import cast, is_assignable

    path_infile = path.join(self.dataDir, infile)

    document = apdf.Document(path_infile)
    page = document.pages[1]
    # Get the annotation from the first page (index 0-based in Python)
    for annotation in page.annotations:
        if annotation.annotation_type == apdf.annotations.AnnotationType.STAMP:
            absorber = apdf.text.TextAbsorber()
            xforms = []
            # Get the appearance of the annotation
            if (annotation.appearance.try_get_value('N', xforms)):
                # Extract text from the appearance
                absorber.visit(xforms[0])

                # Print extracted text
                print(absorber.text)