Extract Text From Stamps using Python

Extract Text from Stamp Annotations

Aspose.PDF for Python lets you extract text from stamp annotations. In order to extract text from Stamp Annotations in a PDF, the following steps can be used:

Load the PDF Document
Access the First Page
Iterate Through Annotations
Check for Stamp Annotations
Initialize a Text Absorber
Extract Appearance Information
Extract Text from the Appearance Stream
Print the Extracted Text


    import aspose.pdf as apdf
    from io import FileIO
    from os import path
    import json
    from aspose.pycore import cast, is_assignable

    path_infile = path.join(self.dataDir, infile)

    document = apdf.Document(path_infile)
    page = document.pages[1]
    # Get the annotation from the first page (index 0-based in Python)
    for annotation in page.annotations:
        if annotation.annotation_type == apdf.annotations.AnnotationType.STAMP:
            absorber = apdf.text.TextAbsorber()
            xforms = []
            # Get the appearance of the annotation
            if (annotation.appearance.try_get_value('N', xforms)):
                # Extract text from the appearance
                absorber.visit(xforms[0])

                # Print extracted text
                print(absorber.text)

Extract Data from AcroForm using Python