Extract Text From Stamps using Python
Contents
[
Hide
]
Extract Text from Stamp Annotations
Aspose.PDF for Python lets you extract text from stamp annotations. In order to extract text from Stamp Annotations in a PDF, the following steps can be used:
- Load the PDF Document
- Access the First Page
- Iterate Through Annotations
- Check for Stamp Annotations
- Initialize a Text Absorber
- Extract Appearance Information
- Extract Text from the Appearance Stream
- Print the Extracted Text
import aspose.pdf as apdf
from io import FileIO
from os import path
import json
from aspose.pycore import cast, is_assignable
path_infile = path.join(self.dataDir, infile)
document = apdf.Document(path_infile)
page = document.pages[1]
# Get the annotation from the first page (index 0-based in Python)
for annotation in page.annotations:
if annotation.annotation_type == apdf.annotations.AnnotationType.STAMP:
absorber = apdf.text.TextAbsorber()
xforms = []
# Get the appearance of the annotation
if (annotation.appearance.try_get_value('N', xforms)):
# Extract text from the appearance
absorber.visit(xforms[0])
# Print extracted text
print(absorber.text)