Annotations and Special Text using Java
Contents
[
Hide
]
Extract highlighted text
Iterate through page annotations and read marked text from HighlightAnnotation.
- Open the source PDF Document.
- Iterate through the Annotation objects on the target Page.
- Check whether each annotation is a HighlightAnnotation.
- Read and print the marked text from each highlight annotation.
public static void extractHighlightedText(Path inputFile) {
try (Document document = new Document(inputFile.toString())) {
for (Annotation annotation : document.getPages().get_Item(1).getAnnotations()) {
if (annotation instanceof HighlightAnnotation) {
HighlightAnnotation highlightAnnotation = (HighlightAnnotation) annotation;
System.out.println(highlightAnnotation.getMarkedText());
}
}
}
}
Extract text from stamp annotations
Read the normal appearance stream from a stamp annotation and pass it through TextAbsorber.
- Open the source PDF Document.
- Iterate through the Annotation objects on the target Page.
- Check whether each annotation is a stamp annotation.
- Create a TextAbsorber and get the normal appearance stream from the stamp annotation.
- Visit the appearance XForm and print the extracted text.
public static void extractStampText(Path inputFile) {
try (Document document = new Document(inputFile.toString())) {
for (Annotation annotation : document.getPages().get_Item(1).getAnnotations()) {
if (annotation.getAnnotationType() == AnnotationType.Stamp) {
TextAbsorber absorber = new TextAbsorber();
Object[] xforms = new Object[1];
if (annotation.getAppearance().tryGetValue("N", xforms) && xforms[0] instanceof XForm) {
absorber.visit((XForm) xforms[0]);
System.out.println(absorber.getText());
}
}
}
}
}
Extract superscript and subscript text details
Use TextFragmentAbsorber when you need both the extracted text and the superscript or subscript flags on each fragment.
- Open the source PDF Document.
- Create a TextFragmentAbsorber.
- Visit the target Page and collect the text fragments.
- Iterate through the TextFragment objects and read the text, superscript flag, and subscript flag.
- Write the extracted details to the output file.
public static void extractSuperSubDetails(Path inputFile, Path outputFile, int pageNumber) throws Exception {
try (Document document = new Document(inputFile.toString())) {
TextFragmentAbsorber absorber = new TextFragmentAbsorber();
document.getPages().get_Item(pageNumber).accept(absorber);
StringBuilder details = new StringBuilder();
for (TextFragment fragment : absorber.getTextFragments()) {
details.append("Text: '").append(fragment.getText())
.append("' | Superscript: ").append(fragment.getTextState().isSuperscript())
.append(" | Subscript: ").append(fragment.getTextState().isSubscript())
.append(System.lineSeparator());
}
Files.writeString(outputFile, details.toString());
}
}