Search and Get Text from Pages of a PDF Document

Search and Get Text from Pages of PDF Document

TextFragmentAbsorber allows you to find text, matching a particular phrase, from all pages of a PDF document.

To search text in the whole document, call the Pages collection’s accept() method. The accept() method takes a TextFragmentAbsorber object as a parameter, which returns a collection of TextFragment objects. Loop through all the fragments to get their properties, for example Text, Position, XIndent, YIndent, FontName, FontSize, IsAccessible, IsEmbedded, IsSubset, ForegroundColor etc.

The following code snippet shows how to search an the entire document and display all matches in a console.

To search text on a particular page and get properties associated with it, provide the page index:

Search and Get Text Segments from Pages of PDF

To search text segments on all pages in a document, get a document’s TextFragment objects.

TextFragmentAbsorber allows you to find text, matching a particular phrase, from all the pages in a PDF document. To search text in the whole document, call the Pages|http://www.aspose.com/api/java/pdf/com.aspose.pdf/classes/PageCollection collection’s accept() method. The accept() method takes a TextFragmentAbsorber object as a parameter, which returns a collection of TextFragment objects.

The following code snippet shows how to search text segments on all pages.

To search a specific text segment and get the properties associated, specify the page index for the page you want to search:

Search and Get Text from pages using Regular Expression

TextFragmentAbsorber helps you search and retrieve text from all pages in a document, based on a regular expression.

To search and get text from a document:

  1. Pass the search term as a regular expression to the TextFragmentAbsorber constructor.
  2. Set the TextFragmentAbsorber object’s TextSearchOptions property. This property requires a TextSearchOptions object: pass true to its constructor when creating a new object.
  3. To retrieve matching text from all pages, call the Pages|http://www.aspose.com/api/java/pdf/com.aspose.pdf/classes/PageCollection collection’s accept() method. TextFragmentAbsorber returns a TextFragmentCollection containing all the fragments matching the criteria specified by the regular expression.

The following code snippet shows how to search all pages in a document and get text based on a regular expression.

To search text on a particular page and get its properties, specify the page index:

In order to search a string in either upper case or lowercase, you may consider using regular expression.