Extracting raw text from PDF file
Contents
[
Hide
]
Extract Text From All the Pages of a PDF Document
Extracting text from a PDF document is a common requirement. In this example, you’ll see how Aspose.PDF for PHP allows extracting text from all the pages of a PDF document. To extract text from all the PDF pages:
- Create an object of the TextAbsorber class.
- Open the PDF using Document class and call the Accept method of the Pages collection.
- The TextAbsorber class absorbs the text from the document and returns in getText() method.
The following code snippet shows you how to extract text from all pages of PDF document.
// Create a new Document object from the input PDF file.
$document = new Document($inputFile);
// Create a new TextAbsorber object to extract text from the document.
$textAbsorber = new TextAbsorber();
// Extract text from the document.
$textAbsorber->visit($document);
// Get the extracted text content.
$content = $textAbsorber->getText();
// Save the extracted text to the output file.
file_put_contents($outputFile, $content);