Extracting raw text from PDF file

Extract Text From All the Pages of a PDF Document

Extracting text from a PDF document is a common requirement. In this example, you’ll see how Aspose.PDF for PHP allows extracting text from all the pages of a PDF document. To extract text from all the PDF pages:

Create an object of the TextAbsorber class.
Open the PDF using Document class and call the Accept method of the Pages collection.
The TextAbsorber class absorbs the text from the document and returns in getText() method.

The following code snippet shows you how to extract text from all pages of PDF document.


    // Create a new Document object from the input PDF file.
    $document = new Document($inputFile);

    // Create a new TextAbsorber object to extract text from the document.
    $textAbsorber = new TextAbsorber();

    // Extract text from the document.
    $textAbsorber->visit($document);

    // Get the extracted text content.
    $content = $textAbsorber->getText();

    // Save the extracted text to the output file.
    file_put_contents($outputFile, $content);

Extract Paragraph from PDF