Extract Text from PDF in Node.js

Extract Text From all the Pages of PDF Document

Extracting text from PDF isn’t easy. Only a few PDF readers can extract text from PDF images or scanned PDFs. But Aspose.PDF for Node.js via C++ tool allows you to easily extract text from all PDF file in the Node.js environment.

This code demonstrates how to use the AsposePDFforNode.js module to extract text from a specified PDF file and log either the extracted text or encountered errors.

Check the code snippets and follow the steps to extract text from your PDF:

CommonJS:

  1. Call require and import asposepdfnodejs module as AsposePdf variable.
  2. Specify the name for the PDF file from which the text will be extracted.
  3. Call AsposePdf as Promise and perform the operation for extracting text. Receive the object if successful.
  4. Call the function AsposePdfExtractText.
  5. Extracted text is stored in the JSON object. Thus, if ‘json.errorCode’ is 0, the extracted text is displayed using console.log. If the json.errorCode parameter is not 0 and, accordingly, an error appears in your file, the error information will be contained in ‘json.errorText’.

  const AsposePdf = require('asposepdfnodejs');
  const pdf_file = 'Aspose.pdf';
  AsposePdf().then(AsposePdfModule => {
      /*Extract text from a PDF-file*/
      const json = AsposePdfModule.AsposePdfExtractText(pdf_file);
      console.log("AsposePdfExtractText => %O", json.errorCode == 0 ? json.extractText : json.errorText);
  });

ECMAScript/ES6:

  1. Import the asposepdfnodejs module.
  2. Specify the name for the PDF file from which the text will be extracted.
  3. Initialize the AsposePdf module. Receive the object if successful.
  4. Call the function AsposePdfExtractText.
  5. Extracted text is stored in the JSON object. Thus, if ‘json.errorCode’ is 0, the extracted text is displayed using console.log. If the json.errorCode parameter is not 0 and, accordingly, an error appears in your file, the error information will be contained in ‘json.errorText’.

  import AsposePdf from 'asposepdfnodejs';
  const AsposePdfModule = await AsposePdf();
  const pdf_file = 'Aspose.pdf';
  /*Extract text from a PDF-file*/
  const json = AsposePdfModule.AsposePdfExtractText(pdf_file);
  console.log("AsposePdfExtractText => %O", json.errorCode == 0 ? json.extractText : json.errorText);