Extract Text from PDF in Node.js
Contents
[
Hide
]
Extract Text From all the Pages of PDF Document
Extracting text from PDF isn’t easy. Only a few PDF readers can extract text from PDF images or scanned PDFs. But Aspose.PDF for Node.js via C++ tool allows you to easily extract text from all PDF file in the Node.js environment.
This code demonstrates how to use the AsposePDFforNode.js module to extract text from a specified PDF file and log either the extracted text or encountered errors.
Check the code snippets and follow the steps to extract text from your PDF:
CommonJS:
- Call
require
and importasposepdfnodejs
module asAsposePdf
variable. - Specify the name for the PDF file from which the text will be extracted.
- Call
AsposePdf
as Promise and perform the operation for extracting text. Receive the object if successful. - Call the function AsposePdfExtractText.
- Extracted text is stored in the JSON object. Thus, if ‘json.errorCode’ is 0, the extracted text is displayed using console.log. If the json.errorCode parameter is not 0 and, accordingly, an error appears in your file, the error information will be contained in ‘json.errorText’.
const AsposePdf = require('asposepdfnodejs');
const pdf_file = 'Aspose.pdf';
AsposePdf().then(AsposePdfModule => {
/*Extract text from a PDF-file*/
const json = AsposePdfModule.AsposePdfExtractText(pdf_file);
console.log("AsposePdfExtractText => %O", json.errorCode == 0 ? json.extractText : json.errorText);
});
ECMAScript/ES6:
- Import the
asposepdfnodejs
module. - Specify the name for the PDF file from which the text will be extracted.
- Initialize the AsposePdf module. Receive the object if successful.
- Call the function AsposePdfExtractText.
- Extracted text is stored in the JSON object. Thus, if ‘json.errorCode’ is 0, the extracted text is displayed using console.log. If the json.errorCode parameter is not 0 and, accordingly, an error appears in your file, the error information will be contained in ‘json.errorText’.
import AsposePdf from 'asposepdfnodejs';
const AsposePdfModule = await AsposePdf();
const pdf_file = 'Aspose.pdf';
/*Extract text from a PDF-file*/
const json = AsposePdfModule.AsposePdfExtractText(pdf_file);
console.log("AsposePdfExtractText => %O", json.errorCode == 0 ? json.extractText : json.errorText);