Extract Tables from PDF in Node.js

Extract tables while converting PDF to CSV files

Convert PDF to CSV

If there are tables in PDF then they are saved in separate CSV files. In case you want to convert PDF document, you can use AsposePdfTablesToCSV function. Please check following code snippet in order to convert PDF file in Node.js environment.

CommonJS:

  1. Call require and import asposepdfnodejs module as AsposePdf variable.
  2. Specify the name of the PDF file that will be converted.
  3. Call AsposePdf as Promise and perform the operation for converting file. Receive the object if successful.
  4. Call the function AsposePdfTablesToCSV.
  5. Convert PDF file. Thus, if ‘json.errorCode’ is 0, the result of the operation is saved in “ResultPDFtoXlsX.xlsx”. If the json.errorCode parameter is not 0 and, accordingly, an error appears in your file, the error information will be contained in ‘json.errorText’.

  const AsposePdf = require('asposepdfnodejs');
  const pdf_file = 'Aspose.pdf';
  AsposePdf().then(AsposePdfModule => {
      /*Convert a PDF-file to CSV (extract tables) with template "ResultPdfTablesToCSV{0:D2}.csv" ({0}, {0:D2}, {0:D3}, ... format page number), TAB as delimiter and save*/
      const json = AsposePdfModule.AsposePdfTablesToCSV(pdf_file, "ResultPdfTablesToCSV{0:D2}.csv", "\t");
      console.log("AsposePdfTablesToCSV => %O", json.errorCode == 0 ? json.filesNameResult : json.errorText);
  });

ECMAScript/ES6:

  1. Import the asposepdfnodejs module.
  2. Specify the name of the PDF file that will be converted.
  3. Initialize the AsposePdf module. Receive the object if successful.
  4. Call the function AsposePdfTablesToCSV.
  5. Convert PDF file. Thus, if ‘json.errorCode’ is 0, the result of the operation is saved in “ResultPDFtoXlsX.xlsx”. If the json.errorCode parameter is not 0 and, accordingly, an error appears in your file, the error information will be contained in ‘json.errorText’.

  import AsposePdf from 'asposepdfnodejs';
  const AsposePdfModule = await AsposePdf();
  const pdf_file = 'Aspose.pdf';
  /*Convert a PDF-file to CSV (extract tables) with template "ResultPdfTablesToCSV{0:D2}.csv" ({0}, {0:D2}, {0:D3}, ... format page number), TAB as delimiter and save*/
  const json = AsposePdfModule.AsposePdfTablesToCSV(pdf_file, "ResultPdfTablesToCSV{0:D2}.csv", "\t");
  console.log("AsposePdfTablesToCSV => %O", json.errorCode == 0 ? json.filesNameResult : json.errorText);