Convert PDF to Excel in C++

Overview

This article explains how to convert PDF to Excel formats using C++. It covers the following topics.

Format: XLS

Format: XLSX

Format: Microsoft Excel XLS format

Format: Microsoft Excel XLSX format

Other topics covered by this article

C++ PDF to Excel Conversions

Aspose.PDF for C++ support the feature of converting PDF files to Excel formats.

Aspose.PDF for C++ is a PDF manipulation component, we have introduced a feature that renders PDF file to Excel workbook (XLS files). During this conversion, the individual pages of the PDF file are converted to Excel worksheets.

In order to convert PDF files to XLS format, Aspose.PDF has a class called ExcelSaveOptions. An object of the ExcelSaveOptions class is passed as a second argument to the Document.Save(..) constructor.

The following code snippet shows the process for converting PDF file into XLS format with Aspose.PDF for C++.

Steps: Convert PDF to XLS in C++ | Steps: Convert PDF to Excel XLS format in C++

  1. Create an instance of Document object with the source PDF document.
  2. Save it to XLS format by calling Document->Save() method.
void ConvertPDFtoExcel()
{
    std::clog << __func__ << ": Start" << std::endl;
    // String for path name
    String _dataDir("C:\\Samples\\Conversion\\");

    // String for file name
    String infilename("sample.pdf");
    String outfilename("PDFToExcel.xls");

    // Open document
    auto document = MakeObject<Document>(_dataDir + infilename);

    try {
    // Save the output in XLS format
    document->Save(_dataDir + outfilename, SaveFormat::Excel);
    }
    catch (Exception ex) {
    std::cerr << ex->get_Message();
    }
    std::clog << __func__ << ": Finish" << std::endl;
}

Convert PDF to XLS with Control Column

When converting a PDF to XLS format, a blank column is added to the output file as first column. The in ExcelSaveOptions class’ InsertBlankColumnAtFirst option is used to control this column. Its default value is true.

void ConvertPDFtoExcel_Advanced_InsertBlankColumnAtFirst()
{
    std::clog << __func__ << ": Start" << std::endl;
    // String for path name
    String _dataDir("C:\\Samples\\Conversion\\");

    // String for file name
    String infilename("sample.pdf");
    String outfilename("PDFToExcel.xls");

    // Open document
    auto document = MakeObject<Document>(_dataDir + infilename);

    // Instantiate ExcelSave Option object
    auto excelSave = MakeObject<ExcelSaveOptions>();

    // The in ExcelSaveOptions class' InsertBlankColumnAtFirst option is used to control this column. Its default value is true.
    excelSave->set_InsertBlankColumnAtFirst(false);

    // Save the output in XLS format
    document->Save(outfilename, excelSave);
    std::clog << __func__ << ": Finish" << std::endl;
}

Convert PDF to Single Excel Worksheet

When exporting a PDF file with a lot of pages to XLS, each page is exported to a different sheet in the Excel file. This is because the MinimizeTheNumberOfWorksheets property is set to false by default. To ensure that all pages are exported to one single sheet in the output Excel file, set the MinimizeTheNumberOfWorksheets property to true.

void ConvertPDFtoExcel_Advanced_MinimizeTheNumberOfWorksheets()
{
    std::clog << __func__ << ": Start" << std::endl;
    // String for path name
    String _dataDir("C:\\Samples\\Conversion\\");

    // String for file name
    String infilename("sample.pdf");
    String outfilename("PDFToExcel.xls");

    // Open document
    auto document = MakeObject<Document>(_dataDir + infilename);

    // Instantiate ExcelSave Option object
    auto excelSave = MakeObject<ExcelSaveOptions>();

    excelSave->set_MinimizeTheNumberOfWorksheets(true);

    // Save the output in XLS format
    document->Save(outfilename, excelSave);
    std::clog << __func__ << ": Finish" << std::endl;
}

Convert to XLSX format

By default Aspose.PDF uses XML Spreadsheet 2003 for storing data. In order to convert PDF files to XLSX format, Aspose.PDF has a class called ExcelSaveOptions with ‘Format’. An object of the ExcelSaveOptions class is passed as a second argument to the Save method.

The following code snippet shows the process for converting PDF file into XLSX format.

Steps: Convert PDF to XLSX in C++ | Steps: Convert PDF to Excel XLSX format in C++

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions.
  3. Set the format as ExcelSaveOptions::ExcelFormat::XLSX.
  4. Save it to XLSX format by calling Document->Save() method and passing it instance of ExcelSaveOptions.
void ConvertPDFtoExcel_Advanced_SaveXLSX()
{
    std::clog << __func__ << ": Start" << std::endl;
    // String for path name
    String _dataDir("C:\\Samples\\Conversion\\");

    // String for file name
    String infilename("sample.pdf");
    String outfilename("PDFToExcel.xls");

    // Open document
    auto document = MakeObject<Document>(_dataDir + infilename);

    // Instantiate ExcelSave Option object
    auto excelSave = MakeObject<ExcelSaveOptions>();

    excelSave->set_Format(ExcelSaveOptions::ExcelFormat::XLSX);

    // Save the output in XLS format
    document->Save(outfilename, excelSave);
    std::clog << __func__ << ": Finish" << std::endl;
}

See Also

This article also covers these topics. The codes are same as above.

Format: Microsoft Excel XLS format

Format: Microsoft Excel XLSX format