Convert PDF to Excel in .NET

Overview

This article explains how to convert PDF to Excel formats using C#. It covers the following topics.

The following code snippet also work with Aspose.PDF.Drawing library.

Format: XLS

Format: XLSX

Format: Excel

Format: Single Excel Worksheet

Format: XML Spreadsheet 2003 format

Format: CSV

Format: ODS

C# PDF to Excel Conversions

Aspose.PDF for .NET support the feature of converting PDF files to Excel 2007, CSV and SpeadsheetML formats.

Aspose.PDF for .NET is a PDF manipulation component, we have introduced a feature that renders PDF file to Excel workbook (XLSX files). During this conversion, the individual pages of the PDF file are converted to Excel worksheets.

In order to convert PDF files to XLSX format, Aspose.PDF has a class called ExcelSaveOptions. An object of the ExcelSaveOptions class is passed as a second argument to the Document.Save(..) constructor.

The following code snippet shows the process for converting PDF file into XLS or XLSX format with Aspose.PDF for .NET.

Steps: Convert PDF to XLS in C#

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions.
  3. Save it to XLS format specifying .xls extension by calling Document.Save() method and passing it ExcelSaveOptions

Steps: Convert PDF to XLSX in C#

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions.
  3. Save it to XLSX format specifying .xlsx extension by calling Document.Save() method and passing it ExcelSaveOptions
// For complete examples and data files, please go to https://github.com/aspose-pdf/Aspose.PDF-for-.NET
// The path to the documents directory.
string dataDir = RunExamples.GetDataDir_AsposePdf_DocumentConversion();

// Load PDF document
Document pdfDocument = new Document(dataDir + "input.pdf");

// Instantiate ExcelSave Option object
Aspose.Pdf.ExcelSaveOptions excelsave = new ExcelSaveOptions();

// Save the output in XLS format
pdfDocument.Save("PDFToXLS_out.xlsx", excelsave);

Convert PDF to XLS with Control Column

When converting a PDF to XLS format, a blank column is added to the output file as first column. The in ExcelSaveOptions class’ InsertBlankColumnAtFirst option is used to control this column. The default value is false, which means that blank columns will not be inserted.

public static void ConvertPDFtoExcelAdvanced_InsertBlankColumnAtFirst()
{
    // For complete examples and data files, please go to https://github.com/aspose-pdf/Aspose.PDF-for-.NET
    // Load PDF document
    Document pdfDocument = new Document(_dataDir + "input.pdf");
    // Instantiate ExcelSave Option object
    Aspose.Pdf.ExcelSaveOptions excelsave = new ExcelSaveOptions {InsertBlankColumnAtFirst = false};
    // Save the output in XLS format
    pdfDocument.Save("PDFToXLS_out.xlsx", excelsave);
}

Convert PDF to Single Excel Worksheet

When exporting a PDF file with a lot of pages to XLS, each page is exported to a different sheet in the Excel file. This is because the MinimizeTheNumberOfWorksheets property is set to false by default. To ensure that all pages are exported to one single sheet in the output Excel file, set the MinimizeTheNumberOfWorksheets property to true.

Steps: Convert PDF to XLS or XLSX Single Worksheet in C#

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions with MinimizeTheNumberOfWorksheets = true.
  3. Save it to XLS or XLSX format having single worksheet by calling Document.Save() method and passing it ExcelSaveOptions.
public static void ConvertPDFtoExcelAdvanced_MinimizeTheNumberOfWorksheets()
{
    // For complete examples and data files, please go to https://github.com/aspose-pdf/Aspose.PDF-for-.NET
    // Load PDF document
    Document pdfDocument = new Document(_dataDir + "input.pdf");

    // Instantiate ExcelSave Option object
    Aspose.Pdf.ExcelSaveOptions excelsave = new ExcelSaveOptions {MinimizeTheNumberOfWorksheets = true};
    // Save the output in XLS format
    pdfDocument.Save("PDFToXLS_out.xlsx", excelsave);
}

Convert to other spreadsheet formats

Convert to XML Spreadsheet 2003 format

Since version 20.8 Aspose.PDF uses Microsoft Excel Open XML Spreadsheet 2007 file format as default for storing data. In order to convert PDF files to XML Spreadsheet 2003 format, Aspose.PDF has a class called ExcelSaveOptions with Format. An object of the ExcelSaveOptions class is passed as a second argument to the Document.Save(..) method.

The following code snippet shows the process for converting PDF file into XLS Excel 2003 XML format.

Steps: Convert PDF to Excel 2003 XML Format in C#

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions with Format = ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003
  3. Save it to XLS - Excel 2003 XML Format format by calling Document.Save() method and passing it ExcelSaveOptions.
public static void ConvertPDFtoExcelAdvanced_SaveXLS2003()
{
    // For complete examples and data files, please go to https://github.com/aspose-pdf/Aspose.PDF-for-.NET

    // Load PDF document
    Document pdfDocument = new Document(_dataDir + "input.pdf");

    // Instantiate ExcelSave Option object
    ExcelSaveOptions excelSave = new ExcelSaveOptions { Format = ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003 };

    // Save the output in XLS format
    pdfDocument.Save("PDFToXLS_out.xls", excelSave);
}

Convert to CSV

Conversion to CSV format performs in the same way as above. All is what you need - set the appropriate format.

Steps: Convert PDF to CSV in C#

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions with Format = ExcelSaveOptions.ExcelFormat.CSV
  3. Save it to CSV format by calling Document.Save() method and passing it ExcelSaveOptions.
// Instantiate ExcelSave Option object
ExcelSaveOptions excelSave = new ExcelSaveOptions { Format = ExcelSaveOptions.ExcelFormat.CSV };

Convert to ODS

Steps: Convert PDF to ODS in C#

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions with Format = ExcelSaveOptions.ExcelFormat.ODS
  3. Save it to ODS format by calling Document.Save() method and passing it ExcelSaveOptions.

Conversion to ODS format performs in the same way as all other formats.

// Instantiate ExcelSave Option object
ExcelSaveOptions excelSave = new ExcelSaveOptions { Format = ExcelSaveOptions.ExcelFormat.ODS };

See Also

This article also covers these topics. The codes are same as above.

Format: Excel

Format: XLS

Format: XLSX

Format: CSV

Format: ODS