Convert PDF to Excel in Python

Overview

This article explains how to convert PDF to Excel formats using Python. It covers the following topics.

Format: XLS

Format: XLSX

Format: Excel

Format: CSV

Format: ODS

PDF to EXCEL conversion via Python

Aspose.PDF for Python via .NET support the feature of converting PDF files to Excel, and CSV formats.

Aspose.PDF for Python via Java is a PDF manipulation component, we have introduced a feature that renders PDF file to Excel workbook (XLSX files). During this conversion, the individual pages of the PDF file are converted to Excel worksheets.

Try to convert PDF to Excel online

Aspose.PDF presents you online free application “PDF to XLSX”, where you may try to investigate the functionality and quality it works.

The following code snippet shows the process for converting PDF file into XLS or XLSX format with Aspose.PDF for Python via Java.

Steps: Convert PDF to XLS in Python

Create an instance of Document object with the source PDF document.
Create an instance of ExcelSaveOptions.
Save it to XLS format specifying .xls extension by calling Document.Save() method and passing it ExcelSaveOptions.




from asposepdf import Api


# init license
documentName = "testdata/license/Aspose.PDF.PythonviaJava.lic"
licenseObject = Api.License()
licenseObject.setLicense(documentName)

# conversion from byte array
documentName = "testdata/source.pdf"
with open(documentName, "rb") as file:
    byte_array = file.read()
doc = Api.Document(byte_array)
documentOutName = "testout/result1.xls"
doc.save(documentOutName, Api.SaveFormat.Excel)

# conversion from file
documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result2.xls"
doc.save(documentOutName, Api.SaveFormat.Excel)


# conversion from byte array
documentName = "testdata/source.pdf"
with open(documentName, "rb") as file:
    byte_array = file.read()
doc = Api.Document(byte_array)
documentOutName = "testout/result3.xls"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003
doc.save(documentOutName, Api.SaveFormat.Excel)

# conversion from file
documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result4.xls"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003
doc.save(documentOutName, Api.SaveFormat.Excel)

Steps: Convert PDF to XLSX in Python

Create an instance of Document object with the source PDF document.
Create an instance of ExcelSaveOptions.
Save it to XLSX format specifying .xlsx extension by calling Document.Save() method and passing it ExcelSaveOptions.


from asposepdf import Api

documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result.xlsx"
doc.save(documentOutName, save_option)

Convert PDF to XLS with control Column

When converting a PDF to XLS format, a blank column is added to the output file as first column. The in ‘ExcelSaveOptions class’ InsertBlankColumnAtFirst option is used to control this column. Its default value is true.


from asposepdf import Api

documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result.xlsx"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003
save_option._insertBlankColumnAtFirst = True
doc.save(documentOutName, save_option)

Convert PDF to Single Excel Worksheet

When exporting a PDF file with a lot of pages to XLS, each page is exported to a different sheet in the Excel file. This is because the MinimizeTheNumberOfWorksheets property is set to false by default. To ensure that all pages are exported to one single sheet in the output Excel file, set the MinimizeTheNumberOfWorksheets property to true.

Steps: Convert PDF to XLS or XLSX Single Worksheet in Python

Create an instance of Document object with the source PDF document.
Create an instance of ExcelSaveOptions with MinimizeTheNumberOfWorksheets = True.
Save it to XLS or XLSX format having single worksheet by calling Document.Save() method and passing it ExcelSaveOptions.


from asposepdf import Api

documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result.xls"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003
save_option._minimizeTheNumberOfWorksheets = True
# Save the file into MS Excel format
doc.save(documentOutName, save_option)

Convert to other spreadsheet formats

Convert to CSV

Conversion to CSV format performs in the same way as above. All is what you need - set the appropriate format.

Steps: Convert PDF to CSV in Python

Create an instance of Document object with the source PDF document.
Create an instance of ExcelSaveOptions with Format = ExcelSaveOptions.ExcelFormat.CSV
Save it to CSV format by calling Document.Save()* method and passing it ExcelSaveOptions.


from asposepdf import Api

documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result.csv"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.CSV
doc.save(documentOutName, save_option)

Convert to ODS

Steps: Convert PDF to ODS in Python

Create an instance of Document object with the source PDF document.
Create an instance of ExcelSaveOptions with Format = ExcelSaveOptions.ExcelFormat.ODS
Save it to ODS format by calling Document.Save() method and passing it ExcelSaveOptions.

Conversion to ODS format performs in the same way as all other formats.


from asposepdf import Api

documentName = "../../testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "../../testout/result1.ods"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.ODS
doc.save(documentOutName, save_option)