Convert PDF to Excel in Python
Overview
This article explains how to convert PDF to Excel formats using Python. It covers the following topics.
Format: XLS
Format: XLSX
Format: Excel
Format: CSV
Format: ODS
PDF to EXCEL conversion via Python
Aspose.PDF for Python via .NET support the feature of converting PDF files to Excel, and CSV formats.
Aspose.PDF for Python via Java is a PDF manipulation component, we have introduced a feature that renders PDF file to Excel workbook (XLSX files). During this conversion, the individual pages of the PDF file are converted to Excel worksheets.
Try to convert PDF to Excel online
Aspose.PDF presents you online free application “PDF to XLSX”, where you may try to investigate the functionality and quality it works.
The following code snippet shows the process for converting PDF file into XLS or XLSX format with Aspose.PDF for Python via Java.
Steps: Convert PDF to XLS in Python
- Create an instance of Document object with the source PDF document.
- Create an instance of ExcelSaveOptions.
- Save it to XLS format specifying .xls extension by calling Document.Save() method and passing it ExcelSaveOptions.
from asposepdf import Api
# init license
documentName = "testdata/license/Aspose.PDF.PythonviaJava.lic"
licenseObject = Api.License()
licenseObject.setLicense(documentName)
# conversion from byte array
documentName = "testdata/source.pdf"
with open(documentName, "rb") as file:
byte_array = file.read()
doc = Api.Document(byte_array)
documentOutName = "testout/result1.xls"
doc.save(documentOutName, Api.SaveFormat.Excel)
# conversion from file
documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result2.xls"
doc.save(documentOutName, Api.SaveFormat.Excel)
# conversion from byte array
documentName = "testdata/source.pdf"
with open(documentName, "rb") as file:
byte_array = file.read()
doc = Api.Document(byte_array)
documentOutName = "testout/result3.xls"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003
doc.save(documentOutName, Api.SaveFormat.Excel)
# conversion from file
documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result4.xls"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003
doc.save(documentOutName, Api.SaveFormat.Excel)
Steps: Convert PDF to XLSX in Python
- Create an instance of Document object with the source PDF document.
- Create an instance of ExcelSaveOptions.
- Save it to XLSX format specifying .xlsx extension by calling Document.Save() method and passing it ExcelSaveOptions.
from asposepdf import Api
documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result.xlsx"
doc.save(documentOutName, save_option)
Convert PDF to XLS with control Column
When converting a PDF to XLS format, a blank column is added to the output file as first column. The in ‘ExcelSaveOptions class’ InsertBlankColumnAtFirst option is used to control this column. Its default value is true.
from asposepdf import Api
documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result.xlsx"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003
save_option._insertBlankColumnAtFirst = True
doc.save(documentOutName, save_option)
Convert PDF to Single Excel Worksheet
When exporting a PDF file with a lot of pages to XLS, each page is exported to a different sheet in the Excel file. This is because the MinimizeTheNumberOfWorksheets property is set to false by default. To ensure that all pages are exported to one single sheet in the output Excel file, set the MinimizeTheNumberOfWorksheets property to true.
Steps: Convert PDF to XLS or XLSX Single Worksheet in Python
- Create an instance of Document object with the source PDF document.
- Create an instance of ExcelSaveOptions with MinimizeTheNumberOfWorksheets = True.
- Save it to XLS or XLSX format having single worksheet by calling Document.Save() method and passing it ExcelSaveOptions.
from asposepdf import Api
documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result.xls"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.XMLSpreadSheet2003
save_option._minimizeTheNumberOfWorksheets = True
# Save the file into MS Excel format
doc.save(documentOutName, save_option)
Convert to other spreadsheet formats
Convert to CSV
Conversion to CSV format performs in the same way as above. All is what you need - set the appropriate format.
Steps: Convert PDF to CSV in Python
- Create an instance of Document object with the source PDF document.
- Create an instance of ExcelSaveOptions with Format = ExcelSaveOptions.ExcelFormat.CSV
- Save it to CSV format by calling Document.Save()* method and passing it ExcelSaveOptions.
from asposepdf import Api
documentName = "testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/result.csv"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.CSV
doc.save(documentOutName, save_option)
Convert to ODS
Steps: Convert PDF to ODS in Python
- Create an instance of Document object with the source PDF document.
- Create an instance of ExcelSaveOptions with Format = ExcelSaveOptions.ExcelFormat.ODS
- Save it to ODS format by calling Document.Save() method and passing it ExcelSaveOptions.
Conversion to ODS format performs in the same way as all other formats.
from asposepdf import Api
documentName = "../../testdata/source.pdf"
doc = Api.Document(documentName)
documentOutName = "../../testout/result1.ods"
save_option = Api.ExcelSaveOptions()
save_option._format = Api.ExcelSaveOptions.ExcelFormat.ODS
doc.save(documentOutName, save_option)
See Also
This article also covers these topics. The codes are same as above.
Format: Excel
- Python PDF to Excel Code
- Python PDF to Excel API
- Python PDF to Excel Programmatically
- Python PDF to Excel Library
- Python Save PDF as Excel
- Python Generate Excel from PDF
- Python Create Excel from PDF
- Python PDF to Excel Converter
Format: XLS
- Python PDF to XLS Code
- Python PDF to XLS API
- Python PDF to XLS Programmatically
- Python PDF to XLS Library
- Python Save PDF as XLS
- Python Generate XLS from PDF
- Python Create XLS from PDF
- Python PDF to XLS Converter
Format: XLSX
- Python PDF to XLSX Code
- Python PDF to XLSX API
- Python PDF to XLSX Programmatically
- Python PDF to XLSX Library
- Python Save PDF as XLSX
- Python Generate XLSX from PDF
- Python Create XLSX from PDF
- Python PDF to XLSX Converter
Format: CSV
- Python PDF to CSV Code
- Python PDF to CSV API
- Python PDF to CSV Programmatically
- Python PDF to CSV Library
- Python Save PDF as CSV
- Python Generate CSV from PDF
- Python Create CSV from PDF
- Python PDF to CSV Converter
Format: ODS