Convert PDF to Excel in Python

PDF to EXCEL conversion via Python

Aspose.PDF for Python via .NET support the feature of converting PDF files to Excel, and CSV formats.

Aspose.PDF for Python via .NET is a PDF manipulation component, we have introduced a feature that renders PDF file to Excel workbook (XLSX files). During this conversion, the individual pages of the PDF file are converted to Excel worksheets.

The following code snippet shows the process for converting PDF file into XLS or XLSX format with Aspose.PDF for Python via .NET.

Steps: Convert a PDF file to an Excel (XML Spreadsheet 2003) format

  1. Load the PDF document.
  2. Set up Excel save options using ExcelSaveOptions.
  3. Save the converted file.

    from os import path
    import aspose.pdf as apdf

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = apdf.Document(path_infile)
    save_options = apdf.ExcelSaveOptions()
    save_options.format = apdf.ExcelSaveOptions.ExcelFormat.XML_SPREAD_SHEET2003
    document.save(path_outfile, save_options)

    print(infile + " converted into " + outfile)

Steps: Convert a PDF file to an XLSX format (Excel 2007+)

  1. Load the PDF document.
  2. Set up Excel save options using ExcelSaveOptions.
  3. Save the converted file.

    from os import path
    import aspose.pdf as apdf

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = apdf.Document(path_infile)
    save_options = apdf.ExcelSaveOptions()
    save_options.format = apdf.ExcelSaveOptions.ExcelFormat.XLSX
    document.save(path_outfile, save_options)

    print(infile + " converted into " + outfile)

Convert PDF to XLS with control Column

When converting a PDF to XLS format, a blank column is added to the output file as first column. The in ‘ExcelSaveOptions class’ ‘insert_blank_column_at_first’ option is used to control this column. Its default value is true.


    from os import path
    import aspose.pdf as apdf

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = apdf.Document(path_infile)
    save_options = apdf.ExcelSaveOptions()
    save_options.format = apdf.ExcelSaveOptions.ExcelFormat.XLSX
    save_options.insert_blank_column_at_first = True
    document.save(path_outfile, save_options)

    print(infile + " converted into " + outfile)

Convert PDF to Single Excel Worksheet

Aspose.PDF for Python via .NET shows how to convert a PDF to an Excel (.xlsx) file, with the ‘minimize_the_number_of_worksheets’ option enabled.

Steps: Convert PDF to XLS or XLSX Single Worksheet in Python

  1. Load the PDF document.
  2. Set up Excel save options using ExcelSaveOptions.
  3. The ‘minimize_the_number_of_worksheets’ option reduces the number of Excel sheets by combining PDF pages into fewer worksheets (e.g., one worksheet for the entire document if possible).
  4. Save the converted file.

    from os import path
    import aspose.pdf as apdf

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = apdf.Document(path_infile)
    save_options = apdf.ExcelSaveOptions()
    save_options.format = apdf.ExcelSaveOptions.ExcelFormat.XLSX
    save_options.minimize_the_number_of_worksheets = True
    document.save(path_outfile, save_options)

    print(infile + " converted into " + outfile)

Convert PDF file into an Excel file in XLSM format

This Python example shows how to convert a PDF file into an Excel file in XLSM format (Excel Macro-Enabled Workbook).


    from os import path
    import aspose.pdf as apdf

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = apdf.Document(path_infile)
    save_options = apdf.ExcelSaveOptions()
    save_options.format = apdf.ExcelSaveOptions.ExcelFormat.XLSM
    document.save(path_outfile, save_options)
    print(infile + " converted into " + outfile)

Convert to other spreadsheet formats

Convert to CSV

The ‘convert_pdf_to_excel_2007_csv’ function performs the same operation as before, but this time the target format is CSV (Comma-Separated Values) instead of XLSM.

Steps: Convert PDF to CSV in Python

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions with ExcelSaveOptions.ExcelFormat.CSV
  3. Save it to CSV format by calling save()* method and passing it ExcelSaveOptions.

from os import path
import aspose.pdf as apdf

def convert_pdf_to_excel_2007_csv(infile, outfile):
    path_infile = path.join(data_dir, infile)
    path_outfile = path.join(data_dir, "python", outfile)

    document = apdf.Document(path_infile)
    save_options = apdf.ExcelSaveOptions()
    save_options.format = apdf.ExcelSaveOptions.ExcelFormat.CSV
    document.save(path_outfile, save_options)

    print(infile + " converted into " + outfile)

Convert to ODS

Steps: Convert PDF to ODS in Python

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions with ExcelSaveOptions.ExcelFormat.ODS
  3. Save it to ODS format by calling save() method and passing it ExcelSaveOptions.

Conversion to ODS format performs in the same way as all other formats.


    from os import path
    import aspose.pdf as apdf
    
    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = apdf.Document(path_infile)
    save_options = apdf.ExcelSaveOptions()
    save_options.format = apdf.ExcelSaveOptions.ExcelFormat.ODS
    document.save(path_outfile, save_options)

    print(infile + " converted into " + outfile)