Manipulate Tables in existing PDF

Manipulate tables in existing PDF

One of the earliest features supported by Aspose.PDF for Python via .NET is its capabilities of Working with Tables and it provides great support for adding tables in PDF files being generated from scratch or any existing PDF files. In this new release, we have implemented new feature of searching and parsing simple tables that already exist on page of PDF document. A new class named TableAbsorber provides these capabilities. The usage of TableAbsorber is very much similar to existing TextFragmentAbsorber class. The following code snippet shows the steps to update contents in particular table cell.


    import aspose.pdf as ap

    # Load existing PDF file
    pdf_document = ap.Document(input_file)
    # Create TableAbsorber object to find tables
    absorber = ap.text.TableAbsorber()
    # Visit first page with absorber
    absorber.visit(pdf_document.pages[1])
    # Get access to first table on page, their first cell and text fragments in it
    fragment = absorber.table_list[0].row_list[0].cell_list[0].text_fragments[1]
    # Change text of the first text fragment in the cell
    fragment.text = "hi world"
    pdf_document.save(output_file)

Replace old Table with a new one in PDF document

In case you need to find a particular table and replace it with the desired one, you can use replace() the method of TableAbsorber class in order to do that. Following example demonstrate the functionality to replace the table inside PDF document:


    import aspose.pdf as ap

    # Load existing PDF document
    pdf_document = ap.Document(input_file)
    # Create TableAbsorber object to find tables
    absorber = ap.text.TableAbsorber()
    # Visit first page with absorber
    absorber.visit(pdf_document.pages[1])
    # Get first table on the page
    table = absorber.table_list[0]
    # Create new table
    new_table = ap.Table()
    new_table.column_widths = "100 100 100"
    new_table.default_cell_border = ap.BorderInfo(ap.BorderSide.ALL, 1)

    row = new_table.rows.add()
    row.cells.add("Col 1")
    row.cells.add("Col 2")
    row.cells.add("Col 3")

    # Replace the table with new one
    absorber.replace(pdf_document.pages[1], table, new_table)
    # Save document
    pdf_document.save(output_file)

How to determine if table will break in the current page

This code generates a PDF document containing a table, calculates the space available on the page, and checks if adding more rows to the table will lead to a page break based on space constraints. The result is saved to an output file.


    import aspose.pdf as ap

    # Instantiate an object PDF class
    pdf = ap.Document()
    # Add the section to PDF document sections collection
    page = pdf.pages.add()
    # Instantiate a table object
    table1 = ap.Table()
    table1.margin.top = 300
    # Add the table in paragraphs collection of the desired section
    page.paragraphs.add(table1)
    # Set with column widths of the table
    table1.column_widths = "100 100 100"
    # Set default cell border using BorderInfo object
    table1.default_cell_border = ap.BorderInfo(ap.BorderSide.ALL, 0.1)
    # Set table border using another customized BorderInfo object
    table1.border = ap.BorderInfo(ap.BorderSide.ALL, 1)
    # Create MarginInfo object and set its left, bottom, right and top margins
    margin = ap.MarginInfo()
    margin.top = 5
    margin.left = 5
    margin.right = 5
    margin.bottom = 5
    # Set the default cell padding to the MarginInfo object
    table1.default_cell_padding = margin
    # If you increase the counter to 17, table will break
    # Because it cannot be accommodated any more over this page
    for row_counter in range(0, 17):
        # Create rows in the table and then cells in the rows
        row1 = table1.rows.add()
        row1.cells.add("col " + str(row_counter) + ", 1")
        row1.cells.add("col " + str(row_counter) + ", 2")
        row1.cells.add("col " + str(row_counter) + ", 3")
    # Get the Page Height information
    page_height = pdf.page_info.height
    # Get the total height information of Page Top & Bottom margin,
    # Table Top margin and table height.
    total_objects_height = page.page_info.margin.top + page.page_info.margin.bottom + table1.margin.top + \
                           table1.get_height(None)
    # Display Page Height, Table Height, table Top margin and Page Top
    # And Bottom margin information
    print("PDF document Height = " + str(pdf.page_info.height) + "\nTop Margin Info = " + str(page.page_info.margin.top)
          + "\nBottom Margin Info = " + str(page.page_info.margin.bottom) + "\n\nTable-Top Margin Info = "
          + str(table1.margin.top) + "\nAverage Row Height = " + str(table1.rows[0].min_row_height) + " \nTable height "
          + str(table1.get_height(None)) + "\n ----------------------------------------" + "\nTotal Page Height ="
          + str(page_height) + "\nCummulative height including Table =" + str(total_objects_height))
    # Check if we deduct the sum of Page top margin + Page Bottom margin
    # + Table Top margin and table height from Page height and its less
    # Than 10 (an average row can be greater than 10)
    if (page_height - total_objects_height) <= 10:
        # If the value is less than 10, then display the message.
        # Which shows that another row can not be placed and if we add new
        # Row, table will break. It depends upon the row height value.
        print("Page Height - Objects Height < 10, so table will break")
    # Save the pdf document
    pdf.save(output_file)

Add Repeating Column in Table

In the Aspose.Pdf.Table class, you can set a repeating_rows_count that will repeat rows if the table is too long vertically and overflows to the next page. However, in some cases, tables are too wide to fit on a single page and needs to be continued to the next page. In order to serve the purpose, we have implemented repeating_columns_count property in Aspose.Pdf.Table class. Setting this property will cause the table to break to next page column-wise and repeat given column count in the start of the next page. Following code snippet shows the usage of repeating_columns_count property:


    import aspose.pdf as ap

    # Create a new document
    doc = ap.Document()
    page = doc.pages.add()
    # Instantiate an outer table that takes up the entire page
    outer_table = ap.Table()
    outer_table.column_widths = "100%"
    outer_table.horizontal_alignment = ap.HorizontalAlignment.LEFT
    # Instantiate a table object that will be nested inside outerTable that will break inside the same page
    my_table = ap.Table()
    my_table.broken = ap.TableBroken.VERTICAL_IN_SAME_PAGE
    my_table.column_adjustment = ap.ColumnAdjustment.AUTO_FIT_TO_CONTENT
    # Add the outerTable to the page paragraphs
    # Add my table to outerTable
    page.paragraphs.add(outer_table)
    body_row = outer_table.rows.add()
    body_cell = body_row.cells.add()
    body_cell.paragraphs.add(my_table)
    my_table.repeating_columns_count = 5
    page.paragraphs.add(my_table)
    # Add header Row
    row = my_table.rows.add()
    row.cells.add("header 1")
    row.cells.add("header 2")
    row.cells.add("header 3")
    row.cells.add("header 4")
    row.cells.add("header 5")
    row.cells.add("header 6")
    row.cells.add("header 7")
    row.cells.add("header 11")
    row.cells.add("header 12")
    row.cells.add("header 13")
    row.cells.add("header 14")
    row.cells.add("header 15")
    row.cells.add("header 16")
    row.cells.add("header 17")
    for row_counter in range(0, 6):
        # Create rows in the table and then cells in the rows
        row1 = my_table.rows.add()
        row1.cells.add("col " + str(row_counter) + ", 1")
        row1.cells.add("col " + str(row_counter) + ", 2")
        row1.cells.add("col " + str(row_counter) + ", 3")
        row1.cells.add("col " + str(row_counter) + ", 4")
        row1.cells.add("col " + str(row_counter) + ", 5")
        row1.cells.add("col " + str(row_counter) + ", 6")
        row1.cells.add("col " + str(row_counter) + ", 7")
        row1.cells.add("col " + str(row_counter) + ", 11")
        row1.cells.add("col " + str(row_counter) + ", 12")
        row1.cells.add("col " + str(row_counter) + ", 13")
        row1.cells.add("col " + str(row_counter) + ", 14")
        row1.cells.add("col " + str(row_counter) + ", 15")
        row1.cells.add("col " + str(row_counter) + ", 16")
        row1.cells.add("col " + str(row_counter) + ", 17")
    doc.save(output_file)

Extract Table from PDF Document Remove Tables from existing PDF