Extract Tables from PDF in Java
Contents
[
Hide
]
Use TableAbsorber when you need to detect table structures in an existing PDF and read their content.
Extract text from detected tables
Use this example when you need to locate tables on each page and collect their cell text.
- Open the source PDF Document.
- Visit each page with TableAbsorber.
- Iterate through absorbed tables, rows, and cells, then output the extracted text.
public static void extract(Path inputFile) {
try (Document document = new Document(inputFile.toString())) {
for (Page page : document.getPages()) {
TableAbsorber absorber = new TableAbsorber();
absorber.visit(page);
for (AbsorbedTable table : absorber.getTableList()) {
System.out.println("Table ----");
for (AbsorbedRow row : table.getRowList()) {
System.out.println("Row:");
StringBuilder rowText = new StringBuilder();
for (AbsorbedCell cell : row.getCellList()) {
StringBuilder cellText = new StringBuilder();
for (TextFragment fragment : cell.getTextFragments()) {
for (TextSegment segment : fragment.getSegments()) {
cellText.append(segment.getText());
}
}
rowText.append(" | ").append(cellText);
}
System.out.println(rowText);
}
}
}
}
}