Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.
从 PDF 中提取表格并不是一项简单的任务,因为表格可以以多种方式创建。
Aspose.PDF for .NET 提供了一种工具,使检索表格变得简单。要提取表格数据,您应执行以下步骤:
TableList
是一个 AbsorbedTable 的列表。要获取数据,请遍历 TableList
并处理 RowList 和 CellList。以下代码片段也适用于 Aspose.PDF.Drawing 库。
以下示例显示了如何从所有页面提取表格:
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractTable()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Tables();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "input.pdf"))
{
foreach (var page in document.Pages)
{
Aspose.Pdf.TableAbsorber absorber = new Aspose.Pdf.TableAbsorber();
absorber.Visit(page);
foreach (var table in absorber.TableList)
{
Console.WriteLine("Table");
foreach (var row in table.RowList)
{
foreach (var cell in row.CellList)
{
foreach (var fragment in cell.TextFragments)
{
var sb = new StringBuilder();
foreach (var seg in fragment.Segments)
{
sb.Append(seg.Text);
}
Console.Write($"{sb.ToString()}|");
}
}
Console.WriteLine();
}
}
}
}
}
每个吸收的表格都有 Rectangle 属性,描述表格在页面上的位置。
如果您需要提取位于特定区域的表格,您必须使用特定坐标。
以下代码片段也适用于 Aspose.PDF.Drawing 库。
以下示例显示了如何提取带有方形注释的表格:
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractMarkedTable()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Tables();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "input.pdf"))
{
var page = document.Pages[1];
var squareAnnotation =
page.Annotations.FirstOrDefault(ann => ann.AnnotationType == Annotations.AnnotationType.Square)
as Aspose.Pdf.Annotations.SquareAnnotation;
var absorber = new Aspose.Pdf.Text.TableAbsorber();
absorber.Visit(page);
foreach (var table in absorber.TableList)
{
var isInRegion = (squareAnnotation.Rect.LLX < table.Rectangle.LLX) &&
(squareAnnotation.Rect.LLY < table.Rectangle.LLY) &&
(squareAnnotation.Rect.URX > table.Rectangle.URX) &&
(squareAnnotation.Rect.URY > table.Rectangle.URY);
if (isInRegion)
{
foreach (var row in table.RowList)
{
foreach (var cell in row.CellList)
{
foreach (var fragment in cell.TextFragments)
{
var sb = new StringBuilder();
foreach (var seg in fragment.Segments)
{
sb.Append(seg.Text);
}
var text = sb.ToString();
Console.Write($"{text}|");
}
}
Console.WriteLine();
}
}
}
}
}
以下示例显示了如何提取表格并将其存储为 CSV 文件。 要查看如何将 PDF 转换为 Excel 电子表格,请参阅 将 PDF 转换为 Excel 文章。
以下代码片段也适用于 Aspose.PDF.Drawing 库。
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractTableSaveExcel()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Tables();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "input.pdf"))
{
// Instantiate ExcelSave Option object
Aspose.Pdf.ExcelSaveOptions excelSave = new Aspose.Pdf.ExcelSaveOptions { Format = ExcelSaveOptions.ExcelFormat.CSV };
// Save the output in XLS format
document.Save(dataDir + "ExtractTableSaveXLS_out.xlsx", excelSave);
}
}
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.