Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.
在本文中,我们将详细探讨如何从 PDF 文件中提取文本。所有这些提取功能都集中在 PdfExtractor 类中。我们将看到如何在代码中使用这些功能。
PdfExtractor 类提供三种类型的提取能力。这三类分别是文本、图像和附件。为了在这三类中进行提取,PdfExtractor 提供了各种方法,这些方法协同工作以提供最终输出。
例如,为了提取文本,您可以使用三种方法,即 ExtractText, GetText, HasNextPageText 和 GetNextPageText。现在,为了开始提取文本,首先需要调用 ExtractText 方法;这将从 PDF 文件中提取文本并将其存储在内存中。之后,GetText 方法将提取的文本保存到指定位置的文件中。HasNextPageText 帮助您循环遍历每一页并检查下一页是否有文本。如果包含一些文本,则 GetNextPageText 将帮助您将单个页面的文本保存到文件中。
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractText()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
bool wholeText = true;
// Create an object of the PdfExtractor class
using (var pdfExtractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
pdfExtractor.BindPdf(dataDir + "sample.pdf");
// ExtractText
pdfExtractor.ExtractText();
if (!wholeText)
{
pdfExtractor.GetText(dataDir + "sample.txt");
}
else
{
// Extract the text into separate files
int pageNumber = 1;
while (pdfExtractor.HasNextPageText())
{
pdfExtractor.GetNextPageText($"{dataDir}\\sample{pageNumber:D3}.txt");
pageNumber++;
}
}
}
}
要提取文本提取模式,请使用以下代码:
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractTextExtractonMode()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
bool wholeText = true;
// Create an object of the PdfExtractor class
using (var pdfExtractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
pdfExtractor.BindPdf(dataDir + "ExtractTextExtractonMode.pdf");
// ExtractText
// pdfExtractor.ExtractTextMode = 0; // pure mode
pdfExtractor.ExtractTextMode = 1; // raw mode
pdfExtractor.ExtractText();
if (!wholeText)
{
pdfExtractor.GetText(dataDir + "ExtractTextExtractonMode_out.txt");
}
else
{
// Extract the text into separate files
int pageNumber = 1;
while (pdfExtractor.HasNextPageText())
{
pdfExtractor.GetNextPageText($"{dataDir}\\sample{pageNumber:D3}.txt");
pageNumber++;
}
}
}
}
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.