Extract Text from PDF File
Contents
[
Hide
]
Extract Text from the Whole PDF File (facades)
pdfExtractor class allows you to extract text from the whole PDF file. You need to create an object of pdfExtractor class and bind the input PDF file using bindPdf method. extractText method helps you extract all the text into the memory. However, in order to get the text, you need to use getText method. The following code snippet shows you how to extract text from the whole PDF file.
public static void ExtractText(Boolean WholeText)
{
// Create an object of the PdfExtractor class
PdfExtractor pdfExtractor = new PdfExtractor();
// Bind the input PDF
pdfExtractor.bindPdf(_dataDir + "sample.pdf");
// ExtractText
pdfExtractor.extractText();
if (!WholeText)
{
pdfExtractor.getText(_dataDir + "sample.txt");
}
else
{
// Extract the text into separate files
int pageNumber = 1;
while (pdfExtractor.hasNextPageText())
{
pdfExtractor.getNextPageText(_dataDir+"/sample"+pageNumber+".txt");
pageNumber++;
}
}
}