Extract Text from PDF File

Extract Text from the Whole PDF File (facades)

pdfExtractor class allows you to extract text from the whole PDF file. You need to create an object of pdfExtractor class and bind the input PDF file using bindPdf method. extractText method helps you extract all the text into the memory. However, in order to get the text, you need to use getText method. The following code snippet shows you how to extract text from the whole PDF file.


    public static void ExtractText(Boolean WholeText)
    {            
        // Create an object of the PdfExtractor class
        PdfExtractor pdfExtractor = new PdfExtractor();

        // Bind the input PDF
        pdfExtractor.bindPdf(_dataDir + "sample.pdf");

        // ExtractText
        pdfExtractor.extractText();

        if (!WholeText)
        {
            pdfExtractor.getText(_dataDir + "sample.txt");
        }
        else
        {
            // Extract the text into separate files
            int pageNumber = 1;
            while (pdfExtractor.hasNextPageText())
            {
                pdfExtractor.getNextPageText(_dataDir+"/sample"+pageNumber+".txt");
                pageNumber++;
            }
        }
    }