Extract Images using PdfExtractor

Extract Images from the Whole PDF to Files (Facades)

PdfExtractor class allows you to extract images from a PDF file. First off, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. After that, call ExtractImage method to extract all the images into memory. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. In order to save the images to disk, you can call the overload of the GetNextImage method which takes file path as argument. The following code snippet shows you how to extract images from the whole PDF to files.

// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesWholePDF()
{
    // The path to the documents directory
    var dataDir = RunExamples.GetDataDir_AsposePdf_Images();

    // Open PDF document
    using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
    {
        // Bind PDF document
        extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");

        // Extract all the images
        extractor.ExtractImage();

        // Get all the extracted images
        while (extractor.HasNextImage())
        {
            extractor.GetNextImage(dataDir + DateTime.Now.Ticks.ToString() + "_out.jpg");
        }
    }
}

Extract Images from the Whole PDF to Streams (Facades)

PdfExtractor class allows you to extract images from a PDF file into streams. First off, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. After that, call ExtractImage method to extract all the images into memory. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. In order to save the images to stream, you can call the overload of the GetNextImage method which takes Stream as argument. The following code snippet shows you how to extract images from the whole PDF to streams.

// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesWholePDFStreams()
{
    // The path to the documents directory
    var dataDir = RunExamples.GetDataDir_AsposePdf_Images();

    // Open PDF document
    using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
    {
        // Bind PDF document
        extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");

        // Extract images
        extractor.ExtractImage();
        // Get all the extracted images
        while (extractor.HasNextImage())
        {
            // Read image into memory stream
            MemoryStream memoryStream = new MemoryStream();
            extractor.GetNextImage(memoryStream);

            // Write to disk, if you like, or use it otherwise
            using (FileStream fileStream = new FileStream(dataDir + DateTime.Now.Ticks.ToString() + "_out.jpg", FileMode.Create))
            {
                memoryStream.WriteTo(fileStream);
            }
        }
    }
}

Extract Images from a Particular Page of a PDF (Facades)

You can extract images from a particular page of a PDF file. In order to do that, you need to set StartPage and EndPage properties to the particular page you want to extract images from. First of all, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. Secondly, you have to set StartPage * and EndPage properties. After that, call ExtractImage method to extract all the images into memory. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. You can either save the images to disk or stream. You only need to call the appropriate overload of GetNextImage method. The following code snippet shows you how to extract images from a particular page of PDF to streams.

// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesParticularPage()
{
    // The path to the documents directory
    var dataDir = RunExamples.GetDataDir_AsposePdf_Images();

    // Open PDF document
    using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
    {
        // Bind PDF document
        extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");

        // Set StartPage and EndPage properties to the page number to
        // You want to extract images from
        extractor.StartPage = 2;
        extractor.EndPage = 2;

        // Extract images
        extractor.ExtractImage();
        // Get extracted images
        while (extractor.HasNextImage())
        {
            // Read image into memory stream
            MemoryStream memoryStream = new MemoryStream();
            extractor.GetNextImage(memoryStream);

            // Write to disk, if you like, or use it otherwise
            using (FileStream fileStream = new FileStream(dataDir + DateTime.Now.Ticks.ToString() + "_out.jpg", FileMode.Create))
            {
                memoryStream.WriteTo(fileStream);
            }
        }
    }
}

Extract Images from a Range of Pages of a PDF (Facades)

You can extract images from a range of pages of a PDF file. In order to do that, you need to set StartPage and EndPage properties to the range of pages you want to extract images from. First of all, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. Secondly, you have to set StartPage and EndPage properties. After that, call ExtractImage method to extract all the images into memory. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. You can either save the images to disk or stream. You only need to call the appropriate overload of GetNextImage method. The following code snippet shows you how to extract images from a range of pages of PDF to streams.

// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesRangePages()
{
    // The path to the documents directory
    var dataDir = RunExamples.GetDataDir_AsposePdf_Images();

    // Open input PDF
    using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
    {
        // Bind PDF document
        extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");

        // Set StartPage and EndPage properties to the page number to
        // You want to extract images from
        extractor.StartPage = 2;
        extractor.EndPage = 2;

        // Extract images
        extractor.ExtractImage();

        // Get extracted images
        while (extractor.HasNextImage())
        {
            // Read image into memory stream
            MemoryStream memoryStream = new MemoryStream();
            extractor.GetNextImage(memoryStream);

            // Write to disk, if you like, or use it otherwise
            using (FileStream fileStream = new
            FileStream(dataDir + DateTime.Now.Ticks.ToString() + "_out.jpg", FileMode.Create))
            {
                memoryStream.WriteTo(fileStream);
            }
        }
    }
}

Extract Images using Image Extraction Mode (Facades)

PdfExtractor class allows you to extract images from a PDF file. Aspose.PDF supports two extraction modes; first is ActuallyUsedImage which extract the images actually used in the PDF document. Second mode is DefinedInResources which extract the images defined in the resources of the PDF document (default extraction mode). First, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. After that, specify the image extraction mode using PdfExtractor.ExtractImageMode property. Then call ExtractImage method to extract all the images into memory depending on the mode you specified. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. In order to save the images to disk, you can call the overload of the GetNextImage method which takes file path as argument.

The following code snippet shows you how to extract images from PDF file using ExtractImageMode option.

// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesImageExtractionMode()
{
    // The path to the documents directory
    var dataDir = RunExamples.GetDataDir_AsposePdf_Images();

    // Open PDF document
    using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
    {
        // Bind PDF document
        extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");

        // Specify Image Extraction Mode
        //extractor.ExtractImageMode = ExtractImageMode.ActuallyUsed;
        extractor.ExtractImageMode = Aspose.Pdf.ExtractImageMode.DefinedInResources;

        // Extract Images based on Image Extraction Mode
        extractor.ExtractImage();

        // Get all the extracted images
        while (extractor.HasNextImage())
        {
            extractor.GetNextImage(dataDir + DateTime.Now.Ticks.ToString() + "_out.png", System.Drawing.Imaging.ImageFormat.Png);
        }
    }
}

For checking if Pdf contains Text Or Images use next code snippet:

// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void CheckIfPdfContainsTextOrImages()
{
    // The path to the documents directory
    var dataDir = RunExamples.GetDataDir_AsposePdf_Images();

    // Instantiate a memoryStream object to hold the extracted text from Document
    MemoryStream ms = new MemoryStream();
    // Instantiate PdfExtractor object
    using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
    {
        // Bind PDF document
        extractor.BindPdf(dataDir + "FilledForm.pdf");
        // Extract text from the input PDF document
        extractor.ExtractText();
        // Save the extracted text to a text file
        extractor.GetText(ms);
        // Check if the MemoryStream length is greater than or equal to 1

        bool containsText = ms.Length >= 1;

        // Extract images from the input PDF document
        extractor.ExtractImage();

        // Calling HasNextImage method in while loop. When images will finish, loop will exit
        bool containsImage = extractor.HasNextImage();

        // Now find out whether this PDF is text only or image only

        if (containsText && !containsImage)
        {
            Console.WriteLine("PDF contains text only");
        }
        else if (!containsText && containsImage)
        {
            Console.WriteLine("PDF contains image only");
        }
        else if (containsText && containsImage)
        {
            Console.WriteLine("PDF contains both text and image");
        }
        else if (!containsText && !containsImage)
        {
            Console.WriteLine("PDF contains neither text or nor image");
        }
    }
}