Extract Images using PdfExtractor
Extract Images from the Whole PDF to Files (Facades)
PdfExtractor class allows you to extract images from a PDF file. First off, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. After that, call ExtractImage method to extract all the images into memory. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. In order to save the images to disk, you can call the overload of the GetNextImage method which takes file path as argument. The following code snippet shows you how to extract images from the whole PDF to files.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesWholePDF()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Images();
// Open PDF document
using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");
// Extract all the images
extractor.ExtractImage();
// Get all the extracted images
while (extractor.HasNextImage())
{
extractor.GetNextImage(dataDir + DateTime.Now.Ticks.ToString() + "_out.jpg");
}
}
}
Extract Images from the Whole PDF to Streams (Facades)
PdfExtractor class allows you to extract images from a PDF file into streams. First off, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. After that, call ExtractImage method to extract all the images into memory. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. In order to save the images to stream, you can call the overload of the GetNextImage method which takes Stream as argument. The following code snippet shows you how to extract images from the whole PDF to streams.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesWholePDFStreams()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Images();
// Open PDF document
using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");
// Extract images
extractor.ExtractImage();
// Get all the extracted images
while (extractor.HasNextImage())
{
// Read image into memory stream
MemoryStream memoryStream = new MemoryStream();
extractor.GetNextImage(memoryStream);
// Write to disk, if you like, or use it otherwise
using (FileStream fileStream = new FileStream(dataDir + DateTime.Now.Ticks.ToString() + "_out.jpg", FileMode.Create))
{
memoryStream.WriteTo(fileStream);
}
}
}
}
Extract Images from a Particular Page of a PDF (Facades)
You can extract images from a particular page of a PDF file. In order to do that, you need to set StartPage and EndPage properties to the particular page you want to extract images from. First of all, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. Secondly, you have to set StartPage * and EndPage properties. After that, call ExtractImage method to extract all the images into memory. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. You can either save the images to disk or stream. You only need to call the appropriate overload of GetNextImage method. The following code snippet shows you how to extract images from a particular page of PDF to streams.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesParticularPage()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Images();
// Open PDF document
using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");
// Set StartPage and EndPage properties to the page number to
// You want to extract images from
extractor.StartPage = 2;
extractor.EndPage = 2;
// Extract images
extractor.ExtractImage();
// Get extracted images
while (extractor.HasNextImage())
{
// Read image into memory stream
MemoryStream memoryStream = new MemoryStream();
extractor.GetNextImage(memoryStream);
// Write to disk, if you like, or use it otherwise
using (FileStream fileStream = new FileStream(dataDir + DateTime.Now.Ticks.ToString() + "_out.jpg", FileMode.Create))
{
memoryStream.WriteTo(fileStream);
}
}
}
}
Extract Images from a Range of Pages of a PDF (Facades)
You can extract images from a range of pages of a PDF file. In order to do that, you need to set StartPage and EndPage properties to the range of pages you want to extract images from. First of all, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. Secondly, you have to set StartPage and EndPage properties. After that, call ExtractImage method to extract all the images into memory. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. You can either save the images to disk or stream. You only need to call the appropriate overload of GetNextImage method. The following code snippet shows you how to extract images from a range of pages of PDF to streams.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesRangePages()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Images();
// Open input PDF
using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");
// Set StartPage and EndPage properties to the page number to
// You want to extract images from
extractor.StartPage = 2;
extractor.EndPage = 2;
// Extract images
extractor.ExtractImage();
// Get extracted images
while (extractor.HasNextImage())
{
// Read image into memory stream
MemoryStream memoryStream = new MemoryStream();
extractor.GetNextImage(memoryStream);
// Write to disk, if you like, or use it otherwise
using (FileStream fileStream = new
FileStream(dataDir + DateTime.Now.Ticks.ToString() + "_out.jpg", FileMode.Create))
{
memoryStream.WriteTo(fileStream);
}
}
}
}
Extract Images using Image Extraction Mode (Facades)
PdfExtractor class allows you to extract images from a PDF file. Aspose.PDF supports two extraction modes; first is ActuallyUsedImage which extract the images actually used in the PDF document. Second mode is DefinedInResources which extract the images defined in the resources of the PDF document (default extraction mode). First, you need to create an object of PdfExtractor class and bind input PDF file using BindPdf method. After that, specify the image extraction mode using PdfExtractor.ExtractImageMode property. Then call ExtractImage method to extract all the images into memory depending on the mode you specified. Once the images are extracted, you can get those images with the help of HasNextImage and GetNextImage methods. You need to loop through all the extracted images using a while loop. In order to save the images to disk, you can call the overload of the GetNextImage method which takes file path as argument.
The following code snippet shows you how to extract images from PDF file using ExtractImageMode option.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractImagesImageExtractionMode()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Images();
// Open PDF document
using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
extractor.BindPdf(dataDir + "sample_cats_dogs.pdf");
// Specify Image Extraction Mode
//extractor.ExtractImageMode = ExtractImageMode.ActuallyUsed;
extractor.ExtractImageMode = Aspose.Pdf.ExtractImageMode.DefinedInResources;
// Extract Images based on Image Extraction Mode
extractor.ExtractImage();
// Get all the extracted images
while (extractor.HasNextImage())
{
extractor.GetNextImage(dataDir + DateTime.Now.Ticks.ToString() + "_out.png", System.Drawing.Imaging.ImageFormat.Png);
}
}
}
For checking if Pdf contains Text Or Images use next code snippet:
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void CheckIfPdfContainsTextOrImages()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Images();
// Instantiate a memoryStream object to hold the extracted text from Document
MemoryStream ms = new MemoryStream();
// Instantiate PdfExtractor object
using (var extractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
extractor.BindPdf(dataDir + "FilledForm.pdf");
// Extract text from the input PDF document
extractor.ExtractText();
// Save the extracted text to a text file
extractor.GetText(ms);
// Check if the MemoryStream length is greater than or equal to 1
bool containsText = ms.Length >= 1;
// Extract images from the input PDF document
extractor.ExtractImage();
// Calling HasNextImage method in while loop. When images will finish, loop will exit
bool containsImage = extractor.HasNextImage();
// Now find out whether this PDF is text only or image only
if (containsText && !containsImage)
{
Console.WriteLine("PDF contains text only");
}
else if (!containsText && containsImage)
{
Console.WriteLine("PDF contains image only");
}
else if (containsText && containsImage)
{
Console.WriteLine("PDF contains both text and image");
}
else if (!containsText && !containsImage)
{
Console.WriteLine("PDF contains neither text or nor image");
}
}
}