Extract Images from PDF

Contents
[ ]

Each page in PDF document contain resources (images, forms and fonts). We can access to these resources by calling getResources method. Class Resources contain XImageCollection and we can get list of images by calling getImages method.

Thus to extract image from page, we need to get reference to the page, next to the page resources and last to the image collection. Particular image we can extract for example by index.

The image’s index returns an XImage object. This object provides a Save method which can be used to save the extracted image. The following code snippet shows how to extract images from a PDF file.

public static void Extract_Images(){
       // The path to the documents directory.
       String _dataDir = "/home/admin1/pdf-examples/Samples/";
       String filePath = _dataDir + "ExtractImages.pdf";

       // Load PDF document
       com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(filePath);

       com.aspose.pdf.Page page = pdfDocument.getPages().get_Item(1);
       com.aspose.pdf.XImageCollection xImageCollection = page.getResources().getImages();
       // Extract a particular image
       com.aspose.pdf.XImage xImage = xImageCollection.get_Item(1);

       try {
           java.io.FileOutputStream outputImage = new java.io.FileOutputStream(_dataDir + "output.jpg");
           // Save output image
           xImage.save(outputImage);
           outputImage.close();
       } catch (java.io.FileNotFoundException e) {
           // TODO: handle exception
           e.printStackTrace();
       } catch (java.io.IOException e) {
           // TODO: handle exception
           e.printStackTrace();
       }
   }