Extract Images from PDF

Each page in PDF document contain resources (images, forms and fonts). We can access to these resources by calling getResources method. Class Resources contain XImageCollection and we can get list of images by calling getImages method.

Thus to extract image from page, we need to get reference to the page, next to the page resources and last to the image collection. Particular image we can extract for example by index.

The image’s index returns an XImage object. This object provides a save method which can be used to save the extracted image. The following code snippet shows how to extract images from a PDF file.


   // Load the PDF document
   $document = new Document($inputFile);

   // Get the first page of the document
   $page = $document->getPages()->get_Item(1);

   // Get the collection of images on the page
   $xImageCollection = $page->getResources()->getImages();

   // Get the first image from the collection
   $xImage = $xImageCollection->get_Item(1);

   // Create a new FileOutputStream object to save the image
   $outputImage = new java("java.io.FileOutputStream", $outputFile);

   // Save the image to the output file
   $xImage->save($outputImage);

   // Close the output image file
   $outputImage->close();

Extract Text from PDF Extract fonts from PDF