Optimize, Compress or Reduce PDF Size in Java
A PDF document can sometimes contain additional data. Reducing the size of a PDF file will help you optimize network transfer and storage. This is especially handy for publishing on web pages, sharing on social networks, sending by e-mail, or archiving in storage. We can use several techniques to optimize PDF:
- Optimize page content for online browsing
- Shrink or compress all images
- Enable reusing page content
- Merge duplicate streams
- Unembed fonts
- Remove unused objects
- Remove flattening form fields
- Remove or flatten annotations
Optimize PDF Document for the Web
Optimization or linearization refers to the process of making a PDF file suitable for online browsing using a web browser. Aspose.PDF supports this process.
To optimize a PDF for web display:
- Open the input document in a Document object.
- Use the optimize() method.
- Save the optimized document using the save(..) method.
The following code snippet shows how to optimize a PDF document for the web.
package com.aspose.pdf.examples;
import java.io.FileNotFoundException;
import java.time.Clock;
import java.time.Duration;
import com.aspose.pdf.*;
import com.aspose.pdf.optimization.ImageCompressionVersion;
import com.aspose.pdf.optimization.ImageEncoding;
public class ExampleOptimize {
private static String _dataDir = "/home/admin1/pdf-examples/Samples/";
public static void OptimizePDFDocumentForTheWeb() {
// Open document
Document pdfDocument = new Document(_dataDir + "sample.pdf");
// Optimize for web
pdfDocument.optimize();
// Save output document
pdfDocument.save(_dataDir + "OptimizeDocument_out.pdf");
}
Reduce PDF File Size
This topic explains the steps to optimize/reduce the PDF file size. Aspose.PDF API provides the OptimizationOptions class that gives the flexibility to optimize output PDF by removing unnecessary objects and compressing PDF files having images. Both these options are elaborated in the following sections.
Remove Unnecessary Objects
We can optimize the size of PDF documents by removing duplicate and unused objects. The following code snippet shows how.
public static void ReduceSizePDF() {
// Open document
Document pdfDocument = new Document(_dataDir + "sample.pdf");
// Optimize PDF document. Note, though, that this method cannot guarantee
// document shrinking
pdfDocument.optimizeResources();
// Save output document
pdfDocument.save(_dataDir + "OptimizeDocument_out.pdf");
}
Shrinking or Compressing All Images
If the source PDF file contains images, consider compressing the images and setting their quality. In order to enable image compression, pass true as an argument to the setCompressImages(..) method. All the images in a document will be re-compressed. The compression is defined by the setImageQuality(..) method, which is the value of the quality in percent. 100% is unchanged quality and image size. To decrease image size, pass an argument of less than 100 to the setImageQuality(..) method.
public static void ShrinkingCompressing() {
// Open document
Document pdfDocument = new Document(_dataDir + "Shrinkimage.pdf");
// Initialize OptimizationOptions
com.aspose.pdf.optimization.OptimizationOptions optimizationOptions = new com.aspose.pdf.optimization.OptimizationOptions();
// Set CompressImages option
optimizationOptions.getImageCompressionOptions().setCompressImages(true);
// Set ImageQuality option
optimizationOptions.getImageCompressionOptions().setImageQuality(50);
// Optimize PDF document using OptimizationOptions
pdfDocument.optimizeResources(optimizationOptions);
_dataDir = _dataDir + "Shrinkimage_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
}
Another way is to resize the images with a lower resolution. In this case, we should set ResizeImages to true and MaxResolution to the appropriate value.
public static void ShrinkingCompressing2() {
// Open document
Document pdfDocument = new Document(_dataDir + "ResizeImage.pdf");
// Initialize OptimizationOptions
com.aspose.pdf.optimization.OptimizationOptions optimizationOptions = new com.aspose.pdf.optimization.OptimizationOptions();
// Set CompressImages option
optimizationOptions.getImageCompressionOptions().setCompressImages(true);
// Set ImageQuality option
optimizationOptions.getImageCompressionOptions().setImageQuality(75);
// Set ResizeImage option
optimizationOptions.getImageCompressionOptions().setResizeImages(true);
// Set MaxResolution option
optimizationOptions.getImageCompressionOptions().setMaxResolution(300);
// Optimize PDF document using OptimizationOptions
pdfDocument.optimizeResources(optimizationOptions);
_dataDir = _dataDir + "ResizeImages_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
}
Another important issue is the execution time. But again, we can manage this setting too. Currently, we can use two algorithms - Standard and Fast. To control the execution time we should set a Version property. The following snippet demonstrates the Fast algorithm:
public static void ShrinkingCompressing3() {
Clock clock = Clock.systemUTC();
Duration tickDuration = Duration.ofNanos(250000);
Clock clock1 = Clock.tick(clock, tickDuration);
System.out.println("Start : " + clock.instant());
// Open document
Document pdfDocument = new Document(_dataDir + "ResizeImage.pdf");
// Initialize OptimizationOptions
com.aspose.pdf.optimization.OptimizationOptions optimizationOptions = new com.aspose.pdf.optimization.OptimizationOptions();
// Set CompressImages option
optimizationOptions.getImageCompressionOptions().setCompressImages(true);
// Set ImageQuality option
optimizationOptions.getImageCompressionOptions().setImageQuality(75);
// Set Image Compression Version to fast
optimizationOptions.getImageCompressionOptions().setVersion(ImageCompressionVersion.Fast);
// Optimize PDF document using OptimizationOptions
pdfDocument.optimizeResources(optimizationOptions);
_dataDir = _dataDir + "ResizeImages_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
System.out.println("Finish : " + clock1.instant());
}
Removing Unused Objects
A PDF document sometimes contains the PDF objects that are not referenced from any other object in the document. This may happen, for example, when a page is removed from the document page tree but the page object itself isn’t removed. Removing these objects doesn’t make the document invalid but rather shrinks it.
public static void RemovingUnusedObjects() {
// Open document
Document pdfDocument = new Document(_dataDir + "OptimizeDocument.pdf");
com.aspose.pdf.optimization.OptimizationOptions optimizationOptions = new com.aspose.pdf.optimization.OptimizationOptions();
optimizationOptions.setRemoveUnusedObjects(true);
pdfDocument.optimizeResources(optimizationOptions);
_dataDir = _dataDir + "emoveUnusedObjects_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
}
Removing Unused Streams
Sometimes a document contains unused resource streams. These streams are not “unused objects” because they are referenced from a page’s resource dictionary. This may happen in cases where an image has been removed from the page but not from the page resources. Also, this situation often occurs when pages are extracted from the document and document pages have “common” resources, that is, the same Resources object. Page contents are analyzed in order to determine if a resource stream is used or not. Unused streams are removed. Sometimes this decreases the document size.
public static void RemovingUnusedStream() {
// Open document
Document pdfDocument = new Document(_dataDir +
"OptimizeDocument.pdf");
com.aspose.pdf.optimization.OptimizationOptions optimizationOptions = new com.aspose.pdf.optimization.OptimizationOptions();
optimizationOptions.setRemoveUnusedStreams(true);
pdfDocument.optimizeResources(optimizationOptions);
_dataDir = _dataDir + "removeUnusedObjects_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
}
Linking Duplicate Streams
Sometimes a document contains several identical resource streams (for example images). This may happen for example when a document is concatenated with itself. The output document contains two independent copies of the same resource stream. We analyze all resource streams and compare them. If streams are duplicated they are merged, that is, only one copy is left, references are changed appropriately and copies of the object are removed. Sometimes this decreases the document size.
public static void LinkingDuplicateStream() {
// Open document
Document pdfDocument = new Document(_dataDir + "OptimizeDocument.pdf");
com.aspose.pdf.optimization.OptimizationOptions optimizationOptions = new com.aspose.pdf.optimization.OptimizationOptions();
optimizationOptions.setRemoveUnusedStreams(true);
// Optimize PDF document using OptimizationOptions
pdfDocument.optimizeResources(optimizationOptions);
_dataDir = _dataDir + "OptimizeDocument_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
}
Additionally, we can use AllowReusePageContent settings. If this property is set to true, the page content will be reused when optimizing the document for identical pages.
public static void AllowReusePageContent() {
// Open document
Document pdfDocument = new Document(_dataDir + "OptimizeDocument.pdf");
com.aspose.pdf.optimization.OptimizationOptions optimizationOptions = new com.aspose.pdf.optimization.OptimizationOptions();
optimizationOptions.setAllowReusePageContent(true);
// Optimize PDF document using OptimizationOptions
pdfDocument.optimizeResources(optimizationOptions);
_dataDir = _dataDir + "OptimizeDocument_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
}
Unembedding Fonts
If the document uses embedded fonts it means that all font data is placed in the document. The advantage is that the document is viewable regardless of whether the font is installed on the user’s machine or not. But embedding fonts makes the document larger. The unembed fonts method removes all embedded fonts. This decreases the document size but the document may become unreadable if the correct font is not installed.
public static void UnembedFonts() {
// Open document
Document pdfDocument = new Document(_dataDir + "OptimizeDocument.pdf");
com.aspose.pdf.optimization.OptimizationOptions optimizationOptions = new com.aspose.pdf.optimization.OptimizationOptions();
optimizationOptions.setUnembedFonts(true);
// Optimize PDF document using OptimizationOptions
pdfDocument.optimizeResources(optimizationOptions);
_dataDir = _dataDir + "OptimizeDocument_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
}
Removing or Flattening Annotations
Annotations can be deleted when they are unnecessary. When they are needed but do not require additional editing, they can be flattened. Both of these techniques will reduce the file size.
public static void FlatteningAnnotations() {
// Open document
Document pdfDocument = new Document(_dataDir + "OptimizeDocument.pdf");
for (Page page : pdfDocument.getPages()) {
for (Annotation annotation : page.getAnnotations())
annotation.flatten();
}
_dataDir = _dataDir + "OptimizeDocument_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
}
Removing Form Fields
If the PDF document contains AcroForms, we can try to reduce the file size by flattening form fields.
public static void RemovingFormFields() {
// Open document
Document pdfDocument = new Document(_dataDir + "OptimizeDocument.pdf");
// Flatten Forms
if (pdfDocument.getForm().getFields().length > 0) {
for (Field field : pdfDocument.getForm().getFields()) {
field.flatten();
}
}
_dataDir = _dataDir + "FlattenForms_out.pdf";
// save the updated document
pdfDocument.save(_dataDir);
}
Convert a PDF from RGB colorspace to grayscale
A PDF file is comprised of Text, Image, Attachment, Annotations, Graphs and other objects. You may come across a requirement to convert a PDF from RGB colorspace to Grayscale so that it would be faster while printing those PDF files. Also when the file is converted to Grayscale, the size of the document is also reduced but with this change, the quality of the document may drop. Currently, this feature is supported by the Pre-Flight feature of Adobe Acrobat, but when talking about Office automation, Aspose.PDF is an ultimate solution to provide such leverages for document manipulation.
In order to accomplish this requirement, the following code snippet can be used.
public static void ConvertRGBtoGrayscale() {
// Open document
Document pdfDocument = new Document(_dataDir + "OptimizeDocument.pdf");
com.aspose.pdf.RgbToDeviceGrayConversionStrategy strategy = new com.aspose.pdf.RgbToDeviceGrayConversionStrategy();
for (int idxPage = 1; idxPage <= pdfDocument.getPages().size(); idxPage++) {
Page page = pdfDocument.getPages().get_Item(idxPage);
strategy.convert(page);
}
pdfDocument.save("output.pdf");
}
FlateDecode Compression
Aspose.PDF for Java provides support of FlateDecode compression for PDF Optimisation functionality. The following code snippet below shows how to use the option in Optimization to store images with FlateDecode compression:
public static void FlateDecodeCompression() {
// Open document
Document pdfDocument = new Document(_dataDir + "OptimizeDocument.pdf");
com.aspose.pdf.optimization.OptimizationOptions optimizationOptions = new com.aspose.pdf.optimization.OptimizationOptions();
optimizationOptions.getImageCompressionOptions().setEncoding(ImageEncoding.Flate);
// Optimize PDF document using OptimizationOptions
pdfDocument.optimizeResources(optimizationOptions);
_dataDir = _dataDir + "OptimizeDocument_out.pdf";
// Save updated document
pdfDocument.save(_dataDir);
}
Store Image in XImageCollection
Aspose.PDF for Java provides the ability to store new images into XImageCollection with FlateDecode compression. To enable this option you can use ImageFilterType.Flate flag. The following code snippet shows how to use this functionality:
public static void StoreImageInXImageCollection() {
// Initialize Document
Document document = new Document();
document.getPages().add();
Page page = document.getPages().get_Item(1);
// Load image into stream
java.io.FileInputStream imageStream = null;
try {
imageStream = new java.io.FileInputStream(new java.io.File("input_image1.jpg"));
} catch (FileNotFoundException e) {
e.printStackTrace();
return;
}
page.getResources().getImages().add(imageStream, ImageFilterType.Flate);
XImage ximage = page.getResources().getImages().get_Item(page.getResources().getImages().size());
page.getContents().add(new com.aspose.pdf.operators.GSave());
// Set coordinates
int lowerLeftX = 0;
int lowerLeftY = 0;
int upperRightX = 600;
int upperRightY = 600;
Rectangle rectangle = new Rectangle(lowerLeftX, lowerLeftY, upperRightX, upperRightY);
Matrix matrix = new Matrix(new double[] { rectangle.getURX() - rectangle.getLLX(), 0, 0,
rectangle.getURY() - rectangle.getLLY(), rectangle.getLLX(), rectangle.getLLY() });
// Using ConcatenateMatrix (concatenate matrix) operator: defines how image must be placed
page.getContents().add(new com.aspose.pdf.operators.ConcatenateMatrix(matrix));
page.getContents().add(new com.aspose.pdf.operators.Do(ximage.getName()));
page.getContents().add(new com.aspose.pdf.operators.GRestore());
document.save(_dataDir + "FlateDecodeCompression.pdf");
}
}