Extract Attachments from PDF
Contents
[
Hide
]
Aspose.PDF for Java supports several extraction flows depending on how attachments are stored in the document.
Extract a single attachment by name
Use this example when you need to save one specific embedded file from a PDF.
- Open the source PDF Document.
- Iterate through the embedded file collection until the required attachment name is found.
- Copy the attachment stream to the output file and stop after extraction.
public static void extractSingleAttachment(Path inputFile, String attachmentName, Path outputFile) throws Exception {
try (Document document = new Document(inputFile.toString())) {
System.out.println("Extracting attachment: " + attachmentName);
boolean attachmentFound = false;
for (FileSpecification fileSpecification : document.getEmbeddedFiles()) {
if (attachmentName.equals(fileSpecification.getName())) {
try (InputStream inputStream = fileSpecification.getContents();
OutputStream outputStream = Files.newOutputStream(outputFile)) {
inputStream.transferTo(outputStream);
}
System.out.println("Attachment extracted successfully");
attachmentFound = true;
break;
}
}
if (!attachmentFound) {
throw new IllegalArgumentException("Attachment '" + attachmentName + "' not found in PDF");
}
}
}
Print embedded file parameters
This helper method prints the metadata stored in a FileParams object.
- Check whether the file parameters object exists.
- Read the available checksum, creation date, modification date, and size values.
- Print the values to the console.
public static void printFileParams(FileParams params) {
if (params != null) {
try {
System.out.println("CheckSum: " + params.getCheckSum());
} catch (Exception ex) {
System.out.println("CheckSum: null");
}
System.out.println("Creation Date: " + params.getCreationDate());
System.out.println("Modification Date: " + params.getModDate());
System.out.println("Size: " + params.getSize());
}
}
Extract all embedded attachments
Use this example when every embedded file in the PDF should be written to an output directory.
- Open the source PDF Document.
- Iterate through the embedded file collection and determine a safe output file name for each item.
- Print the metadata, save each attachment stream, and continue until all files are exported.
public static void extractAttachments(Path inputFile, Path outputDir) throws Exception {
try (Document document = new Document(inputFile.toString())) {
System.out.println("Total files: " + document.getEmbeddedFiles().size());
int fileIndex = 1;
for (FileSpecification fileSpecification : document.getEmbeddedFiles()) {
String fileName = fileSpecification.getName();
if (fileName == null || fileName.isBlank()) {
fileName = fileSpecification.getUnicodeName();
}
if (fileName == null || fileName.isBlank()) {
fileName = "attachment_" + fileIndex + ".bin";
}
System.out.println("Name: " + fileName);
System.out.println("Description: " + fileSpecification.getDescription());
System.out.println("Mime Type: " + fileSpecification.getMIMEType());
printFileParams(fileSpecification.getParams());
Path outputPath = outputDir.resolve(fileName);
try (InputStream inputStream = fileSpecification.getContents();
OutputStream outputStream = Files.newOutputStream(outputPath)) {
inputStream.transferTo(outputStream);
}
fileIndex++;
}
}
}
Extract a file attachment annotation
Use this example when the file is attached through a page annotation instead of only through the embedded files collection.
- Open the source PDF Document.
- Locate the first FileAttachmentAnnotation on the page.
- Read its file specification, export the contents, and print the destination path.
public static void extractFileAttachmentAnnotation(Path inputFile, Path outputDir) throws Exception {
try (Document document = new Document(inputFile.toString())) {
FileAttachmentAnnotation fileAttachment = null;
for (Annotation annotation : document.getPages().get_Item(1).getAnnotations()) {
if (annotation.getAnnotationType() == AnnotationType.FileAttachment) {
fileAttachment = (FileAttachmentAnnotation) annotation;
break;
}
}
if (fileAttachment == null) {
System.out.println("File attachment annotation not found.");
return;
}
FileSpecification fileSpecification = fileAttachment.getFile();
System.out.println("File name: " + fileSpecification.getName());
Path outputPath = outputDir.resolve("extracted-" + fileSpecification.getName());
try (InputStream inputStream = fileSpecification.getContents();
OutputStream outputStream = Files.newOutputStream(outputPath)) {
inputStream.transferTo(outputStream);
}
System.out.println("Extracted to: " + outputPath);
}
}