Convert PDF file to HTML format

Aspose.PDF for PHP provides many features for converting various file formats to PDF documents and converting PDF files into various output formats. This article discusses how to convert a PDF file into HTML format and save the images from the PDF file in a particular folder.

When converting large PDF file with several pages to HTML format, the output appears as a single HTML page. It can end up being very long. To control page size, it is possible to split the output into several pages during PDF to HTML conversion.

Convert PDF pages to HTML

Aspose.PDF for PHP provides many features for converting various file formats to PDF documents and converting PDF files into various output formats. This article discusses how to convert a PDF file into HTML format and save the images from the PDF file in a particular folder.

The following code snippet shows you all the possible options that you can use when converting PDF to HTML.

// Create a new Document object and load the input PDF file
$document = new Document($inputFile);

// Create a new HtmlSaveOptions object for saving the document as HTML
$saveOption = new HtmlSaveOptions();

// Save the document as HTML using the specified save options
$document->save($outputFile, $saveOption);

Convert PDF to HTML - Splitting Output to Multi-page HTML

Aspose.PDF for PHP supports the feature to convert PDF documents to various output formats including HTML. However when converting large PDF files (comprised of multiple pages), you may have a requirement to save individual PDF page to separate HTML file.

When converting large PDF file with several pages to HTML format, the output appears as a single HTML page. It can end up being very long. To control page size, it is possible to split the output into several pages during PDF to HTML conversion. Please try using the following code snippet.

// Create a new Document object and load the input PDF file
$document = new Document($inputFile);

// Create a new HtmlSaveOptions object for saving the document as HTML
$saveOption = new HtmlSaveOptions();

// Specify to split the output into multiple pages
$saveOption->setSplitIntoPages(true);

// Save the document as HTML using the specified save options
$document->save($outputFile, $saveOption);

Convert PDF to HTML - Avoid Saving Images in SVG Format

The default output format for saving images when converting from PDF to HTML is SVG. During conversion, some images from PDF are transformed into SVG vector images. This could be slow. Instead, the images could be transformed into PNG. To allow this, Aspose.PDF has the option to use SVG for vectors or to create PNGs.

In order to completely remove the rendering of images as SVG format when converting PDF files to HTML format, please try using the following code snippet.

// Create a new Document object and load the input PDF file
$document = new Document($inputFile);

// Create a new HtmlSaveOptions object for saving the document as HTML
$saveOption = new HtmlSaveOptions();

// Specify the folder where SVG images are saved during PDF to HTML conversion
$saveOption->setSpecialFolderForSvgImages(DATA_DIR);

// Save the document as HTML using the specified save options
$document->save($outputFile, $saveOption);

Compressing SVG Images During Conversion

To compress SVG images during PDF to HTML conversion, please try using the following code:

// Create a new Document object and load the input PDF file
$document = new Document($inputFile);

// Create a new HtmlSaveOptions object for saving the document as HTML
$saveOptions = new HtmlSaveOptions();
$saveOptions = setCompressSvgGraphicsIfAny(true);

// Save the document as HTML using the specified save options
$document->save($outputFile, $saveOptions);

Convert PDF to HTML - Specify Images Folder

By default, when converting a PDF file to HTML, the images in the PDF are saved in a separate folder created in same directory that the output HTML is created. But sometimes, it is necessary to specify a different folder for saving images to when generating HTML files. To accomplish this, we introduced the SaveOptions. setSpecialFolderForAllImages method is used to specify the target folder for storing images.

// Create a new Document object and load the input PDF file
$document = new Document($inputFile);

// Create a new HtmlSaveOptions object for saving the document as HTML
$saveOptions = new HtmlSaveOptions();
$saveOptions->setSpecialFolderForAllImages(DATA_DIR);

// Save the document as HTML using the specified save options
$document->save($outputFile, $saveOptions);

Transparent Text rendering

In case the source/input PDF file contains transparent texts shadowed by foreground images, then there might be text rendering issues. So in order to cater such scenarios, SaveShadowedTextsAsTransparentTexts and SaveTransparentTexts properties can be used.

// Create a new Document object and load the input PDF file
$document = new Document($inputFile);

// Create a new HtmlSaveOptions object for saving the document as HTML
$saveOptions = new HtmlSaveOptions();
$saveOptions->setSaveShadowedTextsAsTransparentTexts(true);
$saveOptions->setTransparentTexts(true);

// Save the document as HTML using the specified save options
$document->save($outputFile, $saveOptions);

PDF document layers rendering

We can render PDF document layers in separate layer type element during PDF to HTML conversion:

// Create a new Document object and load the input PDF file
$document = new Document($inputFile);

// Create a new HtmlSaveOptions object for saving the document as HTML
$saveOptions = new HtmlSaveOptions();
$saveOptions->setConvertMarkedContentToLayers(true);

// Save the document as HTML using the specified save options
$document->save($outputFile, $saveOptions);

PDF to HTML conversion is one of Aspose.PDF’s most popular features because it makes it possible to view the content of PDF files on various platforms without using a PDF document viewer. The output HTML accords with to WWW standards and can easily be displayed in all web browsers. Using this feature, the PDF files can be viewed over hand held devices because you do not need to install any PDF viewing application but can use a simple web browser.