Convert PDF to HTML in Java

Aspose.PDF for Java supports HTML export with options for images, SVG, page splitting, transparency, and layer rendering.

Convert PDF to HTML

Use this example when a PDF should be exported to a standard HTML document.

  1. Open the source PDF document.
  2. Configure the default HTML save options.
  3. Save the generated HTML output.
public static void convertPdfToHtml(Path inputFile, Path outputFile) {
       saveDocument(inputFile, outputFile, new HtmlSaveOptions());
   }

Convert PDF to HTML and store images separately

Use this example when extracted images should be written as separate files during HTML export.

  1. Open the source PDF document.
  2. Configure HTML save options for external image storage.
  3. Save the HTML output and generated image assets.
public static void convertPdfToHtmlStoringImages(Path inputFile, Path outputFile) {
    HtmlSaveOptions saveOptions = new HtmlSaveOptions();
    saveOptions.setSpecialFolderForAllImages(inputFile.getParent().resolve("images").toString());
    saveDocument(inputFile, outputFile, saveOptions);
}

Convert PDF to multi-page HTML

Use this example when each PDF page should be represented separately in HTML output.

  1. Open the source PDF document.
  2. Set the HTML save options for multi-page mode.
  3. Save the generated HTML files.
public static void convertPdfToHtmlMultiPage(Path inputFile, Path outputFile) {
    HtmlSaveOptions saveOptions = new HtmlSaveOptions();
    saveOptions.setSplitIntoPages(true);
    saveDocument(inputFile, outputFile, saveOptions);
}

Convert PDF to HTML and store SVG separately

Use this example when vector content should be emitted as separate SVG resources.

  1. Open the source PDF document.
  2. Configure HTML save options to externalize SVG content.
  3. Save the HTML output and SVG assets.
public static void convertPdfToHtmlStoringSvg(Path inputFile, Path outputFile) {
    HtmlSaveOptions saveOptions = new HtmlSaveOptions();
    saveOptions.setSpecialFolderForSvgImages(inputFile.getParent().resolve("svg_images").toString());
    saveDocument(inputFile, outputFile, saveOptions);
}

Convert PDF to HTML with compressed SVG

Use this example when SVG output should be optimized during HTML export.

  1. Open the source PDF document.
  2. Enable compressed SVG output in the HTML save options.
  3. Save the converted HTML files.
public static void convertPdfToHtmlCompressSvg(Path inputFile, Path outputFile) {
    HtmlSaveOptions saveOptions = new HtmlSaveOptions();
    saveOptions.setSpecialFolderForSvgImages(inputFile.getParent().resolve("svg_images").toString());
    saveOptions.setCompressSvgGraphicsIfAny(true);
    saveDocument(inputFile, outputFile, saveOptions);
}

Convert PDF to HTML with PNG page backgrounds

Use this example when page backgrounds should be rendered as PNG images in HTML output.

  1. Open the source PDF document.
  2. Configure HTML save options for PNG background rendering.
  3. Save the converted HTML output.
public static void convertPdfToHtmlPngBackground(Path inputFile, Path outputFile) {
    HtmlSaveOptions saveOptions = new HtmlSaveOptions();
    saveOptions.setRasterImagesSavingMode(HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground);
    saveDocument(inputFile, outputFile, saveOptions);
}

Convert PDF to HTML body content only

Use this example when only the body markup is needed instead of a full HTML document shell.

  1. Open the source PDF document.
  2. Configure the HTML save options to emit body content.
  3. Save the HTML output.
public static void convertPdfToHtmlBodyContent(Path inputFile, Path outputFile) {
    HtmlSaveOptions saveOptions = new HtmlSaveOptions();
    saveOptions.setHtmlMarkupGenerationMode(HtmlSaveOptions.HtmlMarkupGenerationModes.WriteOnlyBodyContent);
    saveOptions.setSplitIntoPages(true);
    saveDocument(inputFile, outputFile, saveOptions);
}

Convert PDF to HTML with transparent text rendering

Use this example when transparent text should be preserved in the HTML export.

  1. Open the source PDF document.
  2. Set the HTML save options for transparent text rendering.
  3. Save the converted HTML output.
public static void convertPdfToHtmlTransparentTextRendering(Path inputFile, Path outputFile) {
    HtmlSaveOptions saveOptions = new HtmlSaveOptions();
    saveOptions.setSaveTransparentTexts(true);
    saveOptions.setSaveShadowedTextsAsTransparentTexts(true);
    saveDocument(inputFile, outputFile, saveOptions);
}

Convert PDF to HTML with document layer rendering

Use this example when PDF layer visibility should be reflected in the HTML result.

  1. Open the source PDF document.
  2. Configure the HTML save options for document layer rendering.
  3. Save the exported HTML files.
public static void convertPdfToHtmlDocumentLayersRendering(Path inputFile, Path outputFile) {
    HtmlSaveOptions saveOptions = new HtmlSaveOptions();
    saveOptions.setConvertMarkedContentToLayers(true);
    saveDocument(inputFile, outputFile, saveOptions);
}

Reuse a shared HTML save helper

Use this helper when several HTML conversion examples should save the document through one common method.

  1. Open the source PDF document.
  2. Pass the prepared HtmlSaveOptions into the helper.
  3. Save the converted HTML output.
private static void saveDocument(Path inputFile, Path outputFile, HtmlSaveOptions saveOptions) {
    try (Document document = new Document(inputFile.toString())) {
        document.save(outputFile.toString(), saveOptions);
    }
    System.out.println(inputFile + " converted into " + outputFile);
}