Convert PDF to EPUB, LaTeX, Text, XPS in Python

Convert PDF to EPUB

EPUB is a free and open e-book standard from the International Digital Publishing Forum (IDPF). Files have the extension .epub. EPUB is designed for reflowable content, meaning that an EPUB reader can optimize text for a particular display device. EPUB also supports fixed-layout content. The format is intended as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale. It supersedes the Open eBook standard.

Aspose.PDF for Python also supports the feature to convert PDF documents to EPUB format. Aspose.PDF for Python has a class named ‘EpubSaveOptions’ which can be used as the second argument to document.save() method, to generate an EPUB file. Please try using the following code snippet to accomplish this requirement with Python.


    from os import path
    import aspose.pdf as ap

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = ap.Document(path_infile)
    save_options = ap.EpubSaveOptions()
    save_options.content_recognition_mode = (
        ap.EpubSaveOptions.RecognitionMode.FLOW
    )
    document.save(path_outfile, save_options)

    print(infile + " converted into " + outfile)

Convert PDF to LaTeX/TeX

Aspose.PDF for Python via .NET support converting PDF to LaTeX/TeX. The LaTeX file format is a text file format with the special markup and used in TeX-based document preparation system for high-quality typesetting.

To convert PDF files to TeX, Aspose.PDF has the class LaTeXSaveOptions which provides the property OutDirectoryPath for saving temporary images during the conversion process.

The following code snippet shows the process of converting PDF files into the TEX format with Python.


    from os import path
    import aspose.pdf as ap

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = ap.Document(path_infile)
    save_options = ap.LaTeXSaveOptions()

    document.save(path_outfile, save_options)
    print(infile + " converted into " + outfile)

Convert PDF to Text

Aspose.PDF for Python support converting whole PDF document and single page to a Text file. You can convert PDF document to TXT file using ‘TextDevice’ class. The following code snippet explains how to extract the texts from the all pages.


    from os import path
    import aspose.pdf as ap

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = ap.Document(path_infile)
    device = ap.devices.TextDevice()
    device.process(document.pages[1], path_outfile)

    print(infile + " converted into " + outfile)

Convert PDF to XPS

Aspose.PDF for Python gives a possibility to convert PDF files to XPS format. Let try to use the presented code snippet for converting PDF files to XPS format with Python.

The XPS file type is primarily associated with the XML Paper Specification by Microsoft Corporation. The XML Paper Specification (XPS), formerly codenamed Metro and subsuming the Next Generation Print Path (NGPP) marketing concept, is Microsoft’s initiative to integrate document creation and viewing into the Windows operating system.

To convert PDF files to XPS, Aspose.PDF has the class XpsSaveOptions that is used as the second argument to the document.save() method to generate the XPS file.

The following code snippet shows the process of converting PDF file into XPS format.


    from os import path
    import aspose.pdf as ap

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = ap.Document(path_infile)
    save_options = ap.XpsSaveOptions()
    save_options.use_new_imaging_engine = True
    document.save(path_outfile, save_options)

    print(infile + " converted into " + outfile)

Convert PDF to MD

Aspose.PDF has the class ‘MarkdownSaveOptions()’, which converts a PDF document into Markdown (MD) format while preserving images and resources.

  1. Load the source PDF using ‘ap.Document’.
  2. Create an instance of ‘MarkdownSaveOptions’.
  3. Set ‘resources_directory_name’ to ‘images’ – extracted images will be stored in this folder.
  4. Save the converted Markdown document using the configured options.
  5. Print a confirmation message after conversion.

    from os import path
    import aspose.pdf as ap

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = ap.Document(path_infile)
    save_options = ap.MarkdownSaveOptions()
    # save_options.extract_vector_graphics = True
    save_options.resources_directory_name = "images"
    save_options.use_image_html_tag = True
    document.save(path_outfile, save_options)

    print(infile + " converted into " + outfile)

A Markdown file with text and linked images stored in the specified images folder.

Convert PDF to MobiXML

This method converts a PDF document into the MOBI (MobiXML) format, which is commonly used for eBooks on Kindle devices.

  1. Load the source PDF document using ‘ap.Document’.
  2. Save the document with the format ‘ap.SaveFormat.MOBI_XML’.
  3. Print a confirmation message once the conversion is complete.

    from os import path
    import aspose.pdf as ap

    path_infile = path.join(self.data_dir, infile)
    path_outfile = path.join(self.data_dir, "python", outfile)

    document = ap.Document(path_infile)
    document.save(path_outfile, ap.SaveFormat.MOBI_XML)

    print(infile + " converted into " + outfile)