Convert various file formats to PDF using C#

Convert EPUB to PDF

Aspose.PDF for .NET allows you simply convert EPUB files to PDF format.

EPUB (short for electronic publication) is a free and open e-book standard from the International Digital Publishing Forum (IDPF). Files have the extension .epub. EPUB is designed for reflowable content, meaning that an EPUB reader can optimize text for a particular display device.

EPUB also supports fixed-layout content. The format is intended as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale. It supersedes the Open eBook standard.The version EPUB 3 is also endorsed by the Book Industry Study Group (BISG), a leading book trade association for standardized best practices, research, information and events, for packaging of content.

Conversion steps:

  1. Create an instance of EpubLoadOptions class.
  2. Create an instance of Document class with mention source filename and options.
  3. Save the document with the desired file name.

Next following code snippet show you how to convert EPUB files to PDF format with C#.

public static void ConvertEPUBtoPDF()
{
    EpubLoadOptions option = new EpubLoadOptions();
    Document pdfDocument= new Document(_dataDir + "WebAssembly.epub", option);
    pdfDocument.Save(_dataDir + "epub_test.pdf");
}

You can also set page size for conversion. To define new page size you SizeF object and pass it to EpubLoadOptions constructor.

public static void ConvertEPUBtoPDFAdv()
{
    EpubLoadOptions option = new EpubLoadOptions(new SizeF(1190, 1684));
    Document pdfDocument= new Document(_dataDir + "WebAssembly.epub", option);
    pdfDocument.Save(_dataDir + "epub_test.pdf");
}

Convert Markdown to PDF

This feature is supported by version 19.6 or greater.

Aspose.PDF for .NET provides the functionality to create a PDF document based on input Markdown data file. In order to convert the Markdown to PDF, you need to initialize the Document using MdLoadOptions.

The following code snippet shows how to use this functionality with Aspose.PDF library:

// The path to the documents directory.
string dataDir = RunExamples.GetDataDir_AsposePdf_DocumentConversion();
// Open Markdown document
Document pdfDocument= new Document(dataDir + "sample.md", new MdLoadOptions());
// Save document in PDF format
pdfDocument.Save(dataDir + "MarkdownToPDF.pdf");

Convert PCL to PDF

PCL (Printer Command Language) is a Hewlett-Packard printer language developed to access standard printer features. PCL levels 1 through 5e/5c are command based languages using control sequences that are processed and interpreted in the order they are received. At a consumer level, PCL data streams are generated by a print driver. PCL output can also be easily generated by custom applications.

Currently only PCL5 and older versions are supported

Sets of Commands Support Exceptions Description
Job control commands + Duplex printing mode Control print process: number pf copies, output bin, simplex/duplex printing, left and top offsets etc.
Page control commands + Perforation Skip command Specify a size of page, margins, page orientation inter -lines, -character distances etc.
Cursor Positioning Commands +   Specify cursor position and, hence, origins of text, raster or vector images and details.
Font selection commands +
  1. Transparent Print Data Command.
  2. Embedded soft fonts. In current version instead of creating soft font our library selects suitable font from existing "hard" TrueType fonts installed on a target machine.
    Suitability is defined by width/height ratio.
    This feature works only for Bitmap and TrueType fonts and do not guarantee that text printed with soft font will be relevant to the one in a source file.
    Because character codes in soft font can unmatched default ones.
  3. User Defined Symbol Sets.
Allow loading soft (embedded) fonts from PCL file and managing them in memory.
Raster graphics commands + Only black & white Allow loading raster images from PCL file to memory, specify raster parameters.
such as width, height, compression type, resolution etc.
Color commands +   Allow coloring for all printable objects.
Print Model commands +   Allow filling text, raster images and rectangular areas with a raster predefined and
user-defined patterns specify transparency mode for patterns and source raster image.
Predefined patterns are hatching, cross-hatch and shading ones.
Rectangle area fill commands +   Allow creation and filling rectangular areas with patterns.
HP-GL/2 Vector Graphics commands + Screened Vector Command (SV), Transparency Mode Command (TR), Transparent Data Command (TD), RO (Rotate Coordinate System), Scalable or Bitmap Fonts Command (SB), Character Slant Command (SL) and Extra Space (ES) are not implemented and DV (Define Variable Text Path) commands are realized in beta version. Allow loading HP-GL/2 vector images from PCL file into memory. Vector image has an origin at lower left corner of the printable area, can be scaled, translated, rotated and clipped.
Vector image can contain text, as labels, and geometric figures such as rectangle, circle, ellipse, line, arc, bezier curve and complex figures composed from the simple ones.
Closed figures including letters of labels can be filled with solid fill or vector pattern.
Pattern can be hatching, cross-hatch, shading, raster used-defined, PCL hatching or cross-hatch and PCL user-defined. PCL patterns are raster. Labels can be individually rotated, scaled, and directed in four directions: up, down, left and right. Left and Right directions involve one-after-another letter arrangement. Up and Down directions involve one-under-another letter arrangement.
Macroses   Allow loading a sequence of PCL commands into memory and use this sequence many times, for example, to print page header or set one formatting for a set of pages.
Unicode text   Allow printing non ASCII characters. Not implemented due to lack of sample files with
Unicode text
PCL6 (PCL-XL)   Realized only in Beta version because of lack in test files. Embedded fonts also are not supported.
JetReady extension is not supported because it is impossible to have JetReady specification.
Binary file format.

Converting a PCL file into PDF format

To allow conversion from PCL to PDF, Aspose.PDF has the class PclLoadOptions which is used to initialize the LoadOptions object. Later on this object is passed as an argument during Document object initialization and it helps the PDF rendering engine to determine the input format of source document.

The following code snippet shows the process of converting a PCL file into PDF format.

public static void ConvertPCLtoPDF()
{
    PclLoadOptions options = new PclLoadOptions();
    Document pdfDocument= new Document(_dataDir + "demo.pcl", options);
    pdfDocument.Save(_dataDir + "pcl_test.pdf");
}

You can also monitor the detection of errors during the conversion process. To do this, you need to configure PclLoadOptions object: set or unset SupressErrors.

public static void ConvertPCLtoPDFAvdanced()
{
    PclLoadOptions options = new PclLoadOptions { SupressErrors = true };
    Document pdfDocument= new Document(_dataDir + "demo.pcl", options);
    if (options.Exceptions!=null)
        foreach (var ex in options.Exceptions)
        {
            Console.WriteLine(ex.Message);
        }
    pdfDocument.Save(_dataDir + "pcl_test.pdf");
}

Known Issues

  1. Origin of text strings and images can be slightly differed from the ones in a sorce PCL file If print direction is not 0°. The same refers to vector images if coordinate system of vector plot is rotated (RO command preceded).
  2. Origin of labels in vector images can be differ from the ones in a sorce PCL file If the labels are influenced by a sequence of commands: Label Origin (LO), Define Variable Text Path (DV), Absolute Direction (DI) or Relative Direction (DR).
  3. A text can be incorrectly read if it must be rendered with Bitmap or TrueType soft (embedded) font, because currently these fonts are only partially supported (See exceptions in “Supported features table”). In this situation text can be correctly read only if character codes in a soft font corresponds to default ones. A style of the read text also can be differed from the one in source PCL file because it is not necessary to set style in soft font header.
  4. If parsed PCL file contains Intellifont or Universal soft fonts exception will be thrown, because Intellifont and Universal font are not supported at all.
  5. If parsed PCL file contains macros commands the result of parsing will strongly differ from the source file, because macros commands are not supported.

Convert Text to PDF

Aspose.PDF for .NET support the feature converting plain text and pre-formatted text file to PDF format.

Converting text to PDF means adding text fragments to the PDF page. As for text files, we are dealing with 2 types of text: pre-formatting (for example, 25 lines with 80 characters per line) and non-formatted text (plain text). Depending on our needs, we can control this addition ourselves or entrust it to the library’s algorithms.

Convert plain text file to PDF

In case of the plain text file, we can use the following technique:

  1. use a TextReader to read the whole text;
  2. instantiate Document object and add a new page in Pages collection;
  3. create a new object of TextFragment and pass TextReader object to its constructor;
  4. add TextFragment object as paragraph in Paragraphs collection. If the amount of text is larger than the page, library algorithm automatically adds extra pages;
  5. use Save method of Document class;
// For complete examples and data files, please go to https://github.com/aspose-pdf/Aspose.PDF-for-.NET
// The path to the documents directory.
string dataDir = RunExamples.GetDataDir_AsposePdf_DocumentConversion();
// Read the source text file
TextReader tr = new StreamReader(dataDir + "log.txt");

// Instantiate a Document object by calling its empty constructor
Document pdfDocument= new Document();

// Add a new page in Pages collection of Document
Page page = pdfDocument.Pages.Add();

// Create an instance of TextFragmet and pass the text from reader object to its constructor as argument
TextFragment text = new TextFragment(tr.ReadToEnd());

// Add a new text paragraph in paragraphs collection and pass the TextFragment object
page.Paragraphs.Add(text);

// Save resultant PDF file
pdfDocument.Save(dataDir + "TexttoPDF_out.pdf");

Convert pre-formatted text file to PDF

Converting pre-formatted text is like plain text but you need to make some additional actions such as setting margins, font type and size. Obviously that font should be monospace (for example Courier New).

Follow these steps to convert pre-formatted text to PDF with C#:

  1. Read the whole text as an array of strings;
  2. Instantiate Document object and add a new page in Pages collection;
  3. Run loop through an array of strings and add each string as a paragraph in Paragraph collection

In this case, the library’s algorithm also adds extra pages, but we can control this process ourselves. Following example shows how to convert pre-formatted text file (80x25) to PDF document with page size A4.

public static void ConvertPreFormattedTextToPdf()
{
    // Read the text file as array of string
    var lines = System.IO.File.ReadAllLines(_dataDir + "rfc822.txt");

    // Instantiate a Document object by calling its empty constructor
    Document pdfDocument= new Document();

    // Add a new page in Pages collection of Document
    Page page = pdfDocument.Pages.Add();

    // Set left and right margins for better presentation
    page.PageInfo.Margin.Left = 20;
    page.PageInfo.Margin.Right = 10;
    page.PageInfo.DefaultTextState.Font = FontRepository.FindFont("Courier New");
    page.PageInfo.DefaultTextState.FontSize = 12;

    foreach (var line in lines)
    {
        // check if line contains "form feed" character
        // see https://en.wikipedia.org/wiki/Page_break
        if (line.StartsWith("\x0c"))
        {
            page = pdfDocument.Pages.Add();
            page.PageInfo.Margin.Left = 20;
            page.PageInfo.Margin.Right = 10;
            page.PageInfo.DefaultTextState.Font = FontRepository.FindFont("Courier New");
            page.PageInfo.DefaultTextState.FontSize = 12;
        }
        else
        {
            // Create an instance of TextFragment and
            // pass the line to its
            // constructor as argument
            TextFragment text = new TextFragment(line);

            // Add a new text paragraph in paragraphs collection and pass the TextFragment object
            page.Paragraphs.Add(text);
        }
    }

    // Save resultant PDF file
    pdfDocument.Save(_dataDir + "TexttoPDF_out.pdf");
}

Convert XPS to PDF

Aspose.PDF for .NET support feature converting XPS files to PDF format. Check this article to resolve your tasks.

The XPS file type is primarily associated with the XML Paper Specification by Microsoft Corporation. The XML Paper Specification (XPS), formerly codenamed Metro and subsuming the Next Generation Print Path (NGPP) marketing concept, is Microsoft’s initiative to integrate document creation and viewing into its Windows operating system.

In order to convert XPS to PDF with Aspose.PDF for .NET, we have introduced a class named XpsLoadOption which is used to initialize a LoadOptions object. Later, this object is passed as an argument during the Document object initialization and it helps the PDF rendering engine to determine the source document’s input format.

The following code snippet shows the process of converting XPS file into PDF format with C#.

// For complete examples and data files, please go to https://github.com/aspose-pdf/Aspose.PDF-for-.NET
// The path to the documents directory.
string dataDir = RunExamples.GetDataDir_AsposePdf_DocumentConversion();

// Instantiate LoadOption object using XPS load option
Aspose.Pdf.LoadOptions options = new XpsLoadOptions();

// Create document object
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(dataDir + "XPSToPDF.xps", options);

// Save the resultant PDF document
document.Save(dataDir + "XPSToPDF_out.pdf");

Convert PostScript to PDF

Aspose.PDF for .NET support features converting PostScript files to PDF format. One of the features from Aspose.PDF is that you can set a set of font folders to be used during conversion.

In order to convert a PostScript file to PDF format, Aspose.PDF for .NET offers PsLoadOptions class which is used to initialize the LoadOptions object. Later this object can be passed as an argument to Document object constructor, which will help PDF Rendering Engine to determine the format of source document.

Following code snippet can be used to convert a PostScript file into PDF format with Aspose.PDF for .NET:

// For complete examples and data files, please go to https://github.com/aspose-pdf/Aspose.PDF-for-.NET
// The path to the documents directory.
string _dataDir = RunExamples.GetDataDir_AsposePdf_DocumentConversion();
// Create a new instance of PsLoadOptions
PsLoadOptions options = new PsLoadOptions();
// Open .ps document with created load options
Document pdfDocument = new Document(_dataDir + "input.ps", options);
// Save document
pdfDocument.Save(dataDir + "PSToPDF.pdf");

Additionally, you can set a set of font folders that will be used during conversion:

public static void ConvertPostscriptToPDFAvdanced()
{
    PsLoadOptions options = new PsLoadOptions
    {
        FontsFolders = new [] { @"c:\tmp\fonts1", @"c:\tmp\fonts2"}
    };
    Document pdfDocument = new Document(_dataDir + "input.ps", options);
    pdfDocument.Save(_dataDir + "ps_test.pdf");
}

Convert XML to PDF

The XML format used to store structured data. There are several ways to convert XML to PDF in Aspose.PDF:

  1. Transform any XML data to HTML using XSLT and convert HTML to PDF as described below
  2. Generate XML document using Aspose.PDF XSD Schema
  3. Use XML document based on XSL-FO standard

Convert XSL-FO to PDF

The conversion of XSL-FO files to PDF can be implemented using the traditional Aspose.PDF technique - instantiate Document object with XslFoLoadOptions. But sometimes you can meet with the incorrect file structure. For this case, XSL-FO converter allows setting the error handling strategy. You can choose ThrowExceptionImmediately, TryIgnore or InvokeCustomHandler.

public static void Convert_XSLFO_to_PDF()
{
    // Instantiate XslFoLoadOption object
    var options = new XslFoLoadOptions(".\\samples\\employees.xslt");
    // Set error handling strategy
    options.ParsingErrorsHandlingType = XslFoLoadOptions.ParsingErrorsHandlingTypes.ThrowExceptionImmediately;
    // Create Document object
    var pdfDocument = new Aspose.Pdf.Document(".\\samples\\employees.xml", options);
    pdfDocument.Save(_dataDir + "data_xml.pdf");
}

Convert LaTeX/TeX to PDF

The LaTeX file format is a text file format with markup in the LaTeX derivative of the TeX family of languages and LaTeX is a derived format of the TeX system. LaTeX (ˈleɪtɛk/lay-tek or lah-tek) is a document preparation system and document markup language. It is widely used for the communication and publication of scientific documents in many fields, including mathematics, physics, and computer science. It also has a prominent role in the preparation and publication of books and articles that contain complex multilingual materials, such as Sanskrit and Arabic, including critical editions. LaTeX uses the TeX typesetting program for formatting its output, and is itself written in the TeX macro language.

Aspose.PDF for .NET supports the feature to convert TeX files to PDF format and in order to accomplish this requirement, Aspose.Pdf namespace has a class named LatexLoadOptions which provides the capabilities to load LaTex files and render the output in PDF format using Document class. The following code snippet shows the process of converting LaTex file to PDF format with C#.

public static void ConvertTeXtoPDF()
{
    // Instantiate Latex Load option object
    TeXLoadOptions options = new TeXLoadOptions();
    // Create Document object
    Aspose.Pdf.Document pdfDocument= new Aspose.Pdf.Document(_dataDir + "samplefile.tex", options);
    // Save the output in PDF file
    pdfDocument.Save(_dataDir + "TeXToPDF_out.pdf");
}