Convert HTML to DOCX in C#

A DOCX file is a Microsoft Word document that typically contains the text but can contain a wide range of data, including tables, raster and vector graphics, video, sounds and diagrams. The DOCX file is highly editable, easy to use and manageable in size. This format is popular because of the variety of options it offers users to write any type of documents. This file format is one of the most widely used and is available through numerous programs.

Using Converter.ConvertHTML methods is the most common way to convert HTML code into various formats. With Aspose.HTML for .NET, you can convert HTML to DOCX format programmatically with full control over a wide range of conversion parameters. In this article, you find information on how to convert HTML to DOCX using ConvertHTML() methods of the Converter class, and how to apply DocSaveOptions and ICreateStreamProvider parameters.

Online HTML Converter

You can check the Aspose.HTML for .NET API functionality and convert HTML in real-time. Please load HTML from the local file system, select the output format and run the example. In the example, the save options are set by default. You will immediately receive the result as a separate file.

                
            

If you want to convert HTML to DOCX programmatically, please see the following C# code examples.

HTML to DOCX by a single line of code

The static methods of the Converter class are primarily used as the easiest way to convert an HTML code into various formats. You can convert HTML to DOCX in your C# application literally with a single line of code!

1// Invoke the ConvertHTML() method to convert HTML to DOCX
2Converter.ConvertHTML(@"<h1>Convert HTML to DOCX!</h1>", ".", new DocSaveOptions(), Path.Combine(OutputDir, "convert-with-single-line.docx"));

Convert HTML to DOCX

Converting a file to another format using the ConvertHTML() method is a sequence of operations among which document loading and saving:

  1. Load an HTML file using the HTMLDocument class.
  2. Create a new DocSaveOptions object.
  3. Use the ConvertHTML() method of the Converter class to save HTML as a DOCX file. You need to pass the HTMLDocument, DocSaveOptions, and output file path to the ConvertHTML() method to convert HTML to DOCX.

Please take a look over the following C# code snippet which shows the process of converting HTML to DOCX using Aspose.HTML for .NET.

 1// Prepare a path to a source HTML file
 2string documentPath = Path.Combine(DataDir, "canvas.html");
 3
 4// Prepare a path to save the converted file 
 5string savePath = Path.Combine(OutputDir, "canvas-output.docx");
 6
 7// Initialize an HTML document from the file
 8using var document = new HTMLDocument(documentPath);
 9
10// Initialize DocSaveOptions 
11var options = new DocSaveOptions();
12
13// Convert HTML to DOCX
14Converter.ConvertHTML(document, options, savePath);

Save Options

Aspose.HTML allows converting HTML to DOCX using default or custom save options. DocSaveOptions usage enables you to customize the rendering process; you can specify the page size, margins, resolutions, CSS, etc.

PropertyDescription
FontEmbeddingRuleThis property gets or sets the font embedding rule. Available values are Full and None. The default value is None.
CssGets a CssOptions object which is used for configuration of CSS properties processing.
DocumentFormatThis property gets or sets the file format of the output document. The default value is DOCX.
PageSetupThis property gets a page setup object and uses it for configuration output page-set.
HorizontalResolutionSets horizontal resolution for output images in pixels per inch. The default value is 300 dpi.
VerticalResolutionSets vertical resolution for output images in pixels per inch. The default value is 300 dpi.

To learn more about DocSaveOptions, please read the Fine-Tuning Converters article.

Convert HTML to DOCX using DocSaveOptions

To convert HTML to DOCX with DocSaveOptions specifying, you should follow a few steps:

  1. Load an HTML file using one of the HTMLDocument() constructors of the HTMLDocument class.
  2. Create a new DocSaveOptions object.
  3. Use the ConvertHTML() method of the Converter class to save HTML as a DOCX file. You need to pass the HTMLDocument, DocSaveOptions, and output file path to the ConvertHTML() method to convert HTML to DOCX.

The following example shows how to use DocSaveOptions and create a DOCX file with custom save options:

 1string documentPath = Path.Combine(OutputDir, "save-options.html");
 2string savePath = Path.Combine(OutputDir, "save-options-output.docx");
 3
 4// Prepare HTML code and save it to a file
 5var code = "<h1>DocSaveOptions Class</h1>\r\n" +
 6           "<p>Using DocSaveOptions Class, you can programmatically apply a wide range of conversion parameters.</p>\r\n";
 7
 8File.WriteAllText(documentPath, code);
 9
10// Initialize an HTML Document from the html file
11using var document = new HTMLDocument(documentPath);
12
13// Initialize DocSaveOptions. Set A5 as a page-size 
14var options = new DocSaveOptions();
15options.PageSetup.AnyPage = new Page(new Aspose.Html.Drawing.Size(Length.FromInches(8.3f), Length.FromInches(5.8f)));
16
17// Convert HTML to DOCX
18Converter.ConvertHTML(document, options, savePath);

The DocSaveOptions() constructor initializes an instance of the DocSaveOptions class that is passed to ConvertHTML() method. The ConvertHTML() method takes the document, options, output file path savePath and performs the conversion operation. The DocSaveOptions class provides numerous properties that give you full control over a wide range of parameters and improve the process of converting HTML to DOCX format. In the above example, we use the PageSetup property that specifies the page size of the DOCX document.

Output Stream Providers

If it is required to save files in the remote storage (e.g., cloud, database, etc.) you can implement ICreateStreamProvider interface to have manual control over the file creating process. This interface is designed as a callback object to create a stream at the beginning of the document/page (depending on the output format) and release the early created stream after rendering the document/page.

Aspose.HTML for .NET provides various types of output formats for rendering operations. Some of these formats produce a single output file (for instance PDF, XPS), others create multiple files (Image formats JPG, PNG, etc.).

The example below shows how to implement and use your own MemoryStreamProvider in the application:

 1class MemoryStreamProvider : Aspose.Html.IO.ICreateStreamProvider
 2{
 3    // List of MemoryStream objects created during the document rendering
 4    public List<MemoryStream> Streams { get; } = new List<MemoryStream>();
 5
 6    public Stream GetStream(string name, string extension)
 7    {
 8        // This method is called when only one output stream is required, for instance for XPS, PDF or TIFF formats
 9        MemoryStream result = new MemoryStream();
10        Streams.Add(result);
11        return result;
12    }
13
14    public Stream GetStream(string name, string extension, int page)
15    {
16        // This method is called when the creation of multiple output streams are required. For instance, during the rendering HTML to list of image files (JPG, PNG, etc.)
17        MemoryStream result = new MemoryStream();
18        Streams.Add(result);
19        return result;
20    }
21
22    public void ReleaseStream(Stream stream)
23    {
24        // Here you can release the stream filled with data and, for instance, flush it to the hard-drive
25    }
26
27    public void Dispose()
28    {
29        // Releasing resources
30        foreach (var stream in Streams)
31            stream.Dispose();
32    }
33}

The following code snippet demonstrates how to convert an HTML file to a DOCX file using a memory stream.

 1// Create an instance of MemoryStreamProvider
 2using var streamProvider = new MemoryStreamProvider();
 3
 4// Initialize an HTML document
 5using var document = new HTMLDocument(@"<h1>Convert HTML to DOCX File Format!</h1>", ".");
 6
 7// Convert HTML to DOCX using the MemoryStreamProvider
 8Converter.ConvertHTML(document, new DocSaveOptions(), streamProvider);
 9
10// Get access to the memory stream that contains the result data
11var memory = streamProvider.Streams.First();
12memory.Seek(0, SeekOrigin.Begin);
13
14// Flush the result data to the output file
15using (FileStream fs = File.Create(Path.Combine(OutputDir, "stream-provider.docx")))
16{
17    memory.CopyTo(fs);
18}

Download the Aspose.HTML for .NET library, which allows you to successfully, quickly, and easily convert your HTML, MHTML, EPUB, SVG, and Markdown documents to the most popular formats.

Aspose.HTML offers a free online HTML to DOCX Converter that converts HTML to DOCX with high quality, easy and fast. Just upload, convert your files and get results in a few seconds!

Text “HTML to DOCX Converter”

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.