Convert HTML to DOCX | C#
A DOCX file is a Microsoft Word document that typically contains the text but can contain a wide range of data, including tables, raster and vector graphics, video, sounds and diagrams. The DOCX file is highly editable, easy to use and manageable in size. This format is popular because of the variety of options it offers users to write any type of documents. This file format is one of the most widely used and is available through numerous programs.
Using Converter.ConvertHTML methods is the most common way to convert HTML code into various formats. With Aspose.HTML, you can convert HTML to DOCX format programmatically with full control over a wide range of conversion parameters. In this article, you find information on how to convert HTML to DOCX using ConvertHTML() methods of the Converter class, and how to apply DocSaveOptions and ICreateStreamProvider parameters.
Online HTML Converter
You can check the Aspose.HTML API functionality and convert HTML in real-time. Please load HTML from the local file system, select the output format and run the example. In the example, the save options are set by default. You will immediately receive the result as a separate file.
If you want to convert HTML to DOCX programmatically, please see the following C# code examples.
HTML to DOCX by a single line of code
The static methods of the Converter class are primarily used as the easiest way to convert an HTML code into various formats. You can convert HTML to DOCX in your C# application literally with a single line of code!
1using System.IO;
2using Aspose.Html.Converters;
3using Aspose.Html.Saving;
4...
5 // Invoke the ConvertHTML() method to convert HTML to DOCX
6 Converter.ConvertHTML(@"<h1>Convert HTML to DOCX!</h1>", ".", new DocSaveOptions(), Path.Combine(OutputDir, "convert-with-single-line.docx"));
Convert HTML to DOCX
Converting a file to another format using the ConvertHTML() method is a sequence of operations among which document loading and saving:
- Load an HTML file using the HTMLDocument class.
- Create a new DocSaveOptions object.
- Use the ConvertHTML() method of the Converter class to save HTML as a DOCX file. You need to pass the HTMLDocument, DocSaveOptions, and output file path to the ConvertHTML() method to convert HTML to DOCX.
Please take a look over the following C# code snippet which shows the process of converting HTML to DOCX using Aspose.HTML for .NET.
1using System.IO;
2using Aspose.Html;
3using Aspose.Html.Converters;
4using Aspose.Html.Saving;
5...
6 // Prepare a path to a source HTML file
7 string documentPath = Path.Combine(DataDir, "canvas.html");
8
9 // Prepare a path for converted file saving
10 string savePath = Path.Combine(OutputDir, "canvas-output.docx");
11
12 // Initialize an HTML document from the file
13 using var document = new HTMLDocument(documentPath);
14
15 // Initialize DocSaveOptions
16 var options = new DocSaveOptions();
17
18 // Convert HTML to DOCX
19 Converter.ConvertHTML(document, options, savePath);
You can download the complete examples and data files from GitHub.
Save Options
Aspose.HTML allows converting HTML to DOCX using default or custom save options. DocSaveOptions usage enables you to customize the rendering process; you can specify the page size, margins, resolutions, CSS, etc.
Property | Description |
---|---|
FontEmbeddingRule | This property gets or sets the font embedding rule. Available values are Full and None. The default value is None. |
Css | Gets a CssOptions object which is used for configuration of CSS properties processing. |
DocumentFormat | This property gets or sets the file format of the output document. The default value is DOCX. |
PageSetup | This property gets a page setup object and uses it for configuration output page-set. |
HorizontalResolution | Sets horizontal resolution for output images in pixels per inch. The default value is 300 dpi. |
VerticalResolution | Sets vertical resolution for output images in pixels per inch. The default value is 300 dpi. |
To learn more about DocSaveOptions, please read the Fine-Tuning Converters article.
Convert HTML to DOCX using DocSaveOptions
To convert HTML to DOCX with DocSaveOptions specifying, you should follow a few steps:
- Load an HTML file using one of the HTMLDocument() constructors of the HTMLDocument class.
- Create a new DocSaveOptions object.
- Use the ConvertHTML() method of the Converter class to save HTML as a DOCX file. You need to pass the HTMLDocument, DocSaveOptions, and output file path to the ConvertHTML() method to convert HTML to DOCX.
The following example shows how to use DocSaveOptions and create a DOCX file with custom save options:
1using System.IO;
2using Aspose.Html;
3using Aspose.Html.Converters;
4using Aspose.Html.Saving;
5using Aspose.Html.Drawing;
6...
7 string documentPath = Path.Combine(OutputDir, "save-options.html");
8 string savePath = Path.Combine(OutputDir, "save-options-output.docx");
9
10 // Prepare HTML code and save it to a file
11 var code = "<h1>DocSaveOptions Class</h1>\r\n" +
12 "<p>Using DocSaveOptions Class, you can programmatically apply a wide range of conversion parameters.</p>\r\n";
13
14 File.WriteAllText(documentPath, code);
15
16 // Initialize an HTML Document from the html file
17 using var document = new HTMLDocument(documentPath);
18
19 // Initialize DocSaveOptions. Set A5 as a page-size
20 var options = new DocSaveOptions();
21 options.PageSetup.AnyPage = new Page(new Aspose.Html.Drawing.Size(Length.FromInches(8.3f), Length.FromInches(5.8f)));
22
23 // Convert HTML to DOCX
24 Converter.ConvertHTML(document, options, savePath);
The
DocSaveOptions() constructor initializes an instance of the DocSaveOptions class that is passed to ConvertHTML() method. The ConvertHTML() method takes the document
, options
, output file path savePath
and performs the conversion operation. The DocSaveOptions class provides numerous properties that give you full control over a wide range of parameters and improve the process of converting HTML to DOCX format. In the above example, we use the PageSetup property that specifies the
page size of the DOCX document.
Output Stream Providers
If it is required to save files in the remote storage (e.g., cloud, database, etc.) you can implement ICreateStreamProvider interface to have manual control over the file creating process. This interface is designed as a callback object to create a stream at the beginning of the document/page (depending on the output format) and release the early created stream after rendering the document/page.
Aspose.HTML for .NET provides various types of output formats for rendering operations. Some of these formats produce a single output file (for instance PDF, XPS), others create multiple files (Image formats JPG, PNG, etc.).
The example below shows how to implement and use your own MemoryStreamProvider in the application:
1using System.IO;
2using System.Collections.Generic;
3...
4 class MemoryStreamProvider : Aspose.Html.IO.ICreateStreamProvider
5 {
6 // List of MemoryStream objects created during the document rendering
7 public List<MemoryStream> Streams { get; } = new List<MemoryStream>();
8
9 public Stream GetStream(string name, string extension)
10 {
11 // This method is called when only one output stream is required, for instance for XPS, PDF or TIFF formats.
12 MemoryStream result = new MemoryStream();
13 Streams.Add(result);
14 return result;
15 }
16
17 public Stream GetStream(string name, string extension, int page)
18 {
19 // This method is called when the creation of multiple output streams are required. For instance, during the rendering HTML to list of image files (JPG, PNG, etc.)
20 MemoryStream result = new MemoryStream();
21 Streams.Add(result);
22 return result;
23 }
24
25 public void ReleaseStream(Stream stream)
26 {
27 // Here you can release the stream filled with data and, for instance, flush it to the hard-drive
28 }
29
30 public void Dispose()
31 {
32 // Releasing resources
33 foreach (var stream in Streams)
34 stream.Dispose();
35 }
36 }
1using System.IO;
2using Aspose.Html;
3using System.Linq;
4using Aspose.Html.Converters;
5using Aspose.Html.Saving;
6...
7 // Create an instance of MemoryStreamProvider
8 using var streamProvider = new MemoryStreamProvider();
9
10 // Initialize an HTML document
11 using var document = new HTMLDocument(@"<h1>Convert HTML to DOCX File Format!</h1>", ".");
12
13 // Convert HTML to DOCX using the MemoryStreamProvider
14 Converter.ConvertHTML(document, new DocSaveOptions(), streamProvider);
15
16 // Get access to the memory stream that contains the result data
17 var memory = streamProvider.Streams.First();
18 memory.Seek(0, SeekOrigin.Begin);
19
20 // Flush the result data to the output file
21 using (FileStream fs = File.Create(Path.Combine(OutputDir, "stream-provider.docx")))
22 {
23 memory.CopyTo(fs);
24 }
Download our Aspose.HTML for .NET library allows you to successfully, quickly, and easily convert your HTML, MHTML, EPUB, SVG, and Markdown documents to the most popular formats.
Aspose.HTML offers a free online HTML to DOCX Converter that converts HTML to DOCX with high quality, easy and fast. Just upload, convert your files and get results in a few seconds!