Document Features Supported on HTML Export
You can check the quality of HTML Export and view the results online at this link:
Note that not all Microsoft Word document features are available in HTML format and some features may be lost or converted to an image.
If you are looking for a way to easily store documents in a database then it is suggested to use the WordML or FlatOPC format. Both formats are fully XML-based making them easy to store into a database but they are native word formats which allows you to preserve full fidelity of Microsoft Word features such as WordArt, Textboxex etc.
Aspose.Words saves any loaded document that to valid HTML 4.0 or XHTML 1.0 specifications. EPUB documents are exported as EPUB 2.0. There are plans to support HTML 5 and EPUB 3.0 specification as well. There are also numerous save options available to control a document is exported to HTML. Here some examples of what you can do:
Control the CSS style sheet type
- Specify the directory or streams where images should be saved to.
- Specify where how the URL for an image is constructed.
- Split the internal HTML files when saving to HTML or EPUB to restrict HTML part size to less than 300kb. Some eReaders open EPUB files that have HTML files greater than this size slowly or not at all. Therefore it is recommended to export EPUBs using this option to allow all devices to read the file easily and correctly.
- Export images as embedded Base64.
- Export font size in relative units (em).
- Save fonts with the HTML output.
Some features which are unsupported in HTML are exported as an image. It is Aspose.Words rendering engine takes care of rendering the feature to the image. In such cases, the level of support for this rendered feature can be found under the “Save to Image Format” supported features section.
You can also choose to create your own HTML writer for your own custom needs by building off the Aspose.Words rich DOM. Using the DocumentVisitor you can visit each node and build the HTML node by node.
Currently, most of the special Microsoft “Mso” attributes, which are normally added by Microsoft Word to HTML output to make it round-trip capable back to Word formats, are not written during export to HTML or MHTML. This makes the HTML produced by Aspose.Words much cleaner than the output produced by Microsoft Word which is often bloated with these many round-trip based attributes.
In the future, we will add full support for these in import and allow an option to export control if these attributes are written at all during export.
See the following links in the documentation for further information:
|Built-In Properties||Yes||Built-in properties such as word and character count are updated using Aspose.Words but are not updated automatically on save.
Instead, you need to explicitly update these properties using the appropriate Document member. We will add an automatic update of these properties in a future version.
There is a save option that controls whether document properties are exported or not.
Title, Keywords, Description properties are always exported as title and meta tags to HTML or MHTML and as the appropriate Dublin Core tags when saving as EPUB.
Additional built-in properties are exported as custom <o:> tags. In EPUB format properties are also exported as Dublin Core tags.
|Custom Properties||Yes||Custom properties are exported as custom <o:> tags to HTML.||- HtmlSaveOptions.ExportDocumentProperties|
|Custom Payload Part||N/A|
|Custom XML Data Storage||N/A|
|Embedded Package||N/A||Exported as a plain image.|
|Glossary Document/Quick Parts/Auto Text||N/A|
|Hyphenation||Planned||Paragraphs are exported as normal.|
|Key Map Customizations||N/A|
|Mail Merge Recipient Data||N/A|
|Office Math||Planned||It is planned to export Office Math as an image to formats that do not have native support for it.|
|Themes||Yes||Theme formatting is exported as direct formatting to HTML.
Only some theme formatting such as fonts are supported.
|VBA Project (Macro)||N/A||Macros are not exported to HTML based formats.|
|VBA Project Digital Signature||N/A|
|Background||Yes||Only solid background is exported. Exported as style=“background:xxx” on each <body> tag.
There are plans to export background shape as style-background.
|Thumbnail||Yes||You can include a cover image on output EPUB documents either by importing an existing image or by generating a thumbnail of one of the document page’s using Aspose.Words||- InBuiltDocumentProperties.Thumbnail|
|Embedding Fonts||Yes||There is an option to subset and export font resources to EPUB, MHTML and HTML.
Fonts that are embedded in the original DOCX can be optionally exported.
|Embed Only Non-Standard Fonts||N/A|
|Bibliography||Yes||Bibliography text is saved to HTML formats as normal text.|
|Sources/Citations||Yes||Bibliography sources are not saved to HTML.|
|Allow Only Comments||N/A|
|Allow Only Form Fields||N/A|
|Allow Only Revisions||N/A|
|Limit Formatting to Selection of Styles||N/A|
|Protection Password (Legacy)||N/A|
|Protection Password (OOXML)||N/A|
Only some settings can be exported.
|Asian Typography Settings||N/A|
|Mail Merge Settings||N/A|