Translate Markdown to Document Object Model (DOM)
To programmatically read, manipulate, and modify the content and formatting of a document, you need to translate it to the Aspose.Words Document Object Model (DOM).
In contrast to Word documents, Markdown does not conform to the DOM described in the “Aspose.Words Document Object Model (DOM)" article. However, Aspose.Words provides its own mechanism for translating Markdown documents to DOM and back, so that we can successfully work with their elements such as text formatting, tables, headers, and others.
This article explains how the various markdown features can be translated into Aspose.Words DOM and back to Markdown format.
Complexity of Translation Markdown – DOM – Markdown
The main difficulty of this mechanism is not only to translate Markdown to DOM, but also to do the reverse transformation – to save the document back to Markdown format with minimal loss. There are elements, such as multilevel quotes, for which the reverse transformation is not trivial.
Our translation engine allows users not only to work with complex elements in an existing Markdown document, but also to create their own document in Markdown format with the original structure from scratch. To create various elements, you need to use styles with specific names according to certain rules described later in this article. Such styles can be created programmatically.
Common Translation Principles
We use Font formatting for inline blocks. When there is no direct correspondence for a Markdown feature in Aspose.Words DOM, we use a character style with a name that starts from some special words.
For container blocks, we use style inheritance to denote nested Markdown features. In this case, even when there are no nested features, we also use paragraph styles with a name that starts from some special words.
Bulleted and ordered lists are container blocks in Markdown as well. Their nesting is represented in DOM the same way as for all other container blocks using style inheritance. However, additionally, lists in DOM have corresponded number formatting in either list style or paragraph formatting.
We use Font formatting when translating Bold, Italic or
Strikethrough inline markdown features.
We use a character style with a name that starts from the word
InlineCode, followed by an optional dot
(.) and a number of backticks
(`) for the InlineCode feature. If a number of backticks is missed, then one backtick will be used by default.
||The FieldHyperlink class.|
||The FieldHyperlink class.|
||The Shape class.|
A document is a sequence of container blocks such as headings, paragraphs, lists, quotes, and others. Container blocks can be divided into 2 classes: Leaf blocks and Complex Containers. Leaf blocks can only contain inline content. Complex containers, in turn, can contain other container blocks, including Leaf blocks.
The table below shows examples of using Markdown Leaf blocks in Aspose.Words:
||This is a simple paragraph with a corresponding HorizontalRule shape:
This is translated into a built-in style and should be exactly of the specified pattern (no suffixes or prefixes are allowed).
Otherwise, it will be just a regular paragraph with a corresponding style.
If (N >= 2), then
Any suffix is allowed, but Aspose.Words importer uses numbers “1” and “2” respectively.
The table below shows examples of using Markdown Complex Containers in Aspose.Words:
The suffix in style name is optional, but Aspose.Words importer uses the ordered numbers 1, 2, 3, …. in case of nested quotes.
The nesting is defined via the inherited styles.
||Bulleted lists are represented using paragraph numbering:
There can be 3 types of bulleted lists. They are only diff in a numbering format of the very first level. These are:
||Ordered lists are represented using paragraph numbering:
There can be 2 number format markers: ‘.’ and ‘)’. The default marker is ‘.’.
Aspose.Words also allows to translate tables into DOM, as shown below:
||Table, Row and Cell classes.|