Translate Markdown to Document Object Model (DOM)

To programmatically read, manipulate, and modify the content and formatting of a document, you need to translate it to the Aspose.Words Document Object Model (DOM).

In contrast to Word documents, Markdown does not conform to the DOM described in the Aspose.Words Document Object Model (DOM) article. However, Aspose.Words provides its own mechanism for translating Markdown documents to DOM and back, so that we can successfully work with their elements such as text formatting, tables, headers, and others.

This article explains how the various markdown features can be translated into Aspose.Words DOM and back to Markdown format.

Complexity of Translation Markdown – DOM – Markdown

The main difficulty of this mechanism is not only to translate Markdown to DOM, but also to do the reverse transformation – to save the document back to Markdown format with minimal loss. There are elements, such as multilevel quotes, for which the reverse transformation is not trivial.

Our translation engine allows users not only to work with complex elements in an existing Markdown document, but also to create their own document in Markdown format with the original structure from scratch. To create various elements, you need to use styles with specific names according to certain rules described later in this article. Such styles can be created programmatically.

Common Translation Principles

We use Font formatting for inline blocks. When there is no direct correspondence for a Markdown feature in Aspose.Words DOM, we use a character style with a name that starts from some special words.

For container blocks, we use style inheritance to denote nested Markdown features. In this case, even when there are no nested features, we also use paragraph styles with a name that starts from some special words.

Bulleted and ordered lists are container blocks in Markdown as well. Their nesting is represented in DOM the same way as for all other container blocks using style inheritance. However, additionally, lists in DOM have corresponded number formatting in either list style or paragraph formatting.

Inline Blocks

We use Font formatting when translating Bold, Italic or Strikethrough inline markdown features.

Markdown feature Aspose.Words
Bold
**bold text**
get_Font()->set_Bold(true)
Italic
*italic text*
get_Font()->set_Italic(true)
Strikethrough
~Strikethrough text~
get_Font()->set_StrikeThrough(true)

We use a character style with a name that starts from the word InlineCode, followed by an optional dot (.) and a number of backticks (`) for the InlineCode feature. If a number of backticks is missed, then one backtick will be used by default.

Markdown feature Aspose.Words
InlineCode
**inline code**
get_Font()->set_StyleName(u"InlineCode[.][N]")
Autolink
<scheme://domain.com>
<email@domain.com>
The FieldHyperlink class.
Link
[link text](url)
[link text](<url> "title")
[link text](url 'title')
[link text](url (title))
The FieldHyperlink class.
Image
![](url)
![alt text](<url> “title”)
![alt text](url ‘title’)
![alt text](url (title))
The Shape class.

Container Blocks

A document is a sequence of container blocks such as headings, paragraphs, lists, quotes, and others. Container blocks can be divided into 2 classes: Leaf blocks and Complex Containers. Leaf blocks can only contain inline content. Complex containers, in turn, can contain other container blocks, including Leaf blocks.

Leaf Blocks

The table below shows examples of using Markdown Leaf blocks in Aspose.Words:

Markdown feature Aspose.Words
HorizontalRule
-----
This is a simple paragraph with a corresponding HorizontalRule shape:
DocumentBuilder::InsertHorizontalRule()
ATX Heading
# H1, ## H2, ### H3…
get_ParagraphFormat()->set_StyleName(u"Heading N"), where (1<= N <= 9).
This is translated into a built-in style and should be exactly of the specified pattern (no suffixes or prefixes are allowed).
Otherwise, it will be just a regular paragraph with a corresponding style.
Setext Heading
=== (if Heading level 1),
--- (if Heading level 2)
get_ParagraphFormat->set_StyleName(u"SetextHeading[some suffix]"), based on ‘Heading N’ style.
If (N >= 2), then ‘Heading 2’ will be used, otherwise ‘Heading 1’.
Any suffix is allowed, but Aspose.Words importer uses numbers “1” and “2” respectively.
Indented Code get_ParagraphFormat->set_StyleName(u"IndentedCode[some suffix]")
Fenced Code
``` c#
if ()
then
else
```
get_ParagraphFormat()->set_StyleName(u"FencedCode[.][info string]")
The [.] and [info string] are optional.

Complex Containers

The table below shows examples of using Markdown Complex Containers in Aspose.Words:

Markdown feature Aspose.Words
Quote
> quote,
>> nested quote
get_ParagraphFormat()->set_StyleName(u"Quote[some suffix]")
The suffix in style name is optional, but Aspose.Words importer uses the ordered numbers 1, 2, 3, …. in case of nested quotes.
The nesting is defined via the inherited styles.
BulletedList
- Item 1
- Item 2
- Item 2a
- Item 2b
Bulleted lists are represented using paragraph numbering:
get_ListFormat()->ApplyBulletDefault()
There can be 3 types of bulleted lists. They are only diff in a numbering format of the very first level. These are: ‘-’, ‘+’ or ‘*’ respectively.
OrderedList
1. Item 1
2. Item 2
1) Item 2a
2) Item 2b
Ordered lists are represented using paragraph numbering:
get_ListFormat()->ApplyNumberDefault()
There can be 2 number format markers: ‘.’ and ‘)’. The default marker is ‘.’.

Tables

Aspose.Words also allows to translate tables into DOM, as shown below:

Markdown feature Aspose.Words
Table
a|b
-|-
c|d
Table, Row and Cell classes.

See Also