Translate Markdown to Document Object Model (DOM)
To programmatically read, manipulate, and modify the content and formatting of a document, you need to translate it to the Aspose.Words Document Object Model (DOM).
In contrast to Word documents, Markdown does not conform to the DOM described in the Aspose.Words Document Object Model (DOM) article. However, Aspose.Words provides its own mechanism for translating Markdown documents to DOM and back, so that we can successfully work with their elements such as text formatting, tables, headers, and others.
This article explains how the various markdown features can be translated into Aspose.Words DOM and back to Markdown format.
Complexity of Translation Markdown – DOM – Markdown
The main difficulty of this mechanism is not only to translate Markdown to DOM, but also to do the reverse transformation – to save the document back to Markdown format with minimal loss. There are elements, such as multilevel quotes, for which the reverse transformation is not trivial.
Our translation engine allows users not only to work with complex elements in an existing Markdown document, but also to create their own document in Markdown format with the original structure from scratch. To create various elements, you need to use styles with specific names according to certain rules described later in this article. Such styles can be created programmatically.
Common Translation Principles
We use Font formatting for inline blocks. When there is no direct correspondence for a Markdown feature in Aspose.Words DOM, we use a character style with a name that starts from some special words.
For container blocks, we use style inheritance to denote nested Markdown features. In this case, even when there are no nested features, we also use paragraph styles with a name that starts from some special words.
Bulleted and ordered lists are container blocks in Markdown as well. Their nesting is represented in DOM the same way as for all other container blocks using style inheritance. However, additionally, lists in DOM have corresponded number formatting in either list style or paragraph formatting.
Inline Blocks
We use Font formatting when translating Bold, Italic or Strikethrough inline markdown features.
Markdown feature | Aspose.Words | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bold**bold text** |
get_Font()->set_Bold(true) |
||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||
Italic*italic text* |
get_Font()->set_Italic(true) |
||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||
Strikethrough~Strikethrough text~ |
get_Font()->set_StrikeThrough(true) |
||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
We use a character style with a name that starts from the word InlineCode
, followed by an optional dot (.)
and a number of backticks (`)
for the InlineCode
feature. If a number of backticks is missed, then one backtick will be used by default.
Markdown feature | Aspose.Words | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
InlineCode**inline code** |
get_Font()->set_StyleName(u"InlineCode[.][N]") |
||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||||||||||||||
Autolink<scheme://domain.com> <email@domain.com> |
The FieldHyperlink class. | ||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||||||||||||||
Link[link text](url) [link text](<url> "title") [link text](url 'title') [link text](url (title)) |
The FieldHyperlink class. | ||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||||||||||||||
Image   ) |
The Shape class. | ||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Container Blocks
A document is a sequence of container blocks such as headings, paragraphs, lists, quotes, and others. Container blocks can be divided into 2 classes: Leaf blocks and Complex Containers. Leaf blocks can only contain inline content. Complex containers, in turn, can contain other container blocks, including Leaf blocks.
Leaf Blocks
The table below shows examples of using Markdown Leaf blocks in Aspose.Words:
Markdown feature | Aspose.Words | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HorizontalRule----- |
This is a simple paragraph with a corresponding HorizontalRule shape:DocumentBuilder::InsertHorizontalRule() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ATX Heading# H1, ## H2, ### H3… |
get_ParagraphFormat()->set_StyleName(u"Heading N") , where (1<= N <= 9).This is translated into a built-in style and should be exactly of the specified pattern (no suffixes or prefixes are allowed). Otherwise, it will be just a regular paragraph with a corresponding style. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Setext Heading=== (if Heading level 1),--- (if Heading level 2) |
get_ParagraphFormat->set_StyleName(u"SetextHeading[some suffix]") , based on ‘Heading N’ style.If (N >= 2), then ‘Heading 2’ will be used, otherwise ‘Heading 1’. Any suffix is allowed, but Aspose.Words importer uses numbers “1” and “2” respectively. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Indented Code | get_ParagraphFormat->set_StyleName(u"IndentedCode[some suffix]") |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fenced Code
|
get_ParagraphFormat()->set_StyleName(u"FencedCode[.][info string]") The [.] and [info string] are optional. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Complex Containers
The table below shows examples of using Markdown Complex Containers in Aspose.Words:
Markdown feature | Aspose.Words | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Quote> quote, >> nested quote |
get_ParagraphFormat()->set_StyleName(u"Quote[some suffix]") The suffix in style name is optional, but Aspose.Words importer uses the ordered numbers 1, 2, 3, …. in case of nested quotes. The nesting is defined via the inherited styles. |
||||||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||||||||||||||||||
BulletedList- Item 1 - Item 2 - Item 2a - Item 2b |
Bulleted lists are represented using paragraph numbering:get_ListFormat()->ApplyBulletDefault() There can be 3 types of bulleted lists. They are only diff in a numbering format of the very first level. These are: ‘-’ , ‘+’ or ‘*’ respectively. |
||||||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
|||||||||||||||||||||||||||||||
OrderedList1. Item 1 2. Item 2 1) Item 2a 2) Item 2b |
Ordered lists are represented using paragraph numbering:get_ListFormat()->ApplyNumberDefault() There can be 2 number format markers: ‘.’ and ‘)’. The default marker is ‘.’. |
||||||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Tables
Aspose.Words also allows to translate tables into DOM, as shown below:
Markdown feature | Aspose.Words | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Table a|b -|- c|d |
Table, Row and Cell classes. | ||||||||||||||||||||||||||||||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|