将Markdown转换为文档对象模型(DOM)

要以编程方式读取、操作和修改文档的内容和格式,您需要将其转换为Aspose.Words文档对象模型(DOM)。

与Word文档相反,Markdown不符合 Aspose.Words文档对象模型(DOM) 文章。 但是,Aspose.Words提供了自己的机制,用于将Markdown文档转换为DOM并返回,以便我们可以成功地处理它们的元素,如文本格式,表格,标题等。

本文解释了如何将各种markdown特征转换为Aspose.WordsDOM并返回到Markdown格式。

翻译的复杂性Markdown – DOM – Markdown

这种机制的主要困难不仅是将Markdown转换为DOM,还要进行反向转换-以最小的损失将文档保存回Markdown格式。 有些元素,例如多级引号,对于这些元素,反向转换并不是微不足道的。

我们的翻译引擎不仅允许用户处理现有Markdown文档中的复杂元素,还允许用户从头开始创建具有原始结构的Markdown格式的文档。 要创建各种元素,您需要根据本文后面介绍的某些规则使用具有特定名称的样式。 可以通过编程方式创建此类样式。

常见翻译原则

我们对内联块使用Font格式。 当Aspose.WordsDOM中的Markdown特征没有直接对应时,我们使用一个字符样式,其名称从一些特殊单词开始。

对于容器块,我们使用样式继承来表示嵌套的Markdown特性。 在这种情况下,即使没有嵌套功能,我们也使用具有从一些特殊单词开始的名称的段落样式。

项目符号列表和有序列表也是Markdown中的容器块。 它们的嵌套在DOM中表示的方式与使用样式继承的所有其他容器块相同。 但是,此外,DOM中的列表在列表样式或段落格式中具有对应的数字格式。

内联块

我们在翻译BoldItalicStrikethrough内联markdown特征时使用Font格式。

Markdown特征 Aspose.Words
Bold
**bold text**
get_Font()->set_Bold(true)
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto builder = System::MakeObject<DocumentBuilder>();
// Make the text Bold.
builder->get_Font()->set_Bold(true);
builder->Writeln(u"This text will be Bold");
Italic
*italic text*
get_Font()->set_Italic(true)
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto builder = System::MakeObject<DocumentBuilder>();
// Make the text Italic.
builder->get_Font()->set_Italic(true);
builder->Writeln(u"This text will be Italic");
Strikethrough
~Strikethrough text~
get_Font()->set_StrikeThrough(true)
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto builder = System::MakeObject<DocumentBuilder>();
// Make the text Strikethrough.
builder->get_Font()->set_StrikeThrough(true);
builder->Writeln(u"This text will be Strikethrough");

我们使用一个字符样式,其名称从单词InlineCode开始,后跟一个可选的点(.)和一些反引号(`)作为InlineCode特性。 如果错过了许多反引号,则默认情况下将使用一个反引号。

Markdown特征 Aspose.Words
InlineCode
**inline code**
get_Font()->set_StyleName(u"InlineCode[.][N]")
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto builder = System::MakeObject<DocumentBuilder>();
// Number of backticks is missed, one backtick will be used by default.
auto inlineCode1BackTicks = builder->get_Document()->get_Styles->Add(StyleType::Character, u"InlineCode");
builder->get_Font()->set_Style(inlineCode1BackTicks);
builder->Writeln(u"Text with InlineCode style with 1 backtick");
// There will be 3 backticks.
auto inlineCode3BackTicks = builder->get_Document()->get_Styles->Add(StyleType::Character, u"InlineCode.3");
builder->get_Font()->set_Style(inlineCode3BackTicks);
builder->Writeln(u"Text with InlineCode style with 3 backtick");
Autolink
<scheme://domain.com>
<email@domain.com>
FieldHyperlink类。
Link
[link text](url)
[link text](<url>"title")
[link text](url 'title')
[link text](url (title))
FieldHyperlink类。
Image
![](url)
![alt text](<url>"title")
![alt text](url ‘title’)
![alt text](url (title))
Shape类。
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto doc = System::MakeObject<Document>();
auto builder = System::MakeObject<DocumentBuilder>(doc);
// Insert image.
auto shape = System::MakeObject<Shape>(doc, ShapeType::Image);
shape->set_WrapType(WrapType::Inline);
shape->get_ImageData()->set_SourceFullName(u"/attachment/1456/pic001.png");
shape->get_ImageData()->set_Title(u"title");
builder->InsertNode(shape);

货柜大厦

文档是一系列容器块,如标题、段落、列表、引号等。 容器块可分为2类:叶块和复杂容器。 叶块只能包含内联内容。 复杂的容器,反过来,可以包含其他容器块,包括叶块。

叶块

下表显示了在Aspose.Words中使用Markdown叶块的示例:

Markdown特征 Aspose.Words
HorizontalRule
-----
这是一个简单的段落,具有相应的HorizontalRule形状:
DocumentBuilder::InsertHorizontalRule()
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto builder = System::MakeObject<DocumentBuilder>();
// Insert horizontal rule.
builder->InsertHorizontalRule();
ATX Heading
# H1, ## H2, ### H3…
get_ParagraphFormat()->set_StyleName(u"Heading N"),其中(1<=N <= 9).
这被转换为内置样式,并且应该完全符合指定的模式(不允许使用后缀或前缀)。
否则,它将只是一个具有相应样式的常规段落。
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto builder = System::MakeObject<DocumentBuilder>();
// By default Heading styles in Word may have Bold and Italic formatting.
// If we do not want to be emphasized, set these properties explicitly to false.
builder->get_Font()->set_Bold(false);
builder->get_Font()->set_Italic(false);
builder->get_ParagraphFormat()->set_StyleName(u"Heading 1");
builder->Writeln(u"This is an H1 tag");
Setext Heading
=== (if Heading level 1),
--- (if Heading level 2)
get_ParagraphFormat->set_StyleName(u"SetextHeading[some suffix]"),基于"Heading N"样式。
如果(N>=2),则将使用"Heading 2",否则使用"Heading 1"
允许任何后缀,但Aspose.Words导入器分别使用数字"1"和"2"。
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto doc = System::MakeObject<Document>();
auto builder = System::MakeObject<DocumentBuilder>(doc);
builder->get_ParagraphFormat()->set_StyleName(u"Heading 1");
builder->Writeln(u"This is an H1 tag");
// Reset styles from the previous paragraph to not combine styles between paragraphs.
builder->get_Font()->set_Bold(false);
builder->get_Font()->set_Italic(false);
auto setexHeading1 = builder->get_Document()->get_Styles->Add(StyleType::Paragraph, u"SetextHeading1");
builder->get_ParagraphFormat()->set_Style(setexHeading1);
doc->get_Styles()->idx_get(u"SetextHeading1")->set_BaseStyleName(u"Heading 1");
builder->Writeln(u"Setext Heading level 1");
builder->get_ParagraphFormat()->set_Style(doc->get_Styles()->idx_get(u"Heading 3"));
builder->Writeln(u"This is an H3 tag");
// Reset styles from the previous paragraph to not combine styles between paragraphs.
builder->get_Font()->set_Bold(false);
builder->get_Font()->set_Italic(false);
auto setexHeading2 = builder->get_Document()->get_Styles->Add(StyleType::Paragraph, u"SetextHeading2");
builder->get_ParagraphFormat()->set_Style(setexHeading2);
doc->get_Styles()->idx_get(u"SetextHeading2")->set_BaseStyleName(u"Heading 3");
// Setex heading level will be reset to 2 if the base paragraph has a Heading level greater than 2.
builder->Writeln(u"Setext Heading level 2");
Indented Code get_ParagraphFormat->set_StyleName(u"IndentedCode[some suffix]")
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto doc = System::MakeObject<Document>();
auto builder = System::MakeObject<DocumentBuilder>(doc);
auto indentedCode = builder->get_Document()->get_Styles->Add(StyleType::Paragraph, u"IndentedCode");
builder->get_ParagraphFormat()->set_StyleName(indentedCode);
builder->Writeln(u"This is an indented code");
Fenced Code
``` c#
if ()
then
else
```
get_ParagraphFormat()->set_StyleName(u"FencedCode[.][info string]")
[.][info string]是可选的。
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto doc = System::MakeObject<Document>();
auto builder = System::MakeObject<DocumentBuilder>(doc);
auto fencedCode = builder->get_Document()->get_Styles->Add(StyleType::Paragraph, u"FencedCode");
builder->get_ParagraphFormat()->set_StyleName(fencedCode);
builder->Writeln(u"This is an fenced code");
auto fencedCodeWithInfo = builder->get_Document()->get_Styles->Add(StyleType::Paragraph, u"FencedCode.C#");
builder->get_ParagraphFormat()->set_StyleName(fencedCodeWithInfo);
builder->Writeln(u"This is a fenced code with info string");

复杂容器

下表显示了在Aspose.Words中使用Markdown复杂容器的示例:

Markdown特征 Aspose.Words
Quote
> quote,
>> nested quote
get_ParagraphFormat()->set_StyleName(u"Quote[some suffix]")
样式名称中的后缀是可选的,但Aspose.Words导入器使用有序数字1, 2, 3, …. 在嵌套引号的情况下。
嵌套是通过继承的样式定义的。
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto doc = System::MakeObject<Document>();
auto builder = System::MakeObject<DocumentBuilder>(doc);
// By default a document stores blockquote style for the first level.
builder->get_ParagraphFormat()->set_StyleName(u"Quote");
builder->Writeln(u"Blockquote");
// Create styles for nested levels through style inheritance.
auto quoteLevel2 = builder->get_Document()->get_Styles->Add(StyleType::Paragraph, u"Quote1");
builder->get_ParagraphFormat()->set_StyleName(quoteLevel2);
builder->get_Document()->get_Styles->idx_get(u"Quote1")->set_BaseStyleName(u"Quote");
builder->Writeln(u"1. Nested blockquote");
BulletedList
- Item 1
- Item 2
- Item 2a
- Item 2b
项目符号列表使用段落编号表示:
get_ListFormat()->ApplyBulletDefault()
可以有3类型的项目符号列表。 它们只是第一级编号格式的差异。 它们分别是:‘-’‘+’‘*’
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto builder = System::MakeObject<DocumentBuilder>();
builder->get_ListFormat()->ApplyBulletDefault();
builder->get_ListFormat()->get_List()->get_ListLevels()->idx_get(0)->set_NumberFormat(u"-");
builder->Writeln(u"Item 1");
builder->Writeln(u"Item 2");
builder->get_ListFormat()->ListIndent();
builder->Writeln(u"Item 2a");
builder->Writeln(u"Item 2b");
OrderedList
1. Item 1
2. Item 2
1) Item 2a
2) Item 2b
有序列表使用段落编号表示:
get_ListFormat()->ApplyNumberDefault()
可以有2数字格式标记:‘.’和‘)’。 默认标记为‘.’。
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto builder = System::MakeObject<DocumentBuilder>();
builder->get_ListFormat()->ApplyBulletDefault();
builder->get_ListFormat()->get_List()->get_ListLevels()->idx_get(0)->set_NumberFormat(System::String::Format(u"{0}.", (char16_t)0));
builder->get_ListFormat()->get_List()->get_ListLevels()->idx_get(1)->set_NumberFormat(System::String::Format(u"{0}.", (char16_t)1));
builder->Writeln(u"Item 1");
builder->Writeln(u"Item 2");
builder->get_ListFormat()->ListIndent();
builder->Writeln(u"Item 2a");
builder->Writeln(u"Item 2b");

表格

Aspose.Words还允许将表格转换为DOM,如下所示:

Markdown特征 Aspose.Words
Table
a|b
-|-
c|d
TableRowCell类。
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Use a document builder to add content to the document.
auto builder = System::MakeObject<DocumentBuilder>();
// Add the first row.
builder->InsertCell();
builder->Writeln(u"a");
builder->InsertCell();
builder->Writeln(u"b");
// Add the second row.
builder->InsertCell();
builder->Writeln(u"c");
builder->InsertCell();
builder->Writeln(u"d");

请参阅