将Markdown转换为文档对象模型(DOM)

要以编程方式读取、操作和修改文档的内容和格式,您需要将其转换为Aspose.Words文档对象模型(DOM)。

与Word文件相反,Markdown不符合 Aspose.Words文档对象模型(DOM) 文章。 但是,Aspose.Words提供了自己的机制,用于将Markdown文档转换为DOM并返回,以便我们可以成功地处理它们的元素,如文本格式,表格,标题等。

本文解释了如何将各种markdown特征转换为Aspose.WordsDOM并返回到Markdown格式。

翻译的复杂性Markdown – DOM – Markdown

这种机制的主要困难不仅是将Markdown转换为DOM,还要进行反向转换-以最小的损失将文档保存回Markdown格式。 有一些元素,例如多级引号,对于这些元素,反向转换并不是微不足道的。

我们的翻译引擎不仅允许用户处理现有Markdown文档中的复杂元素,还允许用户从头开始创建具有原始结构的Markdown格式的文档。 要创建各种元素,您需要根据本文后面介绍的某些规则使用具有特定名称的样式。 可以通过编程方式创建此类样式。

常见翻译原则

我们对内联块使用Font格式。 当Aspose.WordsDOM中的Markdown特征没有直接对应时,我们使用一个字符样式,其名称从一些特殊单词开始。

对于容器块,我们使用样式继承来表示嵌套的Markdown特性。 在这种情况下,即使没有嵌套功能,我们也使用具有从一些特殊单词开始的名称的段落样式。

项目符号列表和有序列表也是Markdown中的容器块。 它们的嵌套在DOM中表示的方式与使用样式继承的所有其他容器块相同。 但是,此外,DOM中的列表在列表样式或段落格式中具有对应的数字格式。

内联块

我们在翻译BoldItalicStrikethrough内联markdown功能时使用Font格式。

Markdown特征 Aspose.Words
Bold
**bold text**
Font.Bold = true
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
// Make the text Bold.
builder.getFont().setBold(true);
builder.writeln("This text will be Bold");
Italic
*italic text*
Font.Italic = true
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
// Make the text Italic.
builder.getFont().setItalic(true);
builder.writeln("This text will be Italic");
Strikethrough
~Strikethrough text~
Font.StrikeThrough = true
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
// Make the text Strikethrough.
builder.getFont().setStrikeThrough(true);
builder.writeln("This text will be Strikethrough");

我们使用一个字符样式,其名称从单词InlineCode开始,后跟一个可选的点(.)和一些反引号(`)作为InlineCode特性。 如果错过了许多反引号,则默认情况下将使用一个反引号。

Markdown特征 Aspose.Words
InlineCode
**inline code**
Font.StyleName = "InlineCode[.][N]"
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
// Number of backticks is missed, one backtick will be used by default.
Style inlineCode1BackTicks = builder.getDocument().getStyles().add(StyleType.CHARACTER, "InlineCode");
builder.getFont().setStyle(inlineCode1BackTicks);
builder.writeln("Text with InlineCode style with 1 backtick");
// There will be 3 backticks.
Style inlineCode3BackTicks = builder.getDocument().getStyles().add(StyleType.CHARACTER, "InlineCode.3");
builder.getFont().setStyle(inlineCode3BackTicks);
builder.writeln("Text with InlineCode style with 3 backtick");
Autolink
<scheme://domain.com>
<email@domain.com>
FieldHyperlink类。
Link
[链接文本](url)
[链接文本](<url>"title")
[链接文本](url 'title')
[链接文本](url (title))
FieldHyperlink类。
Image
![](/words/java/translate-markdown-to-document-object-model/url)
![alt文本](/words/java/translate-markdown-to-document-object-model/<url>"title")
![alt文本](/words/java/translate-markdown-to-document-object-model/url ‘title’)
![alt文本](/words/java/translate-markdown-to-document-object-model/url (title))
Shape类。
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
// Insert image.
Shape shape = new Shape(builder.getDocument(), ShapeType.IMAGE);
shape.setWrapType(WrapType.INLINE);
shape.getImageData().setSourceFullName("/attachment/1456/pic001.png");
shape.getImageData().setTitle("title");
builder.insertNode(shape);

货柜大厦

文档是一系列容器块,如标题、段落、列表、引号等。 容器块可分为2类:叶块和复杂容器。 叶块只能包含内联内容。 复杂的容器,反过来,可以包含其他容器块,包括叶块。

叶块

下表显示了在Aspose.Words中使用Markdown叶块的示例:

Markdown特征 Aspose.Words
HorizontalRule
-----
这是一个简单的段落,具有相应的HorizontalRule形状:
DocumentBuilder.InsertHorizontalRule()
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
// Insert horizontal rule.
builder.insertHorizontalRule();
ATX Heading
# H1, ## H2, ### H3…
ParagraphFormat.StyleName = "Heading N",其中(1<=N <= 9).
这被转换为内置样式,并且应该完全符合指定的模式(不允许使用后缀或前缀)。
否则,它将只是一个具有相应样式的常规段落。
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
// By default Heading styles in Word may have Bold and Italic formatting.
//If we do not want to be emphasized, set these properties explicitly to false.
builder.getFont().setBold(false);
builder.getFont().setItalic(false);
builder.getParagraphFormat().setStyleName("Heading 1");
builder.writeln("This is an H1 tag");
Setext Heading
=== (if Heading level 1),
--- (if Heading level 2)
ParagraphFormat.StyleName = "SetextHeading[some suffix]",基于"标题N"样式。
如果(N>=2),则将使用’Heading 2',否则使用’Heading 1'。
允许任何后缀,但Aspose.Words导入器分别使用数字"1"和"2"。
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
builder.getParagraphFormat().setStyleName("Heading 1");
builder.writeln("This is an H1 tag");
// Reset styles from the previous paragraph to not combine styles between paragraphs.
builder.getFont().setBold(false);
builder.getFont().setItalic(false);
Style setexHeading1 = builder.getDocument().getStyles().add(StyleType.PARAGRAPH, "SetexHeading1");
builder.getParagraphFormat().setStyle(setexHeading1);
builder.getDocument().getStyles().get("SetexHeading1").setBaseStyleName("Heading 1");
builder.writeln("Setex Heading level 1");
builder.getParagraphFormat().setStyle(builder.getDocument().getStyles().get("Heading 3"));
builder.writeln("This is an H3 tag");
// Reset styles from the previous paragraph to not combine styles between paragraphs.
builder.getFont().setBold(false);
builder.getFont().setItalic(false);
Style setexHeading2 = builder.getDocument().getStyles().add(StyleType.PARAGRAPH, "SetexHeading2");
builder.getParagraphFormat().setStyle(setexHeading2);
builder.getDocument().getStyles().get("SetexHeading2").setBaseStyleName("Heading 3");
// Setex heading level will be reset to 2 if the base paragraph has a Heading level greater than 2.
builder.writeln("Setex Heading level 2");
Indented Code
<br/>if ()<br/>then<br/>else<br/>```
ParagraphFormat.StyleName = "IndentedCode[some suffix]"
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
Style fencedCode = builder.getDocument().getStyles().add(StyleType.PARAGRAPH, "FencedCode");
builder.getParagraphFormat().setStyle(fencedCode);
builder.writeln("This is an fenced code");
Style fencedCodeWithInfo = builder.getDocument().getStyles().add(StyleType.PARAGRAPH, "FencedCode.C#");
builder.getParagraphFormat().setStyle(fencedCodeWithInfo);
builder.writeln("This is a fenced code with info string");

复杂容器

下表显示了在Aspose.Words中使用Markdown复杂容器的示例:

Markdown特征 Aspose.Words
Quote
> quote,
>> nested quote
ParagraphFormat.StyleName = "Quote[some suffix]"
样式名称中的后缀是可选的,但Aspose.Words导入器使用有序数字1, 2, 3, …. 在嵌套引号的情况下。
嵌套是通过继承的样式定义的。
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
// By default a document stores blockquote style for the first level.
builder.getParagraphFormat().setStyleName("Quote");
builder.writeln("Blockquote");
// Create styles for nested levels through style inheritance.
Style quoteLevel2 = builder.getDocument().getStyles().add(StyleType.PARAGRAPH, "Quote1");
builder.getParagraphFormat().setStyle(quoteLevel2);
builder.getDocument().getStyles().get("Quote1").setBaseStyleName("Quote");
builder.writeln("1. Nested blockquote");
BulletedList
- Item 1
- Item 2
- Item 2a
- Item 2b
项目符号列表使用段落编号表示:
ListFormat.ApplyBulletDefault()
项目符号列表可以有3种类型。 它们只是第一级编号格式的差异。 它们分别是:‘-’‘+’‘*’
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
builder.getListFormat().applyBulletDefault();
builder.getListFormat().getList().getListLevels().get(0).setNumberFormat("-");
builder.writeln("Item 1");
builder.writeln("Item 2");
builder.getListFormat().listIndent();
builder.writeln("Item 2a");
builder.writeln("Item 2b");
OrderedList
1. Item 1
2. Item 2
1) Item 2a
2) Item 2b
有序列表使用段落编号表示:
ListFormat.ApplyNumberDefault()
可以有2个数字格式标记:‘.’和‘)’。 默认标记为‘.’。
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.getListFormat().applyBulletDefault();
builder.getListFormat().getList().getListLevels().get(0).setNumberFormat(MessageFormat.format("{0}.", (char)0));
builder.getListFormat().getList().getListLevels().get(1).setNumberFormat(MessageFormat.format("{0}.", (char)1));
builder.writeln("Item 1");
builder.writeln("Item 2");
builder.getListFormat().listIndent();
builder.writeln("Item 2a");
builder.writeln("Item 2b");

表格

Aspose.Words还允许将表格转换为DOM,如下所示:

Markdown特征 Aspose.Words
Table
a|b
-|-
c|d
TableRowCell类。
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java
// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder();
// Add the first row.
builder.insertCell();
builder.writeln("a");
builder.insertCell();
builder.writeln("b");
// Add the second row.
builder.insertCell();
builder.writeln("c");
builder.insertCell();
builder.writeln("d");

请参阅