处理文本文档

在本文中,我们将了解哪些选项可用于通过 Aspose.Words 处理文本文档。请注意,这不是可用选项的完整列表,而只是使用其中一些选项的示例。

添加双向标记

您可以使用 add_bidi_marks 属性指定以纯文本格式导出时是否在每次 BiDi 运行之前添加双向标记。 Aspose.Words 在文本中的每个双向 Run 之前插入 Unicode 字符"从右到左标记"(U+200F)。当导出为纯文本格式时,此选项对应于 MS Word 文件转换对话框中的"添加双向标记"选项。请注意,仅当 MS Word 中添加了任何阿拉伯语或希伯来语编辑语言时,它才会出现在对话中。

以下代码示例显示如何使用 add_bidi_marks 属性。该属性的默认值为 False

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
builder.writeln("Hello world!")
builder.paragraph_format.bidi = True
builder.writeln("שלום עולם!")
builder.writeln("مرحبا بالعالم!")
saveOptions = aw.saving.TxtSaveOptions()
saveOptions.add_bidi_marks = True
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.add_bidi_marks.txt", saveOptions)

在加载 TXT 期间识别列表项

Aspose.Words 可以将文本文件的列表项作为列表编号或纯文本导入到其文档对象模型中。 detect_numbering_with_whitespaces 属性允许指定从纯文本格式导入文档时如何识别编号列表项:

  • 如果此选项设置为 True,空格也用作列表编号分隔符:阿拉伯样式编号 (1.、1.1.2.) 的列表识别算法同时使用空格和点 (".") 符号。
  • 如果此选项设置为 False,当列表编号以点、右括号或项目符号(例如"•"、"*"、"-“或"o”)结尾时,列表识别算法会检测列表段落。

以下代码示例展示了如何使用此属性:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
# Create a plaintext document in the form of a string with parts that may be interpreted as lists.
# Upon loading, the first three lists will always be detected by Aspose.words,
# and List objects will be created for them after loading.
textDoc = """Full stop delimiters:\n
1. First list item 1\n
2. First list item 2\n
3. First list item 3\n\n
Right bracket delimiters:\n
1) Second list item 1\n
2) Second list item 2\n
3) Second list item 3\n\n
Bullet delimiters:\n
• Third list item 1\n
• Third list item 2\n
• Third list item 3\n\n
Whitespace delimiters:\n
1 Fourth list item 1\n
2 Fourth list item 2\n
3 Fourth list item 3"""
# The fourth list, with whitespace inbetween the list number and list item contents,
# will only be detected as a list if "DetectNumberingWithWhitespaces" in a LoadOptions object is set to true,
# to avoid paragraphs that start with numbers being mistakenly detected as lists.
loadOptions = aw.loading.TxtLoadOptions()
loadOptions.detect_numbering_with_whitespaces = True
# Load the document while applying LoadOptions as a parameter and verify the result.
doc = aw.Document(io.BytesIO(textDoc.encode("utf-8")), loadOptions)
doc.save(docs_base.artifacts_dir + "WorkingWithTxtLoadOptions.detect_numbering_with_whitespaces.docx")

加载 TXT 期间处理前导和尾随空格

您可以控制加载 TXT 文件期间处理前导空格和尾随空格的方式。前导空格可以被修剪、保留或转换为缩进,尾随空格可以被修剪或保留。

以下代码示例演示如何在导入 TXT 文件时修剪前导和尾随空格:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
textDoc = " Line 1 \n" + " Line 2 \n" + " Line 3 "
loadOptions = aw.loading.TxtLoadOptions()
loadOptions.leading_spaces_options = aw.loading.TxtLeadingSpacesOptions.TRIM
loadOptions.trailing_spaces_options = aw.loading.TxtTrailingSpacesOptions.TRIM
f = io.BytesIO(textDoc.encode("utf-8"))
doc = aw.Document(f, loadOptions)
doc.save(docs_base.artifacts_dir + "WorkingWithTxtLoadOptions.handle_spaces_options.docx")

检测文档文本方向

Aspose.Words 在 TxtLoadOptions 类中提供 document_direction 属性来检测文档中的文本方向(RTL / LTR)。此属性设置或获取 DocumentDirection 枚举中提供的文档文本方向。默认值是从左到右。

以下代码示例展示了如何在导入 TXT 文件时检测文档的文本方向:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
loadOptions = aw.loading.TxtLoadOptions()
loadOptions.document_direction = aw.loading.DocumentDirection.AUTO
doc = aw.Document(docs_base.my_dir + "Hebrew text.txt", loadOptions)
paragraph = doc.first_section.body.first_paragraph
print(paragraph.paragraph_format.bidi)
doc.save(docs_base.artifacts_dir + "WorkingWithTxtLoadOptions.document_text_direction.docx")

导出输出 TXT 中的页眉和页脚

如果要导出输出 TXT 文档中的页眉和页脚,可以使用 export_headers_footers_mode 属性。此属性指定将页眉和页脚导出为纯文本格式的方式。

以下代码示例演示如何将页眉和页脚导出为纯文本格式:

doc = aw.Document(docs_base.my_dir + "Document.docx")

options = aw.saving.TxtSaveOptions()
options.save_format = aw.SaveFormat.TEXT

# All headers and footers are placed at the very end of the output document.
options.export_headers_footers_mode = aw.saving.TxtExportHeadersFootersMode.ALL_AT_END
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.export_headers_footers_mode_A.txt", options)

# Only primary headers and footers are exported at the beginning and end of each section.
options.export_headers_footers_mode = aw.saving.TxtExportHeadersFootersMode.PRIMARY_ONLY
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.export_headers_footers_mode_B.txt", options)

# No headers and footers are exported.
options.export_headers_footers_mode = aw.saving.TxtExportHeadersFootersMode.NONE
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.export_headers_footers_mode_C.txt", options)

输出 TXT 中的导出列表缩进

Aspose.Words 引入了 TxtListIndentation 类,允许指定在导出为纯文本格式时如何缩进列表级别。使用 TxtSaveOption 时,提供 list_indentation 属性来指定用于缩进列表级别的字符,并指定用于每一列表级别缩进的字符数。字符属性的默认值为"\0",表示没有缩进。对于 count 属性,默认值为 0,表示不缩进。

使用制表符

以下代码示例演示如何使用制表符导出列表级别:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
# Create a list with three levels of indentation.
builder.list_format.apply_number_default()
builder.writeln("Item 1")
builder.list_format.list_indent()
builder.writeln("Item 2")
builder.list_format.list_indent()
builder.write("Item 3")
saveOptions = aw.saving.TxtSaveOptions()
saveOptions.list_indentation.count = 1
#saveOptions.list_indentation.character = '\t'
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.use_tab_character_per_level_for_list_indentation.txt", saveOptions)

使用空格字符

以下代码示例演示如何使用空格字符导出列表级别:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
# Create a list with three levels of indentation.
builder.list_format.apply_number_default()
builder.writeln("Item 1")
builder.list_format.list_indent()
builder.writeln("Item 2")
builder.list_format.list_indent()
builder.write("Item 3")
saveOptions = aw.saving.TxtSaveOptions()
saveOptions.list_indentation.count = 3
#saveOptions.list_indentation.character = ' '
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.use_space_character_per_level_for_list_indentation.txt", saveOptions)