Working with Text Document

In this article, we will learn what options can be useful for working with a text document via Aspose.Words. Please note that this is not a complete list of available options, but only an example of working with some of them.

Add Bi-Directional Marks

You can use add_bidi_marks property to specify whether to add bi-directional marks before each BiDi run when exporting in plain text format. Aspose.Words inserts Unicode Character ‘RIGHT-TO-LEFT MARK’ (U+200F) before each bi-directional Run in the text. This option corresponds to “Add bi-directional marks” option in MS Word File Conversion dialogue when you export to a Plain Text format. Note that it appears in dialogue only if any of Arabic or Hebrew editing languages are added in MS Word.

The following code example shows how to use add_bidi_marks property. The default value of this property is False:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
builder.writeln("Hello world!")
builder.paragraph_format.bidi = True
builder.writeln("שלום עולם!")
builder.writeln("مرحبا بالعالم!")
saveOptions = aw.saving.TxtSaveOptions()
saveOptions.add_bidi_marks = True
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.add_bidi_marks.txt", saveOptions)

Recognize List Items During Loading TXT

Aspose.Words can import list item of a text file as list numbers or plain text in its document object model. The detect_numbering_with_whitespaces property allows specifying how numbered list items are recognized when a document is imported from plain text format:

  • If this option is set to True, whitespaces are also used as list number delimiters: list recognition algorithm for Arabic style numbering (1., 1.1.2.) uses both whitespaces and dot (".") symbols.
  • If this option is set to False, lists recognition algorithm detects list paragraphs, when list numbers end with either dot, right bracket or bullet symbols (such as “•”, “*”, “-” or “o”).

The following code example shows how to use this property:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
# Create a plaintext document in the form of a string with parts that may be interpreted as lists.
# Upon loading, the first three lists will always be detected by Aspose.words,
# and List objects will be created for them after loading.
textDoc = """Full stop delimiters:\n
1. First list item 1\n
2. First list item 2\n
3. First list item 3\n\n
Right bracket delimiters:\n
1) Second list item 1\n
2) Second list item 2\n
3) Second list item 3\n\n
Bullet delimiters:\n
• Third list item 1\n
• Third list item 2\n
• Third list item 3\n\n
Whitespace delimiters:\n
1 Fourth list item 1\n
2 Fourth list item 2\n
3 Fourth list item 3"""
# The fourth list, with whitespace inbetween the list number and list item contents,
# will only be detected as a list if "DetectNumberingWithWhitespaces" in a LoadOptions object is set to true,
# to avoid paragraphs that start with numbers being mistakenly detected as lists.
loadOptions = aw.loading.TxtLoadOptions()
loadOptions.detect_numbering_with_whitespaces = True
# Load the document while applying LoadOptions as a parameter and verify the result.
doc = aw.Document(io.BytesIO(textDoc.encode("utf-8")), loadOptions)
doc.save(docs_base.artifacts_dir + "WorkingWithTxtLoadOptions.detect_numbering_with_whitespaces.docx")

Handle Leading and Trailing spaces During Loading TXT

You can control the way of handling leading and trailing spaces during loading TXT file. The leading spaces could be trimmed, preserved or converted to indent and trailing spaces could be trimmed or preserved.

The following code example shows how to trim leading and trailing spaces while importing TXT file:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
textDoc = " Line 1 \n" + " Line 2 \n" + " Line 3 "
loadOptions = aw.loading.TxtLoadOptions()
loadOptions.leading_spaces_options = aw.loading.TxtLeadingSpacesOptions.TRIM
loadOptions.trailing_spaces_options = aw.loading.TxtTrailingSpacesOptions.TRIM
f = io.BytesIO(textDoc.encode("utf-8"))
doc = aw.Document(f, loadOptions)
doc.save(docs_base.artifacts_dir + "WorkingWithTxtLoadOptions.handle_spaces_options.docx")

Detect Document Text Direction

Aspose.Words provides document_direction property in TxtLoadOptions class to detect the text direction (RTL / LTR) in the document. This property sets or gets document text directions provided in DocumentDirection enumeration. The default value is left to right.

The following code example shows how to detect text direction of the document while importing TXT file:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
loadOptions = aw.loading.TxtLoadOptions()
loadOptions.document_direction = aw.loading.DocumentDirection.AUTO
doc = aw.Document(docs_base.my_dir + "Hebrew text.txt", loadOptions)
paragraph = doc.first_section.body.first_paragraph
print(paragraph.paragraph_format.bidi)
doc.save(docs_base.artifacts_dir + "WorkingWithTxtLoadOptions.document_text_direction.docx")

If you want to export header and footer in output TXT document, you can use export_headers_footers_mode property. This property specifies the way headers and footers are exported to the plain text format.

The following code example shows how to export headers and footers to plain text format:

doc = aw.Document(docs_base.my_dir + "Document.docx")

options = aw.saving.TxtSaveOptions()
options.save_format = aw.SaveFormat.TEXT

# All headers and footers are placed at the very end of the output document.
options.export_headers_footers_mode = aw.saving.TxtExportHeadersFootersMode.ALL_AT_END
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.export_headers_footers_mode_A.txt", options)

# Only primary headers and footers are exported at the beginning and end of each section.
options.export_headers_footers_mode = aw.saving.TxtExportHeadersFootersMode.PRIMARY_ONLY
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.export_headers_footers_mode_B.txt", options)

# No headers and footers are exported.
options.export_headers_footers_mode = aw.saving.TxtExportHeadersFootersMode.NONE
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.export_headers_footers_mode_C.txt", options)

Export List Indentation in Output TXT

Aspose.Words introduced TxtListIndentation class that allows specifying how list levels are indented while exporting to a plain text format. While working with TxtSaveOption, the list_indentation property is provided to specify the character to be used for indenting list levels and count specifying how many characters to use as indentation per one list level. The default value for character property is ‘\0’ indicating that there is no indentation. For count property, the default value is 0 which means no indentation.

Using Tab Character

The following code example shows how to export list levels using tab characters:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
# Create a list with three levels of indentation.
builder.list_format.apply_number_default()
builder.writeln("Item 1")
builder.list_format.list_indent()
builder.writeln("Item 2")
builder.list_format.list_indent()
builder.write("Item 3")
saveOptions = aw.saving.TxtSaveOptions()
saveOptions.list_indentation.count = 1
#saveOptions.list_indentation.character = '\t'
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.use_tab_character_per_level_for_list_indentation.txt", saveOptions)

Using Space Character

The following code example shows how to export list levels using space characters:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
# Create a list with three levels of indentation.
builder.list_format.apply_number_default()
builder.writeln("Item 1")
builder.list_format.list_indent()
builder.writeln("Item 2")
builder.list_format.list_indent()
builder.write("Item 3")
saveOptions = aw.saving.TxtSaveOptions()
saveOptions.list_indentation.count = 3
#saveOptions.list_indentation.character = ' '
doc.save(docs_base.artifacts_dir + "WorkingWithTxtSaveOptions.use_space_character_per_level_for_list_indentation.txt", saveOptions)