Working with Text Document

In this article, we will learn what options can be useful for working with a text document via Aspose.Words. Please note that this is not a complete list of available options, but only an example of working with some of them.

Add Bi-Directional Marks

You can use the AddBidiMarks property to specify whether to add bi-directional marks before each BiDi run when exporting in plain text format. Aspose.Words inserts Unicode Character ‘RIGHT-TO-LEFT MARK’ (U+200F) before each bi-directional Run in the text. This option corresponds to “Add bi-directional marks” option in MS Word File Conversion dialogue when you export to a Plain Text format. Note that it appears in dialogue only if any of Arabic or Hebrew editing languages are added in MS Word.

The following code example shows how to use AddBidiMarks property. The default value of this property is false:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET.git.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Writeln("Hello world!");
builder.ParagraphFormat.Bidi = true;
builder.Writeln("שלום עולם!");
builder.Writeln("مرحبا بالعالم!");
TxtSaveOptions saveOptions = new TxtSaveOptions { AddBidiMarks = true };
doc.Save(ArtifactsDir + "WorkingWithTxtSaveOptions.AddBidiMarks.txt", saveOptions);

Recognize List Items During Loading TXT

Aspose.Words can import list item of a text file as list numbers or plain text in its document object model. The DetectNumberingWithWhitespaces property allows specifying how numbered list items are recognized when a document is imported from plain text format:

  • If this option is set to true, whitespaces are also used as list number delimiters: list recognition algorithm for Arabic style numbering (1., 1.1.2.) uses both whitespaces and dot (".") symbols.

  • If this option is set to false, lists recognition algorithm detects list paragraphs, when list numbers end with either dot, right bracket or bullet symbols (such as “•”, “*”, “-” or “o”).

The following code example shows how to use this property:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET.git.
// Create a plaintext document in the form of a string with parts that may be interpreted as lists.
// Upon loading, the first three lists will always be detected by Aspose.Words,
// and List objects will be created for them after loading.
const string textDoc = "Full stop delimiters:\n" +
"1. First list item 1\n" +
"2. First list item 2\n" +
"3. First list item 3\n\n" +
"Right bracket delimiters:\n" +
"1) Second list item 1\n" +
"2) Second list item 2\n" +
"3) Second list item 3\n\n" +
"Bullet delimiters:\n" +
"• Third list item 1\n" +
"• Third list item 2\n" +
"• Third list item 3\n\n" +
"Whitespace delimiters:\n" +
"1 Fourth list item 1\n" +
"2 Fourth list item 2\n" +
"3 Fourth list item 3";
// The fourth list, with whitespace inbetween the list number and list item contents,
// will only be detected as a list if "DetectNumberingWithWhitespaces" in a LoadOptions object is set to true,
// to avoid paragraphs that start with numbers being mistakenly detected as lists.
TxtLoadOptions loadOptions = new TxtLoadOptions { DetectNumberingWithWhitespaces = true };
// Load the document while applying LoadOptions as a parameter and verify the result.
Document doc = new Document(new MemoryStream(Encoding.UTF8.GetBytes(textDoc)), loadOptions);
doc.Save(ArtifactsDir + "WorkingWithTxtLoadOptions.DetectNumberingWithWhitespaces.docx");

Handle Leading and Trailing spaces During Loading TXT

You can control the way of handling leading and trailing spaces during loading TXT file. The leading spaces could be trimmed, preserved or converted to indent and trailing spaces could be trimmed or preserved.

The following code example shows how to trim leading and trailing spaces while importing TXT file:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET.git.
const string textDoc = " Line 1 \n" +
" Line 2 \n" +
" Line 3 ";
TxtLoadOptions loadOptions = new TxtLoadOptions
{
LeadingSpacesOptions = TxtLeadingSpacesOptions.Trim,
TrailingSpacesOptions = TxtTrailingSpacesOptions.Trim
};
Document doc = new Document(new MemoryStream(Encoding.UTF8.GetBytes(textDoc)), loadOptions);
doc.Save(ArtifactsDir + "WorkingWithTxtLoadOptions.HandleSpacesOptions.docx");

Detect Document Text Direction

Aspose.Words provides the DocumentDirection property in the TxtLoadOptions class to detect the text direction (RTL / LTR) in the document. This property sets or gets document text directions provided in the DocumentDirection enumeration. The default value is left to right.

The following code example shows how to detect text direction of the document while importing TXT file:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET.git.
TxtLoadOptions loadOptions = new TxtLoadOptions { DocumentDirection = DocumentDirection.Auto };
Document doc = new Document(MyDir + "Hebrew text.txt", loadOptions);
Paragraph paragraph = doc.FirstSection.Body.FirstParagraph;
Console.WriteLine(paragraph.ParagraphFormat.Bidi);
doc.Save(ArtifactsDir + "WorkingWithTxtLoadOptions.DocumentTextDirection.docx");

If you want to export header and footer in output TXT document, you can use the ExportHeadersFootersMode property. This property specifies the way headers and footers are exported to the plain text format.

The following code example shows how to export headers and footers to plain text format:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET.git.
Document doc = new Document();
// Insert even and primary headers/footers into the document.
// The primary header/footers will override the even headers/footers.
doc.FirstSection.HeadersFooters.Add(new HeaderFooter(doc, HeaderFooterType.HeaderEven));
doc.FirstSection.HeadersFooters[HeaderFooterType.HeaderEven].AppendParagraph("Even header");
doc.FirstSection.HeadersFooters.Add(new HeaderFooter(doc, HeaderFooterType.FooterEven));
doc.FirstSection.HeadersFooters[HeaderFooterType.FooterEven].AppendParagraph("Even footer");
doc.FirstSection.HeadersFooters.Add(new HeaderFooter(doc, HeaderFooterType.HeaderPrimary));
doc.FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary].AppendParagraph("Primary header");
doc.FirstSection.HeadersFooters.Add(new HeaderFooter(doc, HeaderFooterType.FooterPrimary));
doc.FirstSection.HeadersFooters[HeaderFooterType.FooterPrimary].AppendParagraph("Primary footer");
// Insert pages to display these headers and footers.
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Writeln("Page 1");
builder.InsertBreak(BreakType.PageBreak);
builder.Writeln("Page 2");
builder.InsertBreak(BreakType.PageBreak);
builder.Write("Page 3");
TxtSaveOptions options = new TxtSaveOptions();
options.SaveFormat = SaveFormat.Text;
// All headers and footers are placed at the very end of the output document.
options.ExportHeadersFootersMode = TxtExportHeadersFootersMode.AllAtEnd;
doc.Save(ArtifactsDir + "WorkingWithTxtLoadOptions.HeadersFootersMode.AllAtEnd.txt", options);
// Only primary headers and footers are exported at the beginning and end of each section.
options.ExportHeadersFootersMode = TxtExportHeadersFootersMode.PrimaryOnly;
doc.Save(ArtifactsDir + "WorkingWithTxtLoadOptions.HeadersFootersMode.PrimaryOnly.txt", options);
// No headers and footers are exported.
options.ExportHeadersFootersMode = TxtExportHeadersFootersMode.None;
doc.Save(ArtifactsDir + "WorkingWithTxtLoadOptions.HeadersFootersMode.None.txt", options);

Export List Indentation in Output TXT

Aspose.Words introduced the TxtListIndentation class that allows specifying how list levels are indented while exporting to a plain text format. While working with TxtSaveOption, the ListIndentation property is provided to specify the character to be used for indenting list levels and count specifying how many characters to use as indentation per one list level.

The default value for character property is ‘\0’ indicating that there is no indentation. For count property, the default value is 0 which means no indentation.

Using Tab Character

The following code example shows how to export list levels using tab characters:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET.git.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Create a list with three levels of indentation.
builder.ListFormat.ApplyNumberDefault();
builder.Writeln("Item 1");
builder.ListFormat.ListIndent();
builder.Writeln("Item 2");
builder.ListFormat.ListIndent();
builder.Write("Item 3");
TxtSaveOptions saveOptions = new TxtSaveOptions();
saveOptions.ListIndentation.Count = 1;
saveOptions.ListIndentation.Character = '\t';
doc.Save(ArtifactsDir + "WorkingWithTxtSaveOptions.UseTabForListIndentation.txt", saveOptions);

Using Space Character

The following code example shows how to export list levels using space characters:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET.git.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Create a list with three levels of indentation.
builder.ListFormat.ApplyNumberDefault();
builder.Writeln("Item 1");
builder.ListFormat.ListIndent();
builder.Writeln("Item 2");
builder.ListFormat.ListIndent();
builder.Write("Item 3");
TxtSaveOptions saveOptions = new TxtSaveOptions();
saveOptions.ListIndentation.Count = 3;
saveOptions.ListIndentation.Character = ' ';
doc.Save(ArtifactsDir + "WorkingWithTxtSaveOptions.UseSpaceForListIndentation.txt", saveOptions);