Split a Document

Splitting or split a document is the process of breaking a large document into a greater number of smaller files. There are various reasons to split a file. For example, you only need some pages from a specific document and not the entire one. Or for privacy reasons, you want to share only some parts of a document with others. With the splitting feature, you can get only the required parts of the document and do the necessary actions with them, for example, to mark up, save, or send.

Aspose.Words provides you with an efficient way to split one document into multiple documents by headings or sections. You can also split a document by pages or by page ranges. Both splitting options will be described in this article.

To split a document into smaller files using Aspose.Words, you need to follow these steps:

  1. Load the document in any supported format.
  2. Split the document.
  3. Save the output documents.

After you split a document, you will be able to open all the output documents that will start with the required pages, text, etc.

Split a Document Using Different Criteria

Aspose.Words allows you to split EPUB or HTML documents into chapters according to various criteria. In the process, the style and layout of the source document are preserved for the output documents.

You can specify criteria using the DocumentSplitCriteria enumeration. So you can divide a document into chapters using one of the following criteria or combine more than one criteria together:

  • heading paragraph,
  • section break,
  • column break,
  • page break.

When saving the output to HTML, Aspose.Words save each individual chapter as a separate HTML file. As a result, the document will be split into multiple HTML files. When saving the output to EPUB, Aspose.Words save the result in a single EPUB file regardless of the DocumentSplitCriteria value you used. So, using DocumentSplitCriteria for EPUB documents only affects the appearance of their content in reader applications: content will be divided into chapters and the document will no longer appear continuous.

In this section, we consider only some of the possible split criteria.

Split a Document by Headings

To split a document into chapters by headings, use the HeadingParagraph value of the DocumentSplitCriteria property.

If you need to split a document by a specific level of heading paragraphs, such as headings 1, 2, and 3, use also the DocumentSplitHeadingLevel property. The output will be divided by paragraphs formatted with the specified heading level.

The following code example shows how to split a document into smaller parts by heading:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET
// Open a Word document
Document doc = new Document(dataDir + "Test File (doc).docx");
HtmlSaveOptions options = new HtmlSaveOptions();
// Split a document into smaller parts, in this instance split by heading
options.DocumentSplitCriteria = DocumentSplitCriteria.HeadingParagraph;
// Save the output file
doc.Save(dataDir + "SplitDocumentByHeadings_out.html", options);

Please note that for this criteria, Aspose.Words only supports saving to HTML format when splitting.

When saving to EPUB, the document is not split into several files, and there will be only one output file.

Split a Document by Sections

Aspose.Words also enables you to use section breaks to split documents and save them to HTML. For this purpose, use SectionBreak as the DocumentSplitCriteria:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET
HtmlSaveOptions options = new HtmlSaveOptions();
options.DocumentSplitCriteria = DocumentSplitCriteria.HeadingParagraph;

There is another way to split the source document into multiple output documents, and you can choose any output format supported by Aspose.Words.

The following code example shows how to split a document into smaller parts by section breaks (without using the DocumentSplitCriteria property):

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET
// Open a Word document
Document doc = new Document(dataDir + "TestFile (Split).docx");
for (int i = 0; i < doc.Sections.Count; i++)
{
// Split a document into smaller parts, in this instance split by section
Section section = doc.Sections[i].Clone();
Document newDoc = new Document();
newDoc.Sections.Clear();
Section newSection = (Section) newDoc.ImportNode(section, true);
newDoc.Sections.Add(newSection);
// Save each section as a separate document
newDoc.Save(dataDir + $"SplitDocumentBySectionsOut_{i}.docx");
}

Split by Pages

You can also split a document page by page, by page ranges, or starting with the specified page numbers. In such case the ExtractPages method can do the job.

This section describes several use cases of dividing documents by paged using the Document class and the ExtractPages method.

Split a Document Page by Page

Aspose.Words enables you to split a multi-page document page by page.

The following code example shows how to divide a document and save each page as a separate document:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET
Document doc = new Document(MyDir + "Big document.docx");
int pageCount = doc.PageCount;
for (int page = 0; page < pageCount; page++)
{
// Save each page as a separate document.
Document extractedPage = doc.ExtractPages(page, 1);
extractedPage.Save(ArtifactsDir + $"SplitDocument.PageByPage_{page + 1}.docx");
}

Split a Document by Page Ranges

Aspose.Words allows splitting a multi-page document by page ranges. You can split one file into multiple files with various page ranges or just select one range and save only this part of the source document. Note that you can choose the page range according to the maximum and minimum page number of a document.

The following code example shows how to split a document into smaller parts by page range with specific start and end indexes:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET
Document doc = new Document(MyDir + "Big document.docx");
// Get part of the document.
Document extractedPages = doc.ExtractPages(3, 6);
extractedPages.Save(ArtifactsDir + "SplitDocument.ByPageRange.docx");

Callback Option to Save a Document

You can use the DocumentPartSavingCallback property to control how Aspose.Words saves document parts when this document is exported into HTML format. This property allows you to rename output files or even to redirect them into custom streams.

Please note that this callback is not useful when saving to EPUB because all output parts must be saved into a single container – the .epub file. So, stream redirection is not supported, and the effect of renaming is not visible since files are renamed inside the container.

Merge the Split Document with Another Document

Aspose.Words enables you to merge the output split document with another document to form a new document. This can be called document merging.

The following code example shows how to merge a split document with another document:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET
public static void MergeDocuments(string dataDir)
{
// Find documents using for merge
FileSystemInfo[] documentPaths = new DirectoryInfo(dataDir)
.GetFileSystemInfos("SplitDocumentPageByPageOut_*.docx").OrderBy(f => f.CreationTime).ToArray();
string sourceDocumentPath =
Directory.GetFiles(dataDir, "SplitDocumentPageByPageOut_1.docx", SearchOption.TopDirectoryOnly)[0];
// Open the first part of the resulting document
Document sourceDoc = new Document(sourceDocumentPath);
// Create a new resulting document
Document mergedDoc = new Document();
DocumentBuilder mergedDocBuilder = new DocumentBuilder(mergedDoc);
// Merge document parts one by one
foreach (FileSystemInfo documentPath in documentPaths)
{
if (documentPath.FullName == sourceDocumentPath)
continue;
mergedDocBuilder.MoveToDocumentEnd();
mergedDocBuilder.InsertDocument(sourceDoc, ImportFormatMode.KeepSourceFormatting);
sourceDoc = new Document(documentPath.FullName);
}
// Save the output file
mergedDoc.Save(dataDir + "MergeDocuments_out.docx");
}