แยกเนื้อหาระหว่างโหนดในเอกสาร

เมื่อทำงานกับเอกสารมันเป็นสิ่งสำคัญที่จะสามารถแยกเนื้อหาจากช่วงเฉพาะภายในเอก อย่างไรก็ตามเนื้อหาอาจประกอบด้วยองค์ประกอบที่ซับซ้อนเช่นย่อหน้าตารางรูปภาพฯลฯ.

โดยไม่คำนึงถึงสิ่งที่เนื้อหาจะต้องถูกแยกเมธอดในการแยกเนื้อหานั้นจะถูกกำหนดโดยที่โห เหล่านี้สามารถเป็นเนื้อหาข้อความทั้งหมดหรือข้อความที่เรียบง่ายทำงาน.

มีหลายสถานการณ์ที่เป็นไปได้และดังนั้นจึงมีหลายชนิดโหนดที่แตกต่างกันที่จะต้องพิจารณา ตัวอย่างเช่นคุณอาจต้องการแยกเนื้อหาระหว่าง:

สองย่อหน้าที่
รันข้อความที่เฉพาะเจาะจง
ฟิลด์ประเภทต่างๆเช่นฟิลด์ผสาน
ช่วงเริ่มต้นและสิ้นสุดของบุ๊กมาร์กหรือแสดงความคิดเห็น
เนื้อหาต่างๆของข้อความที่มีอยู่ในส่วนที่แยกต่างหาก

ในบางสถานการณ์คุณอาจจำเป็นต้องรวมชนิดโหนดต่างๆเช่นการแยกเนื้อหาระหว่างย่อหน้า.

บทความนี้จัดเตรียมการติดตั้งโค้ดสำหรับการแยกข้อความระหว่างโหนดต่างๆรวมทั้งตัวอย่.

ตัวอย่างเหล่านี้เป็นเพียงการสาธิตไม่กี่ของความเป็นไปได้มากมาย เราวางแผนสำหรับการทำงานการสกัดข้อความที่จะเป็นส่วนหนึ่งของประชาชนAPIในอนาค ในขณะเดียวกันอย่าลังเลที่จะโพสต์คำขอของคุณเกี่ยวกับฟังก์ชันการทำงานนี้ใน Aspose.Wordsฟอรั่ม.

ทำไมสารสกัดจากเนื้อหา

บ่อยครั้งที่เป้าหมายของการแยกเนื้อหาคือการทำซ้ำหรือบันทึกแยกต่างหากในเอกสารให ตัวอย่างเช่นคุณสามารถแยกเนื้อหาและ:

คัดลอกลงในเอกสารแยกต่างหาก
แปลงส่วนเฉพาะของเอกสารเป็นPDFหรือรูปภาพ
ทำซ้ำเนื้อหาในเอกสารหลายครั้ง
ทำงานกับเนื้อหาที่แยกออกจากส่วนที่เหลือของเอกสาร

นี้สามารถทำได้ง่ายโดยใช้Aspose.Wordsและการดำเนินการรหัสด้านล่าง.

การแยกอัลกอริทึมเนื้อหา

รหัสในส่วนนี้อยู่ทั้งหมดของสถานการณ์ที่เป็นไปได้ที่อธิบายไว้ข้างต้นด้วยวิธีการทั่วไปและ โครงร่างทั่วไปของเทคนิคนี้เกี่ยวข้องกับ:

รวบรวมโหนดซึ่งกำหนดพื้นที่ของเนื้อหาที่จะสกัดจากเอกสารของคุณ การดึงโหนดเหล่านี้ถูกจัดการโดยผู้ใช้ในโค้ดตามสิ่งที่พวกเขาต้องการถูกดึงออก.
การส่งผ่านโหนดเหล่านี้ไปยังวิธีการExtractContentที่ให้ไว้ด้านล่าง นอกจากนี้คุณยังต้องผ่านพารามิเตอร์บูลีนซึ่งระบุว่าโหนดเหล่านี้ทำหน้าที่เป็นเครื่องหมาย.
การเรียกดูรายการของเนื้อหาที่โคลน(โหนดที่คัดลอก)ที่ระบุให้แยก คุณสามารถใช้รายการโหนดนี้ในลักษณะใดๆที่เกี่ยวข้องตัวอย่างเช่นการสร้างเอกสารใหม่.

วิธีการแยกเนื้อหา

ในการดึงเนื้อหาจากเอกสารของคุณคุณต้องเรียกวิธีการExtractContentด้านล่างและส่งผ่านพารามิเตอร์ที่เหมาะสม พื้นฐานพื้นฐานของวิธีนี้เกี่ยวข้องกับการค้นหาโหนดระดับบล็อก(ย่อหน้าและตาราง)และโคลน ถ้าโหนดเครื่องหมายผ่านเป็นระดับบล็อกแล้ววิธีการที่จะสามารถที่จะเพียงแค่คัดลอกเนื้อ.

แต่ถ้าโหนดเครื่องหมายเป็นแบบอินไลน์(ลูกของย่อหน้า)แล้วสถานการณ์จะซับซ้อนมากขึ้นเนื่ เนื้อหาในโหนดพาเรนต์ที่โคลนไม่อยู่ระหว่างเครื่องหมายจะถูกลบออก โพรเซสนี้ถูกใช้เพื่อให้แน่ใจว่าโหนดอินไลน์จะยังคงเก็บการจัดรูปแบบของย่อหน้าพาเรนต์ เมธอดจะรันการตรวจสอบบนโหนดที่ผ่านเป็นพารามิเตอร์และโยนข้อยกเว้นถ้าโหนดใดไม่ พารามิเตอร์ที่จะส่งผ่านไปยังวิธีนี้คือ:

StartNodeและEndNode สองพารามิเตอร์แรกคือโหนดที่กำหนดที่การสกัดของเนื้อหาคือการเริ่มต้นและสิ้นสุดตามลำ โหนดเหล่านี้สามารถเป็นได้ทั้งระดับบล็อก(ย่อหน้าตาราง)หรือระดับอินไลน์(เช่นรัน,FieldStart,BookmarkStartเป็นต้น):
1. ที่จะผ่านเขตข้อมูลที่คุณควรผ่านวัตถุFieldStartที่สอดคล้องกัน.
2. เมื่อต้องการส่งบุ๊กมาร์กควรส่งผ่านโหนดBookmarkStartและBookmarkEnd.
3. เมื่อต้องการส่งข้อคิดเห็นควรใช้โหนดCommentRangeStartและCommentRangeEnd.
IsInclusive. กำหนดว่าเครื่องหมายจะรวมอยู่ในการสกัดหรือไม่ นูป๊อปอัปที่จะให้เลือกความช่วยเหลือหากต้องการทดสอบให้คลิกเมาส์ปุ่มขวาบนที่ใดๆของข้:
1. ถ้าโหนดFieldStartถูกส่งผ่านแล้วตัวเลือกนี้จะกำหนดว่าฟิลด์ทั้งหมดจะถูกรวมหรือยกเว้น.
2. ถ้าโหนดBookmarkStartหรือBookmarkEndถูกส่งผ่านตัวเลือกนี้จะกำหนดว่าบุ๊กมาร์กถูกรวมหรือเพียงเนื้อหาระหว่างช่.
3. ถ้าโหนดCommentRangeStartหรือCommentRangeEndถูกส่งผ่านอ็อพชันนี้จะกำหนดว่าต้องการรวมความคิดเห็นหรือเฉพาะเนื้อหาใ.

การดำเนินการของExtractContentวิธีที่คุณสามารถหา ที่นี่. วิธีการนี้จะถูกอ้างถึงในสถานการณ์ในบทความนี้.

นอกจากนี้เรายังจะกำหนดวิธีการที่กำหนดเองเพื่อให้ง่ายต่อการสร้างเอกสารจากโหนด วิธีนี้ถูกใช้ในหลายสถานการณ์ด้านล่างและเพียงแค่สร้างเอกสารใหม่และนำเข้าเนื้อหาที่.

ตัวอย่างรหัสต่อไปนี้แสดงวิธีใช้รายการของโหนดและแทรกลงในเอกสารใหม่.

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	static SharedPtr<Document> GenerateDocument(SharedPtr<Document> srcDoc, SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> nodes)
	{
	auto dstDoc = MakeObject<Document>();
	// Remove the first paragraph from the empty document.
	dstDoc->get_FirstSection()->get_Body()->RemoveAllChildren();

	// Import each node from the list into the new document. Keep the original formatting of the node.
	auto importer = MakeObject<NodeImporter>(srcDoc, dstDoc, ImportFormatMode::KeepSourceFormatting);
	for (const auto& node : nodes)
	{
	SharedPtr<Node> importNode = importer->ImportNode(node, true);
	dstDoc->get_FirstSection()->get_Body()->AppendChild(importNode);
	}

	return dstDoc;
	}

view raw generate-document.h hosted with ❤ by GitHub

แยกเนื้อหาระหว่างย่อหน้า

นี้แสดงให้เห็นถึงวิธีการใช้วิธีการข้างต้นเพื่อดึงเนื้อหาระหว่างย่อหน้าที่ ในกรณีนี้เราต้องการที่จะดึงร่างกายของตัวอักษรที่พบในช่วงครึ่งแรกของเอกสาร เราสามารถบอกได้ว่านี่คือระหว่าง7กับย่อหน้าที่ 11.

รหัสด้านล่างทำงานนี้สำเร็จ ย่อหน้าที่เหมาะสมจะถูกแยกออกโดยใช้วิธีการGetChildบนเอกสารและส่งผ่านดัชนีที่ระบุ จากนั้นเราจะส่งผ่านโหนดเหล่านี้ไปยังExtractContentวิธีการและรัฐที่เหล่านี้จะรวมอยู่ในการสกัด. เมธอดนี้จะส่งคืนเนื้อหาที่คัดลอกระหว่างโหนดเหล่านี้ซึ่งถูกแทรกลงในเอกสารใหม่.

ตัวอย่างรหัสต่อไปนี้แสดงวิธีการแยกเนื้อหาระหว่างย่อหน้าที่โดยใช้วิธีการExtractContentด้านบน:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Extract content.docx");

	auto startPara = System::ExplicitCast<Paragraph>(doc->get_FirstSection()->get_Body()->GetChild(NodeType::Paragraph, 6, true));
	auto endPara = System::ExplicitCast<Paragraph>(doc->get_FirstSection()->get_Body()->GetChild(NodeType::Paragraph, 10, true));
	// Extract the content between these nodes in the document. Include these markers in the extraction.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> extractedNodes = ExtractContentHelper::ExtractContent(startPara, endPara, true);

	SharedPtr<Document> dstDoc = ExtractContentHelper::GenerateDocument(doc, extractedNodes);
	dstDoc->Save(ArtifactsDir + u"ExtractContent.ExtractContentBetweenParagraphs.docx");

view raw extract-content-between-paragraphs.h hosted with ❤ by GitHub

แยกเนื้อหาระหว่างโหนดประเภทต่างๆ

เราสามารถแยกเนื้อหาระหว่างชุดใดๆของโหนดระดับบล็อกหรือแบบอินไลน์ ในสถานการณ์สมมตินี้ด้านล่างเราจะแยกเนื้อหาระหว่างย่อหน้าแรกและตารางในส่วนที่ส เราได้รับโหนดเครื่องหมายโดยการเรียกBody.FirstParagraphและGetChildวิธีการในส่วนที่สองของเอกสารที่จะดึงโห สำหรับรูปแบบเล็กน้อยให้แทนซ้ำเนื้อหาและแทรกไว้ด้านล่างเดิม.

ตัวอย่างรหัสต่อไปนี้แสดงวิธีการแยกเนื้อหาระหว่างย่อหน้าและตารางโดยใช้วิธีการExtractContent:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Extract content.docx");

	auto startPara = System::ExplicitCast<Paragraph>(doc->get_LastSection()->GetChild(NodeType::Paragraph, 2, true));
	auto endTable = System::ExplicitCast<Table>(doc->get_LastSection()->GetChild(NodeType::Table, 0, true));
	// Extract the content between these nodes in the document. Include these markers in the extraction.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> extractedNodes = ExtractContentHelper::ExtractContent(startPara, endTable, true);

	// Let's reverse the array to make inserting the content back into the document easier.
	extractedNodes->Reverse();
	for (SharedPtr<Node> extractedNode : extractedNodes)
	// Insert the last node from the reversed list.
	endTable->get_ParentNode()->InsertAfter(extractedNode, endTable);

	doc->Save(ArtifactsDir + u"ExtractContent.ExtractContentBetweenBlockLevelNodes.docx");

view raw extract-content-between-block-level-nodes.h hosted with ❤ by GitHub

สารสกัดจากเนื้อหาระหว่างย่อหน้าขึ้นอยู่กับรูปแบบ

คุณอาจต้องแยกเนื้อหาระหว่างย่อหน้าของลักษณะเดียวกันหรือแตกต่างกันเช่นระหว่างย่อ นล่างแสดงให้เห็นถึงวิธีการเพื่อให้บรรลุนี้. มันเป็นตัวอย่างง่ายๆที่จะดึงเนื้อหาระหว่างตัวอย่างแรกของรูปแบบ"หัวข้อ1"และ"ส่วนหัว3"โดย การทำเช่นนี้เราตั้งค่าพารามิเตอร์ที่ผ่านมาเป็นเท็จซึ่งระบุว่าโหนดเครื่องหมายไม่ควรรว.

ในการใช้งานที่เหมาะสมนี้ควรจะรันในวงเพื่อแยกเนื้อหาระหว่างย่อหน้าทั้งหมดของลักษณ เนื้อหาที่แยกจะถูกคัดลอกลงในเอกสารใหม่.

ตัวอย่างรหัสต่อไปนี้แสดงวิธีการแยกเนื้อหาระหว่างย่อหน้าที่มีลักษณะเฉพาะโดยใช้วิธีการExtractContent:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Extract content.docx");

	// Gather a list of the paragraphs using the respective heading styles.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Paragraph>>> parasStyleHeading1 = ParagraphsByStyleName(doc, u"Heading 1");
	SharedPtr<System::Collections::Generic::List<SharedPtr<Paragraph>>> parasStyleHeading3 = ParagraphsByStyleName(doc, u"Heading 3");

	// Use the first instance of the paragraphs with those styles.
	SharedPtr<Node> startPara1 = parasStyleHeading1->idx_get(0);
	SharedPtr<Node> endPara1 = parasStyleHeading3->idx_get(0);

	// Extract the content between these nodes in the document. Don't include these markers in the extraction.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> extractedNodes = ExtractContentHelper::ExtractContent(startPara1, endPara1, false);

	SharedPtr<Document> dstDoc = ExtractContentHelper::GenerateDocument(doc, extractedNodes);
	dstDoc->Save(ArtifactsDir + u"ExtractContent.ExtractContentBetweenParagraphStyles.docx");

view raw extract-content-between-paragraph-styles.h hosted with ❤ by GitHub

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	static SharedPtr<System::Collections::Generic::List<SharedPtr<Paragraph>>> ParagraphsByStyleName(SharedPtr<Document> doc, System::String styleName)
	{
	// Create an array to collect paragraphs of the specified style.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Paragraph>>> paragraphsWithStyle =
	MakeObject<System::Collections::Generic::List<SharedPtr<Paragraph>>>();

	SharedPtr<NodeCollection> paragraphs = doc->GetChildNodes(NodeType::Paragraph, true);

	// Look through all paragraphs to find those with the specified style.
	for (const auto& paragraph : System::IterateOver<Paragraph>(paragraphs))
	{
	if (paragraph->get_ParagraphFormat()->get_Style()->get_Name() == styleName)
	{
	paragraphsWithStyle->Add(paragraph);
	}
	}

	return paragraphsWithStyle;
	}

view raw paragraphs-by-style-name.h hosted with ❤ by GitHub

แยกเนื้อหาระหว่างการทำงานที่เฉพาะเจาะจง

คุณสามารถแยกเนื้อหาระหว่างโหนดแบบอินไลน์เช่นRunได้เช่นกัน Runsจากย่อหน้าต่างๆสามารถส่งผ่านเป็นเครื่องหมาย รหัสด้านล่างแสดงวิธีการแยกข้อความเฉพาะในระหว่างโหนดParagraphเดียวกัน.

ตัวอย่างโค้ดต่อไปนี้แสดงวิธีแยกเนื้อหาระหว่างการทำงานเฉพาะของย่อหน้าเดียวกันโดยใช้เมธอด ExtractContent:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Extract content.docx");

	auto para = System::ExplicitCast<Paragraph>(doc->GetChild(NodeType::Paragraph, 7, true));
	SharedPtr<Run> startRun = para->get_Runs()->idx_get(1);
	SharedPtr<Run> endRun = para->get_Runs()->idx_get(4);

	// Extract the content between these nodes in the document. Include these markers in the extraction.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> extractedNodes = ExtractContentHelper::ExtractContent(startRun, endRun, true);

	for (SharedPtr<Node> extractedNode : extractedNodes)
	std::cout << extractedNode->ToString(SaveFormat::Text) << std::endl;

view raw extract-content-between-runs.h hosted with ❤ by GitHub

แยกเนื้อหาโดยใช้ฟิลด์

เมื่อต้องการใช้ฟิลด์เป็นเครื่องหมายโหนดFieldStartควรส่งผ่าน พารามิเตอร์สุดท้ายของวิธีการExtractContentจะกำหนดว่าฟิลด์ทั้งหมดจะถูกรวมหรือไม่ ลองดึงเนื้อหาระหว่างฟิลด์"FullName"ผสานและย่อหน้าในเอกสาร เราใช้วิธีการMoveToMergeFieldของDocumentBuilderชั้น นี้จะส่งคืนโหนดFieldStartจากชื่อของฟิลด์ผสานที่ส่งผ่านไปยังโหนดนั้น.

ในกรณีของเราให้ตั้งค่าพารามิเตอร์ที่ผ่านมาผ่านไปExtractContentวิธีการที่จะเป็นเท็จที่จะไม่รวมสนาม เราจะแสดงเนื้อหาที่สกัดเป็นPDF.

ตัวอย่างรหัสต่อไปนี้แสดงวิธีการแยกเนื้อหาระหว่างฟิลด์ที่เฉพาะเจาะจงและย่อหน้าในเอกสารโดยใช้วิธีการExtractContent:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Extract content.docx");
	auto builder = MakeObject<DocumentBuilder>(doc);

	// Pass the first boolean parameter to get the DocumentBuilder to move to the FieldStart of the field.
	// We could also get FieldStarts of a field using GetChildNode method as in the other examples.
	builder->MoveToMergeField(u"Fullname", false, false);

	// The builder cursor should be positioned at the start of the field.
	auto startField = System::ExplicitCast<FieldStart>(builder->get_CurrentNode());
	auto endPara = System::ExplicitCast<Paragraph>(doc->get_FirstSection()->GetChild(NodeType::Paragraph, 5, true));

	// Extract the content between these nodes in the document. Don't include these markers in the extraction.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> extractedNodes = ExtractContentHelper::ExtractContent(startField, endPara, false);

	SharedPtr<Document> dstDoc = ExtractContentHelper::GenerateDocument(doc, extractedNodes);
	dstDoc->Save(ArtifactsDir + u"ExtractContent.ExtractContentUsingField.docx");

view raw extract-content-using-field.h hosted with ❤ by GitHub

แยกเนื้อหาจากบุ๊กมาร์ก

ในเอกสารเนื้อหาที่กำหนดภายในบุ๊กมาร์กถูกห่อหุ้มโดยโหนดBookmarkStartและBookmarkEnd เนื้อหาที่พบระหว่างทั้งสองโหนดทำขึ้นที่คั่นหน้าเว็บ คุณสามารถส่งผ่านโหนดใดๆเหล่านี้เป็นเครื่องหมายใดๆแม้แต่คนจากบุ๊กมาร์กที่แตกต่างกันต เราจะดึงเนื้อหานี้ลงในเอกสารใหม่โดยใช้รหัสด้านล่าง ตัวเลือกพารามิเตอร์IsInclusiveจะแสดงวิธีการเก็บรักษาหรือยกเลิกบุ๊กมาร์ก.

ตัวอย่างรหัสต่อไปนี้แสดงวิธีการแยกเนื้อหาที่อ้างอิงบุ๊กมาร์กโดยใช้วิธีการExtractContent:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Extract content.docx");

	SharedPtr<Bookmark> bookmark = doc->get_Range()->get_Bookmarks()->idx_get(u"Bookmark1");
	SharedPtr<BookmarkStart> bookmarkStart = bookmark->get_BookmarkStart();
	SharedPtr<BookmarkEnd> bookmarkEnd = bookmark->get_BookmarkEnd();

	// Firstly, extract the content between these nodes, including the bookmark.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> extractedNodesInclusive =
	ExtractContentHelper::ExtractContent(bookmarkStart, bookmarkEnd, true);

	SharedPtr<Document> dstDoc = ExtractContentHelper::GenerateDocument(doc, extractedNodesInclusive);
	dstDoc->Save(ArtifactsDir + u"ExtractContent.ExtractContentBetweenBookmark.IncludingBookmark.docx");

	// Secondly, extract the content between these nodes this time without including the bookmark.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> extractedNodesExclusive =
	ExtractContentHelper::ExtractContent(bookmarkStart, bookmarkEnd, false);

	dstDoc = ExtractContentHelper::GenerateDocument(doc, extractedNodesExclusive);
	dstDoc->Save(ArtifactsDir + u"ExtractContent.ExtractContentBetweenBookmark.WithoutBookmark.docx");

view raw extract-content-between-bookmark.h hosted with ❤ by GitHub

แยกเนื้อหาจากความคิดเห็น

ความคิดเห็นถูกสร้างขึ้นจากCommentRangeStart,CommentRangeEndและโหนดความคิดเห็น ทั้งหมดของโหนดเหล่านี้เป็นแบบอินไลน์ สองโหนดแรกแคปซูลเนื้อหาในเอกสารที่มีการอ้างอิงโดยความคิดเห็น,เท่าที่เห็นในภาพหน้าจอด้านล่าง.

โหนดCommentตัวเองเป็นInlineStoryที่สามารถประกอบด้วยย่อหน้าและรัน แสดงถึงข้อความของความคิดเห็นที่เห็นเป็นฟองความคิดเห็นในบานหน้าต่างแสดงตัวอย่า เป็นโหนดนี้เป็นแบบอินไลน์และลูกหลานของร่างกายคุณยังสามารถดึงเนื้อหาจากภายในข้.

ความคิดเห็นที่ห่อหุ้มหัวข้อ,ย่อหน้าแรกและตารางในส่วนที่สอง. ลองดึงความคิดเห็นนี้ลงในเอกสารใหม่ อ็อพชันIsInclusiveจะบอกถ้าความคิดเห็นถูกเก็บไว้หรือถูกยกเลิก.

ตัวอย่างรหัสต่อไปนี้แสดงวิธีการทำเช่นนี้:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Extract content.docx");

	auto commentStart = System::ExplicitCast<CommentRangeStart>(doc->GetChild(NodeType::CommentRangeStart, 0, true));
	auto commentEnd = System::ExplicitCast<CommentRangeEnd>(doc->GetChild(NodeType::CommentRangeEnd, 0, true));

	// Firstly, extract the content between these nodes including the comment as well.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> extractedNodesInclusive =
	ExtractContentHelper::ExtractContent(commentStart, commentEnd, true);

	SharedPtr<Document> dstDoc = ExtractContentHelper::GenerateDocument(doc, extractedNodesInclusive);
	dstDoc->Save(ArtifactsDir + u"ExtractContent.ExtractContentBetweenCommentRange.IncludingComment.docx");

	// Secondly, extract the content between these nodes without the comment.
	SharedPtr<System::Collections::Generic::List<SharedPtr<Node>>> extractedNodesExclusive =
	ExtractContentHelper::ExtractContent(commentStart, commentEnd, false);

	dstDoc = ExtractContentHelper::GenerateDocument(doc, extractedNodesExclusive);
	dstDoc->Save(ArtifactsDir + u"ExtractContent.ExtractContentBetweenCommentRange.WithoutComment.docx");

view raw extract-content-between-comment-range.h hosted with ❤ by GitHub

วิธีการแยกเนื้อหาโดยใช้DocumentVisitor

ใช้คลาสDocumentVisitorเพื่อใช้สถานการณ์การใช้งานนี้ ชั้นนี้สอดคล้องกับรูปแบบการออกแบบของผู้เข้าชมที่รู้จักกันดี ด้วย**DocumentVisitor,**คุณสามารถกำหนดและดำเนินการดำเนินการที่กำหนดเองที่ต้องมีการนับเหนือแผนผังเอกสาร.

DocumentVisitor

แต่ละเมธอดDocumentVisitor.VisitXXXจะส่งคืนค่าVisitorActionที่ควบคุมการแจงนับของโหนด คุณสามารถร้องขออย่างใดอย่างหนึ่งเพื่อดำเนินการต่อการนับข้ามโหนดปัจจุบัน(แต่ยังคง.

เหล่านี้เป็นขั้นตอนที่คุณควรปฏิบัติตามเพื่อตรวจสอบและแยกส่วนต่างๆของเอกสาร:

สร้างคลาสที่ได้มาจากDocumentVisitor
แทนที่และจัดเตรียมการนำไปใช้สำหรับวิธีการDocumentVisitor.VisitXXXบางอย่างหรือทั้งหมดเพื่อดำเนินการดำเนิ
โทรNode.Acceptบนโหนดจากตำแหน่งที่คุณต้องการเริ่มต้นการแจงนับ ตัวอย่างเช่นถ้าคุณต้องการระบุเอกสารทั้งหมดให้ใช้Document.Accept

DocumentVisitor

ตัวอย่างนี้แสดงวิธีการใช้รูปแบบผู้เยี่ยมชมเพื่อเพิ่มการดำเนินงานใหม่ให้กับรุ่นออบเจกต์Aspose.Words ในกรณีนี้เราสร้างแปลงเอกสารที่เรียบง่ายในรูปแบบข้อความ:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Extract content.docx");

	auto convertToPlainText = MakeObject<ExtractContent::ConvertDocToTxt>();
	// Note that every node in the object model has the accept method so the visiting
	// can be executed not only for the whole document, but for any node in the document.
	doc->Accept(convertToPlainText);

	// Once the visiting is complete, we can retrieve the result of the operation,
	// That in this example, has accumulated in the visitor.
	std::cout << convertToPlainText->GetText() << std::endl;

view raw extract-content-using-document-visitor.h hosted with ❤ by GitHub

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	/// <summary>
	/// Simple implementation of saving a document in the plain text format. Implemented as a Visitor.
	/// </summary>
	class ConvertDocToTxt : public DocumentVisitor
	{
	public:
	ConvertDocToTxt() : mIsSkipText(false)
	{
	mIsSkipText = false;
	mBuilder = MakeObject<System::Text::StringBuilder>();
	}

	/// <summary>
	/// Gets the plain text of the document that was accumulated by the visitor.
	/// </summary>
	String GetText()
	{
	return mBuilder->ToString();
	}

	/// <summary>
	/// Called when a Run node is encountered in the document.
	/// </summary>
	VisitorAction VisitRun(SharedPtr<Run> run) override
	{
	AppendText(run->get_Text());
	// Let the visitor continue visiting other nodes.
	return VisitorAction::Continue;
	}

	/// <summary>
	/// Called when a FieldStart node is encountered in the document.
	/// </summary>
	VisitorAction VisitFieldStart(SharedPtr<FieldStart> fieldStart) override
	{
	ASPOSE_UNUSED(fieldStart);
	// In Microsoft Word, a field code (such as "MERGEFIELD FieldName") follows
	// after a field start character. We want to skip field codes and output field.
	// Result only, therefore we use a flag to suspend the output while inside a field code.
	// Note this is a very simplistic implementation and will not work very well.
	// If you have nested fields in a document.
	mIsSkipText = true;
	return VisitorAction::Continue;
	}

	/// <summary>
	/// Called when a FieldSeparator node is encountered in the document.
	/// </summary>
	VisitorAction VisitFieldSeparator(SharedPtr<FieldSeparator> fieldSeparator) override
	{
	ASPOSE_UNUSED(fieldSeparator);
	// Once reached a field separator node, we enable the output because we are
	// now entering the field result nodes.
	mIsSkipText = false;
	return VisitorAction::Continue;
	}

	/// <summary>
	/// Called when a FieldEnd node is encountered in the document.
	/// </summary>
	VisitorAction VisitFieldEnd(SharedPtr<FieldEnd> fieldEnd) override
	{
	ASPOSE_UNUSED(fieldEnd);
	// Make sure we enable the output when reached a field end because some fields
	// do not have field separator and do not have field result.
	mIsSkipText = false;
	return VisitorAction::Continue;
	}

	/// <summary>
	/// Called when visiting of a Paragraph node is ended in the document.
	/// </summary>
	VisitorAction VisitParagraphEnd(SharedPtr<Paragraph> paragraph) override
	{
	ASPOSE_UNUSED(paragraph);
	// When outputting to plain text we output Cr+Lf characters.
	AppendText(ControlChar::CrLf());
	return VisitorAction::Continue;
	}

	VisitorAction VisitBodyStart(SharedPtr<Body> body) override
	{
	ASPOSE_UNUSED(body);
	// We can detect beginning and end of all composite nodes such as Section, Body,
	// Table, Paragraph etc and provide custom handling for them.
	mBuilder->Append(u"* Body Started *\r\n");
	return VisitorAction::Continue;
	}

	VisitorAction VisitBodyEnd(SharedPtr<Body> body) override
	{
	ASPOSE_UNUSED(body);
	mBuilder->Append(u"* Body Ended *\r\n");
	return VisitorAction::Continue;
	}

	/// <summary>
	/// Called when a HeaderFooter node is encountered in the document.
	/// </summary>
	VisitorAction VisitHeaderFooterStart(SharedPtr<HeaderFooter> headerFooter) override
	{
	ASPOSE_UNUSED(headerFooter);
	// Returning this value from a visitor method causes visiting of this
	// Node to stop and move on to visiting the next sibling node
	// The net effect in this example is that the text of headers and footers
	// Is not included in the resulting output
	return VisitorAction::SkipThisNode;
	}

	private:
	SharedPtr<System::Text::StringBuilder> mBuilder;
	bool mIsSkipText;

	/// <summary>
	/// Adds text to the current output. Honors the enabled/disabled output flag.
	/// </summary>
	void AppendText(String text)
	{
	if (!mIsSkipText)
	{
	mBuilder->Append(text);
	}
	}
	};

view raw convert-doc-to-txt.h hosted with ❤ by GitHub

คุณสามารถดาวน์โหลดไฟล์ตัวอย่างของตัวอย่างนี้ได้จาก Aspose.Words GitHub.

วิธีการแยกข้อความเท่านั้น

วิธีการดึงข้อความจากเอกสารมีดังนี้:

ใช้Document.Saveด้วยSaveFormat.Textเพื่อบันทึกเป็นข้อความธรรมดาลงในแฟ้มหรือสตรีม
ใช้Node.ToStringและส่งพารามิเตอร์SaveFormat.Text ภายในนี้เรียกบันทึกเป็นข้อความลงในกระแสหน่วยความจำและส่งกลับสตริงผลลัพธ์
ใช้Node.GetTextเพื่อดึงข้อความที่มีอักขระควบคุมทั้งหมดMicrosoft Wordรวมทั้งรหัสฟิลด์
ใช้แบบกำหนดเองDocumentVisitorเพื่อทำการสกัดแบบกำหนดเอง

ใช้`Node.GetText`และ`Node.ToString`

เอกสารคำสามารถประกอบด้วยอักขระควบคุมที่กำหนดองค์ประกอบพิเศษเช่นฟิลด์สิ้นสุดข รายการอักขระตัวควบคุมคำที่เป็นไปได้ทั้งหมดจะถูกกำหนดในคลาสของControlChar เมธอดNode.GetTextจะส่งคืนข้อความที่มีอักขระตัวควบคุมทั้งหมดที่อยู่ในโหนด.

การโทรToStringจะส่งคืนการแสดงข้อความธรรมดาของเอกสารโดยไม่มีอักขระควบคุมเท่านั้น.

ตัวอย่างรหัสต่อไปนี้แสดงความแตกต่างระหว่างการโทรGetTextและToStringเมธอดบนโหนด:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>();
	auto builder = MakeObject<DocumentBuilder>(doc);

	builder->InsertField(u"MERGEFIELD Field");

	// When converted to text it will not retrieve fields code or special characters,
	// but will still contain some natural formatting characters such as paragraph markers etc.
	// This is the same as "viewing" the document as if it was opened in a text editor.
	std::cout << (String(u"ToString() Result: ") + doc->ToString(SaveFormat::Text)) << std::endl;

view raw simple-extract-text.h hosted with ❤ by GitHub

ใช้`SaveFormat.Text`

ตัวอย่างนี้บันทึกเอกสารดังนี้:

กรองอักขระฟิลด์และรหัสฟิลด์,รูปร่าง,เชิงอรรถ,จุดสิ้นสุดและการอ้างอิงแสดงความคิดเห็น
แทนที่จุดสิ้นสุดของย่อหน้าControlChar.Crอักขระด้วยชุดค่าผสมControlChar.CrLf
ใช้UTF8การเข้ารหัส

ตัวอย่างรหัสต่อไปนี้แสดงวิธีการบันทึกเอกสารในรูปแบบTXT:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Document.docx");
	doc->Save(ArtifactsDir + u"BaseConversions.DocxToTxt.txt");

view raw docx-to-txt.h hosted with ❤ by GitHub

แยกภาพจากรูปร่าง

คุณอาจต้องดึงรูปภาพเอกสารเพื่อดำเนินการบางอย่าง Aspose.Wordsช่วยให้คุณสามารถทำเช่นนี้ได้เช่นกัน.

ตัวอย่างรหัสต่อไปนี้แสดงวิธีการแยกภาพจากเอกสาร:

	// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C.git.
	auto doc = MakeObject<Document>(MyDir + u"Images.docx");

	SharedPtr<NodeCollection> shapes = doc->GetChildNodes(NodeType::Shape, true);
	int imageIndex = 0;

	for (const auto& shape : System::IterateOver<Shape>(shapes))
	{
	if (shape->get_HasImage())
	{
	String imageFileName =
	String::Format(u"Image.ExportImages.{0}_{1}", imageIndex, FileFormatUtil::ImageTypeToExtension(shape->get_ImageData()->get_ImageType()));

	// Note, if you have only an image (not a shape with a text and the image),
	// you can use shape->GetShapeRenderer()->Save(...) method to save the image.
	shape->get_ImageData()->Save(ArtifactsDir + imageFileName);
	imageIndex++;
	}
	}

view raw extract-images.h hosted with ❤ by GitHub

การทำงานกับช่วงในC++ การทำงานกับส่วนหัวและท้ายกระดาษในC++