Replace Text in PDF using C++
Sometimes a task appears to correct or replace text in a PDF document. Trying to do it manually will be a daunting task, so here is the solution to that problem.
Honestly, editing a PDF file is not an easy task. So, a situation, where you need to find and replace one word with another while editing a PDF file, will be very difficult as it will take you a long time to do it. In addition, you may encounter many problems with your output, such as formatting or broken fonts. If you want to easily find and replace text in PDF files, we recommend that you use the Aspose.Pdf library software as it will get the job done in minutes.
In this article, we will show you how to successfully find and replace text in your PDF files using Aspose.PDF for C++.
Replace Text in all pages of PDF document
In order to replace text in all the pages of a PDF document, you first need to use TextFragmentAbsorber to find the particular phrase you want to replace. After that, you need to go through all the TextFragments to replace the text and change any other attributes. Once you have done that, you only need to save the output PDF using the Save method of the Document object. The following code snippet shows you how to replace text in all pages of PDF document.
using namespace System;
using namespace Aspose::Pdf;
using namespace Aspose::Pdf::Text;
void ReplaceTextOnAllPages() {
String _dataDir("C:\\Samples\\");
auto document = MakeObject<Document>(_dataDir + u"sample.pdf");
// Create TextAbsorber object to find all instances of the input search phrase
auto textFragmentAbsorber = MakeObject<TextFragmentAbsorber>("Web");
// Accept the absorber for first page of document
document->get_Pages()->Accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
auto textFragmentCollection = textFragmentAbsorber->get_TextFragments();
// Loop through the fragments
for (auto textFragment : textFragmentCollection) {
// Update text and other properties
textFragment->set_Text(u"World Wide Web");
textFragment->get_TextState()->set_Font(FontRepository::FindFont(u"Verdana"));
textFragment->get_TextState()->set_FontSize(12);
textFragment->get_TextState()->set_ForegroundColor(Color::get_Blue());
textFragment->get_TextState()->set_BackgroundColor(Color::get_Gray());
}
// Save the updated PDF file
document->Save(_dataDir + u"Updated_Text.pdf");
}
Replace Text in particular page region
In order to replace text in a particular page region, first, we need to instantiate TextFragmentAbsorber object, specify page region using TextSearchOptions.Rectangle property and then iterate through all the TextFragments to replace the text. Once these operations are completed, we only need to save the output PDF using the Save method of the Document object. The following code snippet shows you how to replace text in all pages of PDF document.
void ReplaceTextInParticularRegion() {
String _dataDir("C:\\Samples\\");
// load PDF file
auto document = MakeObject<Document>(_dataDir + u"sample.pdf");
// instantiate TextFragment Absorber object
auto textFragmentAbsorber = MakeObject<TextFragmentAbsorber>("PDF");
// search text within page bound
textFragmentAbsorber->get_TextSearchOptions()->set_LimitToPageBounds(true);
// specify the page region for TextSearch Options
textFragmentAbsorber->get_TextSearchOptions()->set_Rectangle(new Rectangle(100, 700, 400, 770));
// search text from first page of PDF file
document->get_Pages()->idx_get(1)->Accept(textFragmentAbsorber);
// iterate through individual TextFragment
for (auto tf : textFragmentAbsorber->get_TextFragments()) {
// replace text with "---"
tf->set_Text(u"---");
}
// Save the updated PDF file
document->Save(_dataDir + u"Updated_Text.pdf");
}
Replace Text Based on a Regular Expression
If you want to replace some phrases based on regular expression, you first need to find all the phrases matching that particular regular expression using TextFragmentAbsorber. You will have to pass the regular expression as a parameter to the TextFragmentAbsorber constructor. You also need to create TextSearchOptions object which specifies whether the regular expression is being used or not. Once you get the matching phrases in TextFragments, you need to loop through all of them and update as required. Finally, you need to save the updated PDF using the Save method of the Document object. The following code snippet shows you how to replace text based on a regular expression.
void ReplaceTextWithRegularExpression() {
String _dataDir("C:\\Samples\\");
// load PDF file
auto document = MakeObject<Document>(_dataDir + u"Sample.pdf");
// Create TextAbsorber object to find all instances of the input search phrase
auto textFragmentAbsorber = MakeObject<TextFragmentAbsorber>("\\d{4}-\\d{4}");
// like 1999-2000
// Set text search option to specify regular expression usage
auto textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber->set_TextSearchOptions(textSearchOptions);
// Accept the absorber for first page of document
document->get_Pages()->Accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
auto textFragmentCollection = textFragmentAbsorber->get_TextFragments();
// Loop through the fragments
for (auto textFragment : textFragmentCollection) {
// Update text and other properties
textFragment->set_Text(u"ABCD-EFGH");
textFragment->get_TextState()->set_Font(FontRepository::FindFont(u"Verdana"));
textFragment->get_TextState()->set_FontSize(12);
textFragment->get_TextState()->set_ForegroundColor(Color::get_Blue());
textFragment->get_TextState()->set_BackgroundColor(Color::get_Gray());
}
// Save the updated PDF file
document->Save(_dataDir + u"Updated_Text.pdf");
}
Replace fonts in existing PDF file
Aspose.PDF for C++ supports the capability to replace text in PDF document. However, sometimes you have a requirement to only replace the font being used inside PDF document. So instead of replacing the text, only font being used is replaced. One of the overloads of TextFragmentAbsorber constructor accepts TextEditOptions object as an argument and we can use RemoveUnusedFonts value from TextEditOptions.FontReplace enumeration to accomplish our requirements. The following code snippet shows how to replace the font inside PDF document.
void ReplaceFonts() {
String _dataDir("C:\\Samples\\");
// Instantiate Document object
auto document = MakeObject<Document>(_dataDir + u"sample.pdf");
// Search text fragments and set edit option as remove unused fonts
auto textFragmentAbsorber = MakeObject<TextFragmentAbsorber>(
MakeObject<TextEditOptions>(TextEditOptions::FontReplace::RemoveUnusedFonts));
// Accept the absorber for all pages of document
document->get_Pages()->Accept(textFragmentAbsorber);
// traverse through all the TextFragments
auto textFragmentCollection = textFragmentAbsorber->get_TextFragments();
for (auto textFragment : textFragmentCollection) {
String fontName = textFragment->get_TextState()->get_Font()->get_FontName();
// if the font name is ArialMT, replace font name with Arial
if (fontName.Equals(u"ArialMT")) {
textFragment->get_TextState()->set_Font(FontRepository::FindFont(u"Arial"));
}
}
// Save the updated PDF file
document->Save(_dataDir + u"Updated_Text.pdf");
}
In the next code snippet, you will see how to use non-English font when replacing text:
void UseNonEnglishFontWhenReplacingText() {
String _dataDir("C:\\Samples\\");
// Instantiate Document object
auto document = MakeObject<Document>(_dataDir + u"sample.pdf");
// Lets to change every of word "PDF" to some Japan text with specific font
// MSGothic that might be installed in the OS
// Also, it may be another font that supports hieroglyphs
auto textFragmentAbsorber = MakeObject<TextFragmentAbsorber>("PDF");
// Create instance of Text Search options
auto searchOptions = MakeObject<TextSearchOptions>(true);
textFragmentAbsorber->set_TextSearchOptions(searchOptions);
// Accept the absorber for all pages of document
document->get_Pages()->Accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
auto textFragmentCollection = textFragmentAbsorber->get_TextFragments();
// Loop through the fragments
for (auto textFragment : textFragmentCollection) {
// Update text and other properties
textFragment->set_Text(u"ファイル");
textFragment->get_TextState()->set_Font(FontRepository::FindFont(u"TakaoMincho"));
textFragment->get_TextState()->set_FontSize(12);
textFragment->get_TextState()->set_ForegroundColor(Color::get_Blue());
textFragment->get_TextState()->set_BackgroundColor(Color::get_Gray());
}
// Save the updated document
document->Save(_dataDir + u"Japanese_Text.pdf");
}
Text Replacement should automatically re-arrange Page Contents
Aspose.PDF for C++ supports finding and replacing text within a PDF file. Recently, however, some clients have run into problems when replacing text, where a particular TextFragment is replaced with smaller content and some extra whitespace is displayed in the resulting PDF, or if the TextFragment is replaced with some longer string, then the words overlap the existing page content. Thus, it was required to introduce a mechanism that, after replacing the text inside the PDF document, rearranged its content.
To serve the aforementioned scenarios, Aspose.PDF for C++ has been improved so that such issues do not occur when replacing text within a PDF file. The following code snippet demonstrates how to replace text within a PDF file and the page content should be reordered automatically.
void RearrangeContent() {
String _dataDir("C:\\Samples\\");
// Instantiate Document object
auto document = MakeObject<Document>(_dataDir + u"sample.pdf");
// Create TextFragment Absorber object with regular expression
auto textFragmentAbsorber = MakeObject<TextFragmentAbsorber>("[PDF,Web]");
auto textSearchOptions = MakeObject<TextSearchOptions>(true);
textFragmentAbsorber->set_TextSearchOptions(textSearchOptions);
// You also can specify the ReplaceAdjustment.WholeWordsHyphenation option to
// wrap text on the next or current line if the current line becomes too long or
// short after replacement:
//textFragmentAbsorber->get_TextReplaceOptions()->set_ReplaceAdjustmentAction(TextReplaceOptions::ReplaceAdjustment::WholeWordsHyphenation);
// Accept the absorber for all pages of document
document->get_Pages()->Accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
auto textFragmentCollection = textFragmentAbsorber->get_TextFragments();
// Replace each TextFragment
for (auto textFragment : textFragmentCollection) {
// Set font of text fragment being replaced
textFragment->get_TextState()->set_Font(FontRepository::FindFont(u"Arial"));
// Set font size
textFragment->get_TextState()->set_FontSize(10);
textFragment->get_TextState()->set_ForegroundColor(Color::get_Blue());
textFragment->get_TextState()->set_BackgroundColor(Color::get_Gray());
// Replace the text with larger string than placeholder
textFragment->set_Text(u"This is a larger string for the testing of this feature");
}
// Save resultant PDF
document->Save(_dataDir + u"RearrangeContentsUsingTextReplacement_out.pdf");
}
Rendering Replaceable Symbols during PDF creation
Replaceable symbols are special symbols in a text string that can be replaced with corresponding content at run time. Replaceable symbols currently support by new Document Object Model of Aspose.PDF namespace are $P
, $p,
\n
, \r
. The $p
and $P
are used to deal with the page numbering at run time. $p
is replaced with the number of the page where the current Paragraph class is in. $P
is replaced with the total number of pages in the document. When adding TextFragment
to the paragraphs collection of PDF documents, it does not support line feed inside the text. However in order to add text with a line feed, please use TextFragment
with TextParagraph
:
- use “\r\n” or Environment.NewLine in TextFragment instead of single “\n”;
- create a TextParagraph object. It will add text with line splitting;
- add the TextFragment with TextParagraph.AppendLine;
- add the TextParagraph with TextBuilder.AppendParagraph.
void RenderingReplaceableSymbols() {
String _dataDir("C:\\Samples\\");
// load PDF file
auto document = MakeObject<Document>(_dataDir + u"sample.pdf");
auto page = document->get_Pages()->Add();
// Initialize new TextFragment with text containing required newline markers
auto textFragment = MakeObject<TextFragment>("Applicant Name: \r\n Joe Smoe");
// Set text fragment properties if necessary
textFragment->get_TextState()->set_FontSize(12);
textFragment->get_TextState()->set_Font(FontRepository::FindFont(u"TimesNewRoman"));
textFragment->get_TextState()->set_BackgroundColor(Color::get_LightGray());
textFragment->get_TextState()->set_ForegroundColor(Color::get_Red());
// Create TextParagraph object
auto par = MakeObject<TextParagraph>();
// Add new TextFragment to paragraph
par->AppendLine(textFragment);
// Set paragraph position
par->set_Position(MakeObject<Position>(100, 600));
// Create TextBuilder object
auto textBuilder = MakeObject<TextBuilder>(page);
// Add the TextParagraph using TextBuilder
textBuilder->AppendParagraph(par);
document->Save(_dataDir + u"RenderingReplaceableSymbols_out.pdf");
}
Replaceable symbols in Header/Footer area
The replaceable symbol can also be placed inside the header/footer section of the PDF file. Review the following code snippet to see how to add a replaceable symbol to a footer section.
void ReplaceableSymbolsInHeaderFooterArea() {
auto document = MakeObject<Document>();
auto page = doc.getPages().add();
auto marginInfo = MakeObject<MarginInfo>();
marginInfo->set_Top(90);
marginInfo->set_Bottom(50);
marginInfo->set_Left(50);
marginInfo->set_Right(50);
// Assign the marginInfo instance to Margin property of PageInfo
page.getPageInfo()->set_Margin(marginInfo);
auto hfFirst = MakeObject<HeaderFooter>();
page->set_Header(hfFirst);
hfFirst->get_Margin()->set_Left(50);
hfFirst->get_Margin()->set_Right(50);
// Instantiate a Text paragraph that will store the content to show as header
auto t1 = MakeObject<TextFragment>("report title");
t1->get_TextState()->set_Font(FontRepository::FindFont(u"Arial"));
t1->get_TextState()->set_FontSize(16);
t1->get_TextState()->set_ForegroundColor(Color::get_Black());
t1->get_TextState()->set_FontStyle(FontStyles::Bold);
t1->get_TextState()->set_HorizontalAlignment(HorizontalAlignment::Center);
t1->get_TextState()->set_LineSpacing(5.0f);
hfFirst->get_Paragraphs()->Add(t1);
auto t2 = MakeObject<TextFragment>("Report_Name");
t2->get_TextState()->set_Font(FontRepository::FindFont(u"Arial"));
t2->get_TextState()->set_ForegroundColor(Color::get_Black());
t2->get_TextState()->set_HorizontalAlignment(HorizontalAlignment::Center);
t2->get_TextState()->set_LineSpacing(5.0f);
t2->get_TextState()->set_FontSize(12);
hfFirst->get_Paragraphs()->Add(t2);
// Create a HeaderFooter object for the section
auto hfFoot = MakeObject<HeaderFooter>();
// Set the HeaderFooter object to odd & even footer
page->set_Footer(hfFoot);
hfFoot->get_Margin()->set_Left(50);
hfFoot->get_Margin()->set_Right(50);
// Add a text paragraph containing current page number of total number of pages
auto t3 = MakeObject<TextFragment>("Generated on test date");
auto t4 = MakeObject<TextFragment>("report name ");
auto t5 = MakeObject<TextFragment>("Page $p of $P");
// Instantiate a table object
auto tab2 = MakeObject<Table>();
// Add the table in paragraphs collection of the desired section
hfFoot->get_Paragraphs()->Add(tab2);
// Set with column widths of the table
tab2->set_ColumnWidths(u"165 172 165");
// Create rows in the table and then cells in the rows
auto row3 = tab2->get_Rows()->Add();
row3->get_Cells()->Add();
row3->get_Cells()->Add();
row3->get_Cells()->Add();
// Set the vertical allignment of the text as center alligned
row3->get_Cells()->idx_get(0)->set_Alignment(HorizontalAlignment::Left);
row3->get_Cells()->idx_get(1)->set_Alignment(HorizontalAlignment::Center);
row3->get_Cells()->idx_get(2)->set_Alignment(HorizontalAlignment::Right);
row3->get_Cells()->idx_get(0)->get_Paragraphs()->Add(t3);
row3->get_Cells()->idx_get(1)->get_Paragraphs()->Add(t4);
row3->get_Cells()->idx_get(2)->get_Paragraphs()->Add(t5);
auto table = MakeObject<Table>();
table->set_ColumnWidths(u"33% 33% 34%");
table->set_DefaultCellPadding(new MarginInfo());
table->get_DefaultCellPadding()->set_Top(10);
table->get_DefaultCellPadding()->set_Bottom(10);
// Add the table in paragraphs collection of the desired section
page.getParagraphs().add(table);
// Set default cell border using BorderInfo object
table->set_DefaultCellBorder(MakeObject<BorderInfo>(BorderSide::All, 0.1f));
// Set table border using another customized BorderInfo object
table->set_Border(MakeObject<BorderInfo>(BorderSide::All, 1.0f));
table->set_RepeatingRowsCount(1);
// Create rows in the table and then cells in the rows
auto row1 = table->get_Rows()->Add();
row1->get_Cells()->Add(u"col1");
row1->get_Cells()->Add(u"col2");
row1->get_Cells()->Add(u"col3");
String CRLF ("\r\n");
for (int i = 0; i <= 10; i++) {
auto row = table->get_Rows()->Add();
row->set_IsRowBroken(true);
for (int c = 0; c <= 2; c++) {
SharedPtr<Cell> c1;
if (c == 2)
c1 = row->get_Cells()->Add(
u"Aspose.Total for C++ is a compilation of every Java component offered by Aspose. It is compiled on a"
+ CRLF
+ u"daily basis to ensure it contains the most up to date versions of each of our Java components. "
+ CRLF
+ u"daily basis to ensure it contains the most up to date versions of each of our Java components. "
+ CRLF
+ u"Using Aspose.Total for C++ developers can create a wide range of applications.");
else
c1 = row->get_Cells()->Add(u"item1" + c);
c1->set_Margin(new MarginInfo());
c1->get_Margin()->set_Left(30);
c1->get_Margin()->set_Top(10);
c1->get_Margin()->set_Bottom(10);
}
}
_dataDir = _dataDir + "ReplaceableSymbolsInHeaderFooter_out.pdf";
doc.save(_dataDir);
}
Remove All Text from PDF Document
Remove All Text using Operators
In some text operations, you need to remove all text from the PDF document, and for that, you usually need to set the found text as an empty string value. The fact is that changing the text for a set of text fragments causes a number of operations to check and adjust the position of the text. They are required in text editing scripts. The difficulty lies in the fact that you cannot determine how many chunks of text will be deleted in the script where they are processed in the loop.
Therefore, we recommend using a different approach for the scenario of removing all text from PDF pages.
The following code snippet shows how to resolve this task fast.
void RemoveAllTextUsingOperators() {
String _dataDir("C:\\Samples\\");
// Open document
auto document = MakeObject<Document>(_dataDir + u"sample.pdf");
// Loop through all pages of PDF Document
for (int i = 1; i <= document->get_Pages()->get_Count(); i++) {
auto page = document->get_Pages()->idx_get(i);
auto operatorSelector = MakeObject<OperatorSelector>(MakeObject<Aspose::Pdf::Operators::TextShowOperator>());
// Select all text on the page
page->get_Contents()->Accept(operatorSelector);
// Delete all text
page->get_Contents()->Delete(operatorSelector->get_Selected());
}
// Save the document
document->Save(_dataDir + u"RemoveAllText_out.pdf");
}