Replace Text in PDF
The following code snippet also work with Aspose.PDF.Drawing library.
Replace Text in all pages of PDF document
In order to replace text in all the pages of a PDF document, you first need to use TextFragmentAbsorber to find the particular phrase you want to replace. After that, you need to go through all the TextFragments to replace the text and change any other attributes. Once you have done that, you only need to save the output PDF using the Save method of the Document object. The following code snippet shows you how to replace text in all pages of PDF document.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ReplaceTextInAllPages()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "ReplaceTextAll.pdf"))
{
// Create TextAbsorber object to find all instances of the input search phrase
var absorber = new Aspose.Pdf.Text.TextFragmentAbsorber("text");
// Accept the absorber for all the pages
document.Pages.Accept(absorber);
// Get the extracted text fragments
var textFragmentCollection = absorber.TextFragments;
// Loop through the fragments
foreach (var textFragment in textFragmentCollection)
{
// Update text and other properties
textFragment.Text = "TEXT";
textFragment.TextState.Font = Aspose.Pdf.Text.FontRepository.FindFont("Verdana");
textFragment.TextState.FontSize = 22;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Blue);
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Green);
}
// Save PDF document
document.Save(dataDir + "ReplaceTextInAllPages_out.pdf");
}
}
Replace Text in particular page region
In order to replace text in a particular page region, first, we need to instantiate TextFragmentAbsorber object, specify page region using TextSearchOptions.Rectangle property and then iterate through all the TextFragments to replace the text. Once these operations are completed, we only need to save the output PDF using the Save method of the Document object. The following code snippet shows you how to replace text in all pages of PDF document.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ReplaceTextInParticularPageRegion()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "programaticallyproducedpdf.pdf"))
{
// instantiate TextFragment Absorber object
var absorber = new Aspose.Pdf.Text.TextFragmentAbsorber();
// search text within page bound
absorber.TextSearchOptions.LimitToPageBounds = true;
// specify the page region for TextSearch Options
absorber.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle(100, 100, 200, 200);
// search text from first page of PDF file
document.Pages[1].Accept(absorber);
// iterate through individual TextFragment
foreach (var textFragment in absorber.TextFragments)
{
// update text to blank characters
textFragment.Text = "";
}
// Save PDF document
document.Save(dataDir + "ReplaceTextInParticularPageRegion_out.pdf");
}
}
Replace Text Based on a Regular Expression
If you want to replace some phrases based on regular expression, you first need to find all the phrases matching that particular regular expression using TextFragmentAbsorber. You will have to pass the regular expression as a parameter to the TextFragmentAbsorber constructor. You also need to create TextSearchOptions object which specifies whether the regular expression is being used or not. Once you get the matching phrases in TextFragments, you need to loop through all of them and update as required. Finally, you need to save the updated PDF using the Save method of the Document object. The following code snippet shows you how to replace text based on a regular expression.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ReplaceTextBasedOnARegularExpression()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "SearchRegularExpressionPage.pdf"))
{
// Create TextAbsorber object to find all the phrases matching the regular expression
var absorber = new Aspose.Pdf.Text.TextFragmentAbsorber("\\d{4}-\\d{4}"); // Like 1999-2000
// Set text search option to specify regular expression usage
absorber.TextSearchOptions = new Aspose.Pdf.Text.TextSearchOptions(true);
// Accept the absorber for a single page
document.Pages[1].Accept(absorber);
// Get the extracted text fragments
var collection = absorber.TextFragments;
// Loop through the fragments
foreach (var textFragment in collection)
{
// Update text and other properties
textFragment.Text = "New Phrase";
// Set to an instance of an object.
textFragment.TextState.Font = Aspose.Pdf.Text.FontRepository.FindFont("Verdana");
textFragment.TextState.FontSize = 22;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Blue);
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Green);
}
// Save PDF document
document.Save(dataDir + "ReplaceTextonRegularExpression_out.pdf");
}
}
Replace fonts in existing PDF file
Aspose.PDF for .NET supports the capability to replace text in PDF document. However, sometimes you have a requirement to only replace the font being used inside PDF document. So instead of replacing the text, only font being used is replaced. One of the overloads of TextFragmentAbsorber constructor accepts TextEditOptions object as an argument and we can use RemoveUnusedFonts value from TextEditOptions.FontReplace enumeration to accomplish our requirements. The following code snippet shows how to replace the font inside PDF document.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ReplaceFonts()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "ReplaceTextPage.pdf"))
{
// Create text edit options
var options = new Aspose.Pdf.Text.TextEditOptions(Aspose.Pdf.Text.TextEditOptions.FontReplace.RemoveUnusedFonts);
// Search text fragments and set edit option as remove unused fonts
var absorber = new Aspose.Pdf.Text.TextFragmentAbsorber(options);
// Accept the absorber for all the pages
document.Pages.Accept(absorber);
// Traverse through all the TextFragments
foreach (var textFragment in absorber.TextFragments)
{
// If the font name is ArialMT, replace font name with Arial
if (textFragment.TextState.Font.FontName == "Arial,Bold")
{
textFragment.TextState.Font = Aspose.Pdf.Text.FontRepository.FindFont("Arial");
}
}
// Save PDF document
document.Save(dataDir + "ReplaceFonts_out.pdf");
}
}
Text Replacement should automatically re-arrange Page Contents
Aspose.PDF for .NET supports the feature to search and replace text inside the PDF file. However recently some customers encountered issues during text replace when particular TextFragment is replaced with smaller contents and some extra spaces are displayed in resultant PDF or in case the TextFragment is replaced with some longer string, then words overlap existing page contents. So the requirement was to introduce a mechanism that once the text inside a PDF document is replaced, the contents should be re-arranged.
In order to cater above-stated scenarios, Aspose.PDF for .NET has been enhanced so that no such issues appear when replacing text inside PDF file. The following code snippet shows how to replace text inside PDF file and the page contents should be re-arranged automatically.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void AutomaticallyReArrangePageContents()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "ExtractTextPage.pdf"))
{
// Create TextFragment Absorber object with regular expression
var absorber = new Aspose.Pdf.Text.TextFragmentAbsorber("[TextFragmentAbsorber,companyname,Textbox,50]");
document.Pages.Accept(absorber);
// Replace each TextFragment
foreach (var textFragment in absorber.TextFragments)
{
// Set font of text fragment being replaced
textFragment.TextState.Font = Aspose.Pdf.Text.FontRepository.FindFont("Arial");
// Set font size
textFragment.TextState.FontSize = 12;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Navy;
// Replace the text with larger string than placeholder
textFragment.Text = "This is a Larger String for the Testing of this issue";
}
// Save PDF document
document.Save(dataDir + "AutomaticallyReArrangePageContents_out.pdf");
}
}
Rendering Replaceable Symbols during PDF creation
Replaceable symbols are special symbols in a text string that can be replaced with corresponding content at run time. Replaceable symbols currently support by new Document Object Model of Aspose.PDF namespace are $P
, $p,
\n
, \r
. The $p
and $P
are used to deal with the page numbering at run time. $p
is replaced with the number of the page where the current Paragraph class is in. $P
is replaced with the total number of pages in the document. When adding TextFragment
to the paragraphs collection of PDF documents, it does not support line feed inside the text. However in order to add text with a line feed, please use TextFragment
with TextParagraph
:
- Use “\r\n” or Environment.NewLine in TextFragment instead of single “\n”.
- Create a TextParagraph object. It will add text with line splitting.
- Add the TextFragment with TextParagraph.AppendLine.
- Add the TextParagraph with TextBuilder.AppendParagraph.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void RenderingReplaceableSymbols()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
// Create PDF document
using (var document = new Aspose.Pdf.Document())
{
var page = document.Pages.Add();
// Initialize new TextFragment with text containing required newline markers
Aspose.Pdf.Text.TextFragment textFragment = new Aspose.Pdf.Text.TextFragment("Applicant Name: " + Environment.NewLine + " Joe Smoe");
// Set text fragment properties if necessary
textFragment.TextState.FontSize = 12;
textFragment.TextState.Font = Aspose.Pdf.Text.FontRepository.FindFont("TimesNewRoman");
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.LightGray;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Red;
// Create TextParagraph object
var par = new Aspose.Pdf.Text.TextParagraph();
// Add new TextFragment to paragraph
par.AppendLine(textFragment);
// Set paragraph position
par.Position = new Aspose.Pdf.Text.Position(100, 600);
// Create TextBuilder object
var textBuilder = new Aspose.Pdf.Text.TextBuilder(page);
// Add the TextParagraph using TextBuilder
textBuilder.AppendParagraph(par);
// Save PDF document
document.Save(dataDir + "RenderingReplaceableSymbols_out.pdf");
}
}
Replaceable symbols in Header/Footer area
Replaceable symbols can also be placed inside the Header/Footer section of PDF file. Please take a look over the following code snippet for details on how to add replaceable symbol in the footer section.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ReplaceableSymbolsInHeaderOrFooterArea()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
// Create PDF document
using (var document = new Aspose.Pdf.Document())
{
var page = document.Pages.Add();
// Create margin info
var marginInfo = new Aspose.Pdf.MarginInfo();
marginInfo.Top = 90;
marginInfo.Bottom = 50;
marginInfo.Left = 50;
marginInfo.Right = 50;
// Assign the marginInfo instance to Margin property of sec1.PageInfo
page.PageInfo.Margin = marginInfo;
var headerFooterFirst = new Aspose.Pdf.HeaderFooter();
page.Header = headerFooterFirst;
headerFooterFirst.Margin.Left = 50;
headerFooterFirst.Margin.Right = 50;
// Instantiate a Text paragraph that will store the content to show as header
var fragment1 = new Aspose.Pdf.Text.TextFragment("report title");
fragment1.TextState.Font = Aspose.Pdf.Text.FontRepository.FindFont("Arial");
fragment1.TextState.FontSize = 16;
fragment1.TextState.ForegroundColor = Aspose.Pdf.Color.Black;
fragment1.TextState.FontStyle = Aspose.Pdf.Text.FontStyles.Bold;
fragment1.TextState.HorizontalAlignment = Aspose.Pdf.HorizontalAlignment.Center;
fragment1.TextState.LineSpacing = 5f;
headerFooterFirst.Paragraphs.Add(fragment1);
var fragment2 = new Aspose.Pdf.Text.TextFragment("Report_Name");
fragment2.TextState.Font = Aspose.Pdf.Text.FontRepository.FindFont("Arial");
fragment2.TextState.ForegroundColor = Aspose.Pdf.Color.Black;
fragment2.TextState.HorizontalAlignment = Aspose.Pdf.HorizontalAlignment.Center;
fragment2.TextState.LineSpacing = 5f;
fragment2.TextState.FontSize = 12;
headerFooterFirst.Paragraphs.Add(fragment2);
// Create a HeaderFooter object for the section
var headerFooterFoot = new Aspose.Pdf.HeaderFooter();
// Set the HeaderFooter object to odd & even footer
page.Footer = headerFooterFoot;
headerFooterFoot.Margin.Left = 50;
headerFooterFoot.Margin.Right = 50;
// Add a text paragraph containing current page number of total number of pages
var fragment3 = new Aspose.Pdf.Text.TextFragment("Generated on test date");
var fragment4 = new Aspose.Pdf.Text.TextFragment("report name ");
var fragment5 = new Aspose.Pdf.Text.TextFragment("Page $p of $P");
// Instantiate a table object
var table2 = new Aspose.Pdf.Table();
// Add the table in paragraphs collection of the desired section
headerFooterFoot.Paragraphs.Add(table2);
// Set with column widths of the table
table2.ColumnWidths = "165 172 165";
// Create rows in the table and then cells in the rows
var row3 = table2.Rows.Add();
row3.Cells.Add();
row3.Cells.Add();
row3.Cells.Add();
// Set the vertical allignment of the text as center alligned
row3.Cells[0].Alignment = Aspose.Pdf.HorizontalAlignment.Left;
row3.Cells[1].Alignment = Aspose.Pdf.HorizontalAlignment.Center;
row3.Cells[2].Alignment = Aspose.Pdf.HorizontalAlignment.Right;
row3.Cells[0].Paragraphs.Add(fragment3);
row3.Cells[1].Paragraphs.Add(fragment4);
row3.Cells[2].Paragraphs.Add(fragment5);
// Sec1.Paragraphs.Add(New Text("Aspose.Total for Java is a compilation of every Java component offered by Aspose. It is compiled on a#$NL" + "daily basis to ensure it contains the most up to date versions of each of our Java components. #$NL " + "Using Aspose.Total for Java developers can create a wide range of applications. #$NL #$NL #$NP" + "Aspose.Total for Java is a compilation of every Java component offered by Aspose. It is compiled on a#$NL" + "daily basis to ensure it contains the most up to date versions of each of our Java components. #$NL " + "Using Aspose.Total for Java developers can create a wide range of applications. #$NL #$NL #$NP" + "Aspose.Total for Java is a compilation of every Java component offered by Aspose. It is compiled on a#$NL" + "daily basis to ensure it contains the most up to date versions of each of our Java components. #$NL " + "Using Aspose.Total for Java developers can create a wide range of applications. #$NL #$NL"))
var table = new Aspose.Pdf.Table();
table.ColumnWidths = "33% 33% 34%";
table.DefaultCellPadding = new Aspose.Pdf.MarginInfo();
table.DefaultCellPadding.Top = 10;
table.DefaultCellPadding.Bottom = 10;
// Add the table in paragraphs collection of the desired section
page.Paragraphs.Add(table);
// Set default cell border using BorderInfo object
table.DefaultCellBorder = new Aspose.Pdf.BorderInfo(Aspose.Pdf.BorderSide.All, 0.1f);
// Set table border using another customized BorderInfo object
table.Border = new Aspose.Pdf.BorderInfo(Aspose.Pdf.BorderSide.All, 1f);
table.RepeatingRowsCount = 1;
// Create rows in the table and then cells in the rows
var row1 = table.Rows.Add();
row1.Cells.Add("col1");
row1.Cells.Add("col2");
row1.Cells.Add("col3");
const string CRLF = "\r\n";
for (int i = 0; i <= 10; i++)
{
var row = table.Rows.Add();
row.IsRowBroken = true;
for (int c = 0; c <= 2; c++)
{
Aspose.Pdf.Cell c1;
if (c == 2)
{
c1 = row.Cells.Add("Aspose.Total for Java is a compilation of every Java component offered by Aspose. It is compiled on a" + CRLF + "daily basis to ensure it contains the most up to date versions of each of our Java components. " + CRLF + "daily basis to ensure it contains the most up to date versions of each of our Java components. " + CRLF + "Using Aspose.Total for Java developers can create a wide range of applications.");
}
else
{
c1 = row.Cells.Add("item1" + c);
}
c1.Margin = new Aspose.Pdf.MarginInfo();
c1.Margin.Left = 30;
c1.Margin.Top = 10;
c1.Margin.Bottom = 10;
}
}
// Save PDF document
document.Save(dataDir + "ReplaceableSymbolsInHeaderFooter_out.pdf");
}
}
Remove Unused Fonts from PDF File
Aspose.PDF for .NET supports the feature to embed fonts while creating a PDF document, as well as the capability to embed fonts in existing PDF files. From Aspose.PDF for .NET 7.3.0, it also lets you remove duplicate or unused fonts from PDF documents.
To replace fonts, use the following approach:
- Call the TextFragmentAbsorber class.
- Call the TextFragmentAbsorber class’ TextEditOptions.FontReplace.RemoveUnusedFonts parameter. (This removes fonts that have become unused during font replacement).
- Set font individually for each text fragment.
The following code snippet replaces font for all text fragments of all document pages and removes unused fonts.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void RemoveUnusedFonts()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "ReplaceTextPage.pdf"))
{
var options = new Aspose.Pdf.Text.TextEditOptions(Aspose.Pdf.Text.TextEditOptions.FontReplace.RemoveUnusedFonts);
var absorber = new Aspose.Pdf.Text.TextFragmentAbsorber();
document.Pages.Accept(absorber);
// Iterate through all the TextFragments
foreach (var textFragment in absorber.TextFragments)
{
textFragment.TextState.Font = Aspose.Pdf.Text.FontRepository.FindFont("Arial, Bold");
}
// Save PDF document
document.Save(dataDir + "RemoveUnusedFonts_out.pdf");
}
}
Remove All Text from PDF Document
Remove All Text using Operators
In some text operation, you need to remove all text from PDF document and for that, you need to set found text as empty string value usually. The point is that changing the text for multitude text fragments invokes a number of checking and text position adjustment operations. They are essential in the text editing scenarios. The difficulty is that you cannot determine how many text fragments will be removed in the scenario where they are processed in a loop.
Therefore, we recommend using another approach for the scenario of removing all text from PDF pages. Please consider the following code snippet that works very fast.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void RemoveAllTextFromDocument()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "RemoveAllText.pdf"))
{
// Loop through all pages of PDF Document
for (int i = 1; i <= document.Pages.Count; i++)
{
var page = document.Pages[i];
var operatorSelector = new Aspose.Pdf.OperatorSelector(new Aspose.Pdf.Operators.TextShowOperator());
// Select all text on the page
page.Contents.Accept(operatorSelector);
// Delete all text
page.Contents.Delete(operatorSelector.Selected);
}
// Save PDF document
document.Save(dataDir + "RemoveAllText_out.pdf", Aspose.Pdf.SaveFormat.Pdf);
}
}