Aspose.Words for Java 21.7 Release Notes

Major Features

There are 94 improvements and fixes in this regular monthly release. The most notable are:

  • Implemented rendering to PDF/A-2 format.
  • Added an ability to work with Framesets.
  • Introduced a new overload of DocumentBuilder.InsertHtml and a new enumeration HtmlInsertOptions.
  • Provided new API for working with Fill patterns.
  • Supported several exact date-time parse formats while loading JSON for LINQ Reporting Engine.

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
WORDSNET-22279Implement Fill.Patterned methodNew Feature
WORDSNET-17557Provide way to get URL from Target attribute inside webSettings.xml.rels fileNew Feature
WORDSNET-21750Highlighted Content is not visible after exporting to PDFNew Feature
WORDSNET-15400Implement maskPen rasterOp mode in InkML renderingNew Feature
WORDSNET-22370Support several exact date-time parse formats while loading JSONNew Feature
WORDSNET-15984Ink annotations are not aligned in PDF outputNew Feature
WORDSNET-21781Support using of arrays with data bands and LINQ extension methods for JavaNew Feature
WORDSNET-22159Implement rendering of brushes with maskPen rasterOp modeNew Feature
WORDSNET-22009Make links clickable when converting CHM to HTML when the file pointing itselfNew Feature
WORDSNET-5971Support of PdfCompliance PDF/A-2New Feature
WORDSNET-11488Add feature to support of PdfCompliance PDF/A-2aNew Feature
WORDSNET-15836Add feature to support array type identifiers in template syntaxNew Feature
WORDSNET-20505Add support for PDF/A-2u, PDF/A-2a, PDF/A-3u, PDF/A-3a complianceNew Feature
WORDSNET-15129Support of DOCX to PDF_A_2U conversionNew Feature
WORDSNET-15126Support of DOCX to PDF_A_2B conversionNew Feature
WORDSNET-15125Support of DOCX to PDF_A_2A conversionNew Feature
WORDSNET-13778Add feature to support of PdfCompliance PDF/A-2bNew Feature
WORDSNET-17478API needed to set the Scope attribute in Table in rendered PDFEnhancement
WORDSNET-21089Shapes are lost after RTF to PDF conversionEnhancement
WORDSNET-22397How to use “Mark as decorative” IsDecorative propertyEnhancement
WORDSNET-22255About Data Source Name in ReportingEngine.BuildReport MethodEnhancement
WORDSNET-17726Notify user when he tries to load MOBI document but the document is AZW3Enhancement
WORDSJAVA-2619Fix standard system colours in JavaBug
WORDSNET-21259Unwanted spaces are added after conversion from DOCX>HTML>DOCXBug
WORDSNET-22312NullReferenceException when save attached document as PDFBug
WORDSNET-22263ReportingEngine.BuildReport throws System.InvalidCastExceptionBug
WORDSNET-22416DOCX to PDF: Output file text not formated properlyBug
WORDSNET-21766Document.Compare generates incorrect revisionsBug
WORDSNET-16528Incorrect convert DOCX to PDF when docx in compatible modeBug
WORDSNET-22404Incorrect page number for FieldPageRefBug
WORDSNET-19592Exception is thrown when converting DOCX to HTMLBug
WORDSNET-21940HTML img’s base64 data is not getting converted to doc in Aspose.words javaBug
WORDSNET-18852Empty paragraphs added when inserting unordered list using DocumentBuilder.InsertHtmlBug
WORDSNET-22277OutOfMemoryException throws wnen extract pages and save it to PNGBug
WORDSNET-21669Mobi file - Huffman compression is not yet supported for specific fileBug
WORDSNET-21963Value of attribute “lang” is invalidBug
WORDSNET-22388NullPointerException when calling mailMerge.execute()Bug
WORDSNET-21988Incorrect page size during conversion of HTML with landscape orientation to DOCXBug
WORDSNET-22228Text is pushed down to next pages after DOCX to PDF conversionBug
WORDSNET-22357System.NullReferenceException occurs upon DOC to PDF conversionBug
WORDSNET-22311Duplicate list item created when inserting multiple paragraph breaks using Range.Replace()Bug
WORDSNET-22321Replacing text containing a paragraph break is poorly represented with TrackRevisions in enabledBug
WORDSNET-21500Image displays as red cross in converted documentsBug
WORDSNET-22387Content are lost after PDF to DOCX conversionBug
WORDSNET-17077Row height increases after DOCX-HTML-DOCX roundtripBug
WORDSNET-19823Paragraph is pushed down to next page after DOCX>HTML>DOCX conversionBug
WORDSNET-22303DOCX does not open in MS Word after after re-savingBug
WORDSNET-22389System.InvalidCastException occurs upon loading DOCBug
WORDSNET-20793DOCX to PDF conversion issue with PDF accessibilityBug
WORDSNET-22330Series lines are rendered incorrectly after converting to PDFBug
WORDSNET-20708Wrong position of chart legend in output PNG fileBug
WORDSNET-22356Object reference not set to an instance of an objectBug
WORDSNET-22374File Corrupted Exception occurs upon loading a RTF DocumentBug
WORDSNET-20964DOCX to PDF conversion issue with chart renderingBug
WORDSNET-22358System.NullReferenceException occurs upon DOC to PDF conversionBug
WORDSNET-22386Document is corrupted exception thrown while loading DOCBug
WORDSNET-22293HTML to TXT conversion issue with table layoutBug
WORDSNET-22364Document word find and replace issueBug
WORDSNET-21454FileCorruptedException is thrown during import HTML fileBug
WORDSNET-21137Image is not loaded from HTML if the “src” attribute value has leading or trailing whitespace charactersBug
WORDSNET-22333Paragraph border is rendered at different position when ExportDocumentStructure is usedBug
WORDSNET-18137Document.UpdateFields does not update TEMPLATE fieldBug
WORDSNET-22360Field.Update does not update FieldHyperlink and showBug
WORDSNET-22332Bullet symbol is lost after DOCX to PDF conversionBug
WORDSNET-22355Incorrectly read SPRM:D62F. Expected 0, but read 11 bytes.Bug
WORDSNET-20958Page number in the footer is wrong in output TIFFBug
WORDSNET-21944ODT to HTML | Frame’s bottom border is missingBug
WORDSNET-22285Exception is thrown when loading MOBI fileBug
WORDSNET-22231Negative letter-spacing after conversion from PDF to HTMLBug
WORDSNET-21789Aspose.Words.FileCorruptedException error when converting MHTMLBug
WORDSNET-22233System.NullReferenceException is thrown when DOC is saved to PDFBug
WORDSNET-20402HTML export issuesBug
WORDSNET-22152Tab stop in a list item gets considerably wider after conversion to HTMLBug
WORDSNET-21796DOCX to PDF/A conversion and validation fails: Several cases with header cells that are not taggedBug
WORDSNET-21947DOCX to PDF/A conversion: accessibility validation fails: Bullet list items are broken into several tagsBug
WORDSNET-14245Document.Compare generates incorrect format revisionsBug
WORDSNET-22297Extra Text becomes Visible in PDFBug
WORDSNET-22191Problem with nested tables in RTF contentBug
WORDSNET-21035Incorrect rendering of Clustered Column Type Chart in PDFBug
WORDSNET-18229The title of the horizontal axis overlaps the axis labelsBug
WORDSNET-20135The Units are no longer aligned with the tick marks in Chart when rendered to PDFBug
WORDSNET-17543Document.UpdateFields leaves INCLUDETEXT field with “Error! Not a valid filename.”Bug
WORDSNET-22262Track changes - Comments are shown in the outlineBug
WORDSNET-22083Accessibility issues are appeared after DOCX to PDF conversionBug
WORDSNET-22242Accessibility tags not behaving properly in Aspose PDF compared to Acrobat/Word PDFBug
WORDSNET-22190Table header tags are not exported after DOCX to PDF/a conversionBug
WORDSNET-21161Table tag structure is incorrect after DOCX to PDF conversionBug
WORDSNET-22223Transparent PNG image became non-transparent after DOCX to PDF conversionBug
WORDSNET-21846DOCX to PDF (PdfA1a) conversion issue with transparent image renderingBug
WORDSNET-17086PDF version support // Text effects are lost in PDFA_1B outputBug
WORDSNET-16968Saving to PDF with PdfCompliance.PdfA1a results in a large fileBug
WORDSNET-21123SmartArt drawing corruption during open save a DOCXBug
WORDSNET-21127Images and and text content are changed after re-saving DOCX at Windows Server 2019Bug
WORDSNET-17657Document.UpdateFields does not process FieldStyleRef.SuppressNonDelimitersBug

Public API and Backward Incompatible Changes

This section lists public API changes that were introduced in Aspose.Words 21.7. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in Aspose.Words which may affect existing code. Any behavior introduced that could be seen as regression and modifies the existing behavior is especially important and is documented here.

Added new public property IsDecorative

Related issue: WORDSNET-22397

A new public property IsDecorative has been added to ShapeBase class.

/// <summary>
/// Gets or sets the flag that specifies whether the shape is decorative.
/// </summary>
/// <remarks>
/// Note that shape having not empty <see cref="ShapeBase.AlternativeText"/> cannot be decorative.
/// </remarks>
public bool IsDecorative { get; set; }

Use Case: Explains how to use IsDecorative property.

Document doc = new Document("input.docx");

Shape shape = doc.FirstSection.Body.Shapes[0];
shape.IsDecorative = true;

doc.Save("output.docx");

Implemented rendering to PDF/A-2 format

New values added to PdfCompliance enum.

public enum PdfCompliance
{
        /// <summary>
        /// The output file will comply with the PDF/A-2a standard.
        /// This level includes all the requirements of PDF/A-2u and additionally requires
        /// that document structure be included (also known as being "tagged"),
        /// with the objective of ensuring that document content can be searched and repurposed.
        /// </summary>
        /// <remarks>
        /// Note that exporting the document structure significantly increases the memory consumption, especially
        /// for the large documents.
        /// </remarks>
        PdfA2a,
        /// <summary>
        /// The output file will comply with the PDF/A-2u standard.
        /// PDF/A-2u has the objective of preserving document static visual appearance over time, independent of the tools
        /// and systems used for creating, storing or rendering the files. Additionally any text contained in the document
        /// can be reliably extracted as a series of Unicode codepoints.
        /// </summary>
        PdfA2u
}

PDF/A-2 is based on PDF-1.7 format and removes significant limitations of PDF/A-1 like prohibited transparency and prohibited object compression.

There are only PDF/A-2a and PDF/A-2u conformance levels and there are not PDF/A-2b because Aspose.Words regular output already conforms to PDF/A-2u level which is more strict than PDF/A-2b.

Please note that PDF/A-2 standard adds two new requirements to the content of the document in addition to the requirements of PDF/A-1:

  1. Both PDF/A-2a and PDF/A-2u standards prohibits the usage of .notdef glyph
  2. PDF/A-2a prohibits the usage of Unicode PUA codepoints. They are commonly used in symbolic fonts like “Symbol”, “Wingdings”, etc.

PdfCompliance.PdfA1a and PdfCompliance.PdfA1b are marked as obsolete. Aspose.Words PDF/A-2 output do not have limitations of PDF/A-1 output described above and is more consistent. So there is no reason to keep saving to PDF/A-1. However if you have requirements to save particularly to PDF/A-1 please report it and we may reconsider this decision.

Several PdfSaveOptions made prohibited when saving to PDF/A-1 and PDF/A-2

public class PdfSaveOptions
{
        /// <summary>
        /// Specifies whether to preserve Microsoft Word form fields as form fields in PDF or convert them to text.
        /// Default is <c>false</c>.
        /// </summary>
        /// <remarks>
        ...
        /// <para>This value is ignored when saving to PDF/A compliance because editable forms are prohibited.</para>
        /// </remarks>
        public bool PreserveFormFields;

        /// <summary>
        /// Gets or sets a value determining the way <see cref="Document.CustomDocumentProperties"/> are exported to PDF file.
        /// </summary>
        /// <remarks>
        ...
        /// <para><see cref="PdfCustomPropertiesExport.Metadata"/> value is not supported when saving to PDF/A.</para>
        /// </remarks>
        public PdfCustomPropertiesExport CustomPropertiesExport;
     
        /// <summary>
        /// Gets or sets a value determining whether hyperlinks in the output Pdf document
        /// are forced to be opened in a new window (or tab) of a browser.
        /// </summary>
        /// <remarks>
        ...
        /// <para>Settings this value to <c>true</c> is not allowed when saving to PDF/A because JavaScript actions are prohibited.</para>
        /// </remarks>
     
        public bool OpenHyperlinksInNewWindow;
        /// <summary>
        /// Specifies how the color space will be selected for the images in PDF document.
        /// </summary>
        /// <remarks>
        ...
        /// <para><see cref="PdfImageColorSpaceExportMode.SimpleCmyk"/> value is not supported when saving to PDF/A.</para>
        /// </remarks>
        public PdfImageColorSpaceExportMode ImageColorSpaceExportMode;
     
        /// <summary>
        /// A flag indicating whether image interpolation shall be performed by a conforming reader.
        /// When <c>false</c> is specified, the flag is not written to the output document and
        /// the default behaviour of reader is used instead.
        /// </summary>
        /// <remarks>
        ...
        /// <para>Settings this value to <c>true</c> is not allowed when saving to PDF/A according to compliance requirements.</para>
        /// </remarks>
        public bool InterpolateImages;
}

A typo in a new enum name introduced in 21.6 corrected

“Aspose.Words.Layout.ContinuosSectionRestart” enum introduced in 21.6 was renamed to ContinuousSectionRestart in order to correct a typo.

Implemented Frameset API

Related issue: WORDSNET-17557

Implemented a new Frameset API for accessing and updating a frame URL. It is available through the new Frameset property of a document.

namespace Aspose.Words
{
    public class Document
    {
        ...

        /// <summary>
        /// Returns a <see cref="Frameset"/> instance if this document represents a frames page.
        /// </summary>
        /// <remarks>
        /// If the document is not framed, the property has the <b>null</b> value.
        /// </remarks>
        public Frameset Frameset { get; }
    }
}

namespace Aspose.Words.Framesets
{
    /// <summary>
    /// Represents a frames page or a single frame on a frames page.
    /// </summary>
    /// <remarks>
    /// If the <see cref="ChildFramesets"/> property contains items, this instance is a frames page, otherwise it is
    /// a single frame.
    /// </remarks>
    public class Frameset
    {
        /// <summary>
        /// Gets or sets the web page URL or document file name to display in this frame.
        /// </summary>
        public string FrameDefaultUrl { get; set; }

        /// <summary>
        /// Gets or sets a value indicating whether the web page or document file name specified in the
        /// <see cref="FrameDefaultUrl"/> property is an external resource the frame is linked with.
        /// </summary>
        public bool IsFrameLinkToFile { get; set; }
     
        /// <summary>
        /// Gets the collection of child frames and frames pages.
        /// </summary>
        public FramesetCollection ChildFramesets { get; }
    }
     
    /// <summary>
    /// Represents a collection of instances of the <see cref="Frameset"/> class.
    /// </summary>
    public class FramesetCollection : IEnumerable<Frameset>
    {
        /// <summary>
        /// Gets the number of frames/frames pages contained in the collection.
        /// </summary>
        public int Count { get; }
     
        /// <summary>
        /// Gets a frame or frames page at the specified index.
        /// </summary>
        public Frameset this[int index] { get; }
    }
}

Use Case:

Document doc = new Document("input.docx");

doc.Frameset.ChildFramesets[0].FrameDefaultUrl = "https://www.aspose.com";
doc.Frameset.ChildFramesets[0].IsFrameLinkToFile = true;

doc.Save("output.docx");

Introduced a new overload of DocumentBuilder.InsertHtml and a new enumeration HtmlInsertOptions

Related issue: WORDSNET-18852

The following new overload of DocumentBuilder.InsertHtml has been implemented:

/// <summary>
/// Inserts an HTML string into the document. Allows to specify additional options.
/// </summary>
/// <param name="html">An HTML string to insert into the document.</param>
/// <param name="options">Options that are used when HTML string is inserted.</param>
/// <remarks>
/// You can use this method to insert an HTML fragment or whole HTML document.
/// </remarks>
public void InsertHtml(string html, HtmlInsertOptions options)

And the following public enumeration has been introduced:

/// <summary>
/// Specifies options for the <see cref="DocumentBuilder.InsertHtml(string, HtmlInsertOptions)"/> method.
/// </summary>
[Flags]
public enum HtmlInsertOptions
{
    /// <summary>
    /// Use the default options when inserting HTML.
    /// </summary>
    None = 0,

    /// <summary>
    /// Use font and paragraph formatting specified in <see cref="DocumentBuilder"/> as base formatting for text
    /// inserted from HTML.
    /// </summary>
    /// <remarks>
    /// <para>
    /// If this option is not specified, formatting of <see cref="DocumentBuilder"/> is ignored and text is inserted
    /// with default HTML formatting. As a result, the text looks as it is rendered in browsers.
    /// </para>
    /// <para>
    /// If this option is specified, formatting of inserted text is based on formatting specified in
    /// <see cref="DocumentBuilder"/>, and the text looks as if it were inserted using <see cref="DocumentBuilder.Write"/>.
    /// </para>
    /// </remarks>
    UseBuilderFormatting = 1,
     
    /// <summary>
    /// Remove the empty paragraph that is normally inserted after HTML that ends with a block-level element.
    /// </summary>
    /// <remarks>
    /// By default, <see cref="DocumentBuilder"/> makes sure that the last block-level element imported from HTML
    /// is closed after import and inserts a paragraph break after the element. This paragraph break separates
    /// content imported from HTML from content of the template document. However, if a HTML fragment is inserted into
    /// an empty paragraph, that paragraph break will create an extra empty paragraph. If this behavior is undesired,
    /// specify this option.
    /// </remarks>
    RemoveLastEmptyParagraph = 2
}

The old DocumentBuilder.InsertHtml overloads are now aliases for the new overload of DocumentBuilder.InsertHtml as follows:

builder.InsertHtml(html);
// Is equivalent to:
builder.InsertHtml(html, HtmlInsertOptions.None);

builder.InsertHtml(html, false);
// Is equivalent to:
builder.InsertHtml(html, HtmlInsertOptions.None);

builder.InsertHtml(html, true);
// Is equivalent to:
builder.InsertHtml(html, HtmlInsertOptions.UseBuilderFormatting);

Use Case: When DocumentBuilder.InsertHtml inserts a HTML fragment that ends with a block-level HTML element (for example, a paragraph or a list), it normally closes that block-level element and inserts a paragraph break. As a result, a new empty paragraph appears after inserted document. This behavior may be undesired when HTML fragments are inserted into a template document. For example, consider the following mail merge scenario.

// Default behavior.
builder.MoveToMergeField("NAME");
builder.InsertHtml("<p>John Smith</p>", true);
builder.MoveToMergeField("EMAIL");
builder.InsertHtml("<p>jsmith@example.com</p>", true);

In the resulting document, there will be an extra empty paragraph after each inserted HTML paragraph. However, if we specify HtmlInsertOptions.RemoveLastEmptyParagraph, those extra empty paragraphs will be removed.

// RemoveLastEmptyParagraph is specified.
builder.MoveToMergeField("NAME");
builder.InsertHtml("<p>John Smith</p>", HtmlInsertOptions.UseBuilderFormatting | HtmlInsertOptions.RemoveLastEmptyParagraph);
builder.MoveToMergeField("EMAIL");
builder.InsertHtml("<p>jsmith@example.com</p>", HtmlInsertOptions.UseBuilderFormatting | HtmlInsertOptions.RemoveLastEmptyParagraph);

Inside the resulting document the empty paragraphs will be removed.

Introduced FieldOptions.TemplateName property

Related issue: WORDSNET-18137

As a part of implementing WORDSNET-18137, we have introduced the FieldOptions.TemplateName property which is used to specify template file name for the TEMPLATE field:

/// <summary>
/// Gets or sets the file name of the template used by the document.
/// </summary>
/// <remarks>
/// <p>This property is used by the <see cref="FieldTemplate"/> field if the <see cref="Document.AttachedTemplate"/> property is empty.</p>
/// <p>If this property is empty the default template file name <c>Normal.dotm</c> is used.</p>
/// </remarks>
public string TemplateName { get; set; }

Use Case:

document.FieldOptions.TemplateName = @"C:\Users\AW\AppData\Roaming\Microsoft\Templates\Normal.dotm";

Public API for working with patterns has been introduced

Related issue: WORDSNET-22279

The following new public methods were added into the Fill class:

/// <summary>
/// Sets the specified fill to a pattern.
/// <param name="patternType"><see cref="Drawing.PatternType"/></param>
/// </summary>
public void Patterned(PatternType patternType)

/// <summary>
/// Sets the specified fill to a pattern.
/// <param name="patternType"><see cref="Drawing.PatternType"/></param>
/// <param name="foreColor">The color of the foreground fill.</param>
/// <param name="backColor">The color of the background fill.</param>
/// </summary>
public void Patterned(PatternType patternType, Color foreColor, Color backColor)
A new public property Fill.Pattern has been added:

/// <summary>
/// Gets a <see cref="Drawing.PatternType"/> for the fill.
/// </summary>
public PatternType Pattern { get; }
A new public enum has been introduced:

/// <summary>
/// Specifies the fill pattern to be used to fill a shape.
/// </summary>
public enum PatternType

Use Case: Explains how to get and apply a pattern to a fill.

// Open some document with a shape.
Document doc = new Document("DocWithShape.docx");

// Get Fill object for the first shape.
Fill fill = doc.FirstSection.Body.Shapes[0].Fill;

// Check Fill Pattern value.
Console.WriteLine("Pattern value is: {0}", fill.Pattern);

// Apply DiagonalBrick pattern to the shape fill.
fill.Patterned(PatternType.DiagonalBrick);

doc.Save("DiagonalBrick.docx");