Aspose.Words for Python via .NET 22.5 Release Notes

This page contains release notes for Aspose.Words for Python via .NET 22.5.0.

Major Features

There are 126 improvements and fixes in this regular monthly release. The most notable are:

Added support for loading EPUB documents.
Added support for loading XML documents.
Added support of “Envelope No. 10” page size for printing.
Implemented rendering of a border box around the MathML formulas and the strike lines.
Improved font detection when rendering characters in MathML formulas.
Improved text wrapping for RTL paragraphs with custom left indent.

Full List of Issues Covering all Changes in this Release (Reported by .NET Users)

Key	Summary	Category
WORDSNET-3822	Table headers are not wrapped properly	New Feature
WORDSNET-8319	Table column widths are calculated incorrectly during rendering	New Feature
WORDSNET-8487	Paragraphs followed by Tightly wrapped Shapes render incorrectly in PDF	New Feature
WORDSNET-8838	Support loading EPUB file format	New Feature
WORDSNET-8931	Tab spacing is not respected in fixed page formats	New Feature
WORDSNET-9253	Shaping issues with Telugu, Tamil, and Chinese characters	New Feature
WORDSNET-10869	Add feature to format page number	New Feature
WORDSNET-12720	Table contents do not render correctly in output PDF	New Feature
WORDSNET-14941	FILLIN fields are lost in output PDF and print	New Feature
WORDSNET-22284	Text position is changed after DOC to PDF conversion	New Feature
WORDSNET-22697	Add support for loading of XML documents	New Feature
WORDSNET-22887	Add loading progress notification	New Feature
WORDSNET-23577	Add .NET 6.0 assemblies to the release build	New Feature
WORDSNET-7128	Text wrapping in Cell is not correct in PDF	Enhancement
WORDSNET-8325	WordML to PDF conversion issue with table rendering	Enhancement
WORDSNET-9075	Table column widths are calculated incorrectly during rendering	Enhancement
WORDSNET-12186	Picture and Textbox cause Aspose.Words to render content on one additional page	Enhancement
WORDSNET-13405	Table width in percent is not honored when converted from DOCX to XPS	Enhancement
WORDSNET-5460	Table inside header of RTF was not rendered in PDF	Bug
WORDSNET-5619	Table widths are disturbed upon rendering to PDF	Bug
WORDSNET-8037	WordML to PDF conversion issue with text rendering	Bug
WORDSNET-8327	WordML to Pdf conversion issue with shape rendering	Bug
WORDSNET-9172	DOCX to PDF conversion issue with table formatting	Bug
WORDSNET-9788	DOC to PDF conversion issue with text (date) alignment	Bug
WORDSNET-10017	DrawingML TextBoxes are pushed to the left beyond the left boundary in fixed page formats	Bug
WORDSNET-10410	Table indentation is not preserved during rendering	Bug
WORDSNET-10700	RTF to PDF conversion issue with table rendering	Bug
WORDSNET-10947	Incorrect tab positioning causes incorrect text wrapping	Bug
WORDSNET-11123	Table widths are not calculated correctly during rendering to PDF	Bug
WORDSNET-11500	Incorrect position of wrapped text on conversion to PDF	Bug
WORDSNET-11641	Widths of Tables and cells are not preserved during rendering to PDF	Bug
WORDSNET-11806	DOC to PDF conversion issue with table layout	Bug
WORDSNET-12099	Table layouts are not correct in PDF	Bug
WORDSNET-12381	Table Cells widths are incorrect in rendered PDF	Bug
WORDSNET-12750	Table Cells widths are incorrect in rendered PDF	Bug
WORDSNET-12979	RenderedDocument and lines issue within table cells	Bug
WORDSNET-13196	Thai font is displayed in the wrong way in PDF	Bug
WORDSNET-14989	Thai characters are not preserved when rendered to PDF	Bug
WORDSNET-16037	Field.isDirty value always false	Bug
WORDSNET-16742	Arabic text is not rendered correctly in output PDF	Bug
WORDSNET-18524	Conversion RTF to PDF inconsistent table width	Bug
WORDSNET-19215	OfficeMath enclosing formula is crushed when outputting PDF	Bug
WORDSNET-19798	Cells in Table gets misplaced during open/save a DOC	Bug
WORDSNET-22023	Text alignments in narrow cells of PDF differs from Word after conversion	Bug
WORDSNET-22605	Split string in LINQ Reporting not working as expected	Bug
WORDSNET-22669	Table Content Pushed Down from its Original Position in PDF	Bug
WORDSNET-22725	Table Cut off Issue when converting Html to Word	Bug
WORDSNET-22726	Exception is thrown while converting from DOCX to HTML	Bug
WORDSNET-22733	Extra vertical spacing added between Rows of a Table with Merged Cells	Bug
WORDSNET-22736	Image position is changed after MHTML to PDF Conversion	Bug
WORDSNET-22843	Incorrect rendering of Column3D in PDF	Bug
WORDSNET-22987	Import differs from what is in browser	Bug
WORDSNET-23025	ArgumentException: Incorrect hex length	Bug
WORDSNET-23225	Aspose.Words hangs on document rendering	Bug
WORDSNET-23279	Horizontal axis labels are wrapped improperly	Bug
WORDSNET-23330	Image is not visible after import from AZW3	Bug
WORDSNET-23332	Aspose.Words hangs when loading a MOBI document	Bug
WORDSNET-23370	UpdatePageLayout throws exception	Bug
WORDSNET-23371	Structured Document Tag gets removed	Bug
WORDSNET-23394	Document.UpdatePageLayout() throws System.InvalidOperationException : Infinite loop detected	Bug
WORDSNET-23396	Text wrapping does not match Word	Bug
WORDSNET-23485	Tab is lost upon converting document to HTML	Bug
WORDSNET-23500	Content is shifted upon rendering document	Bug
WORDSNET-23504	Text is wrapped improperly upon rendering	Bug
WORDSNET-23505	Aspose.Words improperly selects paper source upon printing.	Bug
WORDSNET-23511	RemoveEmptyParagraphs cleanup option does not work in case of nested IF fields	Bug
WORDSNET-23527	Graphics is lost on PDF import	Bug
WORDSNET-23531	Math equations alignment issue	Bug
WORDSNET-23535	Consider disabling LoadOptions.ResourceLoadingCallback invocations for data URLs	Bug
WORDSNET-23536	FileCorruptedException is thrown upon loading HTML document	Bug
WORDSNET-23540	DOCX to PDF: Text overlapping the document layout	Bug
WORDSNET-23545	Problem when editing PDF form field in Chrome	Bug
WORDSNET-23563	Content is lost upon loading PDF document	Bug
WORDSNET-23565	Numbers are rendered as tofu when use NumeralFormat.ArabicIndic	Bug
WORDSNET-23578	Inaccurate vertical alignment in equations when saving to PDF	Bug
WORDSNET-23588	ArgumentException is thrown upon loading MHTML document	Bug
WORDSNET-23596	Text alignment in table is incorrect	Bug
WORDSNET-23604	List numbering is wrong for lists from HTML altChunk’s	Bug
WORDSNET-23607	“Unsupported file format: Unknown” on loading TXT file	Bug
WORDSNET-23642	DOCX to PDF conversion causes layout issues in output PDF file	Bug
WORDSNET-23643	Chart series are lost after DOCX to PDF conversion	Bug
WORDSNET-23644	Bar charts’ height decreases after DOCX to PDF conversion	Bug
WORDSNET-23660	AW does not imitate MS Word handling of an unsupported xml element	Bug
WORDSNET-23661	ReportingEngine.BuildReport throws an exception on .NET 6 when reflection optimization is on	Bug
WORDSNET-23665	Text in category labels is not wrapped	Bug
WORDSNET-23667	Font name and size does not match MS Word on WML to DOCX conversion	Bug
WORDSNET-23668	Extra paragraph in header on WML to DOCX conversion	Bug
WORDSNET-23672	Incorrect shape positions on WML to DOCX conversion	Bug
WORDSNET-23677	Do not invoke ResourceLoadingCallback for empty URLs	Bug
WORDSNET-23685	Document.ExtractPages() causes line numbers restarting	Bug
WORDSNET-23693	InvalidOperationException: Sequence contains more than one matching element	Bug
WORDSNET-23696	TestSaveOdt performance test fails on net5 and net6 CLR	Bug
WORDSNET-23698	DOC to PDF: Text with Shadow effect not correctly converted	Bug
WORDSNET-23699	RTL paragraph is positioned incorrectly inside an inline table with different left and right spacings	Bug
WORDSNET-23703	Font is changed after appending document with KeepSourceFormatting	Bug
WORDSNET-23707	DOC Compare System.InvalidOperationException: Custom XML part is not found.	Bug
WORDSNET-23715	FileCorruptedException is thrown upon loading DOCX document	Bug
WORDSNET-23717	SVG letter-spacing style gets ignored when converting DOCX to PDF	Bug
WORDSNET-23718	Document.ExtractPages changes list numbering	Bug
WORDSNET-23725	Wrong paragraph format when adding an image after Pdf2Word conversion	Bug
WORDSNET-23730	Fix StringComparison warnings	Bug
WORDSNET-23732	Fix StringComparison warnings	Bug
WORDSNET-23733	Fix StringComparison warnings	Bug
WORDSNET-23735	Wrong list numbering due to loss and non-use of DurableId attribute values	Bug
WORDSNET-23743	Part of content is moved into table upon reading RTF	Bug
WORDSNET-23745	Fix StringComparison warnings in fields/mailmerge domain	Bug
WORDSNET-23757	Comments anchor is misplaced after the saving	Bug
WORDSNET-23760	PDF can’t be loaded because of “Sequence contains more than one matching element” error	Bug
WORDSNET-23791	Fix customer issues using SonarQube analysis	Bug

Full List of Issues Covering all Changes in this Release (Reported by Java Users)

Key	Summary	Category
WORDSNET-15581	RTF to PDF conversion issue with table’s cell width	New Feature
WORDSNET-19386	Text-shift observed during Word to PDF conversion	New Feature
WORDSNET-17061	Wrong Font for certain Arabic Characters used in PDF	Bug
WORDSNET-19196	Text position is changed in output PDF	Bug
WORDSNET-20866	DOC to HTML conversion throws System.NullReferenceException	Bug
WORDSNET-21486	Imported SVG-based 3D Pie Chart Renders Incorrectly in Word	Bug
WORDSNET-22835	Unexpected Column Widths after HTML with Merged Cells is Converted to DOCX	Bug
WORDSNET-23277	Axis labels are wrapped improperly	Bug
WORDSNET-23569	FileCorruptedException is thrown upon loading HTML document	Bug
WORDSNET-23571	Uppercase text is rendered as regular text	Bug
WORDSNET-23592	UpdateFields() fails with NPE	Bug
WORDSNET-23658	System.InvalidOperationException: Stack empty. is thrown on Range.Replace	Bug
WORDSNET-23673	FileCorruptedException is thrown upon loading DOCX document	Bug
WORDSNET-23678	Aspose.Words hangs upon rendering document	Bug
WORDSNET-23695	System.InvalidOperationException: Infinite loop detected. exception thrown	Bug
WORDSNET-23716	Images are lost after loading word 2003 XML document	Bug
WORDSNET-23766	Ident of list item is incorrect after comparing documents	Bug

Public API and Backward Incompatible Changes

This section lists public API changes that were introduced in Aspose.Words 22.5. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in Aspose.Words which may affect existing code. Any behavior introduced that could be seen as regression and modifies the existing behavior is especially important and is documented here.

Added support for loading EPUB documents

Related issue: WORDSNET-8838

Aspose.Words now can load EPUB 2.0 documents.

EPUB is an e-book file format that uses the “.epub” file extension. A EPUB document is a collection of XHTML documents. Currently, Aspose.Words always loads all XHTML files from a EPUB document in the order in which they appear in the content file (OPF).

The following publicly visible enum values were added:

The FileFormatUtil class can now be used to determine if a file is a EPUB document. For example, the following call

info = aw.FileFormatUtil.detect_file_format("book.epub")

will return an info instance with the FileFormatInfo.load_format property set to LoadFormat.EPUB.

The use cases for loading EPUB documents are as follows:

doc = aw.Document("book.epub")

Added support for loading XML documents

Related issue: WORDSNET-22697

Aspose.Words now can load XML documents. The Extensible Markup Language (XML) is a simple text-based format for representing structured information: documents, data, configuration, books, transactions, invoices, and much more. Aspose.Words mimics MS Word behavior during import XML documents.

The following publicly visible enum value was added:

LoadFormat.XML

The FileFormatUtil class can now be used to determine if a file is a XML document. For example, the following call

info = aw.FileFormatUtil.detect_file_format("sample.xml")

will return an info instance with the FileFormatInfo.load_format property set to LoadFormat.XML.

The use cases for loading XML documents are as follows:

doc = aw.Document("sample.xml")

Introduced ChapterPageSeparator enum and added PageSetup.chapter_page_separator and PageSetup.heading_level_for_chapter properties

Related issue: WORDSNET-10869

The ChapterPageSeparator enum is introduced:

class ChapterPageSeparator(enum.IntEnum):
    """Defines the separator character that appears between the chapter and page number."""

    # A colon.
    HYPHEN = 0
    
    # A period.
    PERIOD = 1
    
    # A colon.
    COLON = 2
    
    # An emphasized dash.
    EM_DASH = 3

    # A standard dash.
    EN_DASH = 4

The following public properties are added to PageSetup class:

class PageSetup:
    ...

    @property
    def heading_level_for_chapter(self) -> int:
        """Gets or sets the heading level style that is applied to the chapter titles in the document.
        
        Can be a number from 0 through 9. 0 means no chapter number if applied to page number.
        Before you can create page numbers that include chapter numbers, the document headings must have a numbered outline format applied."""
        ...

    @property
    def chapter_page_separator(self) -> ChapterPageSeparator:
        """Gets or sets the separator character that appears between the chapter number and the page number.
        
        Before you can create page numbers that include chapter numbers, the document headings must have a numbered outline format applied."""
        ...

Use Case:

doc = aw.Document(file_name);
 
page_setup = doc.first_section.page_setup
 
page_setup.page_number_style = aw.NumberStyle.UPPERCASE_ROMAN
page_setup.chapter_page_separator = aw.ChapterPageSeparator.COLON
page_setup.heading_level_for_chapter = 1

Slight changes in markup nodes typed collection

Related issue: WORDSNET-23774

The default indexer for markup nodes collection has been changed. Now it is the index number of a structured document tag in the collection.

class StructuredDocumentTagCollection:
    ...

    def __getitem__(self, index: int) -> IStructuredDocumentTag:
        """Returns the structured document tag at the specified index.
        
        :param index: An index into the collection."""
        ...

Along with this, it has become possible to remove a structured document tag at the specified index number, as well as remove a structured document tag by its identifier.

class StructuredDocumentTagCollection:
    ...

    def remove(self, id: int):
        """Removes the structured document tag with the specified identifier.
        
        :param id: The structured document tag identifier."""
        ...

    def remove_at(self, index: int):
        """Removes a structured document tag at the specified index.
        
        :param index: An index into the collection."""
        ...

The functionality that the indexer has previously performed by ID is now available through get_by_id() method.

class StructuredDocumentTagCollection:
    ...

    def get_by_id(self, id: int) -> IStructuredDocumentTag:
        """Returns the structured document tag by identifier.
        
        Returns None if the structured document tag with the specified identifier cannot be found.
        
        :param id: The structured document tag identifier."""

Use Case:

structured_document_tags = doc.range.structured_document_tags
# We iterate through all collection elements, getting each element by its index number.
for i in range(structured_document_tags.count):
    sdt = structured_document_tags[i]
    print(std.title)

# Get the structured document tag by its Id.
sdt = structured_document_tags.get_by_id(1160505028)
if sdt is not None:
    print(sdt.title)

# Remove the structured document tag by its Id.
structured_document_tags.remove(1160505028)

# Remove the structured document tag at position 0.
structured_document_tags.remove_at(0)

Added “NUMBER_10_ENVELOPE” value to PaperSize enum

Related issue: WORDSNET-23505

Added support of “Envelope No. 10” page size (4.125 x 9.5 inches) for printing.

Use Case:

# This value is used to set the page size as follows:
doc = aw.Document(file_name)
doc.first_section.page_setup.paper_size = aw.PaperSize.NUMBER_10_ENVELOPE
 
# Or in a similar way using DocumentBuilder:
builder = aw.DocumentBuilder(doc)
builder.page_setup.paper_size = aw.PaperSize.NUMBER_10_ENVELOPE

HtmlSaveOptions.export_text_box_as_svg was marked as obsolete

Related issue: WORDSNET-23514

The HtmlSaveOptions.export_text_box_as_svg property is now obsolete. The customers should use the HtmlSaveOptions.export_shapes_as_svg, which affects text boxes as well.