Browse our Products

Aspose.Words for Python via .NET 22.4 Release Notes

This page contains release notes for Aspose.Words for Python via .NET 22.4.0.

Major Features

There are 66 improvements and fixes in this regular monthly release. The most notable are:

Added support for Python 3.10.
Added saving to PDFA-4 and several other improvements in PDF output.
Implemented reading of Photoshop metadata resolution in Jpeg images.
Provided an ability to manipulate with DrawingML chart legend entries.
Implemented an ability to specify the name of an xls/xlsx file the DrawingML chart is linked to.
Implemented a new mode of import HTML block-level elements.

Full List of Issues Covering all Changes in this Release (Reported by .NET Users)

Key	Summary	Category
WORDSNET-23432	Implement column widths re-calculation for tables with more than 63 columns	New Feature
WORDSNET-23475	Add saving to PDF/A-4	New Feature
WORDSNET-23522	Provide public setter for Chart.SourceFullName property	New Feature
WORDSNET-23547	New OpenXML File Format attribute for bulleted and numbered lists	New Feature
WORDSNET-23594	Implement reading of Photoshop metadata resolution in Jpeg images	New Feature
WORDSNET-22888	Add loading progress notification for RTF loading	Enhancement
WORDSNET-22889	Add loading progress notification for WML loading	Enhancement
WORDSNET-23523	Performance test fails with great time excess	Enhancement
WORDSNET-14098	Left and Top margins of Div are not lost are re-saving Html	Bug
WORDSNET-19222	Compare generates incorrect result	Bug
WORDSNET-22184	Cannot compile Xamarin.Mac project with Aspose.Words	Bug
WORDSNET-22460	Square blue points in Chart are become round in the PDF	Bug
WORDSNET-22619	Sizes of Series Point Shapes in Combo Chart Increased during Word to PDF Conversion	Bug
WORDSNET-22718	DOCX to HTML image not visible	Bug
WORDSNET-22915	The rotation of the horizontal axis labels is changed after converting to PDF	Bug
WORDSNET-22930	Incorrect charts rendering for round join style outline	Bug
WORDSNET-22982	Table cell preferred does not match MS Word in Aspose.Words DOCX output	Bug
WORDSNET-23156	Vertical table cell merge disappears on saving to DOCX and PDF	Bug
WORDSNET-23275	Font in SmartArt diagram is smaller than in MS Word	Bug
WORDSNET-23353	Legend entry not removed when deleting chart series	Bug
WORDSNET-23363	Improving DOCX to HtmlFixed conversion	Bug
WORDSNET-23400	Incorrect line wrapping of a line with zero-width spaces	Bug
WORDSNET-23401	Incorrect line wrapping with a symbolic font	Bug
WORDSNET-23419	Large scan images are not removed from a Searchable PDF	Bug
WORDSNET-23442	System.NullReferenceException on UpdatePageLayout	Bug
WORDSNET-23443	Problem after converting DOCX to PDF	Bug
WORDSNET-23458	Incorrect markup after appending documents	Bug
WORDSNET-23459	Check how Aspose.Words works with .NET 6 performance	Bug
WORDSNET-23476	HtmlReader.HandleText method fails	Bug
WORDSNET-23499	Inaccurate Arabic text on PDF import	Bug
WORDSNET-23506	Colored cell background issues on PDF import	Bug
WORDSNET-23507	Table is distorted on PDF import	Bug
WORDSNET-23512	Tables are not merged	Bug
WORDSNET-23515	Aspose.Words.FileCorruptedException on loading DOCX document	Bug
WORDSNET-23517	Images are scaled down when saving to XPS	Bug
WORDSNET-23526	Revisions changed after adding CustomXmlPart	Bug
WORDSNET-23529	Aspose.Words hangs on document layout	Bug
WORDSNET-23538	Using Document.ExtractPages method causing list labels numbering issue	Bug
WORDSNET-23543	Legend entry text becomes hidden when updating font of a new/empty legend entry	Bug
WORDSNET-23546	Incorrect color in chart when saving to PDF	Bug
WORDSNET-23548	FootnoteDetector fails to find footnotes above a page number	Bug
WORDSNET-23554	NullReferenceException when save document to PDF	Bug
WORDSNET-23560	Unsupported file format on loading ODT	Bug
WORDSNET-23570	Aspose.Words does not work with .NET 6 Ready to Run option	Bug
WORDSNET-23580	Formatting cannot be applied because the table is empty	Bug
WORDSNET-23582	Issue with How to Define Default Options for ChartDataLabels of ChartSeries sample	Bug
WORDSNET-23602	DetectBackgroundColor fails with “InvalidOperationException: Sequence contains no elements”	Bug
WORDSNET-23603	REF field with relative position option is not localized when saving to PDF	Bug
WORDSNET-23608	FCE on loading DOC	Bug
WORDSNET-23609	Comparison does not show changes between documents	Bug
WORDSNET-23610	Line break is lost when re-saving a PDF	Bug
WORDSNET-23616	Grid calculation fall-back is not detected for a nested table	Bug
WORDSNET-23622	Document model compatibility option value does not match MS Word UI	Bug
WORDSNET-23631	System.ArgumentNullException: Value cannot be null	Bug

Full List of Issues Covering all Changes in this Release (Reported by Java Users)

Key	Summary	Category
WORDSNET-23210	Add feature to show/hide items in the chart’s legend	New Feature
WORDSNET-23301	Consider providing API to access SDTs by id or name	New Feature
WORDSNET-18037	Document.Compare does not mimic MS Word	Bug
WORDSNET-21829	Aspose.Words.FileCorruptedException is thrown while loading DOCX	Bug
WORDSNET-22602	Multilevel list renders incorrectly after DOCX to HTML conversion	Bug
WORDSNET-23217	NullReferenceException is thrown upon UpdateFields	Bug
WORDSNET-23409	UpdateFields throws System.NullReferenceException	Bug
WORDSNET-23412	Issue regarding conversion of Docx with continuous section break to Html	Bug
WORDSNET-23465	Unexpected replacement w:hyperlink during document comparison	Bug
WORDSNET-23566	Text becomes white after open/save DOCX document	Bug
WORDSNET-23586	ArgumentException upon setting bookmark text	Bug
WORDSNET-23593	DOCX to PDF: Italic arabic characters not rendered properly	Bug

Public API and Backward Incompatible Changes

This section lists public API changes that were introduced in Aspose.Words 22.4. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in Aspose.Words which may affect existing code. Any behavior introduced that could be seen as regression and modifies the existing behavior is especially important and is documented here.

Added saving to PDFA-4

Related issue: WORDSNET-23475

PDF/A-4 (ISO-19005-4:2020) is the latest version of PDF/A format. In PDF/A-4 conformance levels has been revised. Unlike previous versions PDF/A-4 do not provide A, B and U conformance levels. The regular PDF/A-4 conformance is equivalent to the level U conformance of previous versions (i.e. document visual preservation and text Unicode representation). Level A conformance (logical structure requirements) is removed as there is PDF/UA format related to this purpose.

New values added to PdfCompliance enum:

class PdfCompliance:
    ...
    # The output file will comply with the PDF/A-4 (ISO 19005-4:2020) standard.
    # PDF/A-4 has the objective of preserving document static visual appearance over time, independent of the tools
    # and systems used for creating, storing or rendering the files. Additionally any text contained in the document
    # can be reliably extracted as a series of Unicode codepoints.
    PDF_A4

Following options are prohibited when saving to PDF/A-4:

class PdfSaveOptions:
    ...
    @property
    def preserve_form_fields(self) -> bool:
        """Specifies whether to preserve Microsoft Word form fields as form fields in PDF or convert them to text.
        Default is False.
        ...
        Editable forms are prohibited by PDF/A compliance. False value will be used automatically
        when saving to PDF/A.
        """

    @property
    def encryption_details(self) -> PdfEncryptionDetails:
        """Gets or sets the details for encrypting the output PDF document.
        ...
        Encryption is prohibited by PDF/A compliance. This option will be ignored when saving to PDF/A.
        """
      
    @property
    def font_embedding_mode(self) -> PdfFontEmbeddingMode:
        """Specifies the font embedding mode.
        ...
        PDF/A and PDF/UA compliance requires all fonts to be embedded.
        PdfFontEmbeddingMode.EMBED_ALL value will be used automatically when saving to
        PDF/A and PDF/UA.
        """
      
    @property
    def use_core_fonts(self) -> bool:
        """Gets or sets a value determining whether or not to substitute TrueType fonts Arial, Times New Roman,
        Courier New and Symbol with core PDF Type 1 fonts.
        ...
        PDF/A and PDF/UA compliance requires all fonts to be embedded. False value will be used
        automatically when saving to PDF/A and PDF/UA.
        """
      
    @property
    def custom_properties_export(self) -> PdfCustomPropertiesExport:
        """Gets or sets a value determining the way Document.custom_document_properties are exported to PDF file.
        ...
        PdfCustomPropertiesExport.METADATA value is not supported when saving to PDF/A.
        PdfCustomPropertiesExport.STANDART will be used instead for PDF/A-1 and PDF/A-2 and
        PdfCustomPropertiesExport.NONE for PDF/A-4.
        """
      
    @property
    def image_color_space_export_mode(self) -> PdfImageColorSpaceExportMode:
        """Specifies how the color space will be selected for the images in PDF document.
        ...
        PdfImageColorSpaceExportMode.SIMPLE_CMYK value is not supported when saving to PDF/A.
        PdfImageColorSpaceExportMode.AUTO value will be used instead.
        """
    
    @property
    def interpolate_images(self) -> bool:
        """A flag indicating whether image interpolation shall be performed by a conforming reader.
        When False is specified, the flag is not written to the output document and
        the default behaviour of reader is used instead.
        ...
        Interpolation flag is prohibited by PDF/A compliance. False value will be used automatically
        when saving to PDF/A."""

Implemented an ability to set Chart.source_full_name property

Related issue: WORDSNET-23522.

Implemented an ability to specify the name of an xls/xlsx file the DrawingML chart is linked to:

class Chart:
    ...
    @property
    def source_full_name(self) -> str:
        """Gets the path and name of an xls/xlsx file this chart is linked to."""
        ...

Use Case:

doc = aw.Document(file_name)
shape = doc.get_child(aw.NodeType.SHAPE, 0, True).as_shape()
shape.chart.source_full_name = r"C:\Documents\ChartData.xlsx"
doc.save(file_name)

Implemented a new mode of import HTML block-level elements

Related issue: WORDSNET-16334

New HTML loading option was added to HtmlLoadOptions class:

class HtmlLoadOptions:
    ...
    @property
    def block_import_mode(self) -> BlockImportMode:
        """Gets or sets a value that specifies how properties of block-level elements are imported.
        Default value is BlockImportMode.MERGE"""
        ...

New s/python-net/aspose.words.loading/blockimportmode/>BlockImportMode enum specifies how properties of block-level elements are imported:

class BlockImportMode(enum.IntEnum): """Specifies how properties of block-level elements are imported from HTML-based documents.""" # Properties of parent blocks are merged and stored on child elements (i.e. paragraphs or tables). # # Properties of parent blocks are merged as follows: margins are added together; borders of higher-level blocks # are discarded and only the most inner-level borders are preserved. As a result, when this mode is specified, # some formatting of blocks from the original document will be lost. # # On the other hand, since all merged block-level properties are stored on document nodes, all formating # in the resulting document will be available for modification. MERGE = 0

# Properties of parent blocks are imported to a special logical structure and are stored separately from # document nodes. # # Only margins and borders of 'body', 'div', and 'blockquote' HTML elements are imported. Properties of each HTML # element are stored individually. # # This mode allows to better preserve borders and margins seen in the HTML document and get better conversion # results. The downside is that the resulting document gets harder to modify, since borders and margins stored # in the logical structure are not available for editing. # # This mode mimics MS Word's behavior regarding import of block properties. PRESERVE = 1

Use Case: The new mode of import HTML block-level elements allows to better preserve borders and margins seen in the HTML document and get better conversion results.

html = """
<html>
    <div style='border:dotted'>
        <div style='border:solid'>
            <p>paragraph 1</p>
            <p>paragraph 2</p>
        </div>
    </div>
</html>"""

load_options = aw.loading.HtmlLoadOptions()

# Set the new mode of import HTML block-level elements.
load_options.block_import_mode = aw.loading.BlockImportMode.PRESERVE
stream = io.BytesIO(html.encode('utf-8'))
doc = aw.Document(stream, load_options)
doc.save("sample.docx")

Implemented chart legend entry API

Related issue: WORDSNET-23210.

The ChartLegendEntry and ChartLegendEntryCollection public classes have been implemented.

class ChartLegendEntry:
    """Represents a chart legend entry.
    
    A legend entry corresponds to a specific chart series or trendline.
    The text of the entry is the name of the series or trendline. The text cannot be changed.
    """
    
    @property
    def is_hidden(self) -> bool:
        """Gets or sets a value indicating whether this entry is hidden in the chart legend.
        The default value is False.
        
        When a chart legend entry is hidden, it does not affect the corresponding chart series or trendline that
        is still displayed on the chart."""
        ...
     
    @property
    def font(self) -> Font:
        """Provides access to the font formatting of this legend entry."""
        ...

class ChartLegendEntryCollection:
    """Represents a collection of chart legend entries."""

    @property
    def count(self) -> int:
        """Returns the number of ChartLegendEntry in this collection."""
        ...
     
    def __getitem__(self, index: int) -> ChartLegendEntry:
        """Returns ChartLegendEntry for the specified index."""
        ...
}

The legend_entries public property has been added to the ChartLegend class.

class ChartLegend:
    ...
    @property
    def legend_entries(self) -> ChartLegendEntryCollection:
        """Returns a collection of legend entries for all series and trendlines of the parent chart."""
        ...

The legend_entry public property has been added to the ChartSeries class.

class ChartSeries:
    ...
    @property
    def legend_entry(self) -> ChartLegendEntry:
        """Gets a legend entry for this chart series."""
        ...

The constructor of the ChartLegend class has been marked obsolete. It will not be possible to create instances of this class.

Use Case:

doc = aw.Document()
builder = aw.DocumentBuilder(doc)

shape = builder.insert_chart(aw.drawing.charts.ChartType.COLUMN, 432, 252)

chart = shape.chart
series = chart.series

# Delete default generated series.
series.clear()

categories = ["AW Category 1", "AW Category 2"]

series1 = series.add("Series 1", categories, [1.0, 2.0])
series.add("Series 2", categories, [3.0, 4.0])
series.add("Series 3", categories, [5.0, 6.0])
series.add("Series 4", categories, [0.0, 0.0])

legend_entries = chart.legend.legend_entries
legend_entries[3].is_hidden = True

for legend_entry in legend_entries:
    legend_entry.font.size = 12

series1.legend_entry.font.italic = True

doc.save("output.docx")

Implemented typed collection for markup nodes

Related issue: WORDSNET-23301

Implemented interface exposing common properties for both StructuredDocumenTag and StructuredDocumentTagRangeStart / StructuredDocumentTagRangeEnd nodes.

class IStructuredDocumentTag:

    def is_ranged(self) -> bool:
        """Returns true if this instance is a ranged structured document tag."""
        ...
     
    def structured_document_tag_node(self) -> Node:
        """Returns Node object that implements this interface."""
        ...
     
    @property
    def id(self) -> int:
        """Specifies a unique read-only persistent numerical Id for this SDT."""
        ...
     
    @property
    def tag(self) -> str:
        """Specifies a tag associated with the current SDT node.
        Can not be None."""
        ...
    
    @property
    def title(self) -> str:
        """Specifies the friendly name associated with this SDT.
        Can not be None."""
        ...
     
    @property
    def placeholder(self) -> BuildingBlock:
        """Gets the BuildingBlock containing placeholder text which should be displayed when this SDT run contents are empty,
        the associated mapped XML element is empty as specified via the "xml_mapping" element
        or the "is_showing_placeholder_text" element is True.
    
        Can be None, meaning that the placeholder is not applicable for this Sdt."""
        ...
     
    @property
    def placeholder_name(self) -> str:
        """Gets or sets name of the BuildingBlock containing placeholder text.
        
        BuildingBlock with this name has to be present in the "Document.glossary_document"
        otherwise Exception will be raised."""
        ...
     
    @property
    def is_showing_placeholder_text(self) -> bool:
        """Specifies whether the content of this SDT shall be interpreted to contain placeholder text
        (as opposed to regular text contents within the SDT).
        
        If set to True, this state shall be resumed (showing placeholder text) upon opening this document."""
        ...
     
    @property
    def level(self) -> MarkupLevel:
        """Gets the level at which this SDT occurs in the document tree."""
        ...
     
    @property
    def sdt_type(self) -> SdtType:
        """Gets type of this "Structured document tag"."""
        ...
     
    @property
    def lock_content_control(self) -> bool:
        """ When set to True, this property will prohibit a user from deleting this SDT."""
        ...
     
    @property
    def lock_contents(self) -> bool:
        """When set to True, this property will prohibit a user from editing the contents of this SDT."""
        ...
     
    @property
    def color(self) -> drawing.Color:
        """Gets or sets the color of the structured document tag."""
        ...
     
    @property
    def xml_mapping(self) -> XmlMapping:
        """Gets an object that represents the mapping of this structured document tag to XML data
        in a custom XML part of the current document.
    
        You can use the XmlMapping.set_mapping method of this object to map
        a structured document tag to XML data.
    
        If this element is present and the parent Sdt is not of a rich text type, then the current
        value of the Sdt shall be determined by finding the XML element (if any) which is
        determined by the attributes on this element.
        See Iso29500, chapter 1, 17.5.2.6 dataBinding (XML Mapping).
        If DataBinding information does not result in an XML element, then the
        application can use any algorithm desired to find the closest available match. If this information does result in an
        XML element, then the contents of that element shall be used to replace the current run content within the
        document."""
        ...
     
    @property
    def word_open_xml(self) -> str:
        """Gets a string that represents the XML contained within the node in the SaveFormat.FLAT_OPC format."""
        ...

Implemented typed collection of IStructuredDocumentTag.

class StructuredDocumentTagCollection:

    def get_by_title(self, title: str) -> IStructuredDocumentTag:
        """Returns the first structured document tag encountered in the collection with the specified title.
        
        Returns None if the structured document tag with the specified title cannot be found.
        
        :param title: The title of structured document tag."""
        ...
     
    def get_by_tag(self, tag: str) -> IStructuredDocumentTag:
        """Returns the first structured document tag encountered in the collection with the specified tag.
        
        Returns None if the structured document tag with the specified tag cannot be found.
        
        :param tag: The tag of the structured document tag."""
        ...
     
    @property
    def count(self) -> int:
        """Returns the number of structured document tags in the collection."""
        ...
     
    def __getitem__(self, id: int) -> IStructuredDocumentTag:
        """Returns the structured document tag by Id.
        
        :param id: The structured document tag identifier."""
        ...
     
    def remove(self, id: int):
        """Removes the structured document tag with the specified identifier.
        
        :param id: The structured document tag identifier."""
        ...

Added new property to Range class.

class Range:
    ...
    @property
    def structured_document_tags(self) -> StructuredDocumentTagCollection:
        """Returns a StructuredDocumentTag's collection that represents all structured document tags in the range."""
        ...

Use Case:

doc = aw.Document("some document with markup")

# Get the structured document tag by Id.
sdt = doc.range.structured_document_tags[1160505028]
print(sdt.is_ranged())
print(sdt.title)

# Get the structured document tag or ranged tag by Title.
sdt = doc.range.structured_document_tags.get_by_title("Alias4")
print(sdt.id)

Document.update_table_layout() method marked as obsolete

Related issue: WORDSNET-23539

update_table_layout() method was an early attempt to reproduce MS Word logic for table column widths re-calculation without relying on table column widths stored data in the document. The method was mostly intended as an alternative way to re-calculate table layouts when relying on the stored column widths caused incorrect results (e.g. for generated documents with incorrect column widths stored by Aspose.Words itself). Though the method produced correct results for some cases, it never reproduced MS Word table layout logic entirely. As a result, applying the method to an arbitrary document could often produce incorrect results for tables that were handled correctly by the default method relying on the stored column widths.

So after the method was recommended to a customer, we often started to get requests about incorrect table layouts after applying the method. The customers were frustrated because some tables were handled correctly only with update_table_layout() and some were handled correctly only without update_table_layout().

Since then, much effort was invested into reproducing MS Word table layout logic without relying on stored column widths. It turned out that replacing the default logic for any arbitrary table and contents is not feasible. There are too many nuances with content metrics and different combinations of table/cell/container properties to take into account and be sure about the correct results. So a limited approach was adopted when Aspose.Words only replaces stored column widths after checking that everything that can influence table layout of a specific table is supported by the re-calculation algorithm. The class of the supported tables was widened significantly in release 22.3 and it will be further widened in the future.

As of now, the most common combinations of table content and table/cell properties are supported by the new approach. The new approach also replaces update_table_layout() logic for the supported tables. So currently update_table_layout() may produce different results only for tables not supported by the new table layout logic. As it is exactly the class of tables for which there are known issues with reproducing MS Word logic, it is highly likely that the correct table layout will not be produced by update_table_layout() either.

Deprecating update_table_layout() will clearly indicate that the method should not be used.