Working with Hyphenation
Sometimes it is necessary to use hyphenation for a more compact arrangement of text in a document. At the same time, it is important to understand that the specifics of word hyphenation may differ for each language.
At the present time, hyphenation is not used as often as it used to be, especially in English texts. Nevertheless, the use of this feature can have a serious impact on user documents – hyphenation affects the layout and, as a result, the appearance of the output files, for example, in PDF format.
For correct splitting of words, language-specific hyphenation dictionaries are used. Aspose.Words uses advanced algorithms to work with such dictionaries and allows you to get the same hyphenation as in Microsoft Word.
Hyphenation Dictionaries
Since different languages use different norms and rules for word hyphenation, the optimal solution for correct hyphenation is to use special dictionaries. Aspose.Words uses OpenOffice dictionaries.
For spell checking, OpenOffice uses the Hunspell library, which is a generalization of TeX’s hyphenation algorithm. This algorithm allows for automatic non-standard hyphenation using competing standard and custom hyphenation patterns. Hunspell uses the Hyphen for hyphenation.
Hyphenation Algorithm
Aspose.Words implements the TeX hyphenation algorithm and can reuse OpenOffice hyphenation dictionaries.
The following features of Aspose.Words algorithms should be taken into account:
- Hyphenation distance parameters (LEFTHYPHENMIN, RIGHTHYPHENMIN, COMPOUNDLEFTHYPHENMIN, COMPOUNDRIGHTHYPHENMIN) specified in the hyphenation dictionary are ignored. Aspose.Words uses its own set of distance parameters depending on the document compatibility mode.
- The hyphenation algorithm in Aspose.Words supports composite hyphenation. However, Aspose.Words splits character sequences containing mixed alphabetic and non-alphabetic characters into alphabetic-only parts (words) and hyphenates them separately. Note that Microsoft Word logic of hyphenation of compound words depends on document compatibility mode.
- The hyphenation algorithm in Aspose.Words does not implement the non-standard hyphenation. Non-standard patterns are ignored.
Loading Hyphenation Dictionaries
To use the hyphenation feature, first register a hyphenation dictionary.The following code example shows how to load hyphenation dictionaries for the specified languages from a file:
The following code example shows how to load hyphenation dictionaries for the specified language from a stream:
As an alternative to pre-registering hyphenation dictionaries, it is possible to register only required hyphenation dictionaries “by request”. To achieve that, implement the IHyphenationCallback interface and use the static callback Callback.
The following code example shows how to implement the IHyphenationCallback interface:
Impact of Hyphenation on Layout
When breaking text into lines, Aspose.Words checks each word whether it fits entirely into the current line. If another word is too long to fit at the end of the line, by default Aspose.Words moves it to the beginning of the next line instead of hyphenating it.
However, the hyphenation feature can be used in Aspose.Words to insert hyphens into words to eliminate gaps in justified text or to maintain an even line length in narrow columns. This can obviously affect the number of lines and therefore the number of pages. In other words, using the hyphenation function affects the document layout.
Hyphenation and Justification (H&J)
Microsoft Word has complex logic for choosing a breakpoint if text is justified and hyphenation is enabled. In short, Microsoft Word may prefer to shrink or stretch spaces to avoid line hyphenation. Most probably this logic is based on Knuth’s article.
Aspose.Words implements its own H&J algorithm that gives the same result as Microsoft Word and provides identical line breaking in the output document.