[ Tcllib Home | Main Table Of Contents | Table Of Contents | Keyword Index | Categories | Modules | Applications ]

unicode(n) 1.0.0 tcllib "Unicode normalization"


unicode - Implementation of Unicode normalization

Table Of Contents



This is an implementation in Tcl of the Unicode normalization forms.


::unicode::fromstring string

Converts string to list of integer Unicode character codes which is used in unicode for internal string representation.

::unicode::tostring uclist

Converts list of integers uclist back to Tcl string.

::unicode::normalize form uclist

Normalizes Unicode characters list ulist according to form and returns the normalized list. Form form takes one of the following values: D (canonical decomposition), C (canonical decomposition, followed by canonical composition), KD (compatibility decomposition), or KC (compatibility decomposition, followed by canonical composition).

::unicode::normalizeS form string

A shortcut to ::unicode::tostring [unicode::normalize \$form [::unicode::fromstring \$string]]. Normalizes Tcl string and returns normalized string.


% ::unicode::fromstring "\u0410\u0411\u0412\u0413"
1040 1041 1042 1043
% ::unicode::tostring {49 50 51 52 53}
% ::unicode::normalize D {7692 775}
68 803 775
% ::unicode::normalizeS KD "\u1d2c"


  1. "Unicode Standard Annex #15: Unicode Normalization Forms", (http://unicode.org/reports/tr15/)


Sergei Golovan

Bugs, Ideas, Feedback

This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category stringprep of the Tcllib Trackers. Please also report any ideas for enhancements you may have for either package and/or documentation.

When proposing code changes, please provide unified diffs, i.e the output of diff -u.

Note further that attachments are strongly preferred over inlined patches. Attachments can be made by going to the Edit form of the ticket immediately after its creation, and then using the left-most button in the secondary navigation bar.

See Also



normalization, unicode