You are looking at historical revision 39436 of this page. It may differ significantly from its current revision.

icu

Select bindings to the ICU unicode library.

Introduction

This library is partially inspired by Python's unicodedata library. As it deals with unicode, it also reexports the utf8 egg for ease of use.

Procedures

Names

[procedure] (char-from-name name)

Return char corresponding to string name name. name is passed through string-upcase.

(char-from-name "fire") ;; => #\x1f525
(char-from-name "FIRE") ;; => #\x1f525
[procedure] (char-string-name char)

Returns string name for char.

(char-string-name #\x1f525) ;; => "FIRE"

Decomposition and Normalization

[procedure] (char-decomposition char)

Returns the decomposition mapping of char.

For example, for ¼, VULGAR FRACTION ONE QUARTER:

(char-decomposition #\xBC) ;; => '(#\1 #\x2044 #\4)
[procedure] (string-normalize str [form])

Returns the normalized form of str to the destination string according to form, which can be any of "nfc", "nfkc", "nfd", or "nfkd"

(string-normalize "¼") ;; => "1/4"

Numbers

[procedure] (char-digit-value char)

Binding for u_charDigitValue. Returns the decimal digit value of a decimal digit character.

(char-digit-value #\4) ;; => 4
[procedure] (char-numeric-value char)

Binding for u_getNumericValue. Get the numeric value (as a double) for a Unicode code point as defined in the Unicode Character Database.

(char-numeric-value #\4) ;; => 4.0
(char-numeric-value #\xBC) ;; => .25
[procedure] (char-digit char radix)

Binding for u_digit. Returns the decimal digit value of the code point in the specified radix.

(char-digit #\f 16) ;; => 15
[procedure] (char-for-digit char radix)

Binding for u_forDigit. Determines the character representation for a specific digit in the specified radix.

(char-for-digit 15 16) ;; => #\f
[procedure] (char-digit? char)

Binding for u_isdigit. Determines whether the specified code point is a digit character according to Java.

[procedure] (char-xdigit? char)

Binding for u_isxdigit. Determines whether the specified code point is a hexadecimal digit.

Operators and transformers

[procedure] (char-mirror char)

Binding for u_charMirror. Maps the specified character to a "mirror-image" character.

[procedure] (char-bidi-paired-pracket)

Binding for u_getBidiPairedBracket. Maps the specified character to its paired bracket character.

[procedure] (char->lower char)
[procedure] (char->upper char)
[procedure] (char->title char)

Bindings for u_tolower,u_toupper, and u_totitle

Properties

[procedure] (char-category char)

Binding for u_charType. Returns the general category value for the code point (an integer, see below).

You can convert this to a symbol with category->integer, and vice versa with integer->category

Categories:

category/unassigned
category/uppercase-letter
category/lowercase-letter
category/titlecase-letter
category/modifier-letter
category/other-letter
category/non-spacing-mark
category/enclosing-mark
category/combining-spacing-mark
category/decimal-digit-number
category/letter-number
category/other-number
category/space-separator
category/line-separator
category/paragraph-separator
category/control-char
category/format-char
category/private-use-char
category/surrogate
category/dash-punctuation
category/start-punctuation
category/end-punctuation
category/connector-punctuation
category/other-punctuation
category/math-symbol
category/currency-symbol
category/modifier-symbol
category/other-symbol
category/initial-punctuation
category/final-punctuation
category/char-category-count
[procedure] (char-direction char)

Binding for u_charDirection. Returns the bidirectional category value for the code point, which is used in the Unicode bidirectional algorithm (an integer, see below).

You can convert this to a symbol with direction->integer, and vice versa with integer->direction

Directions:

direction/left-to-right
direction/right-to-left
direction/european-number
direction/european-number-separator
direction/european-number-terminator
direction/arabic-number
direction/common-number-separator
direction/block-separator
direction/segment-separator
direction/white-space-neutral
direction/other-neutral
direction/left-to-right-embedding
direction/left-to-right-override
direction/right-to-left-arabic
direction/right-to-left-embedding
direction/right-to-left-override
direction/pop-directional-format
direction/dir-non-spacing-mark
direction/boundary-neutral
direction/first-strong-isolate
direction/left-to-right-isolate
direction/right-to-left-isolate
direction/pop-directional-isolate
direction/char-direction-count
[procedure] (char-combining-class char)

Binding for u_getCombiningClass. Returns the combining class of the code point as specified in UnicodeData.txt.

Predicates

char-mirrored?
char-ualphabetic?
char-ulowercase?
char-uuppercase?
char-uwhitespace?
char-whitespace?
char-java-space?
char-space?
char-blank?
char-lower?
char-upper?
char-digit?
char-alpha?
char-alnum?
char-xdigit?
char-punct?
char-graph?
char-defined?
char-cntrl?
char-iso-control?
char-print?
char-base?