eggref/3/stringprep (historical revision 11308)

You are looking at historical revision 11308 of this page. It may differ significantly from its current revision.

==Introduction

RFC 3454 Internationalized string preparation

==Examples

XMPP Nodeprep profile

(define nodeprep

 (make-stringprepper
   (list appendix-b1 appendix-b2) ; Mappings
   #t ; Normalize
   (char-set-union
     appendix-c ; Forbid everything in Appendix C
     (char-set #\" #\& #\' #\/ #\: #\< #\> #\@)) ; And this stuff
   #t ; Bidirectional check))

==Authors

Adam C. Emerson <azure@umich.edu>

==License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

==Requirements

==make-stringprepper

(make-stringprepper mappings normalize? prohibited bidi?)

mappings: list of mappings, where each mapping is either a char-set (each character found in the set is mapped to nothing) or a sorted vector (every element must be a pair with the car the character mapped from, the cdr either a character mapped to or a list of characters. The must be sorted in ascending order by the car).
normalize?: If true, normalize the string into NFKC.
prohibited: char-set of prhobitied characters. char-set-union is good to use here.
bidi?: If true, do the bidirectionality check. NOTE: The RFC requires that the characters in appendix-c8 must be prohibited if this check is performed. Thus, you will get an error if this flag is true and appendix-c8 is not a subset of your prohibited char-set.

make-stringprepper returns a function from strings to strings.

It will throw (exn invalid) if the string contains prohibited characters or fails the bidirectionality check.

==appendix-b1

Mapping given in Table B.1 of Appendix B, "Commonly mapped to nothing."

==appendix-b2

Mapping given in Table B.2 of Appendix B, "Mapping for case-folding used with NKFC."

==appendix-b3

Mapping given in Table B.3 of Appendix B, "Mapping for case-folding used with no normalization."

==appendix-c1.1

Character set given in Table C.1.1 of Appendix C, "ASCII space characters."

==appendix-c1.2

Character set given in Table C.1.2 of Appendix C, "Non-ASCII space characters."

==appendix-c1

Union of appendix-c1.1 and appendix-c1.2.

==appendix-c2.1

Character set given in Table C.2.1 of Appendix C, "ASCII control characters."

==appendix-c2.2

Character set given in Table C.2.2 of Appendix C, "Non-ASCII control characters."

==appendic-c2

Union of appendix-c2.1 and appendix-c2.2

==appendix-c3

Character set given in Table C.3 of Appendix C, "Private use"

==appendix-c4

Character set given in Table C.4 of Appendix C, "Non-character code points"

==appendix-c5

Character set given in Table C.5 of Appendix C, "Surrogate codes"

==appendix-c6

Character set given in Table C.6 of Appendix C, "Inappropriate for plain text"

==appendix-c7

Character set given in Table C.7 of Appendix C, "Inappropriate for canonical representation"

==appendix-c8

Character set given in Table C.8 of Appendix C, "Change display properties or are deprecated"

==appendix-c9

Character set given in Table C.9 of Appendix C, "Tagging characters"

==appendix-c

Union of appendix-c1, appendix-c2, appendix-c3, appendix-c4, appendix-c5, appendix-c6, appendix-c7, appendix-c8, and appendix-c9

==Version History

1.0 Initial Release