You are looking at historical revision 3480 of this page. It may differ significantly from its current revision.

Description

An interface to the ChaSen Japanese morphological analyzer library from NAIST. The ChaSen homepage is at http://chasen.naist.jp/hiki/ChaSen/.

Synopsis

> (use chasen)
> (chasen-parse "日本語の文字列")
((("日本語" "ニホンゴ" "日本語" "名詞-一般")
 ("の" "ノ" "の" "助詞-連体化")
 ("文字" "モジ" "文字" "名詞-一般")
 ("列" "レツ" "列" "名詞-一般")))
>

Procedures

Example

;; Split a haiku into the 5, 7 and 5 syllable phrases

(use chasen syntax-case utf8 srfi-1)
(import utf8)

(define haiku-split
  (let ((non-syllables (string->list "ャュョッン、。!?  \t\n")))
    (lambda (str)
      (define (take-n ls n)
        (let lp ((i 0) (ls ls) (res '()))
          (if (or (>= i n) (null? ls))
            (values (reverse res) ls)
            (lp (+ i (length (remove (cut memv <> non-syllables)
                                     (string->list (cadar ls)))))
                (cdr ls)
                (cons (car ls) res)))))
      (receive (first-5 rest) (take-n (car (chasen-parse str)) 5)
        (receive (next-7 last-5) (take-n rest 7)
          (list first-5 next-7 last-5))))))

(for-each (lambda (x) (apply print (map car x)))
          (haiku-split "古池や蛙飛込む水の音"))

Requirements

iconv

Author

Alex Shinn

License

BSD

History

1.0
Initial release