hash-trie

  1. hash-trie
    1. Documentation
    2. Usage
    3. make-hash-trie-type
    4. hash-trie-type?
    5. hash-trie-type/key-equality-predicate
    6. hash-trie-type/key-hash-function
    7. make-hash-trie
    8. hash-trie/type
    9. hash-trie/count
    10. hash-trie/empty?
    11. hash-trie/search
    12. hash-trie/lookup
    13. hash-trie/member?
    14. hash-trie/update
    15. hash-trie/insert
    16. hash-trie/modify
    17. hash-trie/intern
    18. hash-trie/delete
    19. hash-trie/fold
    20. hash-trie->alist
    21. hash-trie/key->alist
    22. hash-trie/datum->alist
    23. alist->hash-trie
    24. string-hash
    25. symbol-hash
    26. exact-integer-hash
    27. real-number-hash
    28. complex-number-hash
    29. hash-trie-type:string
    30. hash-trie-type:symbol
    31. hash-trie-type:exact-integer
    32. hash-trie-type:real-number
    33. hash-trie-type:complex-number
    34. Iterator API
      1. Example
    35. Author
    36. Maintainer
    37. Version history
    38. License

Documentation

Hash Array Mapped Tries (HAMT)

(Phil Bagwell, `Ideal Hash Trees', Technical Report, 2001)

Provided that the hash function have no collisions and yield hash values in a bounded interval, hash tries admit search and update in constant worst-case time (!), bounded by a somewhat larger constant than what one usually finds for the worst-case time of search or replacement, and the amortized time of insertion or deletion, in hash tables. There is no complicated incremental rehashing going on like in some real-time hash tables to attain these constant time bounds; in fact, there is never any rehashing. A little more precisely, `constant' means logarithmically proportional to the length of the interval of the hash values, or, practically, linearly proportional to the number of bits in the hash, with small constant factors.

Yes, this is the same data structure as Clojure uses to implement its hash maps and hash sets. Similarly to Clojure, this code uses hash collision buckets rather than the paper's suggestion of repeating hashes salted by the trie depth.

Although the pronunciation is identical, and despite the title of Bagwell's paper, a hash trie is not a hash tree. Sorry. Nor do these hash tries have any relation to what Knuth calls hash tries.

Except where this is obviously not the case (hash-table/fold, for example), every procedure here runs in constant expected time under the assumption of a good key hash function, ignoring the time taken by garbage collection and the time taken by the key hash function and the key equality predicate of the relevant hash trie type. When searching in a hash trie, the key equality predicate is applied to only as many keys extra as share a common hash value with the key whose association is sought. Thus in a hash trie with no collisions, every search involves at most one invocation of the key hash function, and at most one invocation of the key equality predicate.

Usage

(import hash-trie)

make-hash-trie-type

[procedure] (make-hash-trie-type KEY=? KEY-HASH) -> <hash-trie-type>

Constructor for hash trie types. KEY=? must be a key equality predicate, a procedure of two arguments that returns true to indicate that they are equal and false to indicate that they are not, and that behaves transitively, symmetrically, and reflexively. KEY-HASH must be a key hash function that preserves the key equality predicate, i.e. for keys A and B, it must be that if (KEY=? A B), then (= (KEY-HASH A) (KEY-HASH B)).

hash-trie-type?

[procedure] (hash-trie-type? OBJECT) -> boolean

Disjoint type predicate for hash trie types.

hash-trie-type/key-equality-predicate

hash-trie-type/key-hash-function

[procedure] (hash-trie-type/key-equality-predicate HASH-TRIE-TYPE) -> key-equality-predicate
[procedure] (hash-trie-type/key-hash-function HASH-TRIE-TYPE) -> key-hash-function

Accessors for the key equality predicates and key hash functions of hash trie types.

make-hash-trie

[procedure] (make-hash-trie HASH-TRIE-TYPE) -> <hash-trie>

Hash trie constructor.

hash-trie/type

[procedure] (hash-trie/type HASH-TRIE) -> <hash-trie-type>

Returns the <hash-trie-type> of HASH-TRIE.

hash-trie/count

[procedure] (hash-trie/count HASH-TRIE) -> fixnum

Returns the number of associations in HASH-TRIE.

hash-trie/empty?

[procedure] (hash-trie/empty? HASH-TRIE) -> boolean

Returns true if HASH-TRIE has no associations, or false if it has any.

hash-trie/search

[procedure] (hash-trie/search HASH-TRIE KEY IF-FOUND IF-NOT-FOUND)

Searches for an association for KEY in HASH-TRIE. If there is one, tail-calls IF-FOUND with one argument, the datum associated with key. If not, tail-calls IF-NOT-FOUND with zero arguments.

hash-trie/lookup

[procedure] (hash-trie/lookup HASH-TRIE KEY DEFAULT) -> *

Searches for an association for KEY in HASH-TRIE. If there is one, returns its associated datum; otherwise returns DEFAULT.

hash-trie/member?

[procedure] (hash-trie/member? HASH-TRIE KEY) -> boolean

Returns true if HASH-TRIE has an association for KEY, or false if not.

hash-trie/update

[procedure] (hash-trie/update HASH-TRIE KEY IF-FOUND IF-NOT-FOUND)

Searches for an association for KEY in HASH-TRIE. If there is one, tail-calls IF-FOUND with three arguments:

If there is no such association, tail-calls IF-NOT-FOUND with one argument, a procedure (INSERT DATUM) that returns a new hash trie with all the associations in HASH-TRIE as well as an association of DATUM with KEY.

hash-trie/insert

[procedure] (hash-trie/insert HASH-TRIE KEY DATUM) -> <hash-trie>

Returns a hash trie with all the associations in HASH-TRIE, but associating DATUM with KEY, whether HASH-TRIE had an association for KEY or not.

hash-trie/modify

[procedure] (hash-trie/modify HASH-TRIE KEY DEFAULT MODIFIER) -> <hash-trie>

Returns a hash trie with all the associations in HASH-TRIE, but associating (MODIFIER D) with KEY if HASH-TRIE associated a datum D with KEY, or associating (MODIFIER DEFAULT) with KEY if HASH-TRIE had no association for KEY.

hash-trie/intern

[procedure] (hash-trie/intern HASH-TRIE KEY GENERATOR) -> * <hash-trie>

If HASH-TRIE has an association for KEY, returns its associated datum and HASH-TRIE. Otherwise, calls (GENERATOR KEY) to obtain a datum D, and returns D and a hash trie with all the associations in HASH-TRIE as well as an association of D with KEY.

hash-trie/delete

[procedure] (hash-trie/delete HASH-TRIE KEY) -> <hash-trie>

Returns a hash trie with all the associations in HASH-TRIE, excluding its association, if any, for KEY.

hash-trie/fold

[procedure] (hash-trie/fold HASH-TRIE INITIAL-VALUE COMBINATOR) -> *

Folds HASH-TRIE by COMBINATOR, starting with an initial value V of INITIAL-VALUE and updating it for each association of a datum D with a key K, in no particular order, by (COMBINATOR K D V).

hash-trie->alist

hash-trie/key->alist

hash-trie/datum->alist

[procedure] (hash-trie->alist HASH-TRIE) -> alist
[procedure] (hash-trie/key->alist HASH-TRIE) -> alist
[procedure] (hash-trie/datum->alist HASH-TRIE) -> alist

hash-trie->alist returns a list of pairs, in no particular order, corresponding with the associations in HASH-TRIE, with keys in the cars and associated data in the respective cdrs.

hash-trie/key-list returns a list of all the keys in HASH-TRIE, in no particular order.

hash-trie/datum-list returns a list of all the data in HASH-TRIE, in no particular order.

alist->hash-trie

[procedure] (alist->hash-trie ALIST HASH-TRIE-TYPE) -> <hash-trie>

Returns a hash trie of the given type with the associations listed in ALIST, taking keys from the cars and corresponding data from the respective cdrs.

string-hash

symbol-hash

exact-integer-hash

real-number-hash

complex-number-hash

[procedure] (string-hash STRING) -> fixnum
[procedure] (symbol-hash SYMBOL) -> fixnum
[procedure] (exact-integer-hash EXACT-INTEGER) -> fixnum
[procedure] (real-number-hash REAL-NUMBER) -> fixnum
[procedure] (complex-number-hash COMPLEX-NUMBER) -> fixnum

Hash functions for various types of data. exact-integer-hash, real-number-hash, and complex-number-hash all agree where their domains coincide. The current implementations of these hash functions are all based on the FNV (Fowler-Noll-Vo) family of hash functions, tweaked slightly so that it is more likely to fit into the range of fixnums (small exact integers that can be represented immediately) for most Scheme systems on 32-bit machines.

hash-trie-type:string

hash-trie-type:symbol

hash-trie-type:exact-integer

hash-trie-type:real-number

hash-trie-type:complex-number

[constant] hash-trie-type:string -> <hash-trie-type>
[constant] hash-trie-type:symbol -> <hash-trie-type>
[constant] hash-trie-type:exact-integer -> <hash-trie-type>
[constant] hash-trie-type:real-number -> <hash-trie-type>
[constant] hash-trie-type:complex-number -> <hash-trie-type>

<hash-trie-type> of the above hash functions, with appropriate key equality predicates: string=? for strings, eq? for symbols, and = for the numeric types.

Iterator API

[procedure] (hash-trie-root HASH-TRIE) -> HASH-TRIE-NODE
[procedure] (hash-trie-bucket? HASH-TRIE-NODE) --> boolean
[procedure] (hash-trie-bucket-list HASH-TRIE-BUCKET-NODE) -> (list-of HASH-TRIE-NODE)
[procedure] (hash-trie-branch? HASH-TRIE-NODE) --> boolean
[procedure] (hash-trie-branch-count HASH-TRIE-BRANCH-NODE) -> exact-positive-integer
HASH-TRIE-NODE : (or HASH-TRIE-BUCKET-NODE HASH-TRIE-BRANCH-NODE)
HASH-TRIE-BRANCH-NODE
vector ; hash-trie-branch-count returns last index
Example
(import hash-trie)
(import (srfi 41))

;see https://mumble.net/~campbell/scheme/hash-trie.scm
(define (hash-trie->stream hash-trie)
  (define (bucket->stream list tail)
    (if (pair? list)
      (stream-cons (car list) ((stream-lambda () (bucket->stream (cdr list) tail))))
      tail ) )
  (define (branch->stream branch tail)
    (let ((vec (hash-trie-branch-vector branch)))
      (let recur ((index (sub1 (vector-length vec))) (tail tail))
        (if (positive? index)
          (node->stream (vector-ref branch index) ((stream-lambda () (recur (sub1 index) tail))))
          tail ) ) ) )
  (define (node->stream node tail)
    (cond
      ((hash-trie-branch? node) (branch->stream node tail))
      ((hash-trie-bucket? node) (bucket->stream (hash-trie-bucket-list node) tail))
      (else
        (error 'hash-trie->stream "(internal) invalid hash-trie node" node hash-trie)) ) )
  (let ((root (hash-trie-root hash-trie)))
    (if root
      ((stream-lambda () (node->stream root stream-null)))
      stream-null ) ) )

(define (stream->hash-trie stream hash-trie-type)
  (let loop ((stream stream) (hash-trie (make-hash-trie hash-trie-type)))
    (if (stream-pair? stream)
      (let ((cell (stream-car stream)))
        (loop (stream-cdr stream) (hash-trie/insert hash-trie (car cell) (cdr cell))) )
      hash-trie ) ) )

Author

Taylor R. Campbell

Maintainer

Kon Lovett

Version history

1.1.0
Provide Iterator API.
1.0.0
C5 release.

License

Copyright (c) 2009, Taylor R. Campbell

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.