sfht

A dictionary data structure based on counting Bloom filters.

  1. sfht
  2. Usage
  3. Documentation
    1. SFHT procedures
  4. Examples
  5. About this egg
    1. Author
    2. Version history
    3. License

Usage

(require-extension sfht)

Documentation

The sfht library is an implementation of the Shared-node Fast Hash Table (SFHT) data structure described by Song, et al., in

 Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing. (SIGCOMM'05)

The present implementation code defines an SFHT typeclass that implements an ordered dictionary mapping of keys to values.

The typeclass containts methods for creating new hash tables, querying, insertion of new elements, and deletion of existing elements. The SFHT interface is particularly suitable for situations where the keys are represented by bit vectors or vectors of fixnum values.

A counting Bloom filter is a Bloom filter that has been extended so that each bit of the filter has a counter associated with it. Upon insertion or deletion of an element, the counter is incremented or decremented, respectively. In order to find an element efficiently, we need to compute the k hash values, read the counters at the k locations, determine the smallest bucket size, and perform a linear search of that bucket for the element.

SFHT procedures

The SFHT type class is created by procedure sfht-map:

[procedure] make-sfht:: N P MAKE-RNG-STATE RANDOM! KEY->VECTOR KEY-VECTOR-REF KEY-VECTOR-LENGTH [KEY-EQUAL?] -> <SFHT>

where

MAKE-RNG-STATE
is a user-supplied function that takes in an integer argument and returns an RNG state value.
RANDOM!
is a user-supplied function that generates a random positive integer, given a state value, which is expected to be mutated.
KEY->VECTOR
is a user-supplied function that takes a key value and returns a vector.
KEY-VECTOR-REF
is a user-supplied function that retrieves an element from the vector returned by KEY-VECTOR.
KEY-VECTOR-LENGTH
is a user-supplied function that returns the length of the key vector.
KEY->EQUAL?
is a user-supplied predicate that takes two keys and returns #t if they are equal. The default function used is equal?

The SFHT typeclass consists of the following methods:

empty
creates a new empty hash table
get
procedure of the form SFHT KEY . DEFAULT-CLAUSE which searches the hash table for an association with a given KEY, and returns a (key . value) pair of the found association. If an association with KEY cannot be located in the hash table, the PROC returns the result of evaluating the DEFAULT-CLAUSE. If the default clause is omitted, an error is signaled. KEY must be comparable to the keys in the hash table by the KEY-EQUAL? predicate specified when the hash table was created)
empty?
returns #t if the hash table is empty
size
returns the size (the number of associations) in the hash table
clear!
removes all associations from the hash table (thus making it empty)
put!
procedure of the form SFHT KEY VALUE which, given a KEY and a VALUE, adds the corresponding association to the hash table. If an association with the same KEY already exists, its value is replaced with the VALUE. The return value is #f.
delete!
procedure of the form SFHT KEY . DEFAULT-CLAUSE which searches the hash table for an association with a given KEY, deletes it, and returns a (key . value) pair of the found and deleted association. If an association with the KEY cannot be located in the hash table, the PROC returns the result of evaluating DEFAULT-CLAUSE. If the default clause is omitted, an error is signaled.

Examples

(let ((m (sfht-map 100000 0.0001

                   (lambda (i) (make-swb-random-state i (fx+ i 17)))
                   swb:random!
                   integer->bit-vector 
                   (compose (lambda (x) (if x 1 0)) bit-vector-ref)
                   bit-vector-length)))
    (with-instance ((<SFHT> m))

      (let ((t (empty)))
        (put t 1 'one)
        (get t)
        ))
        )

About this egg

Author

Ivan Raikov

Version history

3.0
Converted to type class interface
2.6
Ensure test script returns proper exit status
2.5
Documentation converted to wiki format
2.4
Typo fix
2.3
Ported to Chicken 4
2.1
Build script updated for better cross-platform compatibility
2.0
Introduced an API that is independent of the RNG used
1.3
Documentation updates
1.2
License upgrade to GPL v3
1.1
Added random-swb to the list of dependencies
1.0
Initial release

License

Copyright 2007-2013 Ivan Raikov.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
at your option any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.

A full copy of the GPL license can be found at
<http://www.gnu.org/licenses/>.