A dictionary data structure based on counting Bloom filters.
The sfht library is an implementation of the Shared-node Fast Hash Table (SFHT) data structure described by Song, et al., in
Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing. (SIGCOMM'05)
This code defines an sfht object that implements a dictionary mapping of keys to values. The object responds to messages for querying, insertion of new elements, and deletion of existing elements. The interface of the sfht object is particularly suitable for situations where the keys are represented by bit vectors or vectors of fixnum values.
A counting Bloom filter is a Bloom filter that has been extended so that each bit of the filter has a counter associated with it. Upon insertion or deletion of an element, the counter is incremented or decremented, respectively. In order to find an element efficiently, we need to compute the k hash values, read the counters at the k locations, determine the smallest bucket size, and perform a linear search of that bucket for the element.
The sfht object is created by a make-sfht function, the only user-visible function defined in this egg:[procedure] make-sfht:: N P MAKE-RNG-STATE RANDOM! KEY->VECTOR KEY-VECTOR-REF KEY-VECTOR-LENGTH [KEY-EQUAL?] -> SELECTOR
- is a user-supplied function that takes in an integer argument and returns an RNG state value.
- is a user-supplied function that generates a random positive integer, given a state value, which is expected to be mutated.
- is a user-supplied function that takes a key value and returns a vector.
- is a user-supplied function that retrieves an element from the vector returned by KEY-VECTOR.
- is a user-supplied function that returns the length of the key vector.
- is a user-supplied predicate that takes two keys and returns #t if they are equal. The default function used is equal?
The returned selector procedure can take one of the following arguments:
- returns a procedure LAMBDA KEY . DEFAULT-CLAUSE which searches the hash table for an association with a given KEY, and returns a (key . value) pair of the found association. If an association with KEY cannot be located in the hash table, the PROC returns the result of evaluating the DEFAULT-CLAUSE. If the default clause is omitted, an error is signaled. KEY must be comparable to the keys in the hash table by the KEY-EQUAL? predicate specified when the hash table was created)
- returns #t if the hash table is empty
- returns the size (the number of associations) in the hash table
- removes all associations from the hash table (thus making it empty)
- returns a procedure LAMBDA KEY VALUE which, given a KEY and a VALUE, adds the corresponding association to the hash table. If an association with the same KEY already exists, its value is replaced with the VALUE. The return value is #f.
- returns a procedure LAMBDA KEY . DEFAULT-CLAUSE which searches the hash table for an association with a given KEY, deletes it, and returns a (key . value) pair of the found and deleted association. If an association with the KEY cannot be located in the hash table, the PROC returns the result of evaluating DEFAULT-CLAUSE. If the default clause is omitted, an error is signaled.
- prints out all the contents the Bloom filter, for debug purposes
(require-extension iset) (require-extension sfht) (require-extension random-swb) (define sfht (make-sfht 100000 0.0001 (lambda (i) (make-swb-random-state i (fx+ i 17))) swb:random! integer->bit-vector (compose (lambda (x) (if x 1 0)) bit-vector-ref) bit-vector-length)) ((sfht 'put!) 1 'one) ((sfht 'get))
About this egg
- Ensure test script returns proper exit status
- Documentation converted to wiki format
- Typo fix
- Ported to Chicken 4
- Build script updated for better cross-platform compatibility
- Introduced an API that is independent of the RNG used
- Documentation updates
- License upgrade to GPL v3
- Added random-swb to the list of dependencies
- Initial release
Copyright 2007-2011 Ivan Raikov. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or at your option any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A full copy of the GPL license can be found at <http://www.gnu.org/licenses/>.