You are looking at historical revision 22568 of this page. It may differ significantly from its current revision.

bloom-filter

Documentation

Provides a simple Bloom Filter

Bloom Filter Object

make-bloom-filter

[procedure] (make-bloom-filter M MESSAGE-DIGEST-PRIMITIVES [K]) => bloom-filter

Returns a bloom-filter object with M bits of discrimination and a set of hash functions built from the supplied MESSAGE-DIGEST-PRIMITIVES.

The number of hash functions, K, is not necessarily the same as the number of message-digests. A hash function is defined as returning an unsigned 32 bit integer. Most message-digests return more 32 bits of hash. The actual length of the hash is divided into 32 bit blocks to get the individual hash functions.

The argument K will restrict the actual number of hash functions to the "first" K, no matter how many more the supplied message-digests create. First in the order of MESSAGE-DIGEST-PRIMITIVES.

Selecting the optimal set of message-digests is beyond the scope of make-bloom-filter.

bloom-filter-n

[procedure] (bloom-filter-n BLOOM-FILTER) => fixnum

The current population - the number of objects added to the filter.

bloom-filter-m

[procedure] (bloom-filter-m BLOOM-FILTER) => fixnum

The number of bits of discrimination.

bloom-filter-k

[procedure] (bloom-filter-k BLOOM-FILTER) => fixnum

The number of hash functions. (See above.)

bloom-filter-p-false-positive

[procedure] (bloom-filter-p-false-positive BLOOM-FILTER [N]) => number

The probability of false positives for the given population size. The current population is assumed.

bloom-filter-set!

[procedure] (bloom-filter-set! BLOOM-FILTER OBJECT)

Add the specified OBJECT to the indicated BLOOM-FILTER.

bloom-filter-exists?

[procedure] (bloom-filter-exists? BLOOM-FILTER OBJECT) => boolean

Is the specified OBJECT in the indicated BLOOM-FILTER.

Auxillary Procedures

optimum-k

[procedure] (optimum-k N M) => fixnum

Optimal count of hash functions for the given population size N and M bits of discrimination.

optimum-m

[procedure] (optimum-m K N) => fixnum

Optimal count of bits of discrimination for the given population size N and K number of hash functions.

p-false-positive

[procedure] (p-false-positive K N M) => number

What is the probability of false positives for the population size N assuming K hash functions and M bits of discrimination.

desired-m

[procedure] (desired-m P N [K]) => (fixnum fixnum number)

Calculates a near-optimal number of bits of discrimination to meet the desired probability of false positives P, with the given population size N and number of hash functions K. When the K parameter is missing optimum-k is used to calculate a value.

A multi-valued return of the calculated M, K, and P values. The calculated probability may be lower than the desired. The calculated M value will always be a fixnum.

actual-k

[procedure] (actual-k MESSAGE-DIGEST-PRIMITIVES) => fixnum

Calculates the actual number of hash functions for the MESSAGE-DIGEST-PRIMITIVES.

p-random-one-bit

[procedure] (p-random-one-bit K N M) => number

Calculates the probablility of a random set bit for the given number of hash functions K, population size N, and bits of discrimination M.

Usage

(require-extension bloom-filter)

References

Nice exposition of Bloom Filter False Positive Probability.

Requirements

moremacros iset message-digest record-variants check-errors hashes

Author

kon lovett

Version history

1.1.3
A little faster (10%). Better fixnum overflow detection.
1.1.2
Protect desired-m from fixnum representation overflow.
1.1.1
"Fix" for call of non-procedure - maybe. (Nope.)
1.1.0
A little faster (25%).
1.0.0
From the Chicken 3 version, with some changes. (No message-digest registry, for example.)

License

Copyright (C) 2010 Kon Lovett. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED ASIS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.