source: project/wiki/eggref/5/bloom-filter @ 36236

Last change on this file since 36236 was 36236, checked in by Kon Lovett, 2 years ago

rel 2.0.0

File size: 6.3 KB
Line 
1[[tags: egg]]
2
3== bloom-filter
4
5* UNMAINTAINED *
6
7[[toc:]]
8
9
10== Documentation
11
12Provides a simple Bloom Filter
13
14Bloom Filter Object
15
16=== make-bloom-filter
17
18<procedure>(make-bloom-filter M MDPS [K]) --> bloom-filter</procedure>
19
20Returns a bloom-filter object with {{M}} bits of discrimination and a set of
21hash functions built from the supplied {{MDPS}}, a {{(list-of
22message-digest-primitive)}} objects.
23
24The number of hashes, {{K}}, is not necessarily the same as the number of
25message-digests. A hash (here) is defined as an unsigned 32 bit integer. Most
26message-digests return more 32 bits of hash. The actual length of the hash is
27divided into 32 bit blocks to get the individual hashes.
28
29The argument {{K}} will restrict the actual number of hashes to the "first"
30''K'', no matter how many more the supplied message-digests create. First in
31the order of {{MDPS}}.
32
33<procedure>(make-bloom-filter P N MDPS) --> bloom-filter</procedure>
34
35Returns a bloom-filter object with {{M}} and {{K}} values chosen for the
36given population capacity {{N}} and probablity of false-positives {{P}}.
37
38Selecting the optimal set of message-digests is beyond the scope of
39make-bloom-filter.
40
41=== bloom-filter-set!
42
43<procedure>(bloom-filter-set! BLOOM-FILTER OBJECT)</procedure>
44
45Add the specified {{OBJECT}} to the indicated {{BLOOM-FILTER}}.
46
47=== bloom-filter-exists?
48
49<procedure>(bloom-filter-exists? BLOOM-FILTER OBJECT) --> boolean</procedure>
50
51Is the specified {{OBJECT}} in the indicated {{BLOOM-FILTER}}.
52
53=== bloom-filter?
54=== check-bloom-filter
55=== error-bloom-filter
56
57<procedure>(bloom-filter? OBJ) --> boolean</procedure>
58<procedure>(check-bloom-filter LOC OBJ [NAM]) --> bloom-filter</procedure>
59<procedure>(error-bloom-filter LOC OBJ [NAM])</procedure>
60
61=== bloom-filter-algorithms
62
63<procedure>(bloom-filter-algorithms BLOOM-FILTER) --> (list-of message-digest-primitive)</procedure>
64
65The mdps used for the filter.
66
67=== bloom-filter-n
68
69<procedure>(bloom-filter-n BLOOM-FILTER) --> fixnum</procedure>
70
71The current population - the number of objects added to the filter.
72
73Not the population capacity.
74
75=== bloom-filter-m
76
77<procedure>(bloom-filter-m BLOOM-FILTER) --> fixnum</procedure>
78
79The number of bits of discrimination.
80
81=== bloom-filter-k
82
83<procedure>(bloom-filter-k BLOOM-FILTER) --> fixnum</procedure>
84
85The number of hashes. (See above.)
86
87=== bloom-filter-p-false-positive
88
89<procedure>(bloom-filter-p-false-positive BLOOM-FILTER [N]) --> float</procedure>
90
91The probability of a false-positive for the population capacity {{N}},
92default is the current population, {{bloom-filter-n}}.
93
94=== actual-k
95
96<procedure>(actual-k MDPS) --> fixnum</procedure>
97
98Calculates the actual number of hashes for the {{MDPS}}.
99
100=== optimum-size
101
102<procedure>(optimum-size P N) --> fixnum fixnum</procedure>
103
104Returns 2 values, an optimal {{M}}, bits of discrimination, and {{K}},
105number of hashes, for the given population size {{N}} and probability of
106false-positives {{P}}.
107
108=== desired-m
109
110<procedure>(desired-m P N [K]) --> fixnum fixnum float</procedure>
111
112Calculates a near-optimal number of bits of discrimination to meet the
113desired probability of false positives {{P}}, with the given population
114size {{N}} and number of hashes {{K}}. When the {{K}} parameter is
115missing {{optimum-k}} is used to calculate a value.
116
117A multi-valued return of the calculated {{M}}, {{K}}, and {{P}} values.
118The calculated probability may be lower than the desired. The calculated
119{{M}} value will always be a fixnum.
120
121=== optimum-k
122
123<procedure>(optimum-k N M) --> fixnum</procedure>
124
125Optimal count of hashes for the given population size {{N}} and {{M}}
126bits of discrimination.
127
128=== optimum-m
129
130<procedure>(optimum-m K N) --> fixnum</procedure>
131
132Optimal count of bits of discrimination for the given population size
133{{N}} and {{K}} number of hashes.
134
135=== p-false-positive
136
137<procedure>(p-false-positive K N M) --> float</procedure>
138
139What is the probability of false positives for the population size {{N}}
140assuming {{K}} hashes and {{M}} bits of discrimination.
141
142=== p-random-one-bit
143
144<procedure>(p-random-one-bit K N M) --> float</procedure>
145
146Calculates the probablility of a random set bit for the given number of
147hash functions {{K}}, population size {{N}}, and bits of discrimination
148{{M}}.
149
150
151== Usage
152
153<enscript language=scheme>
154(import bloom-filter)
155</enscript>
156
157
158== References
159
160[[http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html|Nice exposition of Bloom Filter False Positive Probability.]]
161
162
163== Requirements
164
165[[iset]]
166[[message-digest-primitive]]
167[[message-digest-type]]
168[[message-digest-utils]]
169[[record-variants]]
170[[check-errors]]
171
172[[sha1]]
173[[md5]]
174[[sha2]]
175[[tiger-hash]]
176[[ripemd]]
177[[test]]
178
179
180== Author
181
182[[/users/kon-lovett|Kon Lovett]]
183
184
185== Version history
186
187; 2.0.0 : CHICKEN 5 release.
188; 1.2.1 : Fix {{bloom-filter-p-false-positive}}. Add types.
189; 1.2.0 : Add missing API.
190; 1.1.8 : Pick up the scraps.
191; 1.1.7 : Re-flow.
192; 1.1.6 : One more time.
193; 1.1.5 : * UNMAINTAINED *
194; 1.1.4 : Added {{optimum-size}} & {{make-bloom-filter}} variant. Calculations take the {{ceiling}}.
195; 1.1.3 : A little faster (10%). Better fixnum overflow detection.
196; 1.1.2 : Protect {{desired-m}} from fixnum representation overflow.
197; 1.1.1 : "Fix" for call of non-procedure - maybe. (Nope.)
198; 1.1.0 : A little faster (25%).
199; 1.0.0 : From the Chicken 3 version, with some changes. (No message-digest registry, for example.)
200
201
202== License
203
204Copyright (C) 2010-2018 Kon Lovett.  All rights reserved.
205
206Permission is hereby granted, free of charge, to any person obtaining a
207copy of this software and associated documentation files (the Software),
208to deal in the Software without restriction, including without limitation
209the rights to use, copy, modify, merge, publish, distribute, sublicense,
210and/or sell copies of the Software, and to permit persons to whom the
211Software is furnished to do so, subject to the following conditions:
212
213The above copyright notice and this permission notice shall be included
214in all copies or substantial portions of the Software.
215
216THE SOFTWARE IS PROVIDED ASIS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
217IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
218FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
219THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
220OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
221ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
222OTHER DEALINGS IN THE SOFTWARE.
Note: See TracBrowser for help on using the repository browser.