1 | [[tags: egg]] |
---|
2 | |
---|
3 | == bloom-filter |
---|
4 | |
---|
5 | * UNMAINTAINED * |
---|
6 | |
---|
7 | [[toc:]] |
---|
8 | |
---|
9 | |
---|
10 | == Documentation |
---|
11 | |
---|
12 | Provides a simple Bloom Filter |
---|
13 | |
---|
14 | Bloom Filter Object |
---|
15 | |
---|
16 | === make-bloom-filter |
---|
17 | |
---|
18 | <procedure>(make-bloom-filter M MDPS [K]) --> bloom-filter</procedure> |
---|
19 | |
---|
20 | Returns a bloom-filter object with {{M}} bits of discrimination and a set of |
---|
21 | hash functions built from the supplied {{MDPS}}, a {{(list-of |
---|
22 | message-digest-primitive)}} objects. |
---|
23 | |
---|
24 | The number of hashes, {{K}}, is not necessarily the same as the number of |
---|
25 | message-digests. A hash (here) is defined as an unsigned 32 bit integer. Most |
---|
26 | message-digests return more 32 bits of hash. The actual length of the hash is |
---|
27 | divided into 32 bit blocks to get the individual hashes. |
---|
28 | |
---|
29 | The argument {{K}} will restrict the actual number of hashes to the "first" |
---|
30 | ''K'', no matter how many more the supplied message-digests create. First in |
---|
31 | the order of {{MDPS}}. |
---|
32 | |
---|
33 | <procedure>(make-bloom-filter P N MDPS) --> bloom-filter</procedure> |
---|
34 | |
---|
35 | Returns a bloom-filter object with {{M}} and {{K}} values chosen for the |
---|
36 | given population capacity {{N}} and probablity of false-positives {{P}}. |
---|
37 | |
---|
38 | Selecting the optimal set of message-digests is beyond the scope of |
---|
39 | make-bloom-filter. |
---|
40 | |
---|
41 | === bloom-filter-set! |
---|
42 | |
---|
43 | <procedure>(bloom-filter-set! BLOOM-FILTER OBJECT)</procedure> |
---|
44 | |
---|
45 | Add the specified {{OBJECT}} to the indicated {{BLOOM-FILTER}}. |
---|
46 | |
---|
47 | === bloom-filter-exists? |
---|
48 | |
---|
49 | <procedure>(bloom-filter-exists? BLOOM-FILTER OBJECT) --> boolean</procedure> |
---|
50 | |
---|
51 | Is the specified {{OBJECT}} in the indicated {{BLOOM-FILTER}}. |
---|
52 | |
---|
53 | === bloom-filter? |
---|
54 | === check-bloom-filter |
---|
55 | === error-bloom-filter |
---|
56 | |
---|
57 | <procedure>(bloom-filter? OBJ) --> boolean</procedure> |
---|
58 | <procedure>(check-bloom-filter LOC OBJ [NAM]) --> bloom-filter</procedure> |
---|
59 | <procedure>(error-bloom-filter LOC OBJ [NAM])</procedure> |
---|
60 | |
---|
61 | === bloom-filter-algorithms |
---|
62 | |
---|
63 | <procedure>(bloom-filter-algorithms BLOOM-FILTER) --> (list-of message-digest-primitive)</procedure> |
---|
64 | |
---|
65 | The mdps used for the filter. |
---|
66 | |
---|
67 | === bloom-filter-n |
---|
68 | |
---|
69 | <procedure>(bloom-filter-n BLOOM-FILTER) --> fixnum</procedure> |
---|
70 | |
---|
71 | The current population - the number of objects added to the filter. |
---|
72 | |
---|
73 | Not the population capacity. |
---|
74 | |
---|
75 | === bloom-filter-m |
---|
76 | |
---|
77 | <procedure>(bloom-filter-m BLOOM-FILTER) --> fixnum</procedure> |
---|
78 | |
---|
79 | The number of bits of discrimination. |
---|
80 | |
---|
81 | === bloom-filter-k |
---|
82 | |
---|
83 | <procedure>(bloom-filter-k BLOOM-FILTER) --> fixnum</procedure> |
---|
84 | |
---|
85 | The number of hashes. (See above.) |
---|
86 | |
---|
87 | === bloom-filter-p-false-positive |
---|
88 | |
---|
89 | <procedure>(bloom-filter-p-false-positive BLOOM-FILTER [N]) --> float</procedure> |
---|
90 | |
---|
91 | The probability of a false-positive for the population capacity {{N}}, |
---|
92 | default is the current population, {{bloom-filter-n}}. |
---|
93 | |
---|
94 | === actual-k |
---|
95 | |
---|
96 | <procedure>(actual-k MDPS) --> fixnum</procedure> |
---|
97 | |
---|
98 | Calculates the actual number of hashes for the {{MDPS}}. |
---|
99 | |
---|
100 | === optimum-size |
---|
101 | |
---|
102 | <procedure>(optimum-size P N) --> fixnum fixnum</procedure> |
---|
103 | |
---|
104 | Returns 2 values, an optimal {{M}}, bits of discrimination, and {{K}}, |
---|
105 | number of hashes, for the given population size {{N}} and probability of |
---|
106 | false-positives {{P}}. |
---|
107 | |
---|
108 | === desired-m |
---|
109 | |
---|
110 | <procedure>(desired-m P N [K]) --> fixnum fixnum float</procedure> |
---|
111 | |
---|
112 | Calculates a near-optimal number of bits of discrimination to meet the |
---|
113 | desired probability of false positives {{P}}, with the given population |
---|
114 | size {{N}} and number of hashes {{K}}. When the {{K}} parameter is |
---|
115 | missing {{optimum-k}} is used to calculate a value. |
---|
116 | |
---|
117 | A multi-valued return of the calculated {{M}}, {{K}}, and {{P}} values. |
---|
118 | The calculated probability may be lower than the desired. The calculated |
---|
119 | {{M}} value will always be a fixnum. |
---|
120 | |
---|
121 | === optimum-k |
---|
122 | |
---|
123 | <procedure>(optimum-k N M) --> fixnum</procedure> |
---|
124 | |
---|
125 | Optimal count of hashes for the given population size {{N}} and {{M}} |
---|
126 | bits of discrimination. |
---|
127 | |
---|
128 | === optimum-m |
---|
129 | |
---|
130 | <procedure>(optimum-m K N) --> fixnum</procedure> |
---|
131 | |
---|
132 | Optimal count of bits of discrimination for the given population size |
---|
133 | {{N}} and {{K}} number of hashes. |
---|
134 | |
---|
135 | === p-false-positive |
---|
136 | |
---|
137 | <procedure>(p-false-positive K N M) --> float</procedure> |
---|
138 | |
---|
139 | What is the probability of false positives for the population size {{N}} |
---|
140 | assuming {{K}} hashes and {{M}} bits of discrimination. |
---|
141 | |
---|
142 | === p-random-one-bit |
---|
143 | |
---|
144 | <procedure>(p-random-one-bit K N M) --> float</procedure> |
---|
145 | |
---|
146 | Calculates the probablility of a random set bit for the given number of |
---|
147 | hash functions {{K}}, population size {{N}}, and bits of discrimination |
---|
148 | {{M}}. |
---|
149 | |
---|
150 | |
---|
151 | == Usage |
---|
152 | |
---|
153 | <enscript language=scheme> |
---|
154 | (import bloom-filter) |
---|
155 | </enscript> |
---|
156 | |
---|
157 | |
---|
158 | == References |
---|
159 | |
---|
160 | [[http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html|Nice exposition of Bloom Filter False Positive Probability.]] |
---|
161 | |
---|
162 | |
---|
163 | == Requirements |
---|
164 | |
---|
165 | [[iset]] |
---|
166 | [[message-digest-primitive]] |
---|
167 | [[message-digest-type]] |
---|
168 | [[message-digest-utils]] |
---|
169 | [[record-variants]] |
---|
170 | [[check-errors]] |
---|
171 | |
---|
172 | [[sha1]] |
---|
173 | [[md5]] |
---|
174 | [[sha2]] |
---|
175 | [[tiger-hash]] |
---|
176 | [[ripemd]] |
---|
177 | [[test]] |
---|
178 | |
---|
179 | |
---|
180 | == Author |
---|
181 | |
---|
182 | [[/users/kon-lovett|Kon Lovett]] |
---|
183 | |
---|
184 | |
---|
185 | == Version history |
---|
186 | |
---|
187 | ; 2.0.0 : CHICKEN 5 release. |
---|
188 | ; 1.2.1 : Fix {{bloom-filter-p-false-positive}}. Add types. |
---|
189 | ; 1.2.0 : Add missing API. |
---|
190 | ; 1.1.8 : Pick up the scraps. |
---|
191 | ; 1.1.7 : Re-flow. |
---|
192 | ; 1.1.6 : One more time. |
---|
193 | ; 1.1.5 : * UNMAINTAINED * |
---|
194 | ; 1.1.4 : Added {{optimum-size}} & {{make-bloom-filter}} variant. Calculations take the {{ceiling}}. |
---|
195 | ; 1.1.3 : A little faster (10%). Better fixnum overflow detection. |
---|
196 | ; 1.1.2 : Protect {{desired-m}} from fixnum representation overflow. |
---|
197 | ; 1.1.1 : "Fix" for call of non-procedure - maybe. (Nope.) |
---|
198 | ; 1.1.0 : A little faster (25%). |
---|
199 | ; 1.0.0 : From the Chicken 3 version, with some changes. (No message-digest registry, for example.) |
---|
200 | |
---|
201 | |
---|
202 | == License |
---|
203 | |
---|
204 | Copyright (C) 2010-2018 Kon Lovett. All rights reserved. |
---|
205 | |
---|
206 | Permission is hereby granted, free of charge, to any person obtaining a |
---|
207 | copy of this software and associated documentation files (the Software), |
---|
208 | to deal in the Software without restriction, including without limitation |
---|
209 | the rights to use, copy, modify, merge, publish, distribute, sublicense, |
---|
210 | and/or sell copies of the Software, and to permit persons to whom the |
---|
211 | Software is furnished to do so, subject to the following conditions: |
---|
212 | |
---|
213 | The above copyright notice and this permission notice shall be included |
---|
214 | in all copies or substantial portions of the Software. |
---|
215 | |
---|
216 | THE SOFTWARE IS PROVIDED ASIS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
---|
217 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
---|
218 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL |
---|
219 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR |
---|
220 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, |
---|
221 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR |
---|
222 | OTHER DEALINGS IN THE SOFTWARE. |
---|