1 | == Statistics |
---|
2 | |
---|
3 | Still under testing! |
---|
4 | |
---|
5 | This library is a port of [[http://compbio.ucdenver.edu/hunter/|Larry Hunter]]'s Lisp statistics library to chicken scheme. |
---|
6 | |
---|
7 | The library provides a number of formulae and methods taken from the book "Fundamentals of Biostatistics" by Bernard Rosner (5th edition). |
---|
8 | |
---|
9 | ==== Statistical Distributions |
---|
10 | |
---|
11 | To use this library, you need to understand the underlying statistics. In brief: |
---|
12 | |
---|
13 | The [[http://en.wikipedia.org/wiki/Binomial_distribution|Binomial distribution]] is used when counting discrete events in a series of trials, each of which events has a probability p of producing a positive outcome. An example would be tossing a coin {{n}} times: the probability of a head is {{p}}, and the distribution gives the expected number of heads in the {{n}} trials. The binomial distribution is defined as B(n, p). |
---|
14 | |
---|
15 | The [[http://en.wikipedia.org/wiki/Poisson_distribution|Poisson distribution]] is used to count discrete events which occur with a known average rate. A typical example is the decay of radioactive elements. A poisson distribution is defined Pois(mu). |
---|
16 | |
---|
17 | The [[http://en.wikipedia.org/wiki/Normal_distribution|Normal distribution]] is used for real-valued events which cluster around a specific mean with a symmetric variance. A typical example would be the distribution of people's heights. A normal distribution is defined N(mean, variance). |
---|
18 | |
---|
19 | === Provided Functions |
---|
20 | |
---|
21 | ==== Utilities |
---|
22 | |
---|
23 | <procedure>(average-rank value sorted-values)</procedure> |
---|
24 | returns the average position of given value in the list of sorted values: the rank is based from 1. |
---|
25 | > (average-rank 2 '(1 2 2 3 4)) |
---|
26 | 5/2 |
---|
27 | |
---|
28 | <procedure>(beta-incomplete x a b)</procedure> |
---|
29 | |
---|
30 | <procedure>(bin-and-count items n)</procedure> |
---|
31 | Divides the range of the list of {{items}} into {{n}} bins, and returns a vector of the number of items which fall into each bin. |
---|
32 | > (bin-and-count '(1 1 2 3 3 4 5) 5) |
---|
33 | #(2 1 2 1 1) |
---|
34 | |
---|
35 | <procedure>(combinations n k)</procedure> |
---|
36 | returns the number of ways to select {{k}} items from {{n}}, where the order does not matter. |
---|
37 | |
---|
38 | <procedure>(factorial n)</procedure> |
---|
39 | returns the factorial of {{n}}. |
---|
40 | |
---|
41 | <procedure>(find-critical-value p-function p-value)</procedure> |
---|
42 | |
---|
43 | <procedure>(fisher-z-transform r)</procedure> |
---|
44 | returns the transformation of a correlation coefficient {{r}} into an approximately normal distribution. |
---|
45 | |
---|
46 | <procedure>(gamma-incomplete a x)</procedure> |
---|
47 | |
---|
48 | <procedure>(gamma-ln x)</procedure> |
---|
49 | |
---|
50 | <procedure>(permutations n k)</procedure> |
---|
51 | returns the number of ways to select {{k}} items from {{n}}, where the order does matter. |
---|
52 | |
---|
53 | <procedure>(random-normal mean sd)</procedure> |
---|
54 | returns a random number distributed with specified mean and standard deviation. |
---|
55 | |
---|
56 | <procedure>(random-pick items)</procedure> |
---|
57 | returns a random item from the given list of items. |
---|
58 | |
---|
59 | <procedure>(random-sample n items)</procedure> |
---|
60 | returns a random sample from the list of items without replacement of size {{n}}. |
---|
61 | |
---|
62 | <procedure>(sign n)</procedure> |
---|
63 | returns 0, 1 or -1 according to if {{n}} is zero, positive or negative. |
---|
64 | |
---|
65 | <procedure>(square n)</procedure> |
---|
66 | |
---|
67 | ==== Descriptive statistics |
---|
68 | |
---|
69 | These functions provide information on a given list of numbers, the {{items}}. Note, the list does not have to be sorted. |
---|
70 | |
---|
71 | <procedure>(mean items)</procedure> |
---|
72 | returns the arithmetic mean of the {{items}} (the sum of the numbers divided by the number of numbers). |
---|
73 | (mean '(1 2 3 4 5)) => 3 |
---|
74 | |
---|
75 | <procedure>(median items)</procedure> |
---|
76 | returns the value which separates the upper and lower halves of the list of numbers. |
---|
77 | (median '(1 2 3 4)) => 5/2 |
---|
78 | |
---|
79 | <procedure>(mode items)</procedure> |
---|
80 | returns two '''values'''. The first is a list of the ''modes'' and the second is the frequency. (A mode of a list of numbers is the most frequently occurring value.) |
---|
81 | > (mode '(1 2 3 4)) |
---|
82 | (1 2 3 4) |
---|
83 | 1 |
---|
84 | > (mode '(1 2 2 3 4)) |
---|
85 | (2) |
---|
86 | 2 |
---|
87 | > (mode '(1 2 2 3 3 4)) |
---|
88 | (2 3) |
---|
89 | 2 |
---|
90 | |
---|
91 | <procedure>(geometric-mean items)</procedure> |
---|
92 | returns the geometric mean of the {{items}} (the result of multiplying the items together and then taking the nth root, where n is the number of items). |
---|
93 | (geometric-mean '(1 2 3 4 5)) => 2.60517108469735 |
---|
94 | |
---|
95 | <procedure>(range items)</procedure> |
---|
96 | returns the difference between the biggest and the smallest value from the list of {{items}}. |
---|
97 | (range '(5 1 2 3 4)) => 4 |
---|
98 | |
---|
99 | <procedure>(percentile items percent)</procedure> |
---|
100 | returns the item closest to the {{percent}} value if the {{items}} are sorted into order; the returned item may be in the list, or the average of adjacent values. |
---|
101 | (percentile '(1 2 3 4) 50) => 5/2 |
---|
102 | (percentile '(1 2 3 4) 67) => 3 |
---|
103 | |
---|
104 | <procedure>(variance items)</procedure> |
---|
105 | |
---|
106 | <procedure>(standard-deviation items)</procedure> |
---|
107 | |
---|
108 | <procedure>(coefficient-of-variation items)</procedure> |
---|
109 | returns 100 * (std-dev / mean) of the {{items}}. |
---|
110 | (coefficient-of-variation '(1 2 3 4)) => 51.6397779494322 |
---|
111 | |
---|
112 | <procedure>(standard-error-of-the-mean items)</procedure> |
---|
113 | returns std-dev / sqrt(length items). |
---|
114 | (standard-error-of-the-mean '(1 2 3 4)) => 0.645497224367903 |
---|
115 | |
---|
116 | <procedure>(mean-sd-n items)</procedure> |
---|
117 | returns three '''values''', one for the mean, one for the standard deviation, and one for the length of the list. |
---|
118 | > (mean-sd-n '(1 2 3 4)) |
---|
119 | 5/2 |
---|
120 | 1.29099444873581 |
---|
121 | 4 |
---|
122 | |
---|
123 | ==== Distributional functions |
---|
124 | |
---|
125 | <procedure>(binomial-probability n k p)</procedure> |
---|
126 | returns the probability that the number of positive outcomes for a binomial distribution B(n, p) is k. |
---|
127 | > (do-ec (: i 0 11) |
---|
128 | (format #t "i = ~d P = ~f~&" i (binomial-probability 10 i 0.5))) |
---|
129 | i = 0 P = 0.0009765625 |
---|
130 | i = 1 P = 0.009765625 |
---|
131 | i = 2 P = 0.0439453125 |
---|
132 | i = 3 P = 0.1171875 |
---|
133 | i = 4 P = 0.205078125 |
---|
134 | i = 5 P = 0.24609375 |
---|
135 | i = 6 P = 0.205078125 |
---|
136 | i = 7 P = 0.1171875 |
---|
137 | i = 8 P = 0.0439453125 |
---|
138 | i = 9 P = 0.009765625 |
---|
139 | i = 10 P = 0.0009765625 |
---|
140 | |
---|
141 | <procedure>(binomial-cumulative-probability n k p)</procedure> |
---|
142 | returns the probability that less than {{k}} positive outcomes occur for a binomial distribution B(n, p). |
---|
143 | > (do-ec (: i 0 11) |
---|
144 | (format #t "i = ~d P = ~f~&" i (binomial-cumulative-probability 10 i 0.5))) |
---|
145 | i = 0 P = 0.0 |
---|
146 | i = 1 P = 0.0009765625 |
---|
147 | i = 2 P = 0.0107421875 |
---|
148 | i = 3 P = 0.0546875 |
---|
149 | i = 4 P = 0.171875 |
---|
150 | i = 5 P = 0.376953125 |
---|
151 | i = 6 P = 0.623046875 |
---|
152 | i = 7 P = 0.828125 |
---|
153 | i = 8 P = 0.9453125 |
---|
154 | i = 9 P = 0.9892578125 |
---|
155 | i = 10 P = 0.9990234375 |
---|
156 | |
---|
157 | <procedure>(binomial-ge-probability n k p)</procedure> |
---|
158 | returns the probability of {{k}} or more positive outcomes for a binomial distribution B(n, p). |
---|
159 | |
---|
160 | <procedure>(binomial-le-probability n k p)</procedure> |
---|
161 | returns the probability {{k}} or fewer positive outcomes for a binomial distribution B(n, p). |
---|
162 | |
---|
163 | <procedure>(poisson-probability mu k)</procedure> |
---|
164 | returns the probability of {{k}} events occurring when the average is {{mu}}. |
---|
165 | > (do-ec (: i 0 20) |
---|
166 | (format #t "P(X=~2d) = ~,4f~&" i (poisson-probability 10 i))) |
---|
167 | P(X= 0) = 0.0000 |
---|
168 | P(X= 1) = 0.0005 |
---|
169 | P(X= 2) = 0.0023 |
---|
170 | P(X= 3) = 0.0076 |
---|
171 | P(X= 4) = 0.0189 |
---|
172 | P(X= 5) = 0.0378 |
---|
173 | P(X= 6) = 0.0631 |
---|
174 | P(X= 7) = 0.0901 |
---|
175 | P(X= 8) = 0.1126 |
---|
176 | P(X= 9) = 0.1251 |
---|
177 | P(X=10) = 0.1251 |
---|
178 | P(X=11) = 0.1137 |
---|
179 | P(X=12) = 0.0948 |
---|
180 | P(X=13) = 0.0729 |
---|
181 | P(X=14) = 0.0521 |
---|
182 | P(X=15) = 0.0347 |
---|
183 | P(X=16) = 0.0217 |
---|
184 | P(X=17) = 0.0128 |
---|
185 | P(X=18) = 0.0071 |
---|
186 | P(X=19) = 0.0037 |
---|
187 | |
---|
188 | <procedure>(poisson-cumulative-probability mu k)</procedure> |
---|
189 | returns the probability of less than {{k}} events occurring when the average is {{mu}}. |
---|
190 | > (do-ec (: i 0 20) |
---|
191 | (format #t "P(X=~2d) = ~,4f~&" i (poisson-cumulative-probability 10 i))) |
---|
192 | P(X= 0) = 0.0000 |
---|
193 | P(X= 1) = 0.0000 |
---|
194 | P(X= 2) = 0.0005 |
---|
195 | P(X= 3) = 0.0028 |
---|
196 | P(X= 4) = 0.0103 |
---|
197 | P(X= 5) = 0.0293 |
---|
198 | P(X= 6) = 0.0671 |
---|
199 | P(X= 7) = 0.1301 |
---|
200 | P(X= 8) = 0.2202 |
---|
201 | P(X= 9) = 0.3328 |
---|
202 | P(X=10) = 0.4579 |
---|
203 | P(X=11) = 0.5830 |
---|
204 | P(X=12) = 0.6968 |
---|
205 | P(X=13) = 0.7916 |
---|
206 | P(X=14) = 0.8645 |
---|
207 | P(X=15) = 0.9165 |
---|
208 | P(X=16) = 0.9513 |
---|
209 | P(X=17) = 0.9730 |
---|
210 | P(X=18) = 0.9857 |
---|
211 | P(X=19) = 0.9928 |
---|
212 | |
---|
213 | <procedure>(poisson-ge-probability mu k)</procedure> |
---|
214 | returns the probability of {{k}} or more events occurring when the average is {{mu}}. |
---|
215 | |
---|
216 | <procedure>(normal-pdf x mean variance)</procedure> |
---|
217 | returns the likelihood of {{x}} given a normal distribution with stated mean and variance. |
---|
218 | > (do-ec (: i 0 11) |
---|
219 | (format #t "~3d ~,4f~&" i (normal-pdf i 5 4))) |
---|
220 | 0 0.0088 |
---|
221 | 1 0.0270 |
---|
222 | 2 0.0648 |
---|
223 | 3 0.1210 |
---|
224 | 4 0.1760 |
---|
225 | 5 0.1995 |
---|
226 | 6 0.1760 |
---|
227 | 7 0.1210 |
---|
228 | 8 0.0648 |
---|
229 | 9 0.0270 |
---|
230 | 10 0.0088 |
---|
231 | |
---|
232 | <procedure>(convert-to-standard-normal x mean variance)</procedure> |
---|
233 | returns a value for {{x}} rescaling the given normal distribution to a standard N(0, 1). |
---|
234 | > (convert-to-standard-normal 5 6 2) |
---|
235 | -1/2 |
---|
236 | |
---|
237 | <procedure>(phi x)</procedure> |
---|
238 | returns the cumulative distribution function (CDF) of the standard normal distribution. |
---|
239 | > (do-ec (: x -2 2 0.4) |
---|
240 | (format #t "~4,1f ~,4f~&" x (phi x))) |
---|
241 | -2.0 0.0228 |
---|
242 | -1.6 0.0548 |
---|
243 | -1.2 0.1151 |
---|
244 | -0.8 0.2119 |
---|
245 | -0.4 0.3446 |
---|
246 | 0.0 0.5000 |
---|
247 | 0.4 0.6554 |
---|
248 | 0.8 0.7881 |
---|
249 | 1.2 0.8849 |
---|
250 | 1.6 0.9452 |
---|
251 | |
---|
252 | * z |
---|
253 | * t-distribution |
---|
254 | * chi-square |
---|
255 | * chi-square-cdf |
---|
256 | |
---|
257 | ==== Confidence intervals |
---|
258 | |
---|
259 | These functions report bounds for an observed property of a distribution: the bounds are tighter as the confidence level, alpha, varies from 0.0 to 1.0. |
---|
260 | |
---|
261 | <procedure>(binomial-probability-ci n p alpha)</procedure> |
---|
262 | returns two values, the upper and lower bounds on an observed probability {{p}} from {{n}} trials with confidence {{(1-alpha)}}. |
---|
263 | > (binomial-probability-ci 10 0.8 0.9) |
---|
264 | 0.724273681640625 |
---|
265 | 0.851547241210938 |
---|
266 | ; 2 values |
---|
267 | |
---|
268 | <procedure>(poisson-mu-ci k alpha)</procedure> |
---|
269 | returns two values, the upper and lower bounds on the poisson parameter if {{k}} events are observed; the bound is for confidence {{(1-alpha)}}. |
---|
270 | > (poisson-mu-ci 10 0.9) |
---|
271 | 8.305419921875 |
---|
272 | 10.0635986328125 |
---|
273 | ; 2 values |
---|
274 | |
---|
275 | <procedure>(normal-mean-ci mean standard-deviation k alpha)</procedure> |
---|
276 | returns two values, the upper and lower bounds on the mean of the normal distibution of {{k}} events are observed; the bound is for confidence {{(1-alpha)}}. |
---|
277 | > (normal-mean-ci 0.5 0.1 10 0.8) |
---|
278 | 0.472063716520217 |
---|
279 | 0.527936283479783 |
---|
280 | ; 2 values |
---|
281 | |
---|
282 | <procedure>(normal-mean-ci-on-sequence items alpha)</procedure> |
---|
283 | returns two values, the upper and lower bounds on the mean of the given {{items}}, assuming they are normally distributed; the bound is for confidence {{(1-alpha)}}. |
---|
284 | > (normal-mean-ci-on-sequence '(1 2 3 4 5) 0.9) |
---|
285 | 2.40860081649174 |
---|
286 | 3.59139918350826 |
---|
287 | ; 2 values |
---|
288 | |
---|
289 | <procedure>(normal-variance-ci standard-deviation k alpha)</procedure> |
---|
290 | returns two values, the upper and lower bounds on the variance of the normal distibution of {{k}} events are observed; the bound is for confidence {{(1-alpha)}}. |
---|
291 | |
---|
292 | <procedure>(normal-variance-ci-on-sequence items alpha)</procedure> |
---|
293 | returns two values, the upper and lower bounds on the variance of the given {{items}}, assuming they are normally distributed; the bound is for confidence {{(1-alpha)}}. |
---|
294 | |
---|
295 | <procedure>normal-sd-ci standard-deviation k alpha)</procedure> |
---|
296 | returns two values, the upper and lower bounds on the standard deviation of the normal distibution of {{k}} events are observed; the bound is for confidence {{(1-alpha)}}. |
---|
297 | |
---|
298 | <procedure>(normal-sd-ci-on-sequence sequence items)</procedure> |
---|
299 | returns two values, the upper and lower bounds on the standard deviation of the given {{items}}, assuming they are normally distributed; the bound is for confidence {{(1-alpha)}}. |
---|
300 | |
---|
301 | ==== Hypothesis testing |
---|
302 | |
---|
303 | ===== (parametric) |
---|
304 | |
---|
305 | * z-test |
---|
306 | * z-test-on-sequence |
---|
307 | * t-test-one-sample |
---|
308 | * t-test-one-sample-on-sequence |
---|
309 | * t-test-paired |
---|
310 | * t-test-paired-on-sequences |
---|
311 | * t-test-two-sample |
---|
312 | * t-test-two-sample-on-sequences |
---|
313 | * f-test |
---|
314 | * chi-square-test-one-sample |
---|
315 | * binomial-test-one-sample |
---|
316 | * binomial-test-two-sample |
---|
317 | * fisher-exact-test |
---|
318 | * mcnemars-test |
---|
319 | * poisson-test-one-sample |
---|
320 | |
---|
321 | ===== (non parametric) |
---|
322 | |
---|
323 | * sign-test |
---|
324 | * sign-test-on-sequence |
---|
325 | * wilcoxon-signed-rank-test |
---|
326 | * wilcoxon-signed-rank-test-on-sequences |
---|
327 | * chi-square-test-rxc |
---|
328 | * chi-square-test-for-trend |
---|
329 | |
---|
330 | ==== Sample size estimates |
---|
331 | |
---|
332 | * t-test-one-sample-sse |
---|
333 | * t-test-two-sample-sse |
---|
334 | * t-test-paired-sse |
---|
335 | * binomial-test-one-sample-sse |
---|
336 | * binomial-test-two-sample-sse |
---|
337 | * binomial-test-paired-sse |
---|
338 | * correlation-sse |
---|
339 | |
---|
340 | ==== Correlation and regression |
---|
341 | |
---|
342 | * linear-regression |
---|
343 | * correlation-coefficient |
---|
344 | * correlation-test-two-sample |
---|
345 | * correlation-test-two-sample-on-sequences |
---|
346 | * spearman-rank-correlation |
---|
347 | |
---|
348 | ==== Significance test functions |
---|
349 | |
---|
350 | * t-significance |
---|
351 | * f-significance |
---|
352 | |
---|
353 | |
---|
354 | === Authors |
---|
355 | |
---|
356 | [[http://wiki.call-cc.org/users/peter-lane|Peter Lane]] wrote the scheme version of this library. The original Lisp version was written by [[http://compbio.ucdenver.edu/hunter/|Larry Hunter]]. |
---|
357 | |
---|
358 | === License |
---|
359 | |
---|
360 | GPL version 3.0. |
---|
361 | |
---|
362 | === Requirements |
---|
363 | |
---|
364 | Needs srfi-1, srfi-25, srfi-69, vector-lib, numbers, extras, foreign, format |
---|
365 | |
---|
366 | Uses the GNU scientific library for basic numeric processing, so requires libgsl, libgslcblas and the development files for libgsl. |
---|
367 | |
---|
368 | === Version History |
---|
369 | |
---|
370 | trunk, for testing |
---|