source: project/wiki/eggref/4/statistics @ 20603

Last change on this file since 20603 was 20603, checked in by svnwiki, 10 years ago

Anonymous wiki edit for IP [86.141.186.14]:

File size: 12.3 KB
Line 
1Still under testing!
2
3== Introduction
4
5This library is a port of [[http://compbio.uchsc.edu/Hunter_lab/Hunter/|Larry Hunter]]'s Lisp statistics library to chicken scheme.
6
7The library provides a number of formulae and methods taken from the book "Fundamentals of Biostatistics" by Bernard Rosner (5th edition).
8
9=== Statistical Distributions
10
11To use this library, you need to understand the underlying statistics.  In brief:
12
13The [[http://en.wikipedia.org/wiki/Binomial_distribution|Binomial distribution]] is used when counting discrete events in a series of trials, each of which events has a probability p of producing a positive outcome.  An example would be tossing a coin {{n}} times: the probability of a head is {{p}}, and the distribution gives the expected number of heads in the {{n}} trials.
14
15The [[http://en.wikipedia.org/wiki/Poisson_distribution|Poisson distribution]] is used to count discrete events which occur with a known average rate.  A typical example is the decay of radioactive elements.
16
17The [[http://en.wikipedia.org/wiki/Normal_distribution|Normal distribution]] is used for real-valued events which cluster around a specific mean with a symmetric variance.  A typical example would be the distribution of people's heights.
18
19== Provided Functions
20
21=== Utilities
22
23<procedure>(average-rank value sorted-values)</procedure>
24returns the average position of given value in the list of sorted values: the rank is based from 1.
25 > (average-rank 2 '(1 2 2 3 4))
26 5/2
27
28<procedure>(beta-incomplete x a b)</procedure>
29
30<procedure>(bin-and-count items n)</procedure>
31Divides the range of the list of {{items}} into {{n}} bins, and returns a vector of the number of items which fall into each bin.
32 > (bin-and-count '(1 1 2 3 3 4 5) 5)
33 #(2 1 2 1 1)
34
35<procedure>(combinations n k)</procedure>
36returns the number of ways to select {{k}} items from {{n}}, where the order does not matter.
37
38<procedure>(factorial n)</procedure>
39returns the factorial of {{n}}.
40
41<procedure>(find-critical-value p-function p-value)</procedure>
42
43<procedure>(fisher-z-transform r)</procedure>
44returns the transformation of a correlation coefficient {{r}} into an approximately normal distribution.
45
46<procedure>(gamma-incomplete a x)</procedure>
47
48<procedure>(gamma-ln x)</procedure>
49
50<procedure>(permutations n k)</procedure>
51returns the number of ways to select {{k}} items from {{n}}, where the order does matter.
52
53<procedure>(random-normal mean sd)</procedure>
54returns a random number distributed with specified mean and standard deviation.
55
56<procedure>(random-pick items)</procedure>
57returns a random item from the given list of items.
58
59<procedure>(random-sample n items)</procedure>
60returns a random sample from the list of items without replacement of size {{n}}.
61
62<procedure>(sign n)</procedure>
63returns 0, 1 or -1 according to if {{n}} is zero, positive or negative.
64
65<procedure>(square n)</procedure>
66
67=== Descriptive statistics
68
69These functions provide information on a given list of numbers, the {{items}}.  Note, the list does not have to be sorted.
70
71<procedure>(mean items)</procedure>
72returns the arithmetic mean of the {{items}} (the sum of the numbers divided by the number of numbers).
73 (mean '(1 2 3 4 5)) => 3
74
75<procedure>(median items)</procedure>
76returns the value which separates the upper and lower halves of the list of numbers.
77 (median '(1 2 3 4)) => 5/2
78
79<procedure>(mode items)</procedure>
80returns two '''values'''.  The first is a list of the ''modes'' and the second is the frequency.  (A mode of a list of numbers is the most frequently occurring value.)
81 > (mode '(1 2 3 4))
82 (1 2 3 4)
83 1
84 > (mode '(1 2 2 3 4))
85 (2)
86 2
87 > (mode '(1 2 2 3 3 4))
88 (2 3)
89 2
90
91<procedure>(geometric-mean items)</procedure>
92returns the geometric mean of the {{items}} (the result of multiplying the items together and then taking the nth root, where n is the number of items).
93 (geometric-mean '(1 2 3 4 5)) => 2.60517108469735
94
95<procedure>(range items)</procedure>
96returns the difference between the biggest and the smallest value from the list of {{items}}.
97 (range '(5 1 2 3 4)) => 4
98
99<procedure>(percentile items percent)</procedure>
100returns the item closest to the {{percent}} value if the {{items}} are sorted into order; the returned item may be in the list, or the average of adjacent values.
101 (percentile '(1 2 3 4) 50) => 5/2
102 (percentile '(1 2 3 4) 67) => 3
103
104<procedure>(variance items)</procedure>
105
106<procedure>(standard-deviation items)</procedure>
107
108<procedure>(coefficient-of-variation items)</procedure>
109returns 100 * (std-dev / mean) of the {{items}}.
110 (coefficient-of-variation '(1 2 3 4)) => 51.6397779494322
111
112<procedure>(standard-error-of-the-mean items)</procedure>
113returns std-dev / sqrt(length items).
114  (standard-error-of-the-mean '(1 2 3 4)) => 0.645497224367903
115
116<procedure>(mean-sd-n items)</procedure>
117returns three '''values''', one for the mean, one for the standard deviation, and one for the length of the list.
118 > (mean-sd-n '(1 2 3 4))
119 5/2
120 1.29099444873581
121 4
122
123=== Distributional functions
124
125<procedure>(binomial-probability n k p)</procedure>
126returns the probability that the number of positive outcomes for a binomial distribution B(n, p) is k.
127 > (do-ec (: i 0 11)
128          (format #t "i = ~d P = ~f~&" i (binomial-probability 10 i 0.5)))
129 i = 0 P = 0.0009765625
130 i = 1 P = 0.009765625
131 i = 2 P = 0.0439453125
132 i = 3 P = 0.1171875
133 i = 4 P = 0.205078125
134 i = 5 P = 0.24609375
135 i = 6 P = 0.205078125
136 i = 7 P = 0.1171875
137 i = 8 P = 0.0439453125
138 i = 9 P = 0.009765625
139 i = 10 P = 0.0009765625
140
141<procedure>(binomial-cumulative-probability n k p)</procedure>
142returns the probability that less than {{k}} positive outcomes occur for a binomial distribution B(n, p).
143 > (do-ec (: i 0 11)
144          (format #t "i = ~d P = ~f~&" i (binomial-cumulative-probability 10 i 0.5)))
145 i = 0 P = 0.0
146 i = 1 P = 0.0009765625
147 i = 2 P = 0.0107421875
148 i = 3 P = 0.0546875
149 i = 4 P = 0.171875
150 i = 5 P = 0.376953125
151 i = 6 P = 0.623046875
152 i = 7 P = 0.828125
153 i = 8 P = 0.9453125
154 i = 9 P = 0.9892578125
155 i = 10 P = 0.9990234375
156
157<procedure>(binomial-ge-probability n k p)</procedure>
158returns the probability of {{k}} or more positive outcomes for a binomial distribution B(n, p).
159
160<procedure>(binomial-le-probability n k p)</procedure>
161returns the probability {{k}} or fewer positive outcomes for a binomial distribution B(n, p).
162
163<procedure>(poisson-probability mu k)</procedure>
164returns the probability of {{k}} events occurring when the average is {{mu}}.
165 > (do-ec (: i 0 20)
166          (format #t "P(X=~2d) = ~,4f~&" i (poisson-probability 10 i)))
167 P(X= 0) = 0.0000
168 P(X= 1) = 0.0005
169 P(X= 2) = 0.0023
170 P(X= 3) = 0.0076
171 P(X= 4) = 0.0189
172 P(X= 5) = 0.0378
173 P(X= 6) = 0.0631
174 P(X= 7) = 0.0901
175 P(X= 8) = 0.1126
176 P(X= 9) = 0.1251
177 P(X=10) = 0.1251
178 P(X=11) = 0.1137
179 P(X=12) = 0.0948
180 P(X=13) = 0.0729
181 P(X=14) = 0.0521
182 P(X=15) = 0.0347
183 P(X=16) = 0.0217
184 P(X=17) = 0.0128
185 P(X=18) = 0.0071
186 P(X=19) = 0.0037
187
188<procedure>(poisson-cumulative-probability mu k)</procedure>
189returns the probability of less than {{k}} events occurring when the average is {{mu}}.
190 > (do-ec (: i 0 20)
191          (format #t "P(X=~2d) = ~,4f~&" i (poisson-cumulative-probability 10 i)))
192 P(X= 0) = 0.0000
193 P(X= 1) = 0.0000
194 P(X= 2) = 0.0005
195 P(X= 3) = 0.0028
196 P(X= 4) = 0.0103
197 P(X= 5) = 0.0293
198 P(X= 6) = 0.0671
199 P(X= 7) = 0.1301
200 P(X= 8) = 0.2202
201 P(X= 9) = 0.3328
202 P(X=10) = 0.4579
203 P(X=11) = 0.5830
204 P(X=12) = 0.6968
205 P(X=13) = 0.7916
206 P(X=14) = 0.8645
207 P(X=15) = 0.9165
208 P(X=16) = 0.9513
209 P(X=17) = 0.9730
210 P(X=18) = 0.9857
211 P(X=19) = 0.9928
212
213<procedure>(poisson-ge-probability mu k)</procedure>
214returns the probability of {{k}} or more events occurring when the average is {{mu}}.
215
216<procedure>(normal-pdf x mean variance)</procedure>
217returns the likelihood of {{x}} given a normal distribution with stated mean and variance.
218 > (do-ec (: i 0 11)
219          (format #t "~3d ~,4f~&" i (normal-pdf i 5 4)))
220  0 0.0088
221  1 0.0270
222  2 0.0648
223  3 0.1210
224  4 0.1760
225  5 0.1995
226  6 0.1760
227  7 0.1210
228  8 0.0648
229  9 0.0270
230 10 0.0088
231
232<procedure>(convert-to-standard-normal x mean variance)</procedure>
233returns a value for {{x}} rescaling the given normal distribution to a standard N(0, 1).
234 > (convert-to-standard-normal 5 6 2)
235 -1/2
236
237<procedure>(phi x)</procedure>
238returns the cumulative distribution function (CDF) of the standard normal distribution.
239 > (do-ec (: x -2 2 0.4)
240         (format #t "~4,1f ~,4f~&" x (phi x)))
241 -2.0 0.0228
242 -1.6 0.0548
243 -1.2 0.1151
244 -0.8 0.2119
245 -0.4 0.3446
246  0.0 0.5000
247  0.4 0.6554
248  0.8 0.7881
249  1.2 0.8849
250  1.6 0.9452
251
252*    z
253*    t-distribution
254*    chi-square
255*    chi-square-cdf
256
257===  Confidence intervals
258
259These functions report bounds for an observed property of a distribution: the bounds are tighter as the confidence level, alpha, varies from 0.0 to 1.0.
260
261<procedure>(binomial-probability-ci n p alpha)</procedure>
262returns two values, the upper and lower bounds on an observed probability {{p}} from {{n}} trials with confidence {{(1-alpha)}}.
263 > (binomial-probability-ci 10 0.8 0.9)
264 0.724273681640625
265 0.851547241210938
266 ; 2 values
267
268<procedure>(poisson-mu-ci k alpha)</procedure>
269returns two values, the upper and lower bounds on the poisson parameter if {{k}} events are observed; the bound is for confidence {{(1-alpha)}}.
270 > (poisson-mu-ci 10 0.9)
271 8.305419921875
272 10.0635986328125
273 ; 2 values
274
275<procedure>(normal-mean-ci mean standard-deviation k alpha)</procedure>
276returns two values, the upper and lower bounds on the mean of the normal distibution of {{k}} events are observed; the bound is for confidence {{(1-alpha)}}.
277 > (normal-mean-ci 0.5 0.1 10 0.8)
278 0.472063716520217
279 0.527936283479783
280 ; 2 values
281
282<procedure>(normal-mean-ci-on-sequence items alpha)</procedure>
283returns two values, the upper and lower bounds on the mean of the given {{items}}, assuming they are normally distributed; the bound is for confidence {{(1-alpha)}}.
284 > (normal-mean-ci-on-sequence '(1 2 3 4 5) 0.9)
285 2.40860081649174
286 3.59139918350826
287 ; 2 values
288
289<procedure>(normal-variance-ci standard-deviation k alpha)</procedure>
290returns two values, the upper and lower bounds on the variance of the normal distibution of {{k}} events are observed; the bound is for confidence {{(1-alpha)}}.
291
292<procedure>(normal-variance-ci-on-sequence items alpha)</procedure>
293returns two values, the upper and lower bounds on the variance of the given {{items}}, assuming they are normally distributed; the bound is for confidence {{(1-alpha)}}.
294
295<procedure>normal-sd-ci standard-deviation k alpha)</procedure>
296returns two values, the upper and lower bounds on the standard deviation of the normal distibution of {{k}} events are observed; the bound is for confidence {{(1-alpha)}}.
297
298<procedure>(normal-sd-ci-on-sequence sequence items)</procedure>
299returns two values, the upper and lower bounds on the standard deviation of the given {{items}}, assuming they are normally distributed; the bound is for confidence {{(1-alpha)}}.
300
301=== Hypothesis testing
302
303====  (parametric)
304
305*    z-test
306*    z-test-on-sequence
307*    t-test-one-sample
308*    t-test-one-sample-on-sequence
309*    t-test-paired
310*    t-test-paired-on-sequences
311*    t-test-two-sample
312*    t-test-two-sample-on-sequences
313*    f-test
314*    chi-square-test-one-sample
315*    binomial-test-one-sample
316*    binomial-test-two-sample
317*    fisher-exact-test
318*    mcnemars-test
319*    poisson-test-one-sample
320
321==== (non parametric)
322
323*    sign-test
324*    sign-test-on-sequence
325*    wilcoxon-signed-rank-test
326*    wilcoxon-signed-rank-test-on-sequences
327*    chi-square-test-rxc
328*    chi-square-test-for-trend
329
330=== Sample size estimates
331
332*    t-test-one-sample-sse
333*    t-test-two-sample-sse
334*    t-test-paired-sse
335*    binomial-test-one-sample-sse
336*    binomial-test-two-sample-sse
337*    binomial-test-paired-sse
338*    correlation-sse
339
340=== Correlation and regression
341
342*    linear-regression
343*    correlation-coefficient
344*    correlation-test-two-sample
345*    correlation-test-two-sample-on-sequences
346*    spearman-rank-correlation
347
348=== Significance test functions
349
350*    t-significance
351*    f-significance
352
353
354== Authors
355
356[[http://wiki.call-cc.org/users/peter-lane|Peter Lane]] wrote the scheme version of this library.  The original Lisp version was written by [[http://compbio.uhsc.edu/Hunter_lab/Hunter/|Larry Hunter]].
357
358== License
359
360GPL version 3.0.
361
362== Requirements
363
364Needs srfi-1, srfi-25, srfi-69, vector-lib, numbers, extras, foreign, format
365
366Uses the GNU scientific library for basic numeric processing, so requires libgsl, libgslcblas and the development files for libgsl.
367
368== Version History
369
370trunk, for testing
Note: See TracBrowser for help on using the repository browser.