# source:project/wiki/eggref/4/statistics@20599

Last change on this file since 20599 was 20599, checked in by svnwiki, 10 years ago

Anonymous wiki edit for IP [86.129.208.150]:

File size: 8.0 KB
Line
1Still under testing!
2
3== Introduction
4
5This library is a port of [[http://compbio.uchsc.edu/Hunter_lab/Hunter/|Larry Hunter]]'s Lisp statistics library to chicken scheme.
6
7The library provides a number of formulae and methods taken from the book "Fundamentals of Biostatistics" by Bernard Rosner (5th edition).
8
9=== Binomial Distribution
10
11The [[http://en.wikipedia.org/wiki/Binomial_distribution|Binomial distribution]] is what you get from counting discrete events in a series of trials, each of which events has a probability p of producing a positive outcome.  An example would be tossing a coin {{n}} times: the probability of a head is {{p}}, and the distribution gives the expected number of heads in the {{n}} trials.
12
13== Provided Functions
14
15=== Utilities
16
17<procedure>(average-rank value sorted-values)</procedure>
18returns the average position of given value in the list of sorted values: the rank is based from 1.
19 > (average-rank 2 '(1 2 2 3 4))
20 5/2
21
22<procedure>(beta-incomplete x a b)</procedure>
23
24<procedure>(bin-and-count items n)</procedure>
25Divides the range of the list of {{items}} into {{n}} bins, and returns a vector of the number of items which fall into each bin.
26 > (bin-and-count '(1 1 2 3 3 4 5) 5)
27 #(2 1 2 1 1)
28
29<procedure>(combinations n k)</procedure>
30returns the number of ways to select {{k}} items from {{n}}, where the order does not matter.
31
32<procedure>(factorial n)</procedure>
33returns the factorial of {{n}}.
34
35<procedure>(find-critical-value p-function p-value)</procedure>
36
37<procedure>(fisher-z-transform r)</procedure>
38returns the transformation of a correlation coefficient {{r}} into an approximately normal distribution.
39
40<procedure>(gamma-incomplete a x)</procedure>
41
42<procedure>(gamma-ln x)</procedure>
43
44<procedure>(permutations n k)</procedure>
45returns the number of ways to select {{k}} items from {{n}}, where the order does matter.
46
47<procedure>(random-normal mean sd)</procedure>
48returns a random number distributed with specified mean and standard deviation.
49
50<procedure>(random-pick items)</procedure>
51returns a random item from the given list of items.
52
53<procedure>(random-sample n items)</procedure>
54returns a random sample from the list of items without replacement of size {{n}}.
55
56<procedure>(sign n)</procedure>
57returns 0, 1 or -1 according to if {{n}} is zero, positive or negative.
58
59<procedure>(square n)</procedure>
60
61=== Descriptive statistics
62
63These functions provide information on a given list of numbers, the {{items}}.  Note, the list does not have to be sorted.
64
65<procedure>(mean items)</procedure>
66returns the arithmetic mean of the {{items}} (the sum of the numbers divided by the number of numbers).
67 (mean '(1 2 3 4 5)) => 3
68
69<procedure>(median items)</procedure>
70returns the value which separates the upper and lower halves of the list of numbers.
71 (median '(1 2 3 4)) => 5/2
72
73<procedure>(mode items)</procedure>
74returns two '''values'''.  The first is a list of the ''modes'' and the second is the frequency.  (A mode of a list of numbers is the most frequently occurring value.)
75 > (mode '(1 2 3 4))
76 (1 2 3 4)
77 1
78 > (mode '(1 2 2 3 4))
79 (2)
80 2
81 > (mode '(1 2 2 3 3 4))
82 (2 3)
83 2
84
85<procedure>(geometric-mean items)</procedure>
86returns the geometric mean of the {{items}} (the result of multiplying the items together and then taking the nth root, where n is the number of items).
87 (geometric-mean '(1 2 3 4 5)) => 2.60517108469735
88
89<procedure>(range items)</procedure>
90returns the difference between the biggest and the smallest value from the list of {{items}}.
91 (range '(5 1 2 3 4)) => 4
92
93<procedure>(percentile items percent)</procedure>
94returns the item closest to the {{percent}} value if the {{items}} are sorted into order; the returned item may be in the list, or the average of adjacent values.
95 (percentile '(1 2 3 4) 50) => 5/2
96 (percentile '(1 2 3 4) 67) => 3
97
98<procedure>(variance items)</procedure>
99
100<procedure>(standard-deviation items)</procedure>
101
102<procedure>(coefficient-of-variation items)</procedure>
103returns 100 * (std-dev / mean) of the {{items}}.
104 (coefficient-of-variation '(1 2 3 4)) => 51.6397779494322
105
106<procedure>(standard-error-of-the-mean items)</procedure>
107returns std-dev / sqrt(length items).
108  (standard-error-of-the-mean '(1 2 3 4)) => 0.645497224367903
109
110<procedure>(mean-sd-n items)</procedure>
111returns three '''values''', one for the mean, one for the standard deviation, and one for the length of the list.
112 > (mean-sd-n '(1 2 3 4))
113 5/2
114 1.29099444873581
115 4
116
117=== Distributional functions
118
119<procedure>(binomial-probability n k p)</procedure>
120returns the probability that the number of positive outcomes for a binomial distribution B(n, p) is k.
121 > (do-ec (: i 0 11)
122          (format #t "i = ~d P = ~f~&" i (binomial-probability 10 i 0.5)))
123 i = 0 P = 0.0009765625
124 i = 1 P = 0.009765625
125 i = 2 P = 0.0439453125
126 i = 3 P = 0.1171875
127 i = 4 P = 0.205078125
128 i = 5 P = 0.24609375
129 i = 6 P = 0.205078125
130 i = 7 P = 0.1171875
131 i = 8 P = 0.0439453125
132 i = 9 P = 0.009765625
133 i = 10 P = 0.0009765625
134
135<procedure>(binomial-cumulative-probability n k p)</procedure>
136returns the probability that less than {{k}} positive outcomes occur for a binomial distribution B(n, p).
137 > (do-ec (: i 0 11)
138          (format #t "i = ~d P = ~f~&" i (binomial-cumulative-probability 10 i 0.5)))
139 i = 0 P = 0.0
140 i = 1 P = 0.0009765625
141 i = 2 P = 0.0107421875
142 i = 3 P = 0.0546875
143 i = 4 P = 0.171875
144 i = 5 P = 0.376953125
145 i = 6 P = 0.623046875
146 i = 7 P = 0.828125
147 i = 8 P = 0.9453125
148 i = 9 P = 0.9892578125
149 i = 10 P = 0.9990234375
150
151<procedure>(binomial-ge-probability n k p)</procedure>
152returns the probability of {{k}} or more positive outcomes for a binomial distribution B(n, p).
153
154<procedure>(binomial-le-probability n k p)</procedure>
155returns the probability {{k}} or fewer positive outcomes for a binomial distribution B(n, p).
156
157*    poisson-probability
158*    poisson-cumulative-probability
159*    poisson-ge-probability
160*    normal-pdf
161*    convert-to-standard-normal
162*    phi
163*    z
164*    t-distribution
165*    chi-square
166*    chi-square-cdf
167
168===  Confidence intervals
169
170<procedure>(binomial-probability-ci n p alpha)</procedure>
171returns two values, the upper and lower bounds on an observed probability {{p}} from {{n}} trials with confidence {{(1-alpha)}}.
172 > (binomial-probability-ci 10 0.8 0.9)
173 0.724273681640625
174 0.851547241210938
175 ; 2 values
176
177*    poisson-mu-ci
178*    normal-mean-ci
179*    normal-mean-ci-on-sequence
180*    normal-variance-ci
181*    normal-variance-ci-on-sequence
182*    normal-sd-ci
183*    normal-sd-ci-on-sequence
184
185=== Hypothesis testing
186
187====  (parametric)
188
189*    z-test
190*    z-test-on-sequence
191*    t-test-one-sample
192*    t-test-one-sample-on-sequence
193*    t-test-paired
194*    t-test-paired-on-sequences
195*    t-test-two-sample
196*    t-test-two-sample-on-sequences
197*    f-test
198*    chi-square-test-one-sample
199*    binomial-test-one-sample
200*    binomial-test-two-sample
201*    fisher-exact-test
202*    mcnemars-test
203*    poisson-test-one-sample
204
205==== (non parametric)
206
207*    sign-test
208*    sign-test-on-sequence
209*    wilcoxon-signed-rank-test
210*    wilcoxon-signed-rank-test-on-sequences
211*    chi-square-test-rxc
212*    chi-square-test-for-trend
213
214=== Sample size estimates
215
216*    t-test-one-sample-sse
217*    t-test-two-sample-sse
218*    t-test-paired-sse
219*    binomial-test-one-sample-sse
220*    binomial-test-two-sample-sse
221*    binomial-test-paired-sse
222*    correlation-sse
223
224=== Correlation and regression
225
226*    linear-regression
227*    correlation-coefficient
228*    correlation-test-two-sample
229*    correlation-test-two-sample-on-sequences
230*    spearman-rank-correlation
231
232=== Significance test functions
233
234*    t-significance
235*    f-significance
236
237
238== Authors
239
240[[http://wiki.call-cc.org/users/peter-lane|Peter Lane]] wrote the scheme version of this library.  The original Lisp version was written by [[http://compbio.uhsc.edu/Hunter_lab/Hunter/|Larry Hunter]].
241