source: project/wiki/eggref/4/statistics @ 20600

Last change on this file since 20600 was 20600, checked in by svnwiki, 10 years ago

Anonymous wiki edit for IP [86.129.208.150]:

File size: 8.5 KB
Line 
1Still under testing!
2
3== Introduction
4
5This library is a port of [[http://compbio.uchsc.edu/Hunter_lab/Hunter/|Larry Hunter]]'s Lisp statistics library to chicken scheme.
6
7The library provides a number of formulae and methods taken from the book "Fundamentals of Biostatistics" by Bernard Rosner (5th edition).
8
9=== Statistical Distributions
10
11To use this library, you need to understand the underlying statistics.  In brief:
12
13The [[http://en.wikipedia.org/wiki/Binomial_distribution|Binomial distribution]] is used when counting discrete events in a series of trials, each of which events has a probability p of producing a positive outcome.  An example would be tossing a coin {{n}} times: the probability of a head is {{p}}, and the distribution gives the expected number of heads in the {{n}} trials.
14
15The [[http://en.wikipedia.org/wiki/Poisson_distribution|Poisson distribution]] is used to count discrete events which occur with a known average rate.  A typical example is the decay of radioactive elements.
16
17The [[http://en.wikipedia.org/wiki/Normal_distribution||Normal distribution]] is used for real-valued events which cluster around a specific mean with a symmetric variance.  A typical example would be the distribution of people's heights.
18
19== Provided Functions
20
21=== Utilities
22
23<procedure>(average-rank value sorted-values)</procedure>
24returns the average position of given value in the list of sorted values: the rank is based from 1.
25 > (average-rank 2 '(1 2 2 3 4))
26 5/2
27
28<procedure>(beta-incomplete x a b)</procedure>
29
30<procedure>(bin-and-count items n)</procedure>
31Divides the range of the list of {{items}} into {{n}} bins, and returns a vector of the number of items which fall into each bin.
32 > (bin-and-count '(1 1 2 3 3 4 5) 5)
33 #(2 1 2 1 1)
34
35<procedure>(combinations n k)</procedure>
36returns the number of ways to select {{k}} items from {{n}}, where the order does not matter.
37
38<procedure>(factorial n)</procedure>
39returns the factorial of {{n}}.
40
41<procedure>(find-critical-value p-function p-value)</procedure>
42
43<procedure>(fisher-z-transform r)</procedure>
44returns the transformation of a correlation coefficient {{r}} into an approximately normal distribution.
45
46<procedure>(gamma-incomplete a x)</procedure>
47
48<procedure>(gamma-ln x)</procedure>
49
50<procedure>(permutations n k)</procedure>
51returns the number of ways to select {{k}} items from {{n}}, where the order does matter.
52
53<procedure>(random-normal mean sd)</procedure>
54returns a random number distributed with specified mean and standard deviation.
55
56<procedure>(random-pick items)</procedure>
57returns a random item from the given list of items.
58
59<procedure>(random-sample n items)</procedure>
60returns a random sample from the list of items without replacement of size {{n}}.
61
62<procedure>(sign n)</procedure>
63returns 0, 1 or -1 according to if {{n}} is zero, positive or negative.
64
65<procedure>(square n)</procedure>
66
67=== Descriptive statistics
68
69These functions provide information on a given list of numbers, the {{items}}.  Note, the list does not have to be sorted.
70
71<procedure>(mean items)</procedure>
72returns the arithmetic mean of the {{items}} (the sum of the numbers divided by the number of numbers).
73 (mean '(1 2 3 4 5)) => 3
74
75<procedure>(median items)</procedure>
76returns the value which separates the upper and lower halves of the list of numbers.
77 (median '(1 2 3 4)) => 5/2
78
79<procedure>(mode items)</procedure>
80returns two '''values'''.  The first is a list of the ''modes'' and the second is the frequency.  (A mode of a list of numbers is the most frequently occurring value.)
81 > (mode '(1 2 3 4))
82 (1 2 3 4)
83 1
84 > (mode '(1 2 2 3 4))
85 (2)
86 2
87 > (mode '(1 2 2 3 3 4))
88 (2 3)
89 2
90
91<procedure>(geometric-mean items)</procedure>
92returns the geometric mean of the {{items}} (the result of multiplying the items together and then taking the nth root, where n is the number of items).
93 (geometric-mean '(1 2 3 4 5)) => 2.60517108469735
94
95<procedure>(range items)</procedure>
96returns the difference between the biggest and the smallest value from the list of {{items}}.
97 (range '(5 1 2 3 4)) => 4
98
99<procedure>(percentile items percent)</procedure>
100returns the item closest to the {{percent}} value if the {{items}} are sorted into order; the returned item may be in the list, or the average of adjacent values.
101 (percentile '(1 2 3 4) 50) => 5/2
102 (percentile '(1 2 3 4) 67) => 3
103
104<procedure>(variance items)</procedure>
105
106<procedure>(standard-deviation items)</procedure>
107
108<procedure>(coefficient-of-variation items)</procedure>
109returns 100 * (std-dev / mean) of the {{items}}.
110 (coefficient-of-variation '(1 2 3 4)) => 51.6397779494322
111
112<procedure>(standard-error-of-the-mean items)</procedure>
113returns std-dev / sqrt(length items).
114  (standard-error-of-the-mean '(1 2 3 4)) => 0.645497224367903
115
116<procedure>(mean-sd-n items)</procedure>
117returns three '''values''', one for the mean, one for the standard deviation, and one for the length of the list.
118 > (mean-sd-n '(1 2 3 4))
119 5/2
120 1.29099444873581
121 4
122
123=== Distributional functions
124
125<procedure>(binomial-probability n k p)</procedure>
126returns the probability that the number of positive outcomes for a binomial distribution B(n, p) is k.
127 > (do-ec (: i 0 11)
128          (format #t "i = ~d P = ~f~&" i (binomial-probability 10 i 0.5)))
129 i = 0 P = 0.0009765625
130 i = 1 P = 0.009765625
131 i = 2 P = 0.0439453125
132 i = 3 P = 0.1171875
133 i = 4 P = 0.205078125
134 i = 5 P = 0.24609375
135 i = 6 P = 0.205078125
136 i = 7 P = 0.1171875
137 i = 8 P = 0.0439453125
138 i = 9 P = 0.009765625
139 i = 10 P = 0.0009765625
140
141<procedure>(binomial-cumulative-probability n k p)</procedure>
142returns the probability that less than {{k}} positive outcomes occur for a binomial distribution B(n, p).
143 > (do-ec (: i 0 11)
144          (format #t "i = ~d P = ~f~&" i (binomial-cumulative-probability 10 i 0.5)))
145 i = 0 P = 0.0
146 i = 1 P = 0.0009765625
147 i = 2 P = 0.0107421875
148 i = 3 P = 0.0546875
149 i = 4 P = 0.171875
150 i = 5 P = 0.376953125
151 i = 6 P = 0.623046875
152 i = 7 P = 0.828125
153 i = 8 P = 0.9453125
154 i = 9 P = 0.9892578125
155 i = 10 P = 0.9990234375
156
157<procedure>(binomial-ge-probability n k p)</procedure>
158returns the probability of {{k}} or more positive outcomes for a binomial distribution B(n, p).
159
160<procedure>(binomial-le-probability n k p)</procedure>
161returns the probability {{k}} or fewer positive outcomes for a binomial distribution B(n, p).
162
163*    poisson-probability
164*    poisson-cumulative-probability
165*    poisson-ge-probability
166*    normal-pdf
167*    convert-to-standard-normal
168*    phi
169*    z
170*    t-distribution
171*    chi-square
172*    chi-square-cdf
173
174===  Confidence intervals
175
176<procedure>(binomial-probability-ci n p alpha)</procedure>
177returns two values, the upper and lower bounds on an observed probability {{p}} from {{n}} trials with confidence {{(1-alpha)}}.
178 > (binomial-probability-ci 10 0.8 0.9)
179 0.724273681640625
180 0.851547241210938
181 ; 2 values
182
183*    poisson-mu-ci
184*    normal-mean-ci
185*    normal-mean-ci-on-sequence
186*    normal-variance-ci
187*    normal-variance-ci-on-sequence
188*    normal-sd-ci
189*    normal-sd-ci-on-sequence
190
191=== Hypothesis testing
192
193====  (parametric)
194
195*    z-test
196*    z-test-on-sequence
197*    t-test-one-sample
198*    t-test-one-sample-on-sequence
199*    t-test-paired
200*    t-test-paired-on-sequences
201*    t-test-two-sample
202*    t-test-two-sample-on-sequences
203*    f-test
204*    chi-square-test-one-sample
205*    binomial-test-one-sample
206*    binomial-test-two-sample
207*    fisher-exact-test
208*    mcnemars-test
209*    poisson-test-one-sample
210
211==== (non parametric)
212
213*    sign-test
214*    sign-test-on-sequence
215*    wilcoxon-signed-rank-test
216*    wilcoxon-signed-rank-test-on-sequences
217*    chi-square-test-rxc
218*    chi-square-test-for-trend
219
220=== Sample size estimates
221
222*    t-test-one-sample-sse
223*    t-test-two-sample-sse
224*    t-test-paired-sse
225*    binomial-test-one-sample-sse
226*    binomial-test-two-sample-sse
227*    binomial-test-paired-sse
228*    correlation-sse
229
230=== Correlation and regression
231
232*    linear-regression
233*    correlation-coefficient
234*    correlation-test-two-sample
235*    correlation-test-two-sample-on-sequences
236*    spearman-rank-correlation
237
238=== Significance test functions
239
240*    t-significance
241*    f-significance
242
243
244== Authors
245
246[[http://wiki.call-cc.org/users/peter-lane|Peter Lane]] wrote the scheme version of this library.  The original Lisp version was written by [[http://compbio.uhsc.edu/Hunter_lab/Hunter/|Larry Hunter]].
247
248== License
249
250GPL version 3.0.
251
252== Requirements
253
254Needs srfi-1, srfi-25, srfi-69, vector-lib, numbers, extras, foreign, format
255
256Uses the GNU scientific library for basic numeric processing, so requires libgsl, libgslcblas and the development files for libgsl.
257
258== Version History
259
260trunk, for testing
Note: See TracBrowser for help on using the repository browser.