4 | == Statistics |
5 | |
6 | This library is a port of [[http://compbio.ucdenver.edu/hunter/|Larry Hunter]]'s Lisp statistics library to chicken scheme. |
7 | |
8 | The library provides a number of formulae and methods taken from the book "Fundamentals of Biostatistics" by Bernard Rosner (5th edition). |
9 | |
10 | === Statistical Distributions |
11 | |
12 | To use this library, you need to understand the underlying statistics. In brief: |
13 | |
14 | The [[http://en.wikipedia.org/wiki/Binomial_distribution|Binomial |
15 | distribution]] is used when counting discrete events in a series of |
16 | trials, each of which events has a probability p of producing a |
17 | positive outcome. An example would be tossing a coin {{n}} times: the |
18 | probability of a head is {{p}}, and the distribution gives the |
19 | expected number of heads in the {{n}} trials. The binomial |
20 | distribution is defined as B(n, p). |
21 | |
22 | The [[http://en.wikipedia.org/wiki/Poisson_distribution|Poisson |
23 | distribution]] is used to count discrete events which occur with a |
24 | known average rate. A typical example is the decay of radioactive |
25 | elements. A poisson distribution is defined Pois(mu). |
26 | |
27 | The [[http://en.wikipedia.org/wiki/Normal_distribution|Normal |
28 | distribution]] is used for real-valued events which cluster around a |
29 | specific mean with a symmetric variance. A typical example would be |
30 | the distribution of people's heights. A normal distribution is |
31 | defined N(mean, variance). |
32 | |
33 | === Provided Functions |
34 | |
35 | ==== Utilities |
36 | |
37 | <procedure>(average-rank value sorted-values)</procedure> |
38 | returns the average position of given value in the list of sorted values: the rank is based from 1. |
39 | > (average-rank 2 '(1 2 2 3 4)) |
40 | 5/2 |
41 | |
42 | <procedure>(beta-incomplete x a b)</procedure> |
43 | |
44 | <procedure>(bin-and-count items n)</procedure> |
45 | Divides the range of the list of {{items}} into {{n}} bins, and returns a vector of the number of items which fall into each bin. |
46 | > (bin-and-count '(1 1 2 3 3 4 5) 5) |
47 | #(2 1 2 1 1) |
48 | |
49 | <procedure>(combinations n k)</procedure> |
50 | returns the number of ways to select {{k}} items from {{n}}, where the order does not matter. |
51 | |
52 | <procedure>(factorial n)</procedure> |
53 | returns the factorial of {{n}}. |
54 | |
55 | <procedure>(find-critical-value p-function p-value #:increasing?)</procedure> |
56 | given a monotonic function {{p-function}} taking a single value {{x}} to {{y}}, returns the value of {{x}} which makes {{(p-function x)}} closest to {{p-value}}. A boolean keyword parameter {{#:increasing?}} determines if function should be increasing or decreasing (the default). |
57 | |
58 | <procedure>(fisher-z-transform r)</procedure> |
59 | returns the transformation of a correlation coefficient {{r}} into an approximately normal distribution. |
60 | |
61 | <procedure>(gamma-incomplete a x)</procedure> |
62 | |
63 | <procedure>(gamma-ln x)</procedure> |
64 | |
65 | <procedure>(permutations n k)</procedure> |
66 | returns the number of ways to select {{k}} items from {{n}}, where the order does matter. |
67 | |
68 | <procedure>(random-normal mean sd)</procedure> |
69 | returns a random number distributed with specified mean and standard deviation. |
70 | |
71 | <procedure>(random-pick items)</procedure> |
72 | returns a random item from the given list of items. |
73 | |
74 | <procedure>(random-sample n items)</procedure> |
75 | returns a random sample from the list of items without replacement of size {{n}}. |
76 | |
77 | <procedure>(random-weighted-sample n items weights)</procedure> |
78 | returns a random sample from the list of items without replacement of size {{n}}, where each sample has a defined probability of selection (weight). |
79 | |
80 | <procedure>(sign n)</procedure> |
81 | returns 0, 1 or -1 according to if {{n}} is zero, positive or negative. |
82 | |
83 | <procedure>(square n)</procedure> |
84 | |
85 | <procedure>(cumsum sequences)</procedure> |
86 | returns the cumulative sum of a sequence. |
87 | |
88 | ==== Descriptive statistics |
89 | |
90 | These functions provide information on a given list of numbers, the {{items}}. Note, the list does not have to be sorted. |
91 | |
92 | <procedure>(mean items)</procedure> |
93 | returns the arithmetic mean of the {{items}} (the sum of the numbers divided by the number of numbers). |
94 | (mean '(1 2 3 4 5)) => 3 |
95 | |
96 | <procedure>(median items)</procedure> |
97 | returns the value which separates the upper and lower halves of the list of numbers. |
98 | (median '(1 2 3 4)) => 5/2 |
99 | |
100 | <procedure>(mode items)</procedure> |
101 | returns two '''values'''. The first is a list of the ''modes'' and the second is the frequency. (A mode of a list of numbers is the most frequently occurring value.) |
102 | > (mode '(1 2 3 4)) |
103 | (1 2 3 4) |
104 | 1 |
105 | > (mode '(1 2 2 3 4)) |
106 | (2) |
107 | 2 |
108 | > (mode '(1 2 2 3 3 4)) |
109 | (2 3) |
110 | 2 |
111 | |
112 | <procedure>(geometric-mean items)</procedure> |
113 | returns the geometric mean of the {{items}} (the result of multiplying the items together and then taking the nth root, where n is the number of items). |
114 | (geometric-mean '(1 2 3 4 5)) => 2.60517108469735 |
115 | |
116 | <procedure>(range items)</procedure> |
117 | returns the difference between the biggest and the smallest value from the list of {{items}}. |
118 | (range '(5 1 2 3 4)) => 4 |
119 | |
120 | <procedure>(percentile items percent)</procedure> |
121 | returns the item closest to the {{percent}} value if the {{items}} are sorted into order; the returned item may be in the list, or the average of adjacent values. |
122 | (percentile '(1 2 3 4) 50) => 5/2 |
123 | (percentile '(1 2 3 4) 67) => 3 |
124 | |
125 | <procedure>(variance items)</procedure> |
126 | |
127 | <procedure>(standard-deviation items)</procedure> |
128 | |
129 | <procedure>(coefficient-of-variation items)</procedure> |
130 | returns 100 * (std-dev / mean) of the {{items}}. |
131 | (coefficient-of-variation '(1 2 3 4)) => 51.6397779494322 |
132 | |
133 | <procedure>(standard-error-of-the-mean items)</procedure> |
134 | returns std-dev / sqrt(length items). |
135 | (standard-error-of-the-mean '(1 2 3 4)) => 0.645497224367903 |
136 | |
137 | <procedure>(mean-sd-n items)</procedure> |
138 | returns three '''values''', one for the mean, one for the standard deviation, and one for the length of the list. |
139 | > (mean-sd-n '(1 2 3 4)) |
140 | 5/2 |
141 | 1.29099444873581 |
142 | 4 |
143 | |
144 | ==== Distributional functions |
145 | |
146 | <procedure>(binomial-probability n k p)</procedure> |
147 | returns the probability that the number of positive outcomes for a binomial distribution B(n, p) is k. |
148 | > (do-ec (: i 0 11) |
149 | (format #t "i = ~d P = ~f~&" i (binomial-probability 10 i 0.5))) |
150 | i = 0 P = 0.0009765625 |
151 | i = 1 P = 0.009765625 |
152 | i = 2 P = 0.0439453125 |
153 | i = 3 P = 0.1171875 |
154 | i = 4 P = 0.205078125 |
155 | i = 5 P = 0.24609375 |
156 | i = 6 P = 0.205078125 |
157 | i = 7 P = 0.1171875 |
158 | i = 8 P = 0.0439453125 |
159 | i = 9 P = 0.009765625 |
160 | i = 10 P = 0.0009765625 |
161 | |
162 | <procedure>(binomial-cumulative-probability n k p)</procedure> |
163 | returns the probability that less than {{k}} positive outcomes occur for a binomial distribution B(n, p). |
164 | > (do-ec (: i 0 11) |
165 | (format #t "i = ~d P = ~f~&" i (binomial-cumulative-probability 10 i 0.5))) |
166 | i = 0 P = 0.0 |
167 | i = 1 P = 0.0009765625 |
168 | i = 2 P = 0.0107421875 |
169 | i = 3 P = 0.0546875 |
170 | i = 4 P = 0.171875 |
171 | i = 5 P = 0.376953125 |
172 | i = 6 P = 0.623046875 |
173 | i = 7 P = 0.828125 |
174 | i = 8 P = 0.9453125 |
175 | i = 9 P = 0.9892578125 |
176 | i = 10 P = 0.9990234375 |
177 | |
178 | <procedure>(binomial-ge-probability n k p)</procedure> |
179 | returns the probability of {{k}} or more positive outcomes for a binomial distribution B(n, p). |
180 | |
181 | <procedure>(binomial-le-probability n k p)</procedure> |
182 | returns the probability {{k}} or fewer positive outcomes for a binomial distribution B(n, p). |
183 | |
184 | <procedure>(poisson-probability mu k)</procedure> |
185 | returns the probability of {{k}} events occurring when the average is {{mu}}. |
186 | > (do-ec (: i 0 20) |
187 | (format #t "P(X=~2d) = ~,4f~&" i (poisson-probability 10 i))) |
188 | P(X= 0) = 0.0000 |
189 | P(X= 1) = 0.0005 |
190 | P(X= 2) = 0.0023 |
191 | P(X= 3) = 0.0076 |
192 | P(X= 4) = 0.0189 |
193 | P(X= 5) = 0.0378 |
194 | P(X= 6) = 0.0631 |
195 | P(X= 7) = 0.0901 |
196 | P(X= 8) = 0.1126 |
197 | P(X= 9) = 0.1251 |
198 | P(X=10) = 0.1251 |
199 | P(X=11) = 0.1137 |
200 | P(X=12) = 0.0948 |
201 | P(X=13) = 0.0729 |
202 | P(X=14) = 0.0521 |
203 | P(X=15) = 0.0347 |
204 | P(X=16) = 0.0217 |
205 | P(X=17) = 0.0128 |
206 | P(X=18) = 0.0071 |
207 | P(X=19) = 0.0037 |
208 | |
209 | <procedure>(poisson-cumulative-probability mu k)</procedure> |
210 | returns the probability of less than {{k}} events occurring when the average is {{mu}}. |
211 | > (do-ec (: i 0 20) |
212 | (format #t "P(X=~2d) = ~,4f~&" i (poisson-cumulative-probability 10 i))) |
213 | P(X= 0) = 0.0000 |
214 | P(X= 1) = 0.0000 |
215 | P(X= 2) = 0.0005 |
216 | P(X= 3) = 0.0028 |
217 | P(X= 4) = 0.0103 |
218 | P(X= 5) = 0.0293 |
219 | P(X= 6) = 0.0671 |
220 | P(X= 7) = 0.1301 |
221 | P(X= 8) = 0.2202 |
222 | P(X= 9) = 0.3328 |
223 | P(X=10) = 0.4579 |
224 | P(X=11) = 0.5830 |
225 | P(X=12) = 0.6968 |
226 | P(X=13) = 0.7916 |
227 | P(X=14) = 0.8645 |
228 | P(X=15) = 0.9165 |
229 | P(X=16) = 0.9513 |
230 | P(X=17) = 0.9730 |
231 | P(X=18) = 0.9857 |
232 | P(X=19) = 0.9928 |
233 | |
234 | <procedure>(poisson-ge-probability mu k)</procedure> |
235 | returns the probability of {{k}} or more events occurring when the average is {{mu}}. |
236 | |
237 | <procedure>(normal-pdf x mean variance)</procedure> |
238 | returns the likelihood of {{x}} given a normal distribution with stated mean and variance. |
239 | > (do-ec (: i 0 11) |
240 | (format #t "~3d ~,4f~&" i (normal-pdf i 5 4))) |
241 | 0 0.0088 |
242 | 1 0.0270 |
243 | 2 0.0648 |
244 | 3 0.1210 |
245 | 4 0.1760 |
246 | 5 0.1995 |
247 | 6 0.1760 |
248 | 7 0.1210 |
249 | 8 0.0648 |
250 | 9 0.0270 |
251 | 10 0.0088 |
252 | |
253 | <procedure>(convert-to-standard-normal x mean variance)</procedure> |
254 | returns a value for {{x}} rescaling the given normal distribution to a standard N(0, 1). |
255 | > (convert-to-standard-normal 5 6 2) |
256 | -1/2 |
257 | |
258 | <procedure>(phi x)</procedure> |
259 | returns the cumulative distribution function (CDF) of the standard normal distribution. |
260 | > (do-ec (: x -2 2 0.4) |
261 | (format #t "~4,1f ~,4f~&" x (phi x))) |
262 | -2.0 0.0228 |
263 | -1.6 0.0548 |
264 | -1.2 0.1151 |
265 | -0.8 0.2119 |
266 | -0.4 0.3446 |
267 | 0.0 0.5000 |
268 | 0.4 0.6554 |
269 | 0.8 0.7881 |
270 | 1.2 0.8849 |
271 | 1.6 0.9452 |
272 | |
273 | <procedure>(z percentile)</procedure> |
274 | returns the inverse of the standard normal distribution. Input is a percentile, between 0.0 and 1.0. |
275 | |
276 | <procedure>( t-distribution degrees-of-freedom percentile)</procedure> |
277 | returns the point in the t-distribution given the {{degrees-of-freedom}} and {{percentile}}. {{degrees-of-freedom}} must be a positive integer, and {{percentile}} a value between 0.0 and 1.0. |
278 | |
279 | <procedure>(chi-square degrees-of-freedom percentile)</procedure> |
280 | returns the point at which chi-square distribution has {{percentile}} to its '''left''', using given {{degrees-of-freedom}}. |
281 | |
282 | <procedure>(chi-square-cdf x degrees-of-freedom)</procedure> |
283 | returns the probability that a random variable is to the '''left''' of {{x}} using the chi-square distribution with given {{degrees-of-freedom}}. |
284 | |
285 | ==== Confidence intervals |
286 | |
287 | These functions report bounds for an observed property of a distribution: the bounds are tighter as the confidence level, alpha, varies from 0.0 to 1.0. |
288 | |
289 | <procedure>(binomial-probability-ci n p alpha)</procedure> |
290 | returns two values, the upper and lower bounds on an observed probability {{p}} from {{n}} trials with confidence {{(1-alpha)}}. |
291 | > (binomial-probability-ci 10 0.8 0.9) |
292 | 0.724273681640625 |
293 | 0.851547241210938 |
294 | ; 2 values |
295 | |
296 | <procedure>(poisson-mu-ci k alpha)</procedure> |
297 | returns two values, the upper and lower bounds on the poisson parameter if {{k}} events are observed; the bound is for confidence {{(1-alpha)}}. |
298 | > (poisson-mu-ci 10 0.9) |
299 | 8.305419921875 |
300 | 10.0635986328125 |
301 | ; 2 values |
302 | |
303 | <procedure>(normal-mean-ci mean standard-deviation k alpha)</procedure> |
304 | returns two values, the upper and lower bounds on the mean of the normal distibution of {{k}} events are observed; the bound is for confidence {{(1-alpha)}}. |
305 | > (normal-mean-ci 0.5 0.1 10 0.8) |
306 | 0.491747852700165 |
307 | 0.508252147299835 |
308 | ; 2 values |
309 | |
310 | <procedure>(normal-mean-ci-on-sequence items alpha)</procedure> |
311 | returns two values, the upper and lower bounds on the mean of the given {{items}}, assuming they are normally distributed; the bound is for confidence {{(1-alpha)}}. |
312 | > (normal-mean-ci-on-sequence '(1 2 3 4 5) 0.9) |
313 | 2.40860081649174 |
314 | 3.59139918350826 |
315 | ; 2 values |
316 | |
317 | <procedure>(normal-variance-ci standard-deviation k alpha)</procedure> |
318 | returns two values, the upper and lower bounds on the variance of the normal distibution of {{k}} events are observed; the bound is for confidence {{(1-alpha)}}. |
319 | |
320 | <procedure>(normal-variance-ci-on-sequence items alpha)</procedure> |
321 | returns two values, the upper and lower bounds on the variance of the given {{items}}, assuming they are normally distributed; the bound is for confidence {{(1-alpha)}}. |
322 | |
323 | <procedure>normal-sd-ci standard-deviation k alpha)</procedure> |
324 | returns two values, the upper and lower bounds on the standard deviation of the normal distibution of {{k}} events are observed; the bound is for confidence {{(1-alpha)}}. |
325 | |
326 | <procedure>(normal-sd-ci-on-sequence sequence items)</procedure> |
327 | returns two values, the upper and lower bounds on the standard deviation of the given {{items}}, assuming they are normally distributed; the bound is for confidence {{(1-alpha)}}. |
328 | |
329 | ==== Hypothesis testing |
330 | |
331 | These functions report on the significance of an observed sample against a given distribution. |
332 | |
333 | ===== (parametric) |
334 | |
335 | <procedure>(z-test x-bar n #:mu #:sigma #:tails)</procedure> |
336 | Given {{x-bar}} the sample mean, {{n}} the number in the sample, {{#:mu}} the distribution mean (defaults to 0), {{#:sigma}} the distribution standard deviation (defaults to 1), and {{#:tails}} the significance to report on: |
337 | |
338 | * {{':both}}, the probability of the difference between {{x-bar}} and {{#:mu}} |
339 | * {{':positive}}, the probability that observation is {{>= x-bar}} |
340 | * {{':negative}}, the probability that observation is {{<= x-bar}} |
341 | |
342 | e.g. given a distribution with mean 50 and standard deviation 10 |
343 | |
344 | ; probability that a single observation is <= 40 |
345 | > (z-test 40 1 #:mu 50 #:sigma 10 #:tails ':negative) |
346 | 0.158655 |
347 | ; probability that 10 observations are <= 40 |
348 | > (z-test 40 10 #:mu 50 #:sigma 10 #:tails ':negative) |
349 | 0.000783 |
350 | ; probability that 5 observations give a mean of 40 |
351 | > (z-test 40 5 #:mu 50 #:sigma 10) |
352 | 0.025347 |
353 | |
354 | <procedure>(z-test-on-sequence observations #:mu #:sigma #:tails)</procedure> |
355 | As for {{z-test}} except {{x-bar}} and {{n}} are computed from given {{observations}}. |
356 | |
357 | <procedure>(t-test-one-sample x-bar sd n mu #:tails)</procedure> |
358 | Given observed data with mean {{x-bar}}, standard devation {{sd}} and number of observations {{n}} ({{n < 30}}), return the significance of the sample compared with the population mean {{mu}}. {{#:tails}} is one of: |
359 | |
360 | * {{':both}} two-sided (default) |
361 | * {{':positive}} one-sided, {{x-bar >= mu}} |
362 | * {{':negative}} one-sided, {{x-bar <= mu}} |
363 | |
364 | <procedure>(t-test-one-sample-on-sequence observations mu #:tails)</procedure> |
365 | As for {{t-test-one-sample}} except {{x-bar}}, {{sd}} and {{n}} are computed from given {{observations}}. |
366 | |
367 | <procedure>(t-test-paired t-bar sd n #:tails)</procedure> |
368 | Computes the significance of the differences between two sequences of data: the differences are given as their mean, {{t-bar}}, standard deviation, {{sd}}, and number of measurements, {{n}}. |
369 | |
370 | <procedure>(t-test-paired-on-sequences before after #:tails)</procedure> |
371 | Computes the significance of the difference between two sequences of data: one before an experimental change and one after. {{#:tails}} is as for {{t-significance}}. |
372 | |
373 | > (t-test-paired-on-sequences '(4 3 5) '(1 1 3)) |
374 | 0.0198039411803931 |
375 | |
376 | <procedure>(t-test-two-sample mean-1 sd-1 n-1 mean-2 sd-2 n-2 #:variances-equal? #:variance-significance-cutoff #:tails)</procedure> |
377 | Computes the significance of the difference of two means given the sample standard deviations and sizes. |
378 | |
379 | <procedure>(t-test-two-sample-on-sequences sequence-1 sequence-2 #:tails)</procedure> |
380 | Significance of difference of two sequences of observations. |
381 | |
382 | <procedure>(f-test variance-1 n1 variance-2 n2 #:tails)</procedure> |
383 | Tests for the equality of two variances. |
384 | |
385 | <procedure>(chi-square-test-one-sample observed-variance sample-size test-variance #:tails)</procedure> |
386 | Tests for significance of difference between an observed and a test variance. |
387 | |
388 | <procedure>(binomial-test-one-sample p-hat n p #:tails #:exact?)</procedure> |
389 | Returns the significance of a one sample test with {{n}} observations, observed probability {{p-hat}} and expected probability {{p}}. |
390 | |
391 | <procedure>(binomial-test-two-sample p-hat-1 n-1 p-hat-2 n-2 #:tails #:exact?)</procedure> |
392 | Returns the significance of a two sample test. |
393 | |
394 | <procedure>(fisher-exact-test a b c d #:tails)</procedure> |
395 | Given a 2x2 contingency table, returns a p value using Fisher's exact test. {{a}} and {{b}} form the first row of the contingency table, {{c}} and {{d}} the second row. |
396 | |
397 | <procedure>(mcnemars-test a-discordant-count b-discordant-count #:exact?)</procedure> |
398 | For measuring effectiveness of, e.g., one treatment over another. {{a-discordant-count}} is the number of times when A worked, {{b-discordant-count}} the number of times B worked. |
399 | |
400 | <procedure>(poisson-test-one-sample observed mu #:tails #:approximate?)</procedure> |
401 | Computes significance of the number of observed events under a Poisson distribution against {{mu}} expected events. |
402 | |
403 | ===== (non parametric) |
404 | |
405 | <procedure>(sign-test plus-count minus-count #:exact? #:tails)</procedure> |
406 | |
407 | <procedure>(sign-test-on-sequence sequence-1 sequence-2 #:exact? #:tails)</procedure> |
408 | Takes two equal-sized sequences of observations, and reports if the entries of one are different to those in the other. |
409 | |
410 | <procedure>(wilcoxon-signed-rank-test differences #:tails)</procedure> |
411 | Given at least 16 differences, reports if the positive differences are significantly larger or smaller than the negative differences. |
412 | |
413 | <procedure>(wilcoxon-signed-rank-test-on-sequences sequence-1 sequence-2 #:tails)</procedure> |
414 | Given two sequences of at least 16 observations, computes {{wilcoxon-signed-rank-test}} on the differences. |
415 | |
416 | <procedure>(chi-square-test-rxc contingency-table)</procedure> |
417 | Given a contingency table (a SRFI-63 array), returns significance of relation between row and column variable. |
418 | |
419 | <procedure>(chi-square-test-for-trend row1-counts row2-counts)</procedure> |
420 | Returns p significance of trend, and prints a string to show if increasing or decreasing. |
421 | |
422 | ==== Sample size estimates |
423 | |
424 | <procedure>(t-test-one-sample-sse mean-1 mean-2 sigma-1 #:alpha #:1-beta #:tails)</procedure> |
425 | Returns the size of sample necessary to distinguish a normally distributed sample with {{mean-2}} from a population {{mean-1}} standard deviation {{sigma-1}}. The significance {{#:alpha}} (defaults to 0.05), power {{#:1-beta}} (0.95) and sides {{#:tails}} (':both) may be altered. |
426 | > (t-test-one-sample-sse 5.0 5.2 0.5) |
427 | 163 |
428 | |
429 | <procedure>(t-test-two-sample-sse mean-1 sigma-1 mean-2 sigma-2 #:alpha #:1-beta #:tails #:sample-ratio)</procedure> |
430 | Returns the size of sample necessary to distinguish a normally distributed sample N(mean-1, sigma-1) from a normally distributed sample N(mean-2, sigma-2). The significance {{#:alpha}} (defaults to 0.05), power {{#:1-beta}} (0.95), sides {{#:tails}} (':both) and sample-ratio {{#:sample-ratio}} (1) may be altered. |
431 | |
432 | <procedure>(t-test-paired-sse difference-mean difference-sigma #:alpha #:1-beta #:tails)</procedure> |
433 | Returns the size of sample to produce a given mean and standard deviation in the differences of two samples. |
434 | |
435 | <procedure>(binomial-test-one-sample-sse p-estimated p-null #:alpha #:1-beta #:tails)</procedure> |
436 | Returns the size of sample needed to test whether an observed probability is significantly different from a particular binomial null hypothesis with a significance alpha and a power 1-beta. |
437 | |
438 | <procedure>(binomial-test-two-sample-sse p-one p-two #:alpha #:1-beta #:tails #:sample-ratio)</procedure> |
439 | Returns the size of sample needed to test if given two binomial probabilities are significantly different. {{#:sample-ratio}} can be given if the two samples differ in size. |
440 | |
441 | <procedure>(binomial-test-paired-sse pd pa #:alpha #:1-beta #:tails)</procedure> |
442 | Sample size estimate for McNemar's discordant pairs test. |
443 | |
444 | <procedure>(correlation-sse rho #:alpha #:1-beta)</procedure> |
445 | Returns the size of sample necessary to find a correlation of value {{rho}} with significance {{#:alpha}} (defaults to 0.05) and power {{#:1-beta}} (defaults to 0.95). |
446 | > (correlation-sse 0.80 #:alpha 0.05 #:1-beta 0.9) |
447 | 11 |
448 | |
449 | ==== Correlation and regression |
450 | |
451 | <procedure>(linear-regression xs ys)</procedure> |
452 | |
453 | Given a line definition as lists of point coordinates, first prints to |
454 | the terminal and then returns 5 '''values''' for the best fitting line |
455 | through the points: |
456 | |
457 | * the y-intercept |
458 | * the slope |
459 | * the correlation coefficient, r |
460 | * the square of the correlation coefficient, r^2 |
461 | * the significance of the difference of the slope from zero, p |
462 | |
463 | (This is also called the Pearson correlation; used when relation expected to be linear. Also see {{spearman-rank-correlation}}.) |
464 | |
465 | > (linear-regression '(1.0 2.0 3.0) '(0.1 0.3 0.8)) |
466 | Intercept = -0.3, slope = 0.35, r = 0.970725343394151, R^2 = 0.942307692307692, p = 0.154420958311267 |
467 | -0.3 |
468 | 0.35 |
469 | 0.970725343394151 |
470 | 0.942307692307692 |
471 | 0.154420958311267 |
472 | ; 5 values |
473 | |
474 | <procedure>(correlation-coefficient xs ys)</procedure> |
475 | As above, but only returns the value of ''r'': |
476 | |
477 | > (correlation-coefficient '(1.0 2.0 3.0) '(0.1 0.3 0.8)) |
478 | 0.970725343394151 |
479 | |
480 | <procedure>(correlation-test-two-sample r1 n1 r2 n2 #:tails)</procedure> |
481 | Returns the significance of the similarity between two correlations. {{#:tails}} determines how the comparison is made: {{':both}} measures the difference, {{':negative}} if {{r1 < r2}} and {{#':positive}} if {{r2 > r1}}. |
482 | |
483 | <procedure>(correlation-test-two-sample-on-sequences points-1 points-2 #:tails)</procedure> |
484 | As above, but computes the correlations from given lists of points. |
485 | |
486 | <procedure>(spearman-rank-correlation xs ys)</procedure> |
487 | Returns two '''values''', the Spearman Rank measure of correlation between the given lists of point coordinates, and the p-significance of the correlation. (This correlation is used for non-linear relations; compare with {{linear-regression}}.) |
488 | |
489 | ==== Significance test functions |
490 | |
491 | <procedure>(t-significance t-value degrees-of-freedom #:tails)</procedure> |
492 | returns the probability of {{t-value}} for given {{degrees-of-freedom}}. The keyword {{#:tails}} modifies the calculation to be two-sided (the default) with {{':both}}, or one-sided, {{':positive}} or {{':negative}}. |
493 | |
494 | > (t-significance 0.2 5) |
495 | 0.849360513995829 |
496 | > (t-significance 0.2 5 #:tails ':positive) |
497 | 0.424680256997915 |
498 | > (t-significance 0.2 5 #:tails ':negative) |
499 | 0.575319743002086 |
500 | |
501 | <procedure>(f-significance f-value numerator-dof denominator-dof #:one-tailed?)</procedure> |
502 | returns the probability of {{f-value}} for given {{numerator-dof}} and {{denominator-dof}}. The boolean keyword {{#:one-tailed?}} indicates if calculation is two-sided (the default) or not. |
503 | |
504 | > (f-significance 1.5 8 2) |
505 | 0.920449812578091 |
506 | > (f-significance 1.5 8 2 #:one-tailed? #t) |
507 | 0.460224906289046 |
508 | |
509 | |
510 | === Authors |
511 | |
512 | [[/users/peter-lane|Peter Lane]] wrote the Scheme version of this library. The original Lisp version was written by [[http://compbio.ucdenver.edu/hunter/|Larry Hunter]]. |
513 | |
514 | === License |
515 | |
516 | GPL version 3.0. |
517 | |
518 | === Version History |
519 | |
520 | * 0.12: removed GSL dependency |
521 | * 0.11: refactoring correlation and regression interface to take two separate dataset arguments |
522 | * 0.9: ported to CHICKEN 5 |
523 | * 0.8: added cumsum and random-weighted-sample |
524 | * 0.5: fixed warning in compilation (thanks to Felix for pointing it out) |
525 | * 0.4: all functions should now be working |
526 | * 0.3: some error fixes and addition of tests for majority of functions |
527 | * 0.2: fixed some errors in keywords and find-critical-value |
528 | * 0.1: initial package |
