You are looking at historical revision 20604 of this page. It may differ significantly from its current revision.

Still under testing!

## Introduction

This library is a port of Larry Hunter's Lisp statistics library to chicken scheme.

The library provides a number of formulae and methods taken from the book "Fundamentals of Biostatistics" by Bernard Rosner (5th edition).

### Statistical Distributions

To use this library, you need to understand the underlying statistics. In brief:

The Binomial distribution is used when counting discrete events in a series of trials, each of which events has a probability p of producing a positive outcome. An example would be tossing a coin `n` times: the probability of a head is `p`, and the distribution gives the expected number of heads in the `n` trials. The binomial distribution is defined as B(n, p).

The Poisson distribution is used to count discrete events which occur with a known average rate. A typical example is the decay of radioactive elements. A poisson distribution is defined Pois(mu).

The Normal distribution is used for real-valued events which cluster around a specific mean with a symmetric variance. A typical example would be the distribution of people's heights. A normal distribution is defined N(mean, variance).

## Provided Functions

### Utilities

*[procedure]*

`(average-rank value sorted-values)`

returns the average position of given value in the list of sorted values: the rank is based from 1.

> (average-rank 2 '(1 2 2 3 4)) 5/2

*[procedure]*

`(beta-incomplete x a b)`

*[procedure]*

`(bin-and-count items n)`

Divides the range of the list of `items` into `n` bins, and returns a vector of the number of items which fall into each bin.

> (bin-and-count '(1 1 2 3 3 4 5) 5) #(2 1 2 1 1)

*[procedure]*

`(combinations n k)`

returns the number of ways to select `k` items from `n`, where the order does not matter.

*[procedure]*

`(factorial n)`

returns the factorial of `n`.

*[procedure]*

`(find-critical-value p-function p-value)`

*[procedure]*

`(fisher-z-transform r)`

returns the transformation of a correlation coefficient `r` into an approximately normal distribution.

*[procedure]*

`(gamma-incomplete a x)`

*[procedure]*

`(gamma-ln x)`

*[procedure]*

`(permutations n k)`

returns the number of ways to select `k` items from `n`, where the order does matter.

*[procedure]*

`(random-normal mean sd)`

returns a random number distributed with specified mean and standard deviation.

*[procedure]*

`(random-pick items)`

returns a random item from the given list of items.

*[procedure]*

`(random-sample n items)`

returns a random sample from the list of items without replacement of size `n`.

*[procedure]*

`(sign n)`

returns 0, 1 or -1 according to if `n` is zero, positive or negative.

*[procedure]*

`(square n)`

### Descriptive statistics

These functions provide information on a given list of numbers, the `items`. Note, the list does not have to be sorted.

*[procedure]*

`(mean items)`

returns the arithmetic mean of the `items` (the sum of the numbers divided by the number of numbers).

(mean '(1 2 3 4 5)) => 3

*[procedure]*

`(median items)`

returns the value which separates the upper and lower halves of the list of numbers.

(median '(1 2 3 4)) => 5/2

*[procedure]*

`(mode items)`

returns two **values**. The first is a list of the *modes* and the second is the frequency. (A mode of a list of numbers is the most frequently occurring value.)

> (mode '(1 2 3 4)) (1 2 3 4) 1 > (mode '(1 2 2 3 4)) (2) 2 > (mode '(1 2 2 3 3 4)) (2 3) 2

*[procedure]*

`(geometric-mean items)`

returns the geometric mean of the `items` (the result of multiplying the items together and then taking the nth root, where n is the number of items).

(geometric-mean '(1 2 3 4 5)) => 2.60517108469735

*[procedure]*

`(range items)`

returns the difference between the biggest and the smallest value from the list of `items`.

(range '(5 1 2 3 4)) => 4

*[procedure]*

`(percentile items percent)`

returns the item closest to the `percent` value if the `items` are sorted into order; the returned item may be in the list, or the average of adjacent values.

(percentile '(1 2 3 4) 50) => 5/2 (percentile '(1 2 3 4) 67) => 3

*[procedure]*

`(variance items)`

*[procedure]*

`(standard-deviation items)`

*[procedure]*

`(coefficient-of-variation items)`

returns 100 * (std-dev / mean) of the `items`.

(coefficient-of-variation '(1 2 3 4)) => 51.6397779494322

*[procedure]*

`(standard-error-of-the-mean items)`

returns std-dev / sqrt(length items).

(standard-error-of-the-mean '(1 2 3 4)) => 0.645497224367903

*[procedure]*

`(mean-sd-n items)`

returns three **values**, one for the mean, one for the standard deviation, and one for the length of the list.

> (mean-sd-n '(1 2 3 4)) 5/2 1.29099444873581 4

### Distributional functions

*[procedure]*

`(binomial-probability n k p)`

returns the probability that the number of positive outcomes for a binomial distribution B(n, p) is k.

> (do-ec (: i 0 11) (format #t "i = ~d P = ~f~&" i (binomial-probability 10 i 0.5))) i = 0 P = 0.0009765625 i = 1 P = 0.009765625 i = 2 P = 0.0439453125 i = 3 P = 0.1171875 i = 4 P = 0.205078125 i = 5 P = 0.24609375 i = 6 P = 0.205078125 i = 7 P = 0.1171875 i = 8 P = 0.0439453125 i = 9 P = 0.009765625 i = 10 P = 0.0009765625

*[procedure]*

`(binomial-cumulative-probability n k p)`

returns the probability that less than `k` positive outcomes occur for a binomial distribution B(n, p).

> (do-ec (: i 0 11) (format #t "i = ~d P = ~f~&" i (binomial-cumulative-probability 10 i 0.5))) i = 0 P = 0.0 i = 1 P = 0.0009765625 i = 2 P = 0.0107421875 i = 3 P = 0.0546875 i = 4 P = 0.171875 i = 5 P = 0.376953125 i = 6 P = 0.623046875 i = 7 P = 0.828125 i = 8 P = 0.9453125 i = 9 P = 0.9892578125 i = 10 P = 0.9990234375

*[procedure]*

`(binomial-ge-probability n k p)`

returns the probability of `k` or more positive outcomes for a binomial distribution B(n, p).

*[procedure]*

`(binomial-le-probability n k p)`

returns the probability `k` or fewer positive outcomes for a binomial distribution B(n, p).

*[procedure]*

`(poisson-probability mu k)`

returns the probability of `k` events occurring when the average is `mu`.

> (do-ec (: i 0 20) (format #t "P(X=~2d) = ~,4f~&" i (poisson-probability 10 i))) P(X= 0) = 0.0000 P(X= 1) = 0.0005 P(X= 2) = 0.0023 P(X= 3) = 0.0076 P(X= 4) = 0.0189 P(X= 5) = 0.0378 P(X= 6) = 0.0631 P(X= 7) = 0.0901 P(X= 8) = 0.1126 P(X= 9) = 0.1251 P(X=10) = 0.1251 P(X=11) = 0.1137 P(X=12) = 0.0948 P(X=13) = 0.0729 P(X=14) = 0.0521 P(X=15) = 0.0347 P(X=16) = 0.0217 P(X=17) = 0.0128 P(X=18) = 0.0071 P(X=19) = 0.0037

*[procedure]*

`(poisson-cumulative-probability mu k)`

returns the probability of less than `k` events occurring when the average is `mu`.

> (do-ec (: i 0 20) (format #t "P(X=~2d) = ~,4f~&" i (poisson-cumulative-probability 10 i))) P(X= 0) = 0.0000 P(X= 1) = 0.0000 P(X= 2) = 0.0005 P(X= 3) = 0.0028 P(X= 4) = 0.0103 P(X= 5) = 0.0293 P(X= 6) = 0.0671 P(X= 7) = 0.1301 P(X= 8) = 0.2202 P(X= 9) = 0.3328 P(X=10) = 0.4579 P(X=11) = 0.5830 P(X=12) = 0.6968 P(X=13) = 0.7916 P(X=14) = 0.8645 P(X=15) = 0.9165 P(X=16) = 0.9513 P(X=17) = 0.9730 P(X=18) = 0.9857 P(X=19) = 0.9928

*[procedure]*

`(poisson-ge-probability mu k)`

returns the probability of `k` or more events occurring when the average is `mu`.

*[procedure]*

`(normal-pdf x mean variance)`

returns the likelihood of `x` given a normal distribution with stated mean and variance.

> (do-ec (: i 0 11) (format #t "~3d ~,4f~&" i (normal-pdf i 5 4))) 0 0.0088 1 0.0270 2 0.0648 3 0.1210 4 0.1760 5 0.1995 6 0.1760 7 0.1210 8 0.0648 9 0.0270 10 0.0088

*[procedure]*

`(convert-to-standard-normal x mean variance)`

returns a value for `x` rescaling the given normal distribution to a standard N(0, 1).

> (convert-to-standard-normal 5 6 2) -1/2

*[procedure]*

`(phi x)`

returns the cumulative distribution function (CDF) of the standard normal distribution.

> (do-ec (: x -2 2 0.4) (format #t "~4,1f ~,4f~&" x (phi x))) -2.0 0.0228 -1.6 0.0548 -1.2 0.1151 -0.8 0.2119 -0.4 0.3446 0.0 0.5000 0.4 0.6554 0.8 0.7881 1.2 0.8849 1.6 0.9452

- z
- t-distribution
- chi-square
- chi-square-cdf

### Confidence intervals

These functions report bounds for an observed property of a distribution: the bounds are tighter as the confidence level, alpha, varies from 0.0 to 1.0.

*[procedure]*

`(binomial-probability-ci n p alpha)`

returns two values, the upper and lower bounds on an observed probability `p` from `n` trials with confidence `(1-alpha)`.

> (binomial-probability-ci 10 0.8 0.9) 0.724273681640625 0.851547241210938 ; 2 values

*[procedure]*

`(poisson-mu-ci k alpha)`

returns two values, the upper and lower bounds on the poisson parameter if `k` events are observed; the bound is for confidence `(1-alpha)`.

> (poisson-mu-ci 10 0.9) 8.305419921875 10.0635986328125 ; 2 values

*[procedure]*

`(normal-mean-ci mean standard-deviation k alpha)`

returns two values, the upper and lower bounds on the mean of the normal distibution of `k` events are observed; the bound is for confidence `(1-alpha)`.

> (normal-mean-ci 0.5 0.1 10 0.8) 0.472063716520217 0.527936283479783 ; 2 values

*[procedure]*

`(normal-mean-ci-on-sequence items alpha)`

returns two values, the upper and lower bounds on the mean of the given `items`, assuming they are normally distributed; the bound is for confidence `(1-alpha)`.

> (normal-mean-ci-on-sequence '(1 2 3 4 5) 0.9) 2.40860081649174 3.59139918350826 ; 2 values

*[procedure]*

`(normal-variance-ci standard-deviation k alpha)`

returns two values, the upper and lower bounds on the variance of the normal distibution of `k` events are observed; the bound is for confidence `(1-alpha)`.

*[procedure]*

`(normal-variance-ci-on-sequence items alpha)`

returns two values, the upper and lower bounds on the variance of the given `items`, assuming they are normally distributed; the bound is for confidence `(1-alpha)`.

*[procedure]*

`normal-sd-ci standard-deviation k alpha)`

returns two values, the upper and lower bounds on the standard deviation of the normal distibution of `k` events are observed; the bound is for confidence `(1-alpha)`.

*[procedure]*

`(normal-sd-ci-on-sequence sequence items)`

returns two values, the upper and lower bounds on the standard deviation of the given `items`, assuming they are normally distributed; the bound is for confidence `(1-alpha)`.

### Hypothesis testing

#### (parametric)

- z-test
- z-test-on-sequence
- t-test-one-sample
- t-test-one-sample-on-sequence
- t-test-paired
- t-test-paired-on-sequences
- t-test-two-sample
- t-test-two-sample-on-sequences
- f-test
- chi-square-test-one-sample
- binomial-test-one-sample
- binomial-test-two-sample
- fisher-exact-test
- mcnemars-test
- poisson-test-one-sample

#### (non parametric)

- sign-test
- sign-test-on-sequence
- wilcoxon-signed-rank-test
- wilcoxon-signed-rank-test-on-sequences
- chi-square-test-rxc
- chi-square-test-for-trend

### Sample size estimates

- t-test-one-sample-sse
- t-test-two-sample-sse
- t-test-paired-sse
- binomial-test-one-sample-sse
- binomial-test-two-sample-sse
- binomial-test-paired-sse
- correlation-sse

### Correlation and regression

- linear-regression
- correlation-coefficient
- correlation-test-two-sample
- correlation-test-two-sample-on-sequences
- spearman-rank-correlation

### Significance test functions

- t-significance
- f-significance

## Authors

Peter Lane wrote the scheme version of this library. The original Lisp version was written by Larry Hunter.

## License

GPL version 3.0.

## Requirements

Needs srfi-1, srfi-25, srfi-69, vector-lib, numbers, extras, foreign, format

Uses the GNU scientific library for basic numeric processing, so requires libgsl, libgslcblas and the development files for libgsl.

## Version History

trunk, for testing