You are looking at historical revision 20623 of this page. It may differ significantly from its current revision.

Statistics

Still under testing! (But the documented procedures below should all work.)

This library is a port of Larry Hunter's Lisp statistics library to chicken scheme.

The library provides a number of formulae and methods taken from the book "Fundamentals of Biostatistics" by Bernard Rosner (5th edition).

Statistical Distributions

To use this library, you need to understand the underlying statistics. In brief:

The Binomial distribution is used when counting discrete events in a series of trials, each of which events has a probability p of producing a positive outcome. An example would be tossing a coin n times: the probability of a head is p, and the distribution gives the expected number of heads in the n trials. The binomial distribution is defined as B(n, p).

The Poisson distribution is used to count discrete events which occur with a known average rate. A typical example is the decay of radioactive elements. A poisson distribution is defined Pois(mu).

The Normal distribution is used for real-valued events which cluster around a specific mean with a symmetric variance. A typical example would be the distribution of people's heights. A normal distribution is defined N(mean, variance).

Provided Functions

Utilities

[procedure] (average-rank value sorted-values)

returns the average position of given value in the list of sorted values: the rank is based from 1.

> (average-rank 2 '(1 2 2 3 4))
5/2
[procedure] (beta-incomplete x a b)
[procedure] (bin-and-count items n)

Divides the range of the list of items into n bins, and returns a vector of the number of items which fall into each bin.

> (bin-and-count '(1 1 2 3 3 4 5) 5)
#(2 1 2 1 1)
[procedure] (combinations n k)

returns the number of ways to select k items from n, where the order does not matter.

[procedure] (factorial n)

returns the factorial of n.

[procedure] (find-critical-value p-function p-value)
[procedure] (fisher-z-transform r)

returns the transformation of a correlation coefficient r into an approximately normal distribution.

[procedure] (gamma-incomplete a x)
[procedure] (gamma-ln x)
[procedure] (permutations n k)

returns the number of ways to select k items from n, where the order does matter.

[procedure] (random-normal mean sd)

returns a random number distributed with specified mean and standard deviation.

[procedure] (random-pick items)

returns a random item from the given list of items.

[procedure] (random-sample n items)

returns a random sample from the list of items without replacement of size n.

[procedure] (sign n)

returns 0, 1 or -1 according to if n is zero, positive or negative.

[procedure] (square n)

Descriptive statistics

These functions provide information on a given list of numbers, the items. Note, the list does not have to be sorted.

[procedure] (mean items)

returns the arithmetic mean of the items (the sum of the numbers divided by the number of numbers).

(mean '(1 2 3 4 5)) => 3
[procedure] (median items)

returns the value which separates the upper and lower halves of the list of numbers.

(median '(1 2 3 4)) => 5/2
[procedure] (mode items)

returns two values. The first is a list of the modes and the second is the frequency. (A mode of a list of numbers is the most frequently occurring value.)

> (mode '(1 2 3 4))
(1 2 3 4)
1
> (mode '(1 2 2 3 4))
(2)
2
> (mode '(1 2 2 3 3 4))
(2 3)
2
[procedure] (geometric-mean items)

returns the geometric mean of the items (the result of multiplying the items together and then taking the nth root, where n is the number of items).

(geometric-mean '(1 2 3 4 5)) => 2.60517108469735
[procedure] (range items)

returns the difference between the biggest and the smallest value from the list of items.

(range '(5 1 2 3 4)) => 4
[procedure] (percentile items percent)

returns the item closest to the percent value if the items are sorted into order; the returned item may be in the list, or the average of adjacent values.

(percentile '(1 2 3 4) 50) => 5/2
(percentile '(1 2 3 4) 67) => 3
[procedure] (variance items)
[procedure] (standard-deviation items)
[procedure] (coefficient-of-variation items)

returns 100 * (std-dev / mean) of the items.

(coefficient-of-variation '(1 2 3 4)) => 51.6397779494322
[procedure] (standard-error-of-the-mean items)

returns std-dev / sqrt(length items).

 (standard-error-of-the-mean '(1 2 3 4)) => 0.645497224367903
[procedure] (mean-sd-n items)

returns three values, one for the mean, one for the standard deviation, and one for the length of the list.

> (mean-sd-n '(1 2 3 4))
5/2
1.29099444873581
4

Distributional functions

[procedure] (binomial-probability n k p)

returns the probability that the number of positive outcomes for a binomial distribution B(n, p) is k.

> (do-ec (: i 0 11) 
         (format #t "i = ~d P = ~f~&" i (binomial-probability 10 i 0.5)))
i = 0 P = 0.0009765625
i = 1 P = 0.009765625
i = 2 P = 0.0439453125
i = 3 P = 0.1171875
i = 4 P = 0.205078125
i = 5 P = 0.24609375
i = 6 P = 0.205078125
i = 7 P = 0.1171875
i = 8 P = 0.0439453125
i = 9 P = 0.009765625
i = 10 P = 0.0009765625
[procedure] (binomial-cumulative-probability n k p)

returns the probability that less than k positive outcomes occur for a binomial distribution B(n, p).

> (do-ec (: i 0 11) 
         (format #t "i = ~d P = ~f~&" i (binomial-cumulative-probability 10 i 0.5)))
i = 0 P = 0.0
i = 1 P = 0.0009765625
i = 2 P = 0.0107421875
i = 3 P = 0.0546875
i = 4 P = 0.171875
i = 5 P = 0.376953125
i = 6 P = 0.623046875
i = 7 P = 0.828125
i = 8 P = 0.9453125
i = 9 P = 0.9892578125
i = 10 P = 0.9990234375
[procedure] (binomial-ge-probability n k p)

returns the probability of k or more positive outcomes for a binomial distribution B(n, p).

[procedure] (binomial-le-probability n k p)

returns the probability k or fewer positive outcomes for a binomial distribution B(n, p).

[procedure] (poisson-probability mu k)

returns the probability of k events occurring when the average is mu.

> (do-ec (: i 0 20) 
         (format #t "P(X=~2d) = ~,4f~&" i (poisson-probability 10 i)))
P(X= 0) = 0.0000
P(X= 1) = 0.0005
P(X= 2) = 0.0023
P(X= 3) = 0.0076
P(X= 4) = 0.0189
P(X= 5) = 0.0378
P(X= 6) = 0.0631
P(X= 7) = 0.0901
P(X= 8) = 0.1126
P(X= 9) = 0.1251
P(X=10) = 0.1251
P(X=11) = 0.1137
P(X=12) = 0.0948
P(X=13) = 0.0729
P(X=14) = 0.0521
P(X=15) = 0.0347
P(X=16) = 0.0217
P(X=17) = 0.0128 
P(X=18) = 0.0071
P(X=19) = 0.0037
[procedure] (poisson-cumulative-probability mu k)

returns the probability of less than k events occurring when the average is mu.

> (do-ec (: i 0 20) 
         (format #t "P(X=~2d) = ~,4f~&" i (poisson-cumulative-probability 10 i)))
P(X= 0) = 0.0000
P(X= 1) = 0.0000
P(X= 2) = 0.0005
P(X= 3) = 0.0028
P(X= 4) = 0.0103
P(X= 5) = 0.0293
P(X= 6) = 0.0671
P(X= 7) = 0.1301
P(X= 8) = 0.2202
P(X= 9) = 0.3328
P(X=10) = 0.4579
P(X=11) = 0.5830
P(X=12) = 0.6968
P(X=13) = 0.7916
P(X=14) = 0.8645
P(X=15) = 0.9165
P(X=16) = 0.9513
P(X=17) = 0.9730
P(X=18) = 0.9857
P(X=19) = 0.9928
[procedure] (poisson-ge-probability mu k)

returns the probability of k or more events occurring when the average is mu.

[procedure] (normal-pdf x mean variance)

returns the likelihood of x given a normal distribution with stated mean and variance.

> (do-ec (: i 0 11) 
         (format #t "~3d ~,4f~&" i (normal-pdf i 5 4)))
 0 0.0088
 1 0.0270
 2 0.0648
 3 0.1210
 4 0.1760
 5 0.1995
 6 0.1760
 7 0.1210
 8 0.0648
 9 0.0270
10 0.0088
[procedure] (convert-to-standard-normal x mean variance)

returns a value for x rescaling the given normal distribution to a standard N(0, 1).

> (convert-to-standard-normal 5 6 2)
-1/2
[procedure] (phi x)

returns the cumulative distribution function (CDF) of the standard normal distribution.

> (do-ec (: x -2 2 0.4)
         (format #t "~4,1f ~,4f~&" x (phi x)))
-2.0 0.0228
-1.6 0.0548
-1.2 0.1151
-0.8 0.2119
-0.4 0.3446
 0.0 0.5000
 0.4 0.6554
 0.8 0.7881
 1.2 0.8849
 1.6 0.9452

Confidence intervals

These functions report bounds for an observed property of a distribution: the bounds are tighter as the confidence level, alpha, varies from 0.0 to 1.0.

[procedure] (binomial-probability-ci n p alpha)

returns two values, the upper and lower bounds on an observed probability p from n trials with confidence (1-alpha).

> (binomial-probability-ci 10 0.8 0.9)
0.724273681640625 
0.851547241210938
; 2 values
[procedure] (poisson-mu-ci k alpha)

returns two values, the upper and lower bounds on the poisson parameter if k events are observed; the bound is for confidence (1-alpha).

> (poisson-mu-ci 10 0.9)
8.305419921875
10.0635986328125
; 2 values
[procedure] (normal-mean-ci mean standard-deviation k alpha)

returns two values, the upper and lower bounds on the mean of the normal distibution of k events are observed; the bound is for confidence (1-alpha).

> (normal-mean-ci 0.5 0.1 10 0.8)
0.472063716520217
0.527936283479783
; 2 values
[procedure] (normal-mean-ci-on-sequence items alpha)

returns two values, the upper and lower bounds on the mean of the given items, assuming they are normally distributed; the bound is for confidence (1-alpha).

> (normal-mean-ci-on-sequence '(1 2 3 4 5) 0.9)
2.40860081649174
3.59139918350826
; 2 values
[procedure] (normal-variance-ci standard-deviation k alpha)

returns two values, the upper and lower bounds on the variance of the normal distibution of k events are observed; the bound is for confidence (1-alpha).

[procedure] (normal-variance-ci-on-sequence items alpha)

returns two values, the upper and lower bounds on the variance of the given items, assuming they are normally distributed; the bound is for confidence (1-alpha).

[procedure] normal-sd-ci standard-deviation k alpha)

returns two values, the upper and lower bounds on the standard deviation of the normal distibution of k events are observed; the bound is for confidence (1-alpha).

[procedure] (normal-sd-ci-on-sequence sequence items)

returns two values, the upper and lower bounds on the standard deviation of the given items, assuming they are normally distributed; the bound is for confidence (1-alpha).

Hypothesis testing

(parametric)
(non parametric)

Sample size estimates

Correlation and regression

Significance test functions

Authors

Peter Lane wrote the scheme version of this library. The original Lisp version was written by Larry Hunter.

License

GPL version 3.0.

Requirements

Needs srfi-1, srfi-25, srfi-69, vector-lib, numbers, extras, foreign, format

Uses the GNU scientific library for basic numeric processing, so requires libgsl, libgslcblas and the development files for libgsl.

Version History