## probdist

Probability density functions.

## Usage

- (require-extension sampled-pdf)
- (require-extension normal-pdf)

## Documentation

`probdist` is a library of routines to compute univariate or multivariate normal probability function for a given mean and covariance; and to approximate an arbitrary probability distribution as a weighted set of samples (sampled PDF).

This code is based on code in the `dysii` project by Lawrence Murray.

The sampled PDF algorithm is based on the paper by Kitagawa:

Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models_, Journal of Computational and Graphical Statistics, 1996, 5, 1-25.

### Normal PDF

(define-datatype normal-pdf normal-pdf? (NormalPDF S mu sigma R sigmaInv sigmaDet ZI mu-zero? ))

A representation of a normal PDF:

`S`- number of dimensions
`mu`- expectation
`sigma`- covariance
`R`- upper triangular Cholesky decomposition of covariance matrix
`sigmaInv`- inverse of covariance matrix
`sigmaDet`- determinant of covariance matrix
`ZI`- the constant factor
`(2*pi)^(-S/2) / prod(diag(R))` `mu-zero?`- true if expectation is zero

*[procedure]*

`make-normal-pdf:: S * MU * SIGMA -> NORMAL-PDF`

Creates a new normal PDF object with the specified dimension, expectation, and covariance. If the dimension `S` is 1, then the expectation `MU` and the covariance `SIGMA` must be scalars; otherwise, they must be SRFI-4 `f64` vectors. In the latter case, `MU` must be a vector of size `S`, and `SIGMA` must be a matrix of size `S x S`.

*[procedure]*

`normal-pdf:density:: NORMAL-PDF * X -> DENSITY`

Computes the density of the distribution at the given point `X`. If the dimension `S` of the distribution is 1, then `X` must be a scalar, otherwise it must be an SRFI-4 `f64` vector of length `S`.

*[procedure]*

`normal-pdf:sample:: NORMAL-PDF * X -> SAMPLE`

Sample from the distribution. Let `X` be a sample from a standard normal distribution. Then the sample is `SAMPLE = MU + R*X`, where `R` is the upper triangular Cholesky decomposition of the covariance matrix.

*[procedure]*

`normal-pdf:expectation:: NORMAL-PDF -> MU`

Return the expectation value of the distribution (scalar or f64vector).

*[procedure]*

`normal-pdf:covariance:: NORMAL-PDF -> SIGMA`

Return the covariance value of the distribution (scalar or f64vector).

### Sampled PDF

(define-datatype sampled-pdf sampled-pdf? (SampledPDF N W WXS MU SIGMA))

A representation of a weighted sampled PDF:

`N`- sample set size
`W`- total weight of the sample set
`wxs`- weighted samples (see below)
`mu`- expectation
`sigma`- covariance

(define-datatype weighted-samples weighted-samples? (WeightedSamples S senv make-senv))

A representation of weighted samples:

`S`- number of dimensions
`senv`- an ordered dictionary data structure that follows the API of rb-tree
- make-senv
- a procedure to create an empty samples dictionary

*[procedure]*

`make-sampled-pdf:: S * MAKE-SENV * XS [* WS * XCAR * XCDR * XNULL?] -> SAMPLED-PDF`

Creates a new sampled PDF object with the specified dimension, samples environment creation procedure that follows the API of rb-tree, and weighted samples. The samples can be specified in one of two ways. If both `XS` and `WS` are provided, then `XS` is a sequence that contains the samples, and `WS` is a sequence that contains the corresponding weights. If only `XS` is provided, or `WS` is false, then `XS` must consist of cons cells, where the car is the weight, and the cdr is the sample. Optional arguments `XCAR XCDR XNULL?` could be used for sequences other than lists (e.g. streams).

*[procedure]*

`sampled-pdf:expectation:: SAMPLED-PDF -> MU`

Return the expectation value of the distribution (scalar or f64vector).

*[procedure]*

`sampled-pdf:covariance:: SAMPLED-PDF -> SIGMA`

Return the covariance value of the distribution (scalar or f64vector).

*[procedure]*

`sampled-pdf:find-sample:: U * SAMPLED-PDF -> X`

Given a number `u in [0,1]`, returns the sample `x(i)` such that `\sum_{j=1}^{i-1} w_j < Wu <= \sum_{j=1}^{i} w_j`, where `W` is the total weight of the sample set.

*[procedure]*

`sampled-pdf:normalize:: SAMPLED-PDF -> SAMPLED-PDF`

Normalizes the sample weights to sum to 1.

*[procedure]*

`sampled-pdf:resample:: MAKE-SENV * SAMPLED-PDF [ * M] -> SAMPLED-PDF`

Resamples the distribution. This produces a new approximation of the same distribution using a set of equally weighted sample points. Sample points are selected using the deterministic resampling method given in the appendix of Kitagawa. Optional argument `M` is the number of samples to take for the new distribution. If not given, defaults to the number of samples in the existing distribution.

## Examples

;; ;; Multivariate test of normal-pdf. ;; ;; This test creates a 3-dimensional normal distribution with a ;; random mean and variance. It then samples from this distribution ;; and calculates the sample mean and variance for comparison. ;; (require-extension srfi-1) (require-extension srfi-4) (require-extension srfi-4-utils) (require-extension matrix-utils) (require-extension blas) (require-extension normal-pdf) (require-extension random-mtzig) (define-utility-matrices f64) ;; Number of samples (define N 100000) ;; Number of dimensions (sample size) (define S 3) (define (f64vector-scale a x) (let ((n (f64vector-length x))) (dscal n a x))) (define (f64vector-sum x y) (let ((n (f64vector-length x))) (daxpy n 1.0 x y))) (define (f64vector-sub x y) (let ((n (f64vector-length x))) (daxpy n -1.0 y x))) (define (f64vector-mul a b) (f64vector-map (lambda (v1 v2) (* v1 v2)) a b)) (define f64matrix-map (make-matrix-map f64vector-ref f64vector-set!)) (define (upper M N A) (f64matrix-map RowMajor M N A (lambda (i j v) (if (>= j i) v 0.0)))) ;; ;; Computes the arithmetic mean of a list of f64 vectors using the ;; recurrence relation mean_(n) = mean(n-1) + (v[n] - ;; mean(n-1))/(n+1) ;; (define (mean s vs) (let loop ((i 0) (vs vs) (mean (make-f64vector s 0.0))) (if (null? vs) mean (loop (fx+ 1 i) (cdr vs) (let ((d (f64vector-sub (car vs) mean))) (f64vector-sum mean (f64vector-scale (/ 1 (+ 1 i)) d))))))) (define (covariance s vs mean) (let* ((cov (matrix-zeros S S))) (let ((ds (map (lambda (x) (f64vector-sub x mean)) vs))) (let ((n (fold (lambda (d n) (dsyr! RowMajor Upper s 1.0 d cov) (fx+ 1 n)) 0 ds))) (f64vector-scale (/ 1 (- n 1)) cov))))) ;; Given a vector of values in the range [0..1], scale all values in ;; the vector to the specified range. (define (f64vector-urange x lo hi) (if (< lo hi) (let ((d (- hi lo))) (f64vector-map (lambda (x) (+ lo (* d x))) x)))) (define rng (random-mtzig:init)) ;; distribution parameters (define mu (f64vector-urange (random-mtzig:f64vector-randu! S rng) -5 5)) (define sigma (let ((x (f64vector-urange (random-mtzig:f64vector-randu! (* S S) rng) 0 5))) (dgemm! RowMajor NoTrans Trans S S S 1.0 x x 0.0 (make-f64vector (* S S))))) (define gpdf (begin (print "mu = " mu) (print "sigma = " sigma) (make-normal-pdf S mu sigma))) (define input (list-tabulate N (lambda (i) (random-mtzig:f64vector-randn! S rng)))) (define samples (let loop ((i 0) (input input) (samples (list))) (if (null? input) samples (let ((samples (cons (normal-pdf:sample gpdf (car input)) samples))) (loop (fx+ 1 i) (cdr input) samples))))) (if (<= N 100) (print "input = " input)) (if (<= N 100) (print "samples = " samples)) ;; test for mean and covariance equality to one decimal place (equal? (f64vector-map (lambda (x) (truncate (* 10 x))) mu) (f64vector-map (lambda (x) (truncate (* 10 x))) (mean S samples))) (equal? (f64vector-map (lambda (x) (truncate (* 10 x))) (upper S S sigma)) (f64vector-map (lambda (x) (truncate (* 10 x))) (covariance S samples (mean S samples))))

## About this egg

### Author

### Version history

- 1.7
- Updated use of blas
- 1.6
- Documentation converted to wiki format
- 1.5
- Ported to Chicken 4.
- 1.4
- Added some error checking in make-normal-pdf and a record printer for the normal-pdf datatype.
- 1.3
- Build script updated for better cross-platform compatibility
- 1.2
- More metafile fixes
- 1.1
- Metafile fixes
- 1.0
- Initial release

### License

Copyright 2007-2011 Ivan Raikov and the Okinawa Institute of Science and Technology. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A full copy of the GPL license can be found at <http://www.gnu.org/licenses/>.