You are looking at historical revision 21883 of this page. It may differ significantly from its current revision.

Dataset Utilities

A set of routines to load and manage datasets for machine learning / data mining tasks.

A dataset is a table:

Outlook Temperature Humidity Windy Plays
sunny hot high false no
sunny hot high true no

Each column in the table is an attribute, and each row is an instance. Instances have values for each attribute. The whole table is called a relation, and can be given a name.

Exported Procedures

Creating datasets

[procedure] (make-nominal-attribute name value-1 ...)

Creates a nominal attribute with given values, e.g.:

> (make-nominal-attribute 'outlook 'sunny 'overcast 'rainy)
[procedure] (make-numeric-attribute name)

Creates a numeric attribute, e.g.:

> (make-numeric-attribute 'temperature)
[procedure] (make-relation name attributes data)

Creates a relation with given name. The attributes must be a list of attribute instances, and the data are a list of lists: each sublist representing an instance, and giving the value for that instance of every attribute.

> (make-relation 'plays-tennis
                  (list (make-nominal-attribute 'outlook 'sunny 'overcast 'rainy)
                        (make-nominal-attribute 'temperature 'hot 'mild 'cool)
                        (make-nominal-attribute 'humidity 'high 'normal)
                        (make-nominal-attribute 'windy 'true 'false)
                        (make-nominal-attribute 'plays 'yes 'no))
                  '((sunny hot high false no)
                    (sunny hot high true no)
                    (overcast hot high false yes)
                    (rainy mild high true no)))

Managing datasets

[procedure] (attribute-name attribute)

Returns the name of given attribute.

[procedure] (attribute-definition attribute)

Returns a definition of the type of given attribute. This definition will be one of:

[procedure] (relation-name relation)

Returns the name of given relation.

[procedure] (relation-attributes relation)

Returns a list of attributes for given relation.

[procedure] (relation-data relation)

Returns a list of the instances in the given relation.

[procedure] (get-attribute-values relation attribute-name)

Returns the values taken by instances in relation for given attribute name.

[procedure] (entropy relation attribute-name)

Computes entropy of given relation, using attribute-name to divide the relation into groups. attribute-name should be a nominal attribute.

[procedure] (filter-instances relation attribute-name value)

Returns a new relation containing those instances of relation which have the given value for attribute-name.

[procedure] (find-attribute-index relation attribute-name)

Returns the index number of given attribute name in relation.

[procedure] (split-instances relation attribute-name)

Given a nominal attribute, returns a list of relations, each representing instances in relation with the same value for given attribute-name.

Importing Data

[procedure] (read-arff filename)

Reads an ARFF definition from given filename, and returns a relation. Currently supports nominal and numeric attribute types, and not sparse files.


Peter Lane.


GPL version 3.0.

Version History

in trunk.