Wiki
Download
Manual
Eggs
API
Tests
Bugs
show
edit
history
You can edit this page using
wiki syntax
for markup.
Article contents:
== Outdated egg! This is an egg for CHICKEN 4, the unsupported old release. You're almost certainly looking for [[/eggref/5/dataset-utils|the CHICKEN 5 version of this egg]], if it exists. If it does not exist, there may be equivalent functionality provided by another egg; have a look at the [[https://wiki.call-cc.org/chicken-projects/egg-index-5.html|egg index]]. Otherwise, please consider porting this egg to the current version of CHICKEN. == Dataset Utilities A set of routines to load and manage datasets for machine learning / data mining tasks. A dataset is a table: --- <table> <tr><th>Outlook</th><th>Temperature</th><th>Humidity</th><th>Windy</th><th>Plays</th></tr> <tr><td>sunny</td><td>hot</td><td>high</td><td>false</td><td>no</td></tr> <tr><td>sunny</td><td>hot</td><td>high</td><td>true</td><td>no</td></tr> </table> --- Each column in the table is an ''attribute'', and each row is an ''instance''. Instances have values for each attribute. The whole table is called a ''relation'', and can be given a name. === Exported Procedures ==== Creating datasets <procedure>(make-nominal-attribute name value-1 ...)</procedure> Creates a nominal attribute with given values, e.g.: > (make-nominal-attribute 'outlook 'sunny 'overcast 'rainy) <procedure>(make-numeric-attribute name)</procedure> Creates a numeric attribute, e.g.: > (make-numeric-attribute 'temperature) <procedure>(make-relation name attributes data)</procedure> Creates a relation with given {{name}}. The {{attributes}} must be a list of attribute instances, and the {{data}} are a list of lists: each sublist representing an instance, and giving the value for that instance of every attribute. > (make-relation 'plays-tennis (list (make-nominal-attribute 'outlook 'sunny 'overcast 'rainy) (make-nominal-attribute 'temperature 'hot 'mild 'cool) (make-nominal-attribute 'humidity 'high 'normal) (make-nominal-attribute 'windy 'true 'false) (make-nominal-attribute 'plays 'yes 'no)) '((sunny hot high false no) (sunny hot high true no) (overcast hot high false yes) ... (rainy mild high true no))) ==== Managing datasets <procedure>(attribute-name attribute)</procedure> Returns the name of given attribute. <procedure>(attribute-definition attribute)</procedure> Returns a definition of the type of given attribute. This definition will be one of: * {{'(numeric)}} for numeric attributes * {{'(nominal value-1 ...)}} for nominal attributes, listing the possible values <procedure>(class-probability relation attribute-name value)</procedure> Returns the proportion of instances with the given attribute value. <procedure>(entropy relation attribute-name)</procedure> Computes entropy of given relation, using {{attribute-name}} to divide the relation into groups. {{attribute-name}} should be a nominal attribute. <procedure>(filter-instances relation attribute-name value)</procedure> Returns a new relation containing those instances of relation which have the given value for attribute-name. <procedure>(find-attribute-index relation attribute-name)</procedure> Returns the index number of given attribute name in relation. <procedure>(get-attribute-values relation attribute-name)</procedure> Returns the values taken by instances in relation for given attribute name. <procedure>(information-gain relation target-class attribute-name)</procedure> Computes the information gain from using the given {{attribute-name}} to split the data in {{relation}} over the entropy of the data as they are; {{target-class}} is used to compute the entropy. <procedure>(relation-attributes relation)</procedure> Returns a list of attributes for given relation. <procedure>(relation-data relation)</procedure> Returns a list of the instances in the given relation. <procedure>(relation-name relation)</procedure> Returns the name of given relation. <procedure>(split-instances relation attribute-name)</procedure> Given a nominal attribute, returns a list of relations, each representing instances in {{relation}} with the same value for given {{attribute-name}}. ==== Metrics <procedure>(euclidean-distance instance-1 instance-2)</procedure> Computes the euclidean distance between the two instances. ==== Importing Data <procedure>(read-arff filename)</procedure> Reads an ARFF definition from given filename, and returns a relation. Currently supports nominal and numeric attribute types, and not sparse files. === Author [[/users/peter-lane|Peter Lane]]. === License GPL version 3.0. === Version History in trunk.
Description of your changes:
I would like to authenticate
Authentication
Username:
Password:
Spam control
What do you get when you multiply 2 by 8?