Wiki
Download
Manual
Eggs
API
Tests
Bugs
show
edit
history
You can edit this page using
wiki syntax
for markup.
Article contents:
== Outdated egg! This is an egg for CHICKEN 4, the unsupported old release. You're almost certainly looking for [[/eggref/5/link-grammar|the CHICKEN 5 version of this egg]], if it exists. If it does not exist, there may be equivalent functionality provided by another egg; have a look at the [[https://wiki.call-cc.org/chicken-projects/egg-index-5.html|egg index]]. Otherwise, please consider porting this egg to the current version of CHICKEN. == link-grammar Bindings for the CMU link-grammar parser system. [[toc:]] === Link Grammar The link grammar parser is a syntactic parser of English, based on [[https://www.abisource.com/projects/link-grammar/|link grammar]], an original theory of English syntax. Given a sentence the system assigns to it a syntactic structure, which consists of a set of labeled links connecting pairs of words. The parser also produces a 'constituent' representation of a sentence (showing noun phrases, verb phrases, etc.). === Author David Ireland (djireland79 at gmail dot com) === Upstream [[https://www.abisource.com/projects/link-grammar/]] === Egg Source Code [[https://gitlab.com/maxwell79/chicken-link-grammar]] ==== {{link-grammar}} '''[module]''' {{link-grammar}} Documentation * [[#parse-with-default]] * [[#parse-sentence]] * [[#display-off]] * [[#display-multi-line]] * [[#display-bracket-tree]] * [[#display-single-line]] * [[#display-max-styles]] * [[#create-default-dictionary]] * [[#create-dictionary-with-language]] * [[#get-verbosity]] * [[#get-version]] * [[#get-dictionary-version]] * [[#get-dictionary-locale]] * [[#get-dictionary-language]] * [[#get-dictionary-data-dir]] * [[#set-dictionary-data-dir!]] * [[#delete-dictionary!]] * [[#create-sentence]] * [[#split-sentence]] * [[#sentence-length]] * [[#sentence-null-count]] * [[#sentence-disjunct-cost]] * [[#sentence-link-cost]] * [[#linkages-found]] * [[#linkages-post-processed]] * [[#linkages-violated]] * [[#valid-linkages]] * [[#delete-sentence!]] * [[#create-linkage]] * [[#corpus-cost]] * [[#get-lword]] * [[#get-rword]] * [[#get-words]] * [[#get-word]] * [[#get-constituents]] * [[#get-diagram]] * [[#get-postscript]] * [[#get-disjuncts]] * [[#get-links-domains]] * [[#get-violation-name]] * [[#link-length]] * [[#link-label]] * [[#link-llabel]] * [[#link-rlabel]] * [[#link-cost]] * [[#link-domain-names]] * [[#num-words]] * [[#num-links]] * [[#num-domains]] * [[#unused-word-cost]] * [[#delete-linkage!]] * [[#init-opts]] * [[#set-max-parse-time!]] * [[#set-linkage-limit!]] * [[#set-short-length!]] * [[#set-disjunct-cost!]] * [[#set-min-null-count!]] * [[#set-max-null-count!]] * [[#set-max-parse-time!]] * [[#set-islands-ok!]] * [[#set-verbosity!]] * [[#resources-exhausted?]] * [[#memory-exhausted?]] * [[#timer-expired?]] * [[#reset-resources!]] * [[#delete-parse-options!]] === Example Usage <enscript highlight="scheme"> (import scheme) (cond-expand (chicken-4 (use (prefix link-grammar lg:))) (chicken-5 (import (prefix link-grammar lg:)))) (define (display-linkage sentence opts index) (let* ((links-found (lg:linkages-found sentence)) (linkage (lg:create-linkage index sentence opts))) (when linkage (let ((constituents (lg:get-constituents linkage lg:display-multi-line)) (diagram (lg:get-diagram linkage #t 80))) (print constituents) (print diagram) (lg:delete-linkage! linkage))) (when (<= index links-found) (display-linkage sentence opts (+ index 1))))) (define (parse text dictionary opts) (let* ((sentence (lg:create-sentence text dictionary)) (num-linkages (lg:parse-sentence sentence opts))) (when (= num-linkages 0) (lg:set-min-null-count! opts 1) (lg:set-max-null-count! opts (lg:sentence-length sentence)) (set! num-linkages (lg:parse-sentence sentence opts))) (display-linkage sentence opts 0) (lg:delete-sentence! sentence))) (define dictionary (lg:create-default-dictionary)) (define opts (lg:init-opts)) (lg:set-linkage-limit! opts 1000) (lg:set-short-length! opts 10) (lg:set-verbosity! opts 1) (lg:set-max-parse-time! opts 30) (lg:set-linkage-limit! opts 1000) (lg:set-min-null-count! opts 0) (lg:set-max-null-count! opts 0) (lg:set-short-length! opts 16) (lg:set-islands-ok! opts #f) (parse "The black fox ran from the hunters" dictionary opts) (lg:delete-parse-options! opts) (lg:delete-dictionary! dictionary) </enscript> (S (NP the black.a fox.n) (VP ran.v-d (PP from (NP the hunters.n)))) +------------------------Xp------------------------+ +----------->WV----------->+ | +---------Wd--------+ | | | +----Ds**x---+ | +----Jp----+ | | | +---A--+--Ss--+--MVp-+ +--Dmc-+ +--RW--+ | | | | | | | | | | LEFT-WALL the black.a fox.n ran.v-d from the hunters.n . RIGHT-WALL (S (NP the black.a fox.n) (VP ran.v-d (PP from (NP the hunters.n)))) +------------------------Xp------------------------+ +---------Wd--------+ | | +----Ds**x---+ +----Jp----+ | | | +---A--+--Ss--+--MVp-+ +--Dmc-+ +--RW--+ | | | | | | | | | | LEFT-WALL the black.a fox.n ran.v-d from the hunters.n . RIGHT-WALL === Simple Use Parse a text using default values for the dictionary and parser ==== {{parse-with-default}} <procedure>(parse-with-default text) → (values words links diagrams postscript)</procedure> Parse text using default values ; {{text}} : string to parse === Sentences A sentence is the API's representation of an input string, tokenized and interpreted according to a specific Dictionary. After a Sentence is created and parsed, various attributes of the resulting set of linkages can be obtained. ==== {{create-sentence}} <procedure>(create-sentence input dictionary) → sentence</procedure> creates a sentence object from the input string, using the Dictionary that was created earlier to tokenize and define words ; {{input}} : Input string (string) ; {{dictionary}} : dictionary to use ==== {{delete-sentence!}} <procedure>(delete-sentence! sentence) → unspecified</procedure> Deletes the specificed sentence ; {{sentence}} : Sentence to be deleted (sentence) ==== {{split-sentence}} <procedure>(split-sentence sentence parse-options) → number</procedure> Splits (tokenizes) the sentence up into its component words and punctuation. This includes splitting up certain run-on expressions, such as '12ft.' which is split into '12' and 'ft.'. If spell- guessing is enabled in the opts, the tokenizer will also separate most run-on words, i.e. pairs of words without an intervening space. This routine returns zero if successful; else a non-zero value if an error occurred. ; {{sentence}} : Sentence to split (sentence) ; {{parse-options}} : ==== {{parse-sentence}} <procedure>(parse-sentence sentence parse-options) → number</procedure> This routine represents the heart of the program. There are several things that are done when a sentence is parsed: 1. Word expressions are extracted from the dictionary and pruned. 2. Disjuncts are built. 3. A series of pruning operations is carried out. 4. The linkages having the minimal number of null links are counted. 5. A 'parse set' of linkages is built. 6. The linkages are post-processed. The 'parse set' is attached to the sentence, and this is one of the key reasons that the API is flexible and modular. All of the necessary information for building linkages is stored in the parse set. This means that other sentences can be parsed, possibly using different dictionaries and other parameters, without disturbing the information obtained from a call to sentence_parse. If another call to parse-sentence is made on the same sentence, the parsing information for the previous call is deleted. Like almost all of the other routines, this call is thread-safe: that is, sentences can be parsed concurrently in multiple threads. ; {{sentence}} : ; {{parse-options}} : ==== {{sentence-length}} <procedure>(sentence-length sentence) → number</procedure> Returns the length of the sentence ; {{sentence}} : ==== {{sentence-null-count}} <procedure>(sentence-null-count) → number</procedure> Returns the number of words that failed to be linked into the rest of the sentence during parsing. This number is greater then zero whenever a word doesn't seem to fit anywhere in the parse, either due to poor grammar, or due to a shortcoming of the dictionary. ==== {{linkages-found}} <procedure>(linkages-found) → number</procedure> Returns the number of linkages that the search found ==== {{valid-linkages}} <procedure>(valid-linkages) → number</procedure> Returns the number of linkages that had no post-processing violations ==== {{linkages-post-processed}} <procedure>(linkages-post-processed) → number</procedure> Returns the number of linkages that were actually post-processed ==== {{linkages-violated}} <procedure>(linkages-violated) → number</procedure> Returns the number of post-processing violations that the i-th linkage had during the last call to sentence_parse. ==== {{sentence-disjunct-cost}} <procedure>(sentence-disjunct-cost sentence index) → number</procedure> Returns the sum total of all of the costs of all of the disjuncts used in the i-th linkage of the sentence. The higher the cost, the less likely that the parse is correct. Very roughly, this can be interpreted as if it was (minus) the log-liklihood of a parse being correct. ; {{sentence}} : ; {{index}} : ==== {{sentence-link-cost}} <procedure>(sentence-link-cost sentence index) → number</procedure> Returns the sum of the length of the links in the i-th parse. The ratio of this length, to the total length of the sentence, gives a rough measure of the complexity of the sentence. That is, long-range links between distant words indicates that the sentence may be hard to understand; alternately, it may indicate that the parse is not very accurate. ; {{sentence}} : ; {{index}} : === Dictionary A Dictionary is the programmer's handle on the set of word definitions that defines the grammar. A user creates a Dictionary from a grammar file and post-process knowledge file, and then passes it to the various parsing routines. ==== {{create-dictionary-with-language}} <procedure>(create-dictionary-with-language language) → dictionary</procedure> Creates a dictionary with the specified language ; {{language}} : Language to use (string) ==== {{create-default-dictionary}} <procedure>(create-default-dictionary) → dictionary</procedure> Looks for a dictionary in the same language as the current environment, and if one is found, creates a dictionary object. ==== {{get-dictionary-language}} <procedure>(get-dictionary-language dictionary) → string</procedure> Returns the language of the specified dictionary ; {{dictionary}} : specified dictionary (dictionary) ==== {{delete-dictionary!}} <procedure>(delete-dictionary! dictionary) → unspecified</procedure> Deletes the specified dictionary ; {{dictionary}} : specified dictionary (dictionary) ==== {{set-dictionary-data-dir!}} <procedure>(set-dictionary-data-dir! path) → unspecified</procedure> Specify the file path to the dictionaries to use; to be effective, this routine must be called before the dictionaries are opened. ; {{path}} : Filename with path ==== {{get-dictionary-data-dir}} <procedure>(get-dictionary-data-dir) → string</procedure> Returns the file path to the dictionaries === Linkages ==== {{create-linkage}} <procedure>(create-linkage) → linkage</procedure> This function creates the index-th linkage from the (parsed) sentence sent. Several operations can be carried out on the resulting linkage; for example it can be printed, post-processed with a different post- processor, or information on individual links can be extracted. If the parse has a conjunction, then the linkage will be made up of two or more sublinkages. ==== {{delete-linkage!}} <procedure>(delete-linkage! linakge) → unspecified</procedure> Delete the given linkage ; {{linakge}} : ==== {{num-words}} <procedure>(num-words linkage) → number</procedure> The number of words in the sentence for which this is a linkage. ; {{linkage}} : ==== {{num-links}} <procedure>(num-links linkage) → number</procedure> The number of links used in the linkage. ; {{linkage}} : ==== {{link-length}} <procedure>(link-length linkage index) → number</procedure> The value returned by num-links procedure is the number of words spanned by the index-th link of the linkage. ; {{linkage}} : ; {{index}} : (number) ==== {{get-lword}} <procedure>(get-lword) → number</procedure> The value returned is the number of the word on the left end of the index-th link of the current sublinkage. ==== {{get-rword}} <procedure>(get-rword) → number</procedure> The value returned is the number of the word on the right end of the index-th link of the current sublinkage. ==== {{link-label}} <procedure>(link-label linkage index) → string</procedure> The label on a link in a diagram is constructed by taking the 'intersection' of the left and right connectors that comprise the link. For example, 'I.p eat, therefore I.p think.v' has a Sp*i label on the link between the words I.p and eat is constructed from the Sp*i connector on the its left word, and the Sp connector on its right word. So, for this example, both link-label and link-llabel return 'Sp*i' while link-rlabel returns 'Sp' for this link. ; {{linkage}} : ; {{index}} : ==== {{link-llabel}} <procedure>(link-llabel linkage index) → string</procedure> See link-label ; {{linkage}} : ; {{index}} : ==== {{link-rlabel}} <procedure>(link-rlabel linkage index) → string</procedure> See link-label ; {{linkage}} : ; {{index}} : ==== {{num-domains}} <procedure>(num-domains linkage index) → number</procedure> num-domains, link-domain-names allow access to most of the domain structure extracted during post-processing. The index parameter in the first two calls specify which link in the linkage to extract the information for. In the 'I eat therefore I think' example above, the link between the words therefore and I.p belongs to two 'm' domains. If the linkage violated any post-processing rules, the name of the violated rule in the post-process knowledge file can be determined by a call to get-violation-name. ; {{linkage}} : ; {{index}} : ==== {{link-domain-names}} <procedure>(link-domain-names linkage word-index) → list</procedure> Gets domain structure extracted during the post-processing ; {{linkage}} : ; {{word-index}} : Specifies which link in the linkage to extract the information for. ==== {{get-words}} <procedure>(get-words linkage) → list</procedure> Returns the array of word spellings or individual word spelling for the linkage. These are the subscripted spellings, such as 'dog.n'. The original spellings can be obtained by calls to sentence-get-word. ; {{linkage}} : ==== {{get-word}} <procedure>(get-word linkage word-number) → string</procedure> Returns the word spelling of an individual word ; {{linkage}} : ; {{word-number}} : The specific word ==== {{disjunct-str}} <procedure>(disjunct-str linkage linkage word-number) → string</procedure> Return a string showing the disjuncts that were actually used in association with the specified word in the current linkage. The string shows the disjuncts in proper order; that is, left-to-right, in the order in which they link to other words. The returned string can be thought of as a very precise part-of-speech-like label for the word, indicating how it was used in the given sentence; this can be useful for corpus statistics. ; {{linkage}} : The specific linkage ; {{linkage}} : ; {{word-number}} : The specific word ==== {{disjunct-cost}} <procedure>(disjunct-cost) → number</procedure> Return the cost of a word as used in a particular linkage, based on the dictionary. ==== {{disjunct-corpus-score}} <procedure>(disjunct-corpus-score) → number</procedure> Returns the cost based on the corpus-statistics database. ==== {{get-constituents}} <procedure>(get-constituents linkage display-style) → string</procedure> Returns the constituents for a particular linkage ; {{linkage}} : ; {{display-style}} : (number ==== {{get-diagram}} <procedure>(get-diagram linkage display-walls? screen-width) → string</procedure> Returns the linkage diagram ; {{linkage}} : ; {{display-walls?}} : A boolean that indicates whether or not the wall-words, and the connectors to them, should be printed ; {{screen-width}} : The screen-width is an integer, indicating the number of columns that should be used during printing; long sentences that are wider than the number of columns will be automatically wrapped so that they always fit. ==== {{get-postscript}} <procedure>(get-postscript linkage display-walls? print-ps-header?) → string</procedure> Returns the macros needed to print out the linkage in a postscript file. ; {{linkage}} : ; {{display-walls?}} : A boolean that indicates whether or not the wall-words, and the connectors to them, should be printed ; {{print-ps-header?}} : A boolean that indicates whether or not postscript header boilerplate should be included. ==== {{get-disjuncts}} <procedure>(get-disjuncts linkage) → string</procedure> Returns the returns a string that shows all of the disjuncts, and their costs, that were used to create the linkage. ; {{linkage}} : ==== {{get-links-domains}} <procedure>(get-links-domains linkage) → string</procedure> Returns a string that lists all of the links and domain names for the linkage. ; {{linkage}} : ==== {{unused-word-cost}} <procedure>(unused-word-cost linkage) → number</procedure> Should return the same value as sentence-null-count. ; {{linkage}} : ==== {{disjunct-cost}} <procedure>(disjunct-cost linkage) → number</procedure> Should return the same value as sentence-disjunct-cost. ; {{linkage}} : ==== {{link-cost}} <procedure>(link-cost linkage) → number</procedure> Should return the same value as sentence-link-cost. ; {{linkage}} : ==== {{corpus-cost}} <procedure>(corpus-cost linkage) → number</procedure> Returns the total cost of this particular linkage, based on the cost of disjuncts stored in the corpus-statistics database. ; {{linkage}} : ==== {{linkage->eps-file}} <procedure>(linkage->eps-file filename postscript) → unspecified</procedure> Saves a linkage to a postscript file ; {{path}} : filename ; {{postscript}} : Postscript string ==== {{get-version}} <procedure>(get-version) → string</procedure> Gets link-grammar version ==== {{get-dictionary-version}} <procedure>(get-dictionary-version dictionary) → string</procedure> Gets dictionary version ; {{dictionary}} : Dictionary ==== {{get-dictionary-locale}} <procedure>(get-dictionary-locale) → string</procedure> Gets dictionary locale ==== {{display-off}} <constant>display-off → 0</constant> Turn off display ==== {{display-multi-line}} <constant>display-multi-line → 1</constant> Print diagram across multiple lines ==== {{display-bracket-tree}} <constant>display-bracket-tree → 2</constant> Use brackets when printing diagram ==== {{display-single-line}} <constant>display-single-line → 3</constant> Print diagram on single line ==== {{display-max-styles}} <constant>display-max-styles → 3</constant> Print diagram on single line ==== {{set-display-morphology!}} <procedure>(set-display-morphology! parse-options value) → unspecified</procedure> Sets display morphology in parse-options ; {{parse-options}} : ; {{value}} : (number) ==== {{get-display-morphology}} <procedure>(get-display-morphology parse-options) → number</procedure> Gets display morphology value ; {{parse-options}} : === Parse Options Parse-options specify the different parameters that are used to parse sentences. Examples of the kinds of things that are controlled by parse-options include maximum parsing time and memory, whether to use null-links, and whether or not to use 'panic' mode. This data structure is passed in to the various parsing and printing routines along with the sentence. Default value for parse-option members are: verbosity → 0 linkage-limit → 10000 min-null-count → 0 max-null-count → 0 null-block → 1 islands-ok → #f short-length → 6 all-short → #f display-short → #t display-word-subscripts → #t display-link-subscripts → #t display-walls → #f allow-null → #t echo-on → #f batch-mode → #f panic-mode → #f screen-width → 79 display-on → #t display-postscript → #f display-bad → #f display-links → #f ==== {{init-opts}} <procedure>(init-opts) → parse-options</procedure> Initilise parse-options to default values ==== {{set-max-parse-time!}} <procedure>(set-max-parse-time! parse-options value) → unspecified</procedure> Set maximum parse time ; {{parse-options}} : ; {{value}} : (number) ==== {{set-linkage-limit!}} <procedure>(set-linkage-limit! parse-options linkage-limit) → unspecified</procedure> Set linkage limit ; {{parse-options}} : ; {{linkage-limit}} : (number) ==== {{set-short-length!}} <procedure>(set-short-length! parse-options short-length) → unspecified</procedure> The short_length parameter determines how long the links are allowed to be. The intended use of this is to speed up parsing by not considering very long links for most connectors, since they are very rarely used in a correct parse. An entry for UNLIMITED-CONNECTORS in the dictionary will specify which connectors are exempt from the length limit. ; {{parse-options}} : ; {{short-length}} : (number) ==== {{set-disjunct-cost!}} <procedure>(set-disjunct-cost! parse-options disjunt-cost) → unspecified</procedure> Determines the maximum disjunct cost used during parsing, where the cost of a disjunct is equal to the maximum cost of all of its connectors. The default is that only disjuncts up to a cost of 2.9 are considered. ; {{parse-options}} : ; {{disjunt-cost}} : ==== {{set-min-null-count!}} <procedure>(set-min-null-count! parse-options null-count) → unspecified</procedure> When parsing a sentence, the parser will find all solutions having the minimum number of null links. It carries out its search in the range of null link counts between min_null_count and max_null_count. By default, the minimum and maximum number of null links is 0, so null links are not used. ; {{parse-options}} : ; {{null-count}} : ==== {{set-max-null-count!}} <procedure>(set-max-null-count! parse-options null-count) → unspecified</procedure> When parsing a sentence, the parser will find all solutions having the minimum number of null links. It carries out its search in the range of null link counts between min-null-count and max-null-count. By default, the minimum and maximum number of null links is 0, so null links are not used. ; {{parse-options}} : ; {{null-count}} : ==== {{reset-resources!}} <procedure>(reset-resources! parse-options) → unspecified</procedure> Reset acquired resources ; {{parse-options}} : ==== {{resources-exhausted?}} <procedure>(resources-exhausted? parse-options) → boolean</procedure> Resources_exhausted means memory-exhausted? OR timer-expired? ; {{parse-options}} : ==== {{memory-exhausted?}} <procedure>(memory-exhausted? parse-options) → number</procedure> Checks whether the memory was exhausted during parsing ; {{parse-options}} : ==== {{timer-expired?}} <procedure>(timer-expired? parse-options) → number</procedure> Checks whether the timer was exceeded during parsing. ; {{parse-options}} : ==== {{set-islands-ok!}} <procedure>(set-islands-ok! parse-options islands-ok?) → unspecified</procedure> This option determines whether or not 'islands' of links are allowed. ; {{parse-options}} : ; {{islands-ok?}} : A boolean to indicate whether islands are allowed ==== {{set-verbosity!}} <procedure>(set-verbosity! parse-options verbosity-level) → unspecified</procedure> Sets/gets the level of description printed to stderr/stdout about the parsing process. ; {{parse-options}} : ; {{verbosity-level}} : ==== {{get-verbosity}} <procedure>(get-verbosity parse-options) → number</procedure> Get the verbosity level ; {{parse-options}} : ==== {{delete-parse-options!}} <procedure>(delete-parse-options! parse-options) → number</procedure> Delete a parse-option object ; {{parse-options}} : === License This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA === About this egg ==== Author [[/users/djireland|David Ireland]] ==== Repository [[https://gitlab.com/maxwell79/chicken-link-grammar]] ==== License LGPL-2.1 ==== Dependencies ==== Versions ; [[https://gitlab.com/maxwell79/chicken-link-grammar/releases/tag/1.6|1.6]] : ==== Colophon Documented by [[/egg/hahn|hahn]].
Description of your changes:
I would like to authenticate
Authentication
Username:
Password:
Spam control
What do you get when you subtract 15 from 13?