npdiff

Compute the longest common subsequence of two sequences, and generate a script to transform one sequence into the other.

  1. npdiff
  2. Documentation
    1. Data types
    2. Procedures
  3. About this egg
    1. Author
    2. Repository
    3. Version history
    4. License

Documentation

The npdiff library is an implementation of an algorithm for sequence comparison that has a worst-case running time of O(N*P), where P is the number of deletions required by the shortest edit script between the two sequences. The algorithm is described in the following paper:

Sun Wu, Udi Manber, Eugene Myers, and Webb Miller. An O(NP) sequence comparison algorithm. 
In Information Processing Letters, volume 35, pages 317--323, September 1990.

This library defines a data type that describes the three basic operations of an edit script (remove, insert, change) and a procedure that implements the algorithm for sequences of an arbitrary type, provided an equality predicate for that type, and accessors for the sequence.

Data types

(define-datatype diffop diffop? ...)

Representation of the three diff operations: insert, remove, change. The target in each operation is an index or a range of indices in the sequence being transformed (target sequence), and the source is a range of indices in the sequence being read from (source sequence). The operation definitions are:

(Insert target source seq context)
Insert operation
(Remove target seq context)
Remove operation
(Change target source seqin seqout contextin contextout)
Change operation

Procedures

[procedure] npdiff:: A B [CONTEXT-LEN] -> (HUNK ... )

Performs comparison on the input sequences where A and B are two sequences to be compared, and CONTEXT-LEN is an optional context length, which is used by the HUNKS procedure to create context in the hunks of the edit script

About this egg

Author

Ivan Raikov

Repository

https://github.com/iraikov/chicken-npdiff

Version history

2.0 Ported to CHICKEN 5 and yasos collections
1.16
Added missing else clause when constructing hunks for completely different sequences [thanks to Felix Winkelmann]
1.15
Increased amount of context used for remove hunks [related to a bug discovered by Daishi Kato]
1.14
Proper handling of boundary cases when either input sequence is empty [thanks to Daishi Kato]
1.13
Make sure context is included even if the two sequences are completely different
1.12
Updated version in setup file
1.11
Documentation converted to wiki format
1.10
Eliminated dependency on matchable
1.9
Ported to Chicken 4
1.8
Added missing files to the file manifest
1.7
Removed dependence on stack egg
1.5
Build script updated for better cross-platform compatibility
1.4
eggdoc documentation fix
1.3
License upgrade to GPL v3
1.2
Added license text to source files
1.1
Documentation fixes
1.0
Initial release

License

Copyright 2007-2019 Ivan Raikov. 

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.

A full copy of the GPL license can be found at
<http://www.gnu.org/licenses/>.