npdiff
Compute the longest common subsequence of two sequences, and generate a script to transform one sequence into the other.
Documentation
The npdiff library is an implementation of an algorithm for sequence comparison that has a worst-case running time of O(N*P), where P is the number of deletions required by the shortest edit script between the two sequences. The algorithm is described in the following paper:
Sun Wu, Udi Manber, Eugene Myers, and Webb Miller. An O(NP) sequence comparison algorithm. In Information Processing Letters, volume 35, pages 317--323, September 1990.
This library defines a data type that describes the three basic operations of an edit script (remove, insert, change) and a procedure that implements the algorithm for sequences of an arbitrary type, provided an equality predicate for that type, and accessors for the sequence.
Data types
(define-datatype diffop diffop? ...)
Representation of the three diff operations: insert, remove, change. The target in each operation is an index or a range of indices in the sequence being transformed (target sequence), and the source is a range of indices in the sequence being read from (source sequence). The operation definitions are:
- (Insert target source seq context)
- Insert operation
- TARGET is the index at which the insertion takes place;
- SOURCE is a range in the form (x . y) that specifies the indices of those elements in the source sequence that are inserted in the target sequence;
- SEQ is the target sequence, and
- CONTEXT is an optional context provided in the form (BEFORE . AFTER) where BEFORE is a sequence of elements from the source preceding the elements being inserted, and AFTER is a sequence of elements from the source that follow the elements being inserted.
- (Remove target seq context)
- Remove operation
- TARGET is a range of elements in the form (x . y) that are being deleted from the target sequence;
- SEQ is the target sequence, and
- CONTEXT is an optional context provided in the form (BEFORE . AFTER) where BEFORE is a sequence of elements from the target preceding the elements being deleted, and AFTER is a sequence of elements from the target that follow the elements being deleted.
- (Change target source seqin seqout contextin contextout)
- Change operation
- TARGET is a range of elements in the form (x . y) that are being modified in the target sequence;
- SOURCE is a range of elements in the form (x . y) from the source sequence that are replacing the specified elements in the target sequence;
- SEQIN is the source sequence, and SEQOUT is the target sequence;
- CONTEXTOUT is an optional context provided in the form (BEFORE . AFTER) where BEFORE is a sequence of elements from the target preceding the elements being changed (SEQOUT), and AFTER is a sequence of element from the target that follow the elements being changed (SEQOUT); CONTEXTIN is an optional context provided in the form (BEFORE . AFTER) where BEFORE is a sequence of elements from the source preceding the replacement elements (SEQIN), and AFTER is a sequence of elements from the source that follow the replacement elements (SEQIN);
Procedures
[procedure] npdiff:: A B [CONTEXT-LEN] -> (HUNK ... )Performs comparison on the input sequences where A and B are two sequences to be compared, and CONTEXT-LEN is an optional context length, which is used by the HUNKS procedure to create context in the hunks of the edit script
About this egg
Author
Repository
https://github.com/iraikov/chicken-npdiff
Version history
- 2.0 Ported to CHICKEN 5 and yasos collections
- 1.16
- Added missing else clause when constructing hunks for completely different sequences [thanks to Felix Winkelmann]
- 1.15
- Increased amount of context used for remove hunks [related to a bug discovered by Daishi Kato]
- 1.14
- Proper handling of boundary cases when either input sequence is empty [thanks to Daishi Kato]
- 1.13
- Make sure context is included even if the two sequences are completely different
- 1.12
- Updated version in setup file
- 1.11
- Documentation converted to wiki format
- 1.10
- Eliminated dependency on matchable
- 1.9
- Ported to Chicken 4
- 1.8
- Added missing files to the file manifest
- 1.7
- Removed dependence on stack egg
- 1.5
- Build script updated for better cross-platform compatibility
- 1.4
- eggdoc documentation fix
- 1.3
- License upgrade to GPL v3
- 1.2
- Added license text to source files
- 1.1
- Documentation fixes
- 1.0
- Initial release
License
Copyright 2007-2019 Ivan Raikov. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A full copy of the GPL license can be found at <http://www.gnu.org/licenses/>.