decatur-vote / text-diff
A minimal library that produces diffs in a fairly human readable, plain-text format. Diffs allow converting old text to new text & visa versa.
Requires (Dev)
- taeluf/code-scrawl: v0.8.x-dev
- taeluf/phtml: v0.1.x-dev
- taeluf/tester: v0.3.x-dev
Suggests
- taeluf/phtml: Used by TextDiff to beautify HTML, so that HTML is multi-line & can be diff'd.
This package is auto-updated.
Last update: 2024-10-30 01:54:12 UTC
README
PHP Text Diffs
A minimal library that produces diffs in a fairly human readable, plain-text format. Diffs allow converting old text to new text & visa versa.
Installation
composer require decatur-vote/text-diff v1.0.x-dev
or in your composer.json
{"require":{ "decatur-vote/text-diff": "v1.0.x-dev"}}
Usage
Generate a diff, and move forward or backward with the operations (opps
).
<?php
$differ = new \DecaturVote\Differ();
$old = 'old text';
// $old = $differ->fix_html($old); # optional - will beautify html using DOMDocument, via composer package taeluf/phtml
$new = 'new text';
$opps = $differ->get_opps($old, $new);
// file_put_contents('file_path.txt', $opps)
$recovered_old_text = $differ->backward($new, $opps);
$restored_new_text = $differ->forward($old, $opps);
Notes
- Takes 122ms (on my machine) to generate diffs for two 15,000 character strings and then run both
forward()
andbackward()
. Run viaphptest LargeRandomStringDiffs
phptest -test MultipleDiffs -run dup
- tests a specific diff test. See the test class.- There are 27 different diff tests that are all passing, several of which were added when bugs were found via LargeRandomStringDiffs test. Most (maybe all) bugs were because of duplicate lines not being handled.
TODO maybe?
- create diff html (I don't know how that'll work with my
move
operations, so I probably won't do this)
How it works
Overview of how the Differ works. We'll cover what diffs look like, how they're processed, THEN how they're created.
What diffs look like
This will remove lines 1 & 3, add lines 'c' & 'def' at positions 3 & 5, then move line 6 to line 4 & line 7 to line 3.
1-abc
3-baby
3+c
5+def
6>4
7>3
NOTE: 0-based line indexes. The first line is line 0
. The second line is line 1
. 1-abc
means there is a line BEFORE abc
NOTE: remove & add line operations depend on previous instructions having already been applied. 3-baby
, above, says 'Remove Line 3, which contains the text "baby".'. However, the original text has baby
as line 4. 3-baby
is processed AFTER 1-abc
... so the 4th line ("baby") becomes the 3rd line after line 1 is removed, so we get 3-baby
Processing Diffs
Forward diffs are processed from top to bottom, always in this order:
- remove lines
- add lines
- move lines
NOTE: For 6>4
, line 4 is replaced by line 6. Line 6 is unchanged, unless a different move opp overwrites line 6.
Backward diffs are processed from bottom to top:
- Move lines
- Remove lines that are listed as
add
operations (ex5+def
would be removed) - Add lines that are listed as
remove
operations (ex1-abc
would be added)
Generating Diffs
Duplicate lines are handled at each stage (most of the complexity of the algorithm comes from managing duplicate lines).
- Generate an array of lines for
$old_text
and$new_text
- Find any lines in
$old_text
that don't exist in$new_text
& generate opps for these lines to be removed - Find any lines in
$new_text
that don't exist in$old_text
& generate opps for these lines to be added - Apply the remove & add opps to the
$old_text
to create an out-of-order copy of the new text. - Compare each line of the out-of-order new text to the supplied
$new_text
. Find the correct location.