Text::Diff - Perform diffs on files and record sets
use Text::Diff;
## Mix and match filenames, strings, file handles, producer subs,
## or arrays of records; returns diff in a string.
## WARNING: can return B<large> diffs for large files.
my $diff = diff "file1.txt", "file2.txt", { STYLE => "Context" };
my $diff = diff \$string1, \$string2, \%options;
my $diff = diff \*FH1, \*FH2;
my $diff = diff \&reader1, \&reader2;
my $diff = diff \@records1, \@records2;
## May also mix input types:
my $diff = diff \@records1, "file_B.txt";
diff() provides a basic set of services akin to the GNU diff utility. It
is not anywhere near as feature complete as GNU diff , but it is better
integrated with Perl and available on all platforms. It is often faster than
shelling out to a system's diff executable for small files, and generally
slower on larger files.
Relies on the Algorithm::Diff manpage for, well, the algorithm. This may not produce
the same exact diff as a system's local diff executable, but it will be a
valid diff and comprehensible by patch . We haven't seen any differences
between the Algorithm::Diff manpage's logic and GNU diff 's, but we have not examined
them to make sure they are indeed identical.
Note: If you don't want to import the diff function, do one of the
following:
use Text::Diff ();
require Text::Diff;
That's a pretty rare occurrence,
so diff() is exported by default.
If you pass a filename, but the file can't be read,
then diff() will croak .
diff() takes two parameters from which to draw input and a set of
options to control its output. The options are:
- FILENAME_A, MTIME_A, FILENAME_B, MTIME_B
-
The name of the file and the modification time ``files''.
These are filled in automatically for each file when diff() is passed a
filename, unless a defined value is passed in.
If a filename is not passed in and FILENAME_A and FILENAME_B are not provided
or are undef , the header will not be printed.
Unused on OldStyle diffs.
- OFFSET_A, OFFSET_B
-
The index of the first line / element. These default to 1 for all
parameter types except ARRAY references, for which the default is 0. This
is because ARRAY references are presumed to be data structures, while the
others are line-oriented text.
- STYLE
-
``Unified'', ``Context'', ``OldStyle'', or an object or class reference for a class
providing
file_header() , hunk_header() , hunk() , hunk_footer() and
file_footer() methods. The two footer() methods are provided for
overloading only; none of the formats provide them.
Defaults to ``Unified'' (unlike standard diff , but Unified is what's most
often used in submitting patches and is the most human readable of the three.
If the package indicated by the STYLE has no hunk() method, diff() will
load it automatically (lazy loading). Since all such packages should inherit
from Text::Diff::Base , this should be marvy.
Styles may be specified as class names (STYLE => 'Foo' ),
in which case they will be new() ed with no parameters,
or as objects (STYLE => Foo->new ).
- CONTEXT
-
How many lines before and after each diff to display. Ignored on old-style
diffs. Defaults to 3.
- OUTPUT
-
Examples and their equivalent subroutines:
OUTPUT => \*FOOHANDLE, # like: sub { print FOOHANDLE shift() }
OUTPUT => \$output, # like: sub { $output .= shift }
OUTPUT => \@output, # like: sub { push @output, shift }
OUTPUT => sub { $output .= shift },
If no OUTPUT is supplied, returns the diffs in a string. If
OUTPUT is a CODE ref, it will be called once with the (optional)
file header, and once for each hunk body with the text to emit. If
OUTPUT is an the IO::Handle manpage, output will be emitted to that handle.
- FILENAME_PREFIX_A, FILENAME_PREFIX_B
-
The string to print before the filename in the header. Unused on
OldStyle
diffs. Defaults are "---" , "+++" for Unified and "***" , "+++" for
Context.
- KEYGEN, KEYGEN_ARGS
-
These are passed to traverse_sequences in the Algorithm::Diff manpage.
Note: if neither FILENAME_ option is defined, the header will not be
printed. If at least one is present, the other and both MTIME_ options must
be present or ``Use of undefined variable'' warnings will be generated (except
on OldStyle diffs, which ignores these options).
These functions implement the output formats. They are grouped in to classes
so diff() can use class names to call the correct set of output routines and
so that you may inherit from them easily. There are no constructors or
instance methods for these classes, though subclasses may provide them if need
be.
Each class has file_header() , hunk_header() , hunk() , and footer()
methods identical to those documented in the Text::Diff::Unified section.
header() is called before the hunk() is first called, footer()
afterwards. The default footer function is an empty method provided for
overloading:
sub footer { return "End of patch\n" }
Some output formats are provided by external modules (which are loaded
automatically), such as the Text::Diff::Table manpage. These are
are documented here to keep the documentation simple.
Returns ``'' for all methods (other than new() ).
--- A Mon Nov 12 23:49:30 2001
+++ B Mon Nov 12 23:49:30 2001
@@ -2,13 +2,13 @@
2
3
4
-5d
+5a
6
7
8
9
+9a
10
11
-11d
12
13
- Text::Diff::Unified::file_header
-
$s = Text::Diff::Unified->file_header( $options );
Returns a string containing a unified header. The sole parameter is the
options hash passed in to diff() , containing at least:
FILENAME_A => $fn1,
MTIME_A => $mtime1,
FILENAME_B => $fn2,
MTIME_B => $mtime2
May also contain
FILENAME_PREFIX_A => "---",
FILENAME_PREFIX_B => "+++",
to override the default prefixes (default values shown).
- Text::Diff::Unified::hunk_header
-
Text::Diff::Unified->hunk_header( \@ops, $options );
Returns a string containing the heading of one hunk of unified diff.
- Text::Diff::Unified::hunk
-
Text::Diff::Unified->hunk( \@seq_a, \@seq_b, \@ops, $options );
Returns a string containing the output of one hunk of unified diff.
+--+----------------------------------+--+------------------------------+
| |../Test-Differences-0.2/MANIFEST | |../Test-Differences/MANIFEST |
| |Thu Dec 13 15:38:49 2001 | |Sat Dec 15 02:09:44 2001 |
+--+----------------------------------+--+------------------------------+
| | * 1|Changes *
| 1|Differences.pm | 2|Differences.pm |
| 2|MANIFEST | 3|MANIFEST |
| | * 4|MANIFEST.SKIP *
| 3|Makefile.PL | 5|Makefile.PL |
| | * 6|t/00escape.t *
| 4|t/00flatten.t | 7|t/00flatten.t |
| 5|t/01text_vs_data.t | 8|t/01text_vs_data.t |
| 6|t/10test.t | 9|t/10test.t |
+--+----------------------------------+--+------------------------------+
This format also goes to some pains to highlight ``invisible'' characters on
differing elements by selectively escaping whitespace:
+--+--------------------------+--------------------------+
| |demo_ws_A.txt |demo_ws_B.txt |
| |Fri Dec 21 08:36:32 2001 |Fri Dec 21 08:36:50 2001 |
+--+--------------------------+--------------------------+
| 1|identical |identical |
* 2| spaced in | also spaced in *
* 3|embedded space |embedded tab *
| 4|identical |identical |
* 5| spaced in |\ttabbed in *
* 6|trailing spaces\s\s\n |trailing tabs\t\t\n *
| 7|identical |identical |
* 8|lf line\n |crlf line\r\n *
* 9|embedded ws |embedded\tws *
+--+--------------------------+--------------------------+
See the Text::Diff::Table manpage for more details, including how the whitespace
escaping works.
*** A Mon Nov 12 23:49:30 2001
--- B Mon Nov 12 23:49:30 2001
***************
*** 2,14 ****
2
3
4
! 5d
6
7
8
9
10
11
- 11d
12
13
--- 2,14 ----
2
3
4
! 5a
6
7
8
9
+ 9a
10
11
12
13
Note: hunk_header() returns only ``***************\n''.
5c5
< 5d
---
> 5a
9a10
> 9a
12d12
< 11d
Note: no file_header() .
Must suck both input files entirely in to memory and store them with a normal
amount of Perlish overhead (one array location) per record. This is implied by
the implementation of the Algorithm::Diff manpage, which takes two arrays. If
the Algorithm::Diff manpage ever offers an incremental mode, this can be changed
(contact the maintainers of the Algorithm::Diff manpage and Text::Diff if you need
this; it shouldn't be too terribly hard to tie arrays in this fashion).
Does not provide most of the more refined GNU diff options: recursive
directory tree scanning, ignoring blank lines / whitespace, etc., etc. These
can all be added as time permits and need arises, many are rather easy; patches
quite welcome.
Uses closures internally, this may lead to leaks on Perl versions 5.6.1 and
prior if used many times over a process' life time.
the Algorithm::Diff manpage - the underlying implementation of the diff algorithm
used by Text::Diff .
the YAML::Diff manpage - find difference between two YAML documents.
the HTML::Differences manpage - find difference between two HTML documents.
This uses a more sane approach than the HTML::Diff manpage.
the XML::Diff manpage - find difference between two XML documents.
the Array::Diff manpage - find the differences between two Perl arrays.
the Hash::Diff manpage - find the differences between two Perl hashes.
the Data::Diff manpage - find difference between two arbitrary data structures.
https://github.com/neilbowers/Text-Diff
Adam Kennedy <adamk@cpan.org>
Barrie Slaymaker <barries@slaysys.com>
Some parts copyright 2009 Adam Kennedy.
Copyright 2001 Barrie Slaymaker. All Rights Reserved.
You may use this under the terms of either the Artistic License or GNU Public
License v 2.0 or greater.
|