MSCFOSS/DIF122/Software Development Practices/Unit IV/Patching

From Amachu
Jump to: navigation, search

Contents

Bug Fixing

  • Bug fixing in a software would mean identifying the section of code that causes the problem and do modification to that section, which might involve,
    • alteration of code
    • removing code
    • adding more lines of code
    • any other work, that causes a change to the source code
  • It involves, getting the version of the source in which the bug has been confirmed & making modifications to it.
  • After fixing, the one who fixes it would be expected to send the difference or comparision between the unmodified version of the code & modified version of the code.
  • diff and patch are two command line tools that is used to acheive this.

What Comparison Means

There are several ways to think about the differences between two files. One way to think of the differences is as a series of lines that were deleted from, inserted in, or changed in one file to produce the other file. diff compares two files line by line, finds groups of lines that differ, and reports each group of differing lines. It can report the differing lines in several formats, which have different purposes.

GNU diff can show whether files are different without detailing the differences. It also provides ways to suppress certain kinds of differences that are not important to you. Most commonly, such differences are changes in the amount of white space between words or lines. diff also provides ways to suppress differences in alphabetic case or in lines that match a regular expression that you provide. These options can accumulate; for example, you can ignore changes in both white space and alphabetic case.

Hunks

When comparing two files, diff finds sequences of lines common to both files, interspersed with groups of differing lines called hunks. Comparing two identical files yields one sequence of common lines and no hunks, because no lines differ. Comparing two entirely different files yields no common lines and one large hunk that contains all lines of both files. In general, there are many ways to match up lines between two given files. diff tries to minimize the total hunk size by finding large sequences of common lines interspersed with small hunks of differing lines.

For example, suppose the file F contains the three lines ‘a’, ‘b’, ‘c’, and the file G contains the same three lines in reverse order ‘c’, ‘b’, ‘a’. If diff finds the line ‘c’ as common, then the command ‘diff F G’ produces this output:

     1,2d0
     < a
     < b
     3a2,3
     > b
     > a

But if diff notices the common line ‘b’ instead, it produces this output:

     1c1
     < a
     ---
     > c
     3c3
     < c
     ---
     > a

It is also possible to find ‘a’ as the common line. diff does not always find an optimal matching between the files; it takes shortcuts to run faster. But its output is usually close to the shortest possible. You can adjust this tradeoff with the --minimal (-d) option

diff Output Formats

diff has several mutually exclusive options for output format. The following sections describe each format, illustrating how diff reports the differences between two sample input files.

Two Sample Input Files

Here are two sample files that we will use in numerous examples to illustrate the output of diff and how various options can change it.

This is the file lao:

     The Way that can be told of is not the eternal Way;
     The name that can be named is not the eternal name.
     The Nameless is the origin of Heaven and Earth;
     The Named is the mother of all things.
     Therefore let there always be non-being,
       so we may see their subtlety,
     And let there always be being,
       so we may see their outcome.
     The two are the same,
     But after they are produced,
       they have different names.

This is the file tzu:

     The Nameless is the origin of Heaven and Earth;
     The named is the mother of all things.
     
     Therefore let there always be non-being,
       so we may see their subtlety,
     And let there always be being,
       so we may see their outcome.
     The two are the same,
     But after they are produced,
       they have different names.
     They both may be called deep and profound.
     Deeper and more profound,
     The door of all subtleties!

In this example, the first hunk contains just the first two lines of lao, the second hunk contains the fourth line of lao opposing the second and third lines of tzu, and the last hunk contains just the last three lines of tzu.

Showing Differences in Their Context

Usually, when you are looking at the differences between files, you will also want to see the parts of the files near the lines that differ, to help you understand exactly what has changed. These nearby parts of the files are called the context.

GNU diff provides two output formats that show context around the differing lines: context format and unified format. It can optionally show in which function or section of the file the differing lines are found.

If you are distributing new versions of files to other people in the form of diff output, you should use one of the output formats that show context so that they can apply the diffs even if they have made small changes of their own to the files. patch can apply the diffs in this case by searching in the files for the lines of context around the differing lines; if those lines are actually a few lines away from where the diff says they are, patch can adjust the line numbers accordingly and still apply the diff correctly.

Context Format

The context output format shows several lines of context around the lines that differ. It is the standard format for distributing updates to source code.

To select this output format, use the --context[=lines] (-C lines) or -c option. The argument lines that some of these options take is the number of lines of context to show. If you do not specify lines, it defaults to three. For proper operation, patch typically needs at least two lines of context.

An Example of Context Format

Here is the output of ‘diff -c lao tzu’. Notice that up to three lines that are not different are shown around each line that is different; they are the context lines. Also notice that the first two hunks have run together, because their contents overlap.

     *** lao	2002-02-21 23:30:39.942229878 -0800
     --- tzu	2002-02-21 23:30:50.442260588 -0800
     ***************
     *** 1,7 ****
     - The Way that can be told of is not the eternal Way;
     - The name that can be named is not the eternal name.
       The Nameless is the origin of Heaven and Earth;
     ! The Named is the mother of all things.
       Therefore let there always be non-being,
         so we may see their subtlety,
       And let there always be being,
     --- 1,6 ----
       The Nameless is the origin of Heaven and Earth;
     ! The named is the mother of all things.
     ! 
       Therefore let there always be non-being,
         so we may see their subtlety,
       And let there always be being,
     ***************
     *** 9,11 ****
     --- 8,13 ----
       The two are the same,
       But after they are produced,
         they have different names.
     + They both may be called deep and profound.
     + Deeper and more profound,
     + The door of all subtleties!
An Example of Context Format with Less Context

Here is the output of ‘diff -C 1 lao tzu’. Notice that at most one context line is reported here.

     *** lao	2002-02-21 23:30:39.942229878 -0800
     --- tzu	2002-02-21 23:30:50.442260588 -0800
     ***************
     *** 1,5 ****
     - The Way that can be told of is not the eternal Way;
     - The name that can be named is not the eternal name.
       The Nameless is the origin of Heaven and Earth;
     ! The Named is the mother of all things.
       Therefore let there always be non-being,
     --- 1,4 ----
       The Nameless is the origin of Heaven and Earth;
     ! The named is the mother of all things.
     ! 
       Therefore let there always be non-being,
     ***************
     *** 11 ****
     --- 10,13 ----
         they have different names.
     + They both may be called deep and profound.
     + Deeper and more profound,
     + The door of all subtleties!
Detailed Description of Context Format

The context output format starts with a two-line header, which looks like this:

    *** from-file from-file-modification-time
    --- to-file to-file-modification time

The time stamp normally looks like ‘2002-02-21 23:30:39.942229878 -0800’ to indicate the date, time with fractional seconds, and time zone in Internet RFC 2822 format. (The fractional seconds are omitted on hosts that do not support fractional time stamps.) However, a traditional time stamp like ‘Thu Feb 21 23:30:39 2002’ is used if the LC_TIME locale category is either ‘C’ or ‘POSIX’.

You can change the header's content with the --label=label option;

Next come one or more hunks of differences; each hunk shows one area where the files differ. Context format hunks look like this:

     ***************
     *** from-file-line-numbers ****
       from-file-line
       from-file-line...
     --- to-file-line-numbers ----
       to-file-line
       to-file-line...

If a hunk contains two or more lines, its line numbers look like ‘start,end’. Otherwise only its end line number appears. An empty hunk is considered to end at the line that precedes the hunk.

The lines of context around the lines that differ start with two space characters. The lines that differ between the two files start with one of the following indicator characters, followed by a space character:

‘!’

   A line that is part of a group of one or more lines that changed between the two files. There is a corresponding group of lines marked 
   with ‘!’ in the part of this hunk for the other file.

‘+’

   An “inserted” line in the second file that corresponds to nothing in the first file.

‘-’

   A “deleted” line in the first file that corresponds to nothing in the second file. 

If all of the changes in a hunk are insertions, the lines of from-file are omitted. If all of the changes are deletions, the lines of to-file are omitted.

Unified Format

The unified output format is a variation on the context format that is more compact because it omits redundant context lines. To select this output format, use the --unified[=lines] (-U lines), or -u option. The argument lines is the number of lines of context to show. When it is not given, it defaults to three.

At present, only GNU diff can produce this format and only GNU patch can automatically apply diffs in this format. For proper operation, patch typically needs at least three lines of context.

An Example of Unified Format

Here is the output of the command ‘diff -u lao tzu’ (see Sample diff Input, for the complete contents of the two files):

     --- lao	2002-02-21 23:30:39.942229878 -0800
     +++ tzu	2002-02-21 23:30:50.442260588 -0800
     @@ -1,7 +1,6 @@
     -The Way that can be told of is not the eternal Way;
     -The name that can be named is not the eternal name.
      The Nameless is the origin of Heaven and Earth;
     -The Named is the mother of all things.
     +The named is the mother of all things.
     +
      Therefore let there always be non-being,
        so we may see their subtlety,
      And let there always be being,
     @@ -9,3 +8,6 @@
      The two are the same,
      But after they are produced,
        they have different names.
     +They both may be called deep and profound.
     +Deeper and more profound,
     +The door of all subtleties!
Detailed Description of Unified Format

The unified output format starts with a two-line header, which looks like this:

     --- from-file from-file-modification-time
     +++ to-file to-file-modification-time

The time stamp looks like ‘2002-02-21 23:30:39.942229878 -0800’ to indicate the date, time with fractional seconds, and time zone. The fractional seconds are omitted on hosts that do not support fractional time stamps.

You can change the header's content with the --label=label option; see See Alternate Names.

Next come one or more hunks of differences; each hunk shows one area where the files differ. Unified format hunks look like this:

     @@ from-file-line-numbers to-file-line-numbers @@
      line-from-either-file
      line-from-either-file...

If a hunk contains just one line, only its start line number appears. Otherwise its line numbers look like ‘start,count’. An empty hunk is considered to start at the line that follows the hunk.

If a hunk and its context contain two or more lines, its line numbers look like ‘start,count’. Otherwise only its end line number appears. An empty hunk is considered to end at the line that precedes the hunk.

The lines common to both files begin with a space character. The lines that actually differ between the two files have one of the following indicator characters in the left print column:

‘+’

   A line was added here to the first file.

‘-’

   A line was removed here from the first file.

Showing Differences Side by Side

diff can produce a side by side difference listing of two files. The files are listed in two columns with a gutter between them. The gutter contains one of the following markers:

white space

   The corresponding lines are in common. That is, either the lines are identical, or the difference is ignored because of one of the --ignore options 

‘|’

   The corresponding lines differ, and they are either both complete or both incomplete.

‘<’

   The files differ and only the first file contains the line.

‘>’

   The files differ and only the second file contains the line.

‘(’

   Only the first file contains the line, but the difference is ignored.

‘)’

   Only the second file contains the line, but the difference is ignored.

‘\’

   The corresponding lines differ, and only the first line is incomplete.

‘/’

   The corresponding lines differ, and only the second line is incomplete. 

Normally, an output line is incomplete if and only if the lines that it contains are incomplete; However, when an output line represents two differing lines, one might be incomplete while the other is not. In this case, the output line is complete, but its the gutter is marked ‘\’ if the first line is incomplete, ‘/’ if the second line is.

Side by side format is sometimes easiest to read, but it has limitations. It generates much wider output than usual, and truncates lines that are too long to fit. Also, it relies on lining up output more heavily than usual, so its output looks particularly bad if you use varying width fonts, nonstandard tab stops, or nonprinting characters.

Controlling Side by Side Format

The --side-by-side (-y) option selects side by side format. Because side by side output lines contain two input lines, the output is wider than usual: normally 130 print columns, which can fit onto a traditional printer line. You can set the width of the output with the --width=columns (-W columns) option. The output is split into two halves of equal width, separated by a small gutter to mark differences; the right half is aligned to a tab stop so that tabs line up. Input lines that are too long to fit in half of an output line are truncated for output.

The --left-column option prints only the left column of two common lines. The --suppress-common-lines option suppresses common lines entirely.

An Example of Side by Side Format

Here is the output of the command ‘diff -y -W 72 lao tzu’.

     The Way that can be told of is n   <
     The name that can be named is no   <
     The Nameless is the origin of He        The Nameless is the origin of He
     The Named is the mother of all t   |    The named is the mother of all t
                                        >
     Therefore let there always be no        Therefore let there always be no
       so we may see their subtlety,           so we may see their subtlety,
     And let there always be being,          And let there always be being,
       so we may see their outcome.            so we may see their outcome.
     The two are the same,                   The two are the same,
     But after they are produced,            But after they are produced,
       they have different names.              they have different names.
                                        >    They both may be called deep and
                                        >    Deeper and more profound,
                                        >    The door of all subtleties!

Showing Differences Without Context

The “normal” diff output format shows each hunk of differences without any surrounding context. Sometimes such output is the clearest way to see how lines have changed, without the clutter of nearby unchanged lines (although you can get similar results with the context or unified formats by using 0 lines of context). However, this format is no longer widely used for sending out patches; for that purpose, the context format (see Context Format) and the unified format (see Unified Format) are superior. Normal format is the default for compatibility with older versions of diff and the POSIX standard. Use the --normal option to select this output format explicitly.

An Example of Normal Format

Here is the output of the command ‘diff lao tzu’. Notice that it shows only the lines that are different between the two files.

     1,2d0
     < The Way that can be told of is not the eternal Way;
     < The name that can be named is not the eternal name.
     4c2,3
     < The Named is the mother of all things.
     ---
     > The named is the mother of all things.
     > 
     11a11,13
     > They both may be called deep and profound.
     > Deeper and more profound,
     > The door of all subtleties!

Detailed Description of Normal Format

The normal output format consists of one or more hunks of differences; each hunk shows one area where the files differ. Normal format hunks look like this:

     change-command
     < from-file-line
     < from-file-line...
     ---
     > to-file-line
     > to-file-line...

There are three types of change commands. Each consists of a line number or comma-separated range of lines in the first file, a single character indicating the kind of change to make, and a line number or comma-separated range of lines in the second file. All line numbers are the original line numbers in each file. The types of change commands are:

‘lar’

   Add the lines in range r of the second file after line l of the first file. For example, ‘8a12,15’ means append lines 12–15 of file 2 
   after line 8 of file 1; or, if changing file 2 into file 1, delete lines 12–15 of file 2.

‘fct’

   Replace the lines in range f of the first file with lines in range t of the second file. This is like a combined add and delete, but 
   more compact. For example, ‘5,7c8,10’ means change lines 5–7 of file 1 to read as lines 8–10 of file 2; or, if changing file 2 into file
   1, change lines 8–10 of file 2 to read as lines 5–7 of file 1.

‘rdl’

   Delete the lines in range r from the first file; line l is where they would have appeared in the second file had they not been deleted. 
   For example, ‘5,7d3’ means delete lines 5–7 of file 1; or, if changing file 2 into file 1, append lines 5–7 of file 1 after line 3 of 
   file 2.

Creating Patch

  • Considering the two simple files, mentioned above are saved as lao & tzu & you are in the directory containing the files, the following command creates the patch file named *lao_tzu.patch*
$ diff -c lao tzu > lao_tzu.patch
  • Now assume that this patch file is sent as a fix to the original author/ team of the software, they would apply the change or will have an automated sytem in place that would apply the change using the patch utility.
  • To have a feel of it, copy tao --> tao.orig and then issue the following command
$ patch lao < lao_tzu.patch
  • Open the file lao & tzu & both will be have same contents.

Applying Patch

The 'p' directive

In real time a patch applies to more than one files and directories. In such situation comes the “a bit tricky” p directive. To better understand what this actually specifies,

  • imagine that you want to apply a patch to a a list of files under the directory “dir1″.
  • Then, this “dir1″ directory contains another “dir2″ directory that the patch has to patch as well.
  • The person that created the patch did that under his/her own path names.
  • Therefore, if your working directory is on “dir2″, the patch will not be able to recognize that it has to start to patch from the previous directory.
  • Thus, the general rule is to apply the patch to your top level directory.
  • When you are at that one, a p1 switch is just ok.
  • If you were in “dir2″, for instance, a p2 would be needed.
$ patch -p1 lao < lao_tzu.patch

Removing patch

$ patch -p1 -R < lao_tzu.patch

Create a Whole Directory Patch

  • The first thing is to create a local backup of the whole directory that includes the files that are about to be edited.
  • Then, after you finish with your changes, you just make a recursive patch using,
$ diff -urN dir.orig/ dir/ > dir.patch
  • Apply it to the whole directory as shown above, simply by making it your working directory and specifying -p1 as the p parameter
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox
Print/export