MSCFOSS/DIF122/Software Development Practices/Unit IV/Revision Control Systems SVN
Version Control Basics
A version control system (or revision control system) is a system that tracks incremental versions (or revisions) of files and, in some cases, directories over time. Of course, merely tracking the various versions of a user's (or group of users') files and directories isn't very interesting in itself. What makes a version control system useful is the fact that it allows you to explore the changes which resulted in each of those versions and facilitates the arbitrary recall of the same.
In this section, we'll introduce some fairly high-level version control system components and concepts. We'll limit our discussion to modern version control systems—in today's interconnected world, there is very little point in acknowledging version control systems which cannot operate across wide-area networks.
At the core of the version control system is a repository, which is the central store of that system's data. The repository usually stores information in the form of a filesystem tree—a hierarchy of files and directories. Any number of clients connect to the repository, and then read or write to these files. By writing data, a client makes the information available to others; by reading data, the client receives information from others. Figure below illustrates this.
Why is this interesting? So far, this sounds like the definition of a typical file server. And indeed, the repository is a kind of file server, but it's not your usual breed. What makes the repository special is that as the files in the repository are changed, the repository remembers each version of those files.
When a client reads data from the repository, it normally sees only the latest version of the filesystem tree. But what makes a version control client interesting is that it also has the ability to request previous states of the filesystem from the repository. A version control client can ask historical questions such as “What did this directory contain last Wednesday?” and “Who was the last person to change this file, and what changes did he make?” These are the sorts of questions that are at the heart of any version control system.
The Working Copy
A version control system's value comes from the fact that it tracks versions of files and directories, but the rest of the software universe doesn't operate on “versions of files and directories”. Most software programs understand how to operate only on a single version of a specific type of file. So how does a version control user interact with an abstract—and, often, remote—repository full of multiple versions of various files in a concrete fashion? How does his or her word processing software, presentation software, source code editor, web design software, or some other program—all of which trade in the currency of simple data files—get access to such files? The answer is found in the version control construct known as a working copy.
A working copy is, quite literally, a local copy of a particular version of a user's VCS-managed data upon which that user is free to work. Working copies appear to other software just as any other local directory full of files, so those programs don't have to be “version-control-aware” in order to read from and write to that data. The task of managing the working copy and communicating changes made to its contents to and from the repository falls squarely to the version control system's client software.
If the primary mission of a version control system is to track the various versions of digital information over time, a very close secondary mission in any modern version control system is to enable collaborative editing and sharing of that data. But different systems use different strategies to achieve this. It's important to understand these different strategies, for a couple of reasons. First, it will help you compare and contrast existing version control systems, in case you encounter other systems similar to Subversion. Beyond that, it will also help you make more effective use of Subversion, since Subversion itself supports a couple of different ways of working.
The problem of file sharing
All version control systems have to solve the same fundamental problem: how will the system allow users to share information, but prevent them from accidentally stepping on each other's feet? It's all too easy for users to accidentally overwrite each other's changes in the repository.
Consider the scenario shown in Figure below. Suppose we have two coworkers, Harry and Sally. They each decide to edit the same repository file at the same time. If Harry saves his changes to the repository first, it's possible that (a few moments later) Sally could accidentally overwrite them with her own new version of the file. While Harry's version of the file won't be lost forever (because the system remembers every change), any changes Harry made won't be present in Sally's newer version of the file, because she never saw Harry's changes to begin with. Harry's work is still effectively lost—or at least missing from the latest version of the file—and probably by accident. This is definitely a situation we want to avoid!
The lock-modify-unlock solution
Many version control systems use a lock-modify-unlock model to address the problem of many authors clobbering each other's work. In this model, the repository allows only one person to change a file at a time. This exclusivity policy is managed using locks. Harry must “lock” a file before he can begin making changes to it. If Harry has locked a file, Sally cannot also lock it, and therefore cannot make any changes to that file. All she can do is read the file and wait for Harry to finish his changes and release his lock. After Harry unlocks the file, Sally can take her turn by locking and editing the file. Figure below demonstrates this simple solution.
The problem with the lock-modify-unlock model is that it's a bit restrictive and often becomes a roadblock for users:
- Locking may cause administrative problems. Sometimes Harry will lock a file and then forget about it. Meanwhile, because Sally is still waiting to edit the file, her hands are tied. And then Harry goes on vacation. Now Sally has to get an administrator to release Harry's lock. The situation ends up causing a lot of unnecessary delay and wasted time.
- Locking may cause unnecessary serialization. What if Harry is editing the beginning of a text file, and Sally simply wants to edit the end of the same file? These changes don't overlap at all. They could easily edit the file simultaneously, and no great harm would come, assuming the changes were properly merged together. There's no need for them to take turns in this situation.
- Locking may create a false sense of security. Suppose Harry locks and edits file A, while Sally simultaneously locks and edits file B. But what if A and B depend on one another, and the changes made to each are semantically incompatible? Suddenly A and B don't work together anymore. The locking system was powerless to prevent the problem—yet it somehow provided a false sense of security. It's easy for Harry and Sally to imagine that by locking files, each is beginning a safe, insulated task, and thus they need not bother discussing their incompatible changes early on. Locking often becomes a substitute for real communication.
The copy-modify-merge solution
Subversion, CVS, and many other version control systems use a copy-modify-merge model as an alternative to locking. In this model, each user's client contacts the project repository and creates a personal working copy. Users then work simultaneously and independently, modifying their private copies. Finally, the private copies are merged together into a new, final version. The version control system often assists with the merging, but ultimately, a human being is responsible for making it happen correctly.
Here's an example. Say that Harry and Sally each create working copies of the same project, copied from the repository. They work concurrently and make changes to the same file A within their copies. Sally saves her changes to the repository first. When Harry attempts to save his changes later, the repository informs him that his file A is out of date. In other words, file A in the repository has somehow changed since he last copied it. So Harry asks his client to merge any new changes from the repository into his working copy of file A. Chances are that Sally's changes don't overlap with his own; once he has both sets of changes integrated, he saves his working copy back to the repository. The two figures that follows show this process.
But what if Sally's changes do overlap with Harry's changes? What then? This situation is called a conflict, and it's usually not much of a problem. When Harry asks his client to merge the latest repository changes into his working copy, his copy of file A is somehow flagged as being in a state of conflict: he'll be able to see both sets of conflicting changes and manually choose between them. Note that software can't automatically resolve conflicts; only humans are capable of understanding and making the necessary intelligent choices. Once Harry has manually resolved the overlapping changes—perhaps after a discussion with Sally—he can safely save the merged file back to the repository.
The copy-modify-merge model may sound a bit chaotic, but in practice, it runs extremely smoothly. Users can work in parallel, never waiting for one another. When they work on the same files, it turns out that most of their concurrent changes don't overlap at all; conflicts are infrequent. And the amount of time it takes to resolve conflicts is usually far less than the time lost by a locking system.
In the end, it all comes down to one critical factor: user communication. When users communicate poorly, both syntactic and semantic conflicts increase. No system can force users to communicate perfectly, and no system can detect semantic conflicts. So there's no point in being lulled into a false sense of security that a locking system will somehow prevent conflicts; in practice, locking seems to inhibit productivity more than anything else.
Subversion is an open source version control system. Using Subversion, you can record the history of source files and documents. It manages files and directories over time. A tree of files is placed into a central repository. The repository is much like an ordinary file server, except that it remembers every change ever made to files and directories.
Subversion is in the Ubuntu's main repository, so to install Subversion you can simply install the subversion package.
This step assumes you have installed above mentioned packages on your system. This section explains how to create SVN repository and access the project.
There are several typical places to put a Subversion repository; most common places are: /srv/svn, /usr/local/svn and /home/svn. For clarity's sake, we'll assume we are putting the Subversion repository in /home/svn, and your project's name is simply 'myproject'
There are also several common ways to set permissions on your repository. However, this area is the most common source of errors in installation, so we will cover it thoroughly. Typically, you should choose to create a new group called 'subversion' that will own the repository directory.
- Choose System > Administration > Users and Groups from your Ubuntu menu.
- Click the 'Manage Groups' button.
- Click the 'Add' button.
- Name the group 'subversion'
- Add yourself and www-data (the Apache user) as users to this group
- Click 'OK', then click 'Close' twice to commit your changes and exit the app.
You have to logout and login again before you are a member of the subversion group, and can do check ins.
Now issue the following commands:
$ sudo mkdir /home/svn $ cd /home/svn $ sudo mkdir myproject
The SVN repository can be created using the following command:
$ sudo svnadmin create /home/svn/myproject
And use the following commands to correct file permissions:
$ cd /home/svn $ sudo chown -R www-data:subversion myproject $ sudo chmod -R g+rws myproject
The last command sets gid for proper permissions on all new files added to your Subversion repository.
If you want to use WebDAV as an access method described below, repeat the chmod -R g+rws myproject command again. This is because svnadmin will create directories and files without group write access. This is no problem for read only access or using the custom svn protocol but when Apache tries to commit changes to the repository linux will deny it access. Also the owner and group are set as root. This can be changed by repeating the chown and chgrp commands listed above.
Subversion repositories can be accessed (checkout) through many different methods-on local disk, or through various network protocols. A repository location, however, is always a URL. Given below are the two commond methods.
- file:/// - direct repository access (on local disk)
- http:// - Access via WebDAV protocol to Subversion-aware Apache 2 web server
Direct repository access (file://)
This is the simplest of all access methods. It does not require any SVN server process to be running. This access method is used to access SVN from the same machine. The syntax is as follows:
$ svn co file:///home/svn/myproject or $ svn co file://localhost/home/svn/myproject
NOTE: Please note, if you do not specify the hostname, you must use three forward slashes (///). If you specify the hostname, you must use two forward slashes (//).
The repository permission is dependant on filesystem permission. If the user has read/write permission, he can checkout/commit the changes to the repository. If you set permissions as above, you can give new users the ability to checkout/commit by simply adding them to the Subversion group you added above.
Access via WebDAV protocol (http://)
To access the SVN repository via WebDAV protocol, you must configure your Apache 2 web server.
First install the following package libapache2-svn.
NOTE: If you have already tried to install the "dav" modules manually, package installation may fail. Simply remove all files beginning with "dav" from the mods-enabled directory, then remove and install the package again. Let the package put files in the correct place, then edit your configuration.
You must add the following snippet in your /etc/apache2/mods-available/dav_svn.conf file:
<Location /svn/myproject> DAV svn SVNPath /home/svn/myproject AuthType Basic AuthName "myproject subversion repository" AuthUserFile /etc/subversion/passwd <LimitExcept GET PROPFIND OPTIONS REPORT> Require valid-user </LimitExcept> </Location>
NOTE: The above configuration assumes that all Subversion repositories are available under /home/svn directory.
TIP: If you want the ability to browse all projects on this repository by going to the root url (http://www.serveraddress.com/svn) use the following in dav_svn.conf instead of the previous listing:
<Location /svn> DAV svn SVNParentPath /home/svn SVNListParentPath On AuthType Basic AuthName "Subversion Repository" AuthUserFile /etc/subversion/passwd <LimitExcept GET PROPFIND OPTIONS REPORT> Require valid-user </LimitExcept> </Location>
NOTE: The above configuration allows anonymous read access to the repositories. If you want to require authentication with any of the users specified within the file to access the repository remove the lines <LimitExcept ...> and </LimitExcept> (i.e. leave only the "Require valid-user" line). The previous listing would become:
<Location /svn> DAV svn SVNParentPath /home/svn SVNListParentPath On AuthType Basic AuthName "Subversion Repository" AuthUserFile /etc/subversion/passwd Require valid-user </Location>
NOTE: If a client tries to svn update which involves updating many files, the update request might result in an error Server sent unexpected return value (413 Request Entity Too Large) in response to REPORT request, because the size of the update request exceeds the limit allowed by the server. You can avoid this error by disabling the request size limit by adding the line LimitXMLRequestBody 0 between the <Location...> and </Location> lines.
Alternatively, you can allow svn access on a per-site basis. This is done by adding the previous snippet into the desired site configuration file located in /etc/apache2/sites-available/ directory.
Once you add the above lines, you must restart apache2 web server. To restart apache2 web server, you can run the following command:
sudo /etc/init.d/apache2 restart
Next, you must create /etc/subversion/passwd file. This file contains user authentication details.
If you have just installed SVN, the passwd file will not yet exist and needs to be created using the "-c" switch. Adding any users after that should be done without the "-c" switch to avoid overwriting the passwd file.
To add the first entry, ie.. to add the first user, you can run the following command:
sudo htpasswd -c /etc/subversion/passwd user_name
It prompts you to enter the password. Once you enter the password, the user is added.
To add more users after that, you can run the following command:
sudo htpasswd /etc/subversion/passwd second_user_name
If you are uncertain whether the passwd file exists, running the command below will tell you whether the file already exists:
Now, to access the repository you can run the following command:
$ svn co http://hostname/svn/myproject myproject --username user_name
It prompts you to enter the password. You must enter the password configured using htpasswd command. Once it is authenticated the project is checked out. If you encounter access denied, please remember to logout and login again for your memebership of the subversion user-group to take effect.