MSCFOSS/DIF122/Software Development Practices/Unit IV/Package Management
A package management system, also called package manager, is a collection of software tools to automate the process of installing, upgrading, configuring, and removing software packages for a computer's operating system in a consistent manner. It typically maintains a database of software dependencies and version information to prevent software mismatches and missing prerequisites.
Packages are distributions of software, applications and data. Packages also contain metadata, such as the software's name, description of its purpose, version number, vendor, checksum, and a list of dependencies necessary for the software to run properly. Upon installation, metadata is stored in a local package database.
Package management systems are designed to save organizations time and money through remote administration and software distribution technology that eliminate the need for manual installs and updates. This can be particularly useful for large enterprises whose operating systems are based on Linux and other Unix-like systems, typically consisting of hundreds or even thousands of distinct software packages; in the former case, a package management system is a convenience, in the latter case it becomes essential.
Package management systems are charged with the task of organizing all of the packages installed on a system. Typical functions of a package management system include:
- Verifying file checksums to ensure correct and complete packages;
- Verifying digital signatures to authenticate the origin of packages;
- Applying file archivers to manage encapsulated files;
- Upgrading software with latest versions, typically from a software repository;
- Grouping of packages by function to reduce user confusion;
- Managing dependencies to ensure a package is installed with all packages it requires. This resolved the problem known as Dependency Hell.
Some additional challenges are met by only a few package management systems.
How Repositories Work
A repository consists of at least one directory with some DEB packages in it, and two special files: Packages.gz for the binary packages, and Sources.gz for the source packages.
If your repository is listed correctly in sources.list (more on that later), apt-get will fetch the Packages.gz index if the binary packages are listed (with the deb keyword) and Sources.gz if the sources are listed (with the deb-src keyword).
Packages.gz contains the name, version, size, the short and the long description, and the dependencies of each package, plus some additional information which is not of interest for us. All that information is listed (and used by) the Debian package managers such as dselect or aptitude.
Sources.gz contains the name, version and the build dependencies (the packages needed to build) of each package (plus some information which is not of interest for us, too); that information is used by apt-get source and similar tools.
There's an optional Release file containing some informations about your repository; that is used for Pinning, an interesting trick I won't go into in this document. You can read more about pinning in the APT HOWTO.
Thus, once you have set up your repository, you can list and install all of your packages together with the ones in Debian; if you update a package, it'll be upgraded when the user runs apt-get update && apt-get upgrade; and every user will be able to easily see a short description and other important information about your packages.
But there's more to it. If created properly, repositories can offer different packages for each supported distribution and for each of the (currently eleven) supported architectures; apt will automatically fetch the right one for a user's machine, without him even having to know about all the different architectures. It also allows you to group your packages into components, just as Debian's packages are divided into main, non-free and contrib. So, especially if your software is cross-platform, you'll love package repositories.
How to Set Up a Repository
There are two types of repositories: more complex ones where the user has only to specify the base path to the repository, the distribution and the components he wants (apt will automatically fetch the ones for the right architecture, if available), and simpler ones where the user has to specify an exact path (and apt will do no magic to find out which packages are the right ones). The former are a bit more complex to set up, but easier to use, and should always be used for complex and/or cross platform repositories; the latter are simpler to set up, but should only be used for small or single-architecture repositories.
Although it is not really correct, I'll call the former Automatic Repositories and the latter Trivial Repositories.
The directory structure of an automatic repository with the standard Debian architectures and components looks like this:
Example 1. A Standard Debian Repository
(your repository root) | +-dists | |-stable | |-main | | |-binary-alpha | | |-binary-arm | | |-binary-... | | +-source | |-contrib | | |-binary-alpha | | |-binary-arm | | |-binary-... | | +-source | +-non-free | |-binary-alpha | |-binary-arm | |-binary-... | +-source | |-testing | |-main | | |-binary-alpha | | |-binary-arm | | |-binary-... | | +-source | |-contrib | | |-binary-alpha | | |-binary-arm | | |-binary-... | | +-source | +-non-free | |-binary-alpha | |-binary-arm | |-binary-... | +-source | +-unstable |-main | |-binary-alpha | |-binary-arm | |-binary-... | +-source |-contrib | |-binary-alpha | |-binary-arm | |-binary-... | +-source +-non-free |-binary-alpha |-binary-arm |-binary-... +-source
The free packages go into main; the non-free ones into non-free, and the free ones which depend on non-free ones into contrib. Debian currently supports 11 architectures; I've omitted most of them for the sake of brevity.
Each binary-* directory contains a Packages.gz and an optional Release file; each source directory contains a Sources.gz and an optional Release file. Notice that the packages do not have to be in the same directory as the index files, because the index files contain paths to the individual packages; in fact, they could be anywhere else in the repository. This makes it possible to create pools.
You are free to create as many distributions and components and to call them as you wish; the ones I used in the example are just the ones used in Debian. You could, for example, create the distributions current and beta (instead of stable, testing and unstable), and the components foo, bar, baz and qux (instead of main, contrib and non-free).
While you are free to call the components as you want, it is generally a good idea to use the standard Debian distributions, because that's what Debian users expect.
Trivial repositories consist of one root directory and of as many subdirectories as you wish. As the users have to specify the path to the root of the repository and the relative path between the root and the directory with the index files in it, you are free to do whatever you want (even to put everything into the root of the repository; then, the relative path will be simply “/”).
Example 2. A Trivial Repository with Two Subdirectories
(your repository root) | |-binary
Creating the Index Files
dpkg-scanpackages generates the Packages file and dpkg-scansources the Sources file.
They both send their output to stdout; thus, to generate compressed files, you can use a command chain like this one: dpkg-scanpackages arguments | gzip -9c > Packages.gz.
The two tools work the same way; they both take two arguments (in reality there are more, but I won't go into that here; you can read the manpages if you want to know more); the first the directory under which the packages are, and the second is the override file. We don't need override files for simple repositories, but as it is a required argument, we simply pass /dev/null.
dpkg-scanpackages scans the .deb packages; dpkg-scansources scans the .dsc files. It is thus necessary to put the .orig.gz, .diff.gz and .dsc files together. The .changes files are not needed.
Thus, if you have a trivial repository such as the one from Example 2, “A Trivial Repository with Two Subdirectories”, you can create the two index files as follows:
$ cd my-repository $ dpkg-scanpackages binary /dev/null | gzip -9c > binary/Packages.gz $ dpkg-scansources source /dev/null | gzip -9c > source/Sources.gz
If you have a repository as complex as the one in Example 1, “A Standard Debian Repository”, you'll have to write some scripts to automate this process.
Creating the Release files
If you want to enable the users of your repository to use Pinning with your repository, you must include a Release file in every directory containing an index file. (You can read more about pinning in the APT HOWTO).
The Release files are simple and short text files of the following form:
Archive: archive Component: component Origin: YourCompany Label: YourCompany Debian repository Architecture: architecture
Archive: The name of the distribution of Debian the packages in this directory belong to (or are designed for), i.e. stable, testing or unstable. Component: The component of the packages in the directory, for example main, non-free, or contrib. Origin: The name of who made the packages. Label: Some label adequate for the packages or for your repository. Use your fantasy. Architecture: The architecture of the packages in this directory, such as i386, sparc or source.
It is important to get Archive and Architecture right, as they're most used for pinning. The others are less important.
With automatic repositories, distributing the packages in the different directories will quickly lead to an unmanageable beast. It is also a waste of space and bandwidth, as there are many packages (for example documentation packages) which are the same for all architectures.
In these cases, a possible solution is a pool. A pool is an additional directory under the repository root containing all packages (the binaries for all architectures, distributions, and components, and all the sources). Through a smart combination of override files (which are not covered in this document) and of scripts many problems can be avoided. A nice example of a pooled repository is the Debian repository itself.
Pools are only useful for big repositories; I've never made one and I don't think I'll need to in the near future, and that's why I don't explain how to make one here. If you think that such a section should be added, feel free to write one, and contact me.
There are various tools to automate and ease the creation of Debian archives; I've listed the most notable of them here.
apt-ftparchive is used to move a collection of Debian package files into a proper archive hierarchy as is used in the official Debian archive. (sic. This is not correct now at least in 2009. It merely creates index files which can be used by APT clients.) It is part of the apt-utils package.
apt-move is used to move a collection of Debian package files into a proper archive hierarchy as is used in the official Debian archive.
Using a Repository
Using a repository is very simple, but it depends on what type of repository you have made: binary or source, and automatic or trivial.
Each repository gets one line in sources.list; for a binary one, you use the deb command, and for a source one a deb-src command.
Each line has the following syntax:
deb|deb-src uri distribution [component1] [component2] [...]
The uri is the URI of the root of the repository, such as ftp://ftp.yoursite.com/debian, http://yoursite.com/debian, or, for local files, file::///home/joe/my-debian-repository. The trailing slash is optional.
For automatic repositories, you must specify one distribution and one or more components; the distribution must not end in a slash.
Example 3. Two Automatic Repositories from my sources.list
deb ftp://sunsite.cnlab-switch.ch/mirror/debian/ unstable main contrib non-free deb-src ftp://sunsite.cnlab-switch.ch/mirror/debian/ unstable main contrib non-free
These two lines specify an automatic binary and source repository with root ftp://sunsite.cnlab-switch.ch/mirror/debian/, the distribution unstable and the components main, contrib and non-free.
If the repository is not automatic, then the distribution specifies the relative path to the index files and must end with a slash, and no components may be specified.
Example 4. Two Trivial Repositories from my sources.list
deb file:///home/aisotton/rep-exact binary/ deb-src file:///home/aisotton/rep-exact source/
The first of these two lines specifies a binary repository in /home/aisotton/rep-exact/binary on my local machine; the second specifies a source repository in /home/aisotton/rep-exact/source.