From Amachu
Jump to: navigation, search


XML Introduction

  • XML - eXtensible Markup Language
  • Markup Language - a language used to annotate text to make explicit the interpretation of the text, useful for computer/human processing.
  • It is not very clear from the following text, what the text represents.
Abin 9444107528 23783120
Sarma 9444101103 23718291
Sameer 9444193010 20403030
  • The same data marked up using XML.
    <name> Abin </name>
    <mobile> 9444107528 </mobile>
    <landl> 23783120 </landl>
    <name> Sarma </name>
    <mobile> 9444101103 </mobile>
    <landl> 23718291 </landl>
    <name> Sameer </name>
    <mobile> 9444193010 </mobile>
    <landl> 20403030 </landl>
  • The annotations in angle brackets are called tags.
  • Markup Languages were primarily developed for document and database publishing.
  • XML is only a set of rules that says how a markup language should look like.
  • The advantage is that the same XML parser can be used for all these XML based markup languages.
  • As such XML does not define any tags by itself. The user can define a set of tags for an application and use it in his XML document.
  • This is why XML is called "Extensible" Markup Language.
  • Examples of XML applications are:
    • XML RPC
    • RSS
    • XHTML
    • Docbook
  • For each application there is Document Type Defintion (DTD), that describes
    • What tags are available
    • How these tags can be nested
    • What attributes are available for each tag

Docbook Overview

  • Docbook was originally intended for technical documentation related to computer hardware and software.
  • Docbook is used in several key open source projects including the GNOME, KDE, FreeBSD and the Linux kernel.
  • Docbook is NOT used to describe how the content should look like.
  • Docbook is used to describe the meaning of the content. For example, rather than explaining how a source code listing is to be visually formatted, Docbook simply says that a particular block of text is a source code listing. It is up to an external processing tool to decide what font is to be used, whether the code is to be syntax highlighted, etc.
  • A set of tools is used to convert Docbook, to a presentation format like HTML, RTF, PDF, Man pages, voice, etc.
  • The transformation is done using XSLT. XSLT is a language used to describe how one XML document is to be converted into another XML or human-readable documents.
  • An XSLT processor reads an XSLT stylesheet and transforms the input XML file. Examples of XSLT processors are
    • xsltproc — from the GNOME project
    • xalan — from the Apache XML project
    • saxon — by Michael Kay
  • A bunch of stylesheets has been made available by the Docbook XSL project, maintained by Norman Walsh.
                                 Docbook XSL
                     Docbook XML --> xsltproc --> HTML Output
  • Docbook advantages:
    • Content is separated out from the presentation. Technical writers can focus on the content, and the appearance will be taken care by the stylesheets.
    • Organization wide uniformity of document appearance — title pages, headers, footers, typo-graphy, ...
    • Old documents can be easily re-generated to reflect changes in the stylesheet.
    • Multiple output format for print, web, voice, etc.
  • Disadvantages
    • Learning curve is steeper
    • Need to have experience to do it fast
      • Hundreds of tags

Packages to install

  • docbook-xml - standard XML documentation system, for software and systems
  • docbook-xsl - stylesheets for processing DocBook XML files to various output
  • xsltproc - XSLT command line processor
  • fop - XML to PDF Translator

Docbook XML

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
    <title>Docbook Demo</title>
    <title>Hello, world</title>
     <para>This is my first DocBook file.</para>
  • To transform using xsltproc
xsltproc -o <output-file> <stylesheet> <docbook-xml>
  • Example invocation to transform to HTML.
$ STYLESHEET=/usr/share/xml/docbook/stylesheet/nwalsh/html/docbook.xsl
$ xsltproc -o hello.html $STYLESHEET hello.xml
  • Document Type Declaration, is an instruction that associates the XML file with a DTD.
<!DOCTYPE ‚ [1]
article ƒ [2]
PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" „[3]
"/usr/share/xml/docbook/schema/dtd/4.4/docbookx.dtd" …[4]
    • [1] - Tells the processor that we are about to choose the DTD.
    • [2] - Specifies the root element — book or article
    • [3] - Specifies the DTD to use, with its Formal Public Indentifier
    • [4] - Path of the DTD on the local system.
  • The <articleinfo> tag is used to provide meta information about the document like the title, author, revision, date, etc ...
  • <section> starts a new section, and the section title is specified by the containing <title> tag.
  • Paragraphs within a section are enclosed within <para>.
  • Itemized list
 <listitem><para> Fruits </para></listitem>
 <listitem><para> Vegetables </para></listitem>
 <listitem><para> Spices </para></listitem>
  • 3 column table
  <title>Mouse Mileage</title>
   <tgroup cols="3">
     <entry>Feet Traveled</entry>


  • XSLT is capable for tranforming one XML tree into another tree or a simple text format.
  • XSLT cannot be used to directly generate PDFs, for one PDFs are not XML files.
  • Enter XSL-FO, an XML application that describes how data will be presented to the reader — font size, colors, line spacing, page margins, headers and footers.
  • XSL-FO can be later converted to PDF using FO processor. One such FO processor is Apache FOP.
$ fop -pdf out.pdf


  • Save the content below as mscfoss.xml
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "/usr/share/xml/docbook/schema/dtd/4.4/docbookx.dtd">
  <title>M Sc (CS-FOSS) Online Program</title>
  <para>It is the only degree-oriented program of a leading Indian technological university focused sharply on Free/Open Source Software 
(FOSS) that has revolutionised the field of computing the world over. In the process of becoming a CS Professional through this program, 
one also acquires mastery over a wide range of products, tools, technologies and approaches thrown up by the FOSS movement that dominate 
the global SW/IT Industry today.</para>
  • Transform using xsltproc

xsltproc -o <output-file> <stylesheet> <docbook-xml>

$ STYLESHEET=/usr/share/xml/docbook/stylesheet/nwalsh/html/docbook.xsl
$ xsltproc -o hello.html $STYLESHEET hello.xml
  • Convert to pdf
$ fop -c /usr/share/doc/fop/fop.xconf -xml mscfoss.xml -xsl /usr/share/xml/docbook/stylesheet/docbook-xsl/fo/docbook.xsl -param Times-Roman -param Times-Roman -pdf mscfoss.pdf
Personal tools