Documentation for txt2docbook
Thomas Weber
* Purpose
This program reads a ascii document and converts it into a valid docbook xml file.
** Why would one need this?
Docbook is a really cool format to write complex technical documents. On the other hand its to
complicated to use it for rather simple papers because one has to write to much !overhead! to get a nice
formated result of his work.
With this small tool, you can write a ordinary _README.TXT_ like file (following certain simple rules),
convert it to xml and send it through one ore more stylesheets to publish it.
Additionally its possible to use a differtent backend module to generate other output formats.
This is a bit odd because of the fact that one of the strenghts of docbook, or xml in general, is the
possibility to convert it easily to various formats by applying a xsl stylesheet. For special purposes, however,
its feasible to use the !shortcut! way through the perl backend.
** But it does not support a special feature i need!
By using this converter, you can also add any valid docbook tag into the 'source' file.
This way you are not limited to the elements it supports. Instead, you can use the full
power of docbook, freed of the nasty routine work (tagging sections, paragraphs, lists)
You can also extend/customize this program very easily to your personal needs.
** Its such a simple idea, is there no other tool like this?
In my search for a solution to write well formated papers, i found only one program.
It is named 'APT-Convert' and its available under GPL from http://www.xmlmind.com/aptconvert.html.
However, it did not satisfy all my needs, so i started to write my own converter. The syntax of
the ascii files is slightly inspired by the APT ("Almost Plain Text) format, though.
* Usage
** Requirements
All you need to run this tool is _perl_. No special modules are used.
However, to get your final document, you need to install and configure the _docbook.dtd_ , certain
stylesheets and a XSLT-processor. Installing these tools is out of the scope of this guide. Please refer to
to http://www.docbook.org for more information.
** Configuration
Before you begin to use _txt2docbook_, you have to set the public identifier for the docbook DTD
in the file _output.pl_.
$SYSTEMIDENTIFIER="/your/path/to/docbookx.dtd";
or:
$SYSTEMIDENTIFIER="http://your.host/dtds/docbookx.dtd";
It is possible to omit the public identifier by commenting it out (not recommended).
A XML validator can't check the XML file if there is no identifier available!
Some XSLT-Processors will also refuse to transform the file.
';
]]>
** Converting
txt2docbook [inputfile]
The program parses the inputfile and sends the resulting XML to _STDOUT_ . Hence you can send it into a file
or pipe it right into your XML processor.
By setting the -o switch, you can use a alternative output module. The module must reside in the path of the main
script and it needs to follow this naming rule: output_FORMATNAME.pl (i.e. output_html.pl, output_foo.pl).
Thus
txt2docbook -o html [inputfile]
will generate a html file from of your source document.
* Syntax
As sayed before, the ascii format has to follow some syntax rules in order to get correctly converted to
a XML file. The current implementation supports the following elements:
[paragraphs] To write the text
[sections] Split your document into several parts
[item lists] Simple list with bullets.
[variable lists] Term + explanation of the term.
This block uses a _varlist_ , for example.
[markup] Markup words
[tags] Usage of docbook tags
In the next sections, you can learn more about using the several features.
** The document head
This converter generates always a docbook !article! document. A article needs a title. To set it, the converter
uses !the very first line of text! of the input file. So the first line is not a section but the title of the
document. In addition, the 2. line of the source file gets converted to the author first- and surname tags. If you
don't want to set the author name, simply leave the 2. line of your document empty.
** Paragraphs
The basic building blocks of a text are paragraphs. You start a new paragraph by closing the previous one
with a empty line after the last line of text.
First paragraph
multilined
still goes on here
next paragraph
** Sections
You can use two different methods to
split your text into several sections.
*** Asterisk (*) marker
* section level 1
** section level 2
*** section level 3
* next level 1 section
** not indented level 2 section
If you choose this method, you are freed of counting the section numbers. Moving
section is effortless. A apropriate stylesheet will generate the section numbers later.
*** Dotted numbers
1. section level 1
1.2 section level 2
1.3. also section level 2
1.3.1 section level 3
2. next level 1 section
2.1. indenting is not needed
You mark a new section by using numbers as the first columns of a line. By using dots between the
numbers, you can denote subsections. It is no difference whether you use a trailing dot or not.
Using the numbers to mark section is a advantage for small _README_ like documents. Both the ascii and
docbook version will have sectionnumbers.
** Lists
You begin a list by using one of several list item markers. See the following example:
This is text
Valid list markers are -, o, +, =
- the first list item
- another list item
- 3rd list item
goes on in this line
This is text (=list end)
o also a list item
- you can even mix the markers
+ item
= item
A list ends with the first empty line after a item is not followed by another item.
** Varlists
DocBook uses a tag called _varlist_ to markup term-description lists. The basic usage is similiar to
a list block, so you begin the _varlist_ with the first _varlist item_ and end it by a
empty line followed by another block.
This is text
[term] Description of the term,
can have multiple lines
[next term]
Description can start also in the next line
Text continues.
** Text markup
Two basic tags of DocBook are supported by this version:
- emphasis (!....!)
- filename (_...._)
A text with a !very important! message in it.
Some text with a filename like _/usr/bin/perl_ in it.
If you use Urls (xxxx://xxx.xxx.xxx) in your text, you'll find corresponding _ULINK_ tags in
the converted document.
** Tags
As sayed before: you can use arbitrary docbook tags in your text. However,
there are two modes how you can do this.
The first way is to simply write the tags into your text (easy, huh?).
By writing a !single! tag into a line, you can turn off the converter for
all the lines until the corresponding closing ( /...) tag gets readed.
This way, you can make use of _programlisting_ and other "do-not-format" blocks.
** Includes
You can include any other ascii file into the current one.
To do so, use the following syntax:
_txt2docbook_ will continue by converting the new file into the output
document and returns to the parent file when finished.
The include depth is not limited.
* Customization
As soon as you know some basic stuff about the architecture of this program, it should be very easy for you to
extend/customize it.
The tool consists of only 3 parts:
- _txt2docbook.pl_
- _blocks.pm_
- _output.pl_
All 3 parts have to reside in the same directory.
*** _txt2docbook.pl_
This is the !main! part of the program. It reads the source file, parses it line by line and starts
new !blocks! as needed.
*** _blocks.pm_
For each supported docbook element, you'll find one perl class in here. Most of the hard word (i.e. deciding
whether a block can be closed by a certain other block) is done here. Touching this file is only needed if you
want to write a completely new feature.
*** _output.pl_
When a block gets started or closed, it calls the appropriate function in this sourcefile. All the
formating of the output file is done here (including urls and markup abbreviations).
Thus it will be your primary playground if you want to change the resulting tags of a parsed file.
By writing a new _output.pl_ , you can even change the output format
from docbook-xml to whatever you want.
*** _output_html.pl_
Use this alternative backend to generate html instead of xml. Don't expect a fancy design of the output file at
this time.
* License
&LICENSE.txt;