Document Conversion

Documents are submitted to us in many different formats. Hence, we have extensive experience with conversions of all kinds in different formats, and we have created a wide-ranging library of conversion scripts. In this section, we will give you a short introduction to our method of conversion and we will provide you with an overview of formats that are most common in our work.

Perl scripts

For document conversion, we use self-written scripts in the scripting-language Perl. Perl is a programming language that was developed by Larry Wall in the late eighties and is known as the Swiss army knife of programming languages due to its versatility. As Perl possesses many possibilities to execute pattern recognition and replacement (by means of so-called regular expressions), the language is perfectly suitable for document conversion. Another advantage of Perl is the fact that it supports Unicode.

RTF and Word

Modern word processors can save files as Rich Text Format (RTF). This format is developed by Microsoft to facilitate file exchange. A file that is saved as RTF contains the complete text of the original file in plain ASCII-text and also so-called control-codes that mark the presence of layout information such as footnotes, pictures, cross-references and index encoding. RTF is a well structured and documented format, which makes it very suitable for conversion to other formats.

First of all, we have extensive experience with the translation of RTF to TeX, since most of the books we typeset have been submitted in MS Word format. MS Word itself can save files such as RTF. Afterwards we convert them with the Perl scripts to documents legible for TeX. Aside from the Word format we can also convert less common formats, such as WordPerfect, by using RTF. In this conversion process, we manually determine which layout information is relevant for the final result, in other words, which control codes need to be preserved. For example, information about the type area and baseline skip is generally useless, while italics and the settings of non-Latin fonts are relevant. Using this method, the conversion to TeX results in ‘clean’ files: only the relevant information is left.

Apart from conversion from RTF, we also have experience with the conversion to RTF. After we typeset your text in TeX, we can usually convert the TeX-file back to RTF. As modern word processors cannot only save files as RTF but also read them, the text is available again in the format of the word processor in which it was written. This means that your text is available in your own word processor the way it was printed. This service is especially valuable when there have been multiple corrections to the proofs.

XML

Another format that we count as one of our field of expertise is XML (eXtensible Markup Language). XML is a mark-up language for encoding data and their mutual relations. XML resembles HTML to some extent, but it is characterized by a broader setup. Therefore, XML is able to describe many kinds of information, varying from library catalogues to transcriptions of handwritings. We are highly familiar with converting XML to TeX and vice versa. For other kinds of documents, XML uses different recipes that describe the structure of the documents, the so-called DTD’s (Document Type Definition). One such DTD that is of great importance to the scientific world is TEI (developed by the Text Encoding Initiative). For instance, we use TEI-XML in the Digital Locke Project.

Databases

Sometimes the structure of the information is so complex - for instance when it has a large number of cross-references - that it is hard to manage it in a text file, since text files generally have a linear composition. Complex information, therefore, is better off in a non-linear database with mutually related tables. We usually use FileMaker Pro for such projects. FileMaker distinguishes itself from more advanced database management systems, such as MySQL, by the speed with which complex database structures can be developed and by its comfortable user interface. We are experienced in converting FileMaker-data to TeX format, so that the information in the database can be printed as a catalogue, a bibliography or a critical edition. Examples of such conversions are, amongst others, the Stoa project and A Cellist’s Companion.