The source of this document is available on gitlab.
Last version: 2019-03-28

From text files to lightweight markup languages

Table of contents

Text files and text editors

A more technical as well as less circular (!) definition of a text file can be found on the dedicated Wikipedia article. For more details on text editors you can also check the corresponding Wikipedia page.

A word processor software is more sophisticated than a simple text editor; it can do more and can therefore also open and work with text files. When we write "do more" we mean here work on the final layout of the document. Some text editors like Emacs or Vim provide programming aid, interaction with some of the other software installed on the machine (e.g. Python, =R), etc. giving them a quasi "Swiss army knife" status; their users can spend days or weeks without having to use a word processor (some even never use a word processor).

Be careful: the native format of the files generated by word processors is rarely a text format. doc, docx and odt files are not text files.

The case of the PDF file opened with a text editor

In the sequence a PDF file opened with a text editor is shown in order to demonstrate that such a file cannot not be properly visualized with this kind of software, a "PDF specific" software is required such as Adobe Reader, Evince, MuPDF, Aperçu,… You can nevertheless see that the beginning of the file contains readable characters (readable by a text editor), the first line tells us that the file uses version 1.3 of the PDF format. This early part in text format contains metadata that are not shown by a visualization software like Adobe Reader. These metadata follow (partly) the XMP (Extensible Metadata Platform) format; we will come back to it in the fifth sequence of this module.

On UTF-8

A table of UTF-8 symbols can be found at: http://www.utf8-chartable.de/. It is handy to insert uncommon symbols like the "TLO": Ꮰ of Cherokee, or the mathematical symbol ∀, "for all".

If you are often using Greek letters (for equations for instance),it is possible with Linux to redefine some keys combinations to generate quickly these letters. These key combinations are defined in the .XCompose file; the beginning of my .XCompose file contains:

# On charge la base de donnée de Compose la plus complète en UTF-8
 include "/usr/share/X11/locale/en_US.UTF-8/Compose"
 # espace insécable fine
 <Multi_key> <Multi_key> <Space> : " " U202F
 # Lettres greques
 <Multi_key> <space> <a> : "α"  Greek_alpha
 <Multi_key> <space> <A> : "Α"  Greek_ALPHA
 <Multi_key> <space> <b> : "β"  Greek_beta
 <Multi_key> <space> <B> : "Β"  Greek_BETA
 <Multi_key> <space> <g> : "γ"  Greek_gamma
 <Multi_key> <space> <G> : "Γ"  Greek_GAMMA
 <Multi_key> <space> <d> : "δ"  Greek_delta
 <Multi_key> <space> <D> : "Δ"  Greek_DELTA
 <Multi_key> <space> <e> : "ε"  Greek_epsilon
 <Multi_key> <space> <E> : "Ε"  Greek_EPSILON
 <Multi_key> <space> <z> : "ζ"  Greek_zeta
 <Multi_key> <space> <Z> : "Ζ"  Greek_ZETA
 <Multi_key> <space> <h> : "η"  Greek_eta

On TinyTex

Yihui Xie, author of the great "bookdown" R package, has developed a "light LaTeX" version: TinyTex ("A lightweight, cross-platform, portable, and easy-to-maintain LaTeX distribution based on TeX Live").