Typesetting Markdown – Part 8: Annotations

This part in the series explores pandoc’s annotation Markdown extension, as well as creating themes using ConTeXt.

Preamble

What’s in store (take them all):

Tools you’ll need:

What to do:

  1. Rename $HOME/dev/writing/book to something else, if it exists.
  2. Unzip book.zip inside $HOME/dev/writing.
  3. Move $HOME/dev/writing/book/build-template into $HOME/bin.
  4. Unzip themes.zip inside $HOME/dev/writing/book.
  5. Run $HOME/dev/writing/book/build to generate books.
  6. Open novel.pdf and enjoy!

Confirm the book directory hierarchy under $HOME/dev/writing matches:

book
├── standard-ebooks
│   ├── jekyll-hyde
│   ├── les-miserables
│   ├── pride-prejudice
│   └── wuthering-heights

Be sure to review any shell scripts you download before running them.

Example run:

cd $HOME/dev/writing/book
./build -V -c -y -i jekyll-hyde -d standard-ebooks/jekyll-hyde \
-b robert-louis-stevenson_the-strange-case-of-dr-jekyll-and-mr-hyde

Read on to learn more about typesetting poems using annotations, or skip to Novel Content for an in-depth review about how to typeset Standard Ebook novels using annotations.

Introduction

Part 7 described typesetting math and using variables to change parameters to equations. This part visits the interplay between an extended Markdown syntax, parsing text using Lua, and defining ConTeXt macros that can be formatted thematically. We’ll review professionally produced book interiors then craft our own themes to various documents inspired from existing layouts.

Separating what you want to communicate–the content or data model—from how you want it shown—the presentation—is a powerful abstraction.

The Treachery of Images

René Magritte’s iconic painting, Ceci n’est pas une pipe, embodies what humans have known since the dawn of civilization: words and pictures are not themselves the items to which they refer. Rather, they are symbolic representations of the item. We can think of the word pipe as a general model to describe a broad category of possible forms.

Doing with Images makes Symbols

Dr. Alan Kay’s Doing with Images makes Symbols depicts numerous systems that hide complex models behind simple, clever graphical user interfaces. In the first graphical user interface Dr. Kay reviews, the mathematical model defines the content and the transformations applied to each instance define how they are displayed.

CSS Zen Garden

Side-stepping the functional CSS versus semantic CSS debate, Dave Shea’s CSS Zen Garden opened developers’ eyes to scores of possible themes crafted by changing only the presentation layer: the CSS. No changes to the model—the HTML source—are necessary.

Impacts Project

For my Impacts Project, different photobook publishers had different technical requirements. Some publishers required using backround images that spanned two pages while others needed separate pages altogether. By writing the content in pandoc-flavoured Markdown then applying different themes using ConTeXt, multiple layout variations were produced without having to maintain two copies of the prose. Separation was possible by annotating parts of the text, semantically.

Annotations

Annotations are labels that authors can specify to classify blocks of text.

Supercentenarians may recall how Frost’s Fire and Ice was presented, which will suffice as output we want to achieve:

Fire and Ice by Robert Frost

To demonstrate separating content from presentation, our goal is to take an unformatted, annotated Markdown document and generate a document that resembles the original formatting from 1920.

Here’s the Markdown for Fire and Ice, classified with a poem annotation:

::: poem
Some say the world will end in fire,
Some say in ice.
From what I’ve tasted of desire
I hold with those who favor fire.
But if it had to perish twice,
I think I know enough of hate
To say that for destruction ice
Is also great,
And would suffice.
:::

Notice how the first and last lines have three full colons (:::). The colons instruct pandoc to mark everything in between as a document fragment annotated with a poem class. Save the above as poem.md, then run pandoc on the file, preserving the line breaks:

pandoc -t html -f markdown --wrap=preserve poem.md

The output begins with an HTML element that includes the poem class:

<div class="poem">

For ConTeXt, we’d expect the output to resemble:

\startpoem

or the LaTeX output to be:

\begin{poem}

But, at time of writing, pandoc does not honour the classes for TeX output because it would result in code that cannot be interpreted by a TeX engine. (There’s no technical reason that prevents pandoc from offering a well-documented option that permits generating “broken” TeX documents. By the time an author understands TeX macros, they probably know how to fix them. I digress.)

Rather, ConTeXt and LaTeX start with the poem’s first line.

LaTeX users can install the pandoc-latex-environment filter and its dependencies to side-step the issue; however, ConTeXt users must write a script to resolve the problem. Suddenly, a rabbit hole named Lua appears.

Lua

Lua is a scripting language that pandoc can use when processing documents. To see how it works, first create a new book directory:

cd $HOME/writing
mv book book.old.01
mkdir book
cd book

Move or copy poem.md into the book directory. Create a new file named classify.lua, in the same directory, having the following contents:

function Div( element )
  local annotation = element.classes:find_if( matches )
  
  if annotation then
    annotation = annotation:gsub( "[^%w]*", "" )

    return {
      context( "\\start", annotation ),
      element,
      context( "\\stop", annotation )
    }
  end
end

function matches( s )
  return s:match( "^%a+" )
end

function context( macro, annotation )
  return pandoc.RawBlock( "context", macro .. annotation )
end

In structured document parsing lingo, an element is a named container that may hold content, including other containers. Breaking down the above Lua code, the first line declares a function that is called whenever pandoc encounters a div element in the source document:

function Div( element )

Pandoc interprets triple colons (:::) as div elements. Every div may have associated classes, also called annotations. Continuing, the following line finds all classes for each div that match a particular pattern:

local annotation = element.classes:find_if( matches )

The pattern is defined by the return line of the matches function, which checks whether the annotation begins (^) with at least one (+) upper- or lowercase alphabetic ASCII character (%a; apologies to anyone reading who would like to use UTF-8 characters in their annotation names):

return s:match( "^%a+" )

If an annotation is found that matches the given criteria, then the set ([, ]) of all (*) alphanumeric characters (%w) that do not match (^) the annotation’s name are replaced with the empty string (""):

annotation = annotation:gsub( "[^%w]*", "" )

The script then wraps the \start and \stop ConTeXt macros around the matching element’s entire content using the annotation’s name:

context( "\\start", annotation ),
element,
context( "\\stop", annotation )

Calling our context function requests that pandoc generate a raw text block using the macro prefix and annotation name. Providing both the macro prefix and the extracted annotation name tells pandoc to write \startpoem (or \stoppoem) to the generated document:

return pandoc.RawBlock( "context", macro .. annotation )

Invoke pandoc to verify that the filter produces the expected macros:

pandoc --lua-filter classify.lua -t context \
  -f markdown --wrap=preserve -o poem.tex poem.md
cat poem.tex

At last, we’ve reached the rabbit hole’s den:

\startpoem
  % ... Removed for brevity.
\stoppoem

We can now export custom Markdown annotations as ConTeXt macros.

Style

The poem has a few problems to solve:

Uppercasing

Using an ornate first character at the start of text is sometimes called a drop cap or lettrine. ConTeXt has a lettrine module, but using it means marking up the poem text with the \lettrine macro: mixing presentation logic with the plain source document.

ConTeXt, also at time of writing, did not have a font feature available to change words to uppercase. Graciously, folks from the mailing list offered a solution.

Create a new file named uppercase.tex in the book directory having the following contents:

\startluacode
  fonts.handlers.otf.addfeature {
    name    = "uppercasing",
    type    = "substitution",
    prepend = true,
    data    = characters.uccodes
  }
\stopluacode

\definefontfeature[uppercasing][uppercasing=yes]

\unexpanded\def\uppercasing{\feature[+]{uppercasing}}

The code adds a new font feature called uppercasing to OpenType Fonts (OTF). At some point in the future the above code (along with lowercasing) may be added to the ConTeXt distribution.

Line Counting

Frost’s poem has lines with a variety of indentation levels. ConTeXt does not appear to count individual lines within setuplines environments. This means we’ll need to create a counter to track what line of the poem is about to be printed. From there we can adjust the depth of indentation. Start by defining a counter:

\definecounter[PoemLineCount][way=bytext]

Using way=bytext prevents the counter from resetting to zero on new pages. The Poem prefix combined with TitleCase helps avoid inadvertant name collisions.

Counters can be set (e.g., initialized to zero), incremented, and retrieved as follows, respectively:

\setnumber[PoemLineCount][0]
\incrementcounter[PoemLineCount]
\rawcountervalue[PoemLineCount]

Presenting

With counters in mind, create a new file named style.tex in the same book directory containing the following text:

\setupinitial[n=2, distance=-2pt]

\definealternativestyle[poem:Uppercase][{\tfa\bf\uppercasing}][]

\definefirstline[poem:FirstLine][%
  alternative=word,
  style=poem:Uppercase,
  n=1,
]

\definestartstop[poem][
  before={\startlines \setfirstline[poem:FirstLine] \placeinitial},
  after={\stoplines},
]

\definecounter[PoemLineCount][way=bytext]

\definedataset[PoemIndent]
\setdataset[PoemIndent][2][option={yes, 4.00em}]
\setdataset[PoemIndent][3][option={no,  0.00em}]
\setdataset[PoemIndent][5][option={yes, 0.75em}]
\def\PoemIndentOpt{%
  \datasetvariable
    {PoemIndent}{\rawcountervalue[PoemLineCount]}{option}%
}

\setuplines[
  indenting=odd,
  before={\setnumber[PoemLineCount][0]},
  command={%
    \incrementcounter[PoemLineCount]
    \setupindenting[\PoemIndentOpt]
  },
]

From the top, the first line configures how drop caps will be applied to the first letter of each text block that’s annotated with poem:

\setupinitial[n=2, distance=-2pt]

Passing n=2 sets the number of lines the initial letter will span, effectively controlling its height. The distance option allows pixel-perfect adjustment of the spacing between the lettrine and the second letter that follows.

Next up is registering an alternative style that we’ll use to configure how the first word of the first line will appear:

\definealternativestyle[poem:Uppercase][{\tfa\bf\uppercasing}][]

The alternative style is named poem:Uppercase and how the style temporarily affects how the font is defined by the following macros:

Finally, the cog of the lettrine’s wheel:

\definefirstline[poem:FirstLine][%
  alternative=word,
  style=poem:Uppercase,
  n=1,
]

As its name implies, the definefirstline environment allows changing how the first line of text appears. The alternative option can be either line to affect the entire line or word to affect only a given number of words. The style references the predefined alternative style for switching the font. Passing n=1 restricts the number of words affected to the first.

The pandoc filter—the Lua script—converted the leading ::: poem to \startpoem and the trailing ::: to \stoppoem. In ConTeXt, this start/stop pairing is an environment that can contain other commands. Recall that pandoc does not allow people to generate “broken” documents; pandoc has no assurance that the start/stop environment for a poem exists; our Lua script has created a “broken” document. So the poem environment must be defined somewhere, which is accomplished by the following definition:

\definestartstop[poem][
  before={\startlines \setfirstline[poem:FirstLine] \placeinitial},
  after={\stoplines},
]

Providing poem as the name creates both \startpoem and \stoppoem as macros that ConTeXt will now recognize. The before option defines the commands to run when \startpoem is encountered:

The counter line before the data set should look familiar:

\definecounter[PoemLineCount][way=bytext]

\definedataset[PoemIndent]
\setdataset[PoemIndent][2][option={yes, 4.00em}]
\setdataset[PoemIndent][3][option={no,  0.00em}]
\setdataset[PoemIndent][5][option={yes, 0.75em}]

To avoid duplicating conditional statements, we define a data set that associates keys with values. In this case, the keys are the line numbers—2, 3, and 5—and the values are the special indentation rules. We’ll cross-reference the keys using the PoemLineCount counter. As PoemLineCount changes value, ConTeXt will retrieve the rules for typesetting the indentation.

Next up, a macro is defined that looks up the indenting rules based on the current line count:

\def\PoemIndentOpt{%
  \datasetvariable
    {PoemIndent}{\rawcountervalue[PoemLineCount]}{option}%
}

The reason for using \def—not \define—is explained elsewhere.

Mimicking the original indentation from the 1920s magazine takes some doing, but with the foundation in place, the code can be expressed tersely:

\setuplines[
  indenting=odd,
  before={\setnumber[PoemLineCount][0]},
  command={%
    \incrementcounter[PoemLineCount]
    \setupindenting[\PoemIndentOpt]
  },
]

We use \setuplines to configure the \startlines and \stoplines from the poem’s environment definition. By default, we want most of the odd lines to be indented by a small amount (0.75em). Before every poem is exported to the document, the counter for each line in the poem is set to 0—this allows the same formatting to be reused for all such annotated poems; without resetting the counter, subsequent poems would not be indented as desired because the cross-referenced key (PoemLineCount) would continuously increment upwards.

Lastly, the command option sets up the macros to run for each line in the poem. Even though odd lines are indented by default (indenting=odd), the original poem’s indentation varies uniquely. These variations are captured by changing the indentation setup when certain line numbers are encountered. The 2nd line is indented fairly far, the 3rd line is not indented at all, the 4th line isn’t indented because it is even, and the remaining odd lines (5, 7, and 9) are all indented the same amount.

Importing

At this point we have the following files:

Create one more file named main.tex in the book directory. Paste the following commands into it:

\input uppercase
\input style

\starttext
  \input poem
\stoptext

When we get to theming, we’ll see how including files this way does not allow changing the document appearance easily.

Until then, run ConTeXt to generate main.pdf:

context main.tex

Original font face and minor microtypography issues notwithstanding, we have faithfully reproduced the poem as it was first published:

Poem Reproduction

In the next sections we’ll take a brief look at interior page design, review professionally crafted page layouts, and then find a Markdown book to style a few different ways.

Interior Book Design

Interior book design is a deep topic by itself (see my notes for a brief summary). A classic page layout—such as Tschichold’s page format that’s based on the golden ratio—resembles:

Classic Interior Layout

Fortunately, ConTeXt provides an excellent default page layout:

ConTeXt Default Output

ConTeXt applies the same general principles that publications from professional book designers follow. We’ll draw inspiration from samples crafted by a random selection of designers:

Examples from Sean Ford:

Less, Andrew Sean Greer
Eugene V. Debs, Paul Buhle, et al.

Examples from Maureen Cutajar:

Flight from Katama, D. S. Cooper
Cinderlla in Therapy, LeeAnna Neumeyer

Examples from Marraii Designs:

After all these Years, Mary J. Williams
Dying To Really Live, Duane Smith

Examples from David Moratto:

Food Fight Inc, Bruno J. Codispoti
The Goat in the Tree, Lorne Elliott

Novel Content

To make stylish themes for novels, it would be convenient to have novels to stylize. Although Project Gutenberg has an extensive library of free books, they are quite unsuitable for typesetting automatically. Rather, Standard Ebooks offer novels far more suitable typesetting in different themes.

The people at Standard Ebooks have carefully and painstakingly format-shifted plain text novels offered by Project Gutenberg into epub files. Using epub is a splendid choice because it allows editors to impart semantic meaning on the prose in a machine-readable format.

Converting ebub files to Markdown involved some yak shaving, described in Project Gutenberg Projects. We’ll borrow heavily from that effort.

Requirements

Install the software in the following sections to follow along.

System Applications

Download and install the following tools, if you haven’t already:

Set an environment variable named SAXON_JAR to the fully qualified path (directory plus file name) for saxon-he-10.0.jar. Substitute the version of the software that was downloaded, if different.

Ensure that both the XSLT processor and Git are available:

java -jar $SAXON_JAR
git --version

Git, Java, and Saxon-HE are installed.

Scripts and Stylesheets

Download the following file:

The archive contains the following files in a book directory:

Install the archive file as follows, making sure to substitute the appropriate path to where book.zip was downloaded:

cd $HOME/dev/writing
mv book book.old.02
unzip $HOME/book.zip
cd book
mkdir -p $HOME/bin
mv build-template $HOME/bin
chmod +x build

If you put build-template in a different directory, change the source line in the build script accordingly.

Typeset Novel

With files in place, typeset a novel as follows:

cd $HOME/dev/writing/book
./build -V

You should see:

[20:34:51.4627] Verify requirements
[20:34:51.4668] Change directory
[20:34:51.4686] Markdown: contents.md
[20:34:51.4701] Compile : ./main.tex
[20:34:51.4713] Contents: ./contents.tex
[20:34:51.4726] Output  : novel.pdf
[20:34:51.4738] Transform jane-austen_pride-and-prejudice into contents.md
[20:34:52.5015] Convert contents.md to ./contents.tex
[20:34:54.2729] Convert ./main.tex to novel.pdf
[20:35:05.8983] Postprocess

Open novel.pdf to reveal:

Pride and Prejudice, Page 1, Jane Austen

Pretty decent default settings.

If you don’t already have your own novel in Markdown to typeset, peruse the Standard Ebooks collection. For faster builds, use a shorter book such as Robert Louis Stevenson’s The Strange Case of Dr. Jekyll and Mr. Hyde:

./build -V -b \
  robert-louis-stevenson_the-strange-case-of-dr-jekyll-and-mr-hyde

Note:

Project Structure

Before defining styles for annotations or thematic styles, it helps to have a structure to organize them. Moreover, being able to define a global style for all documents that can be tweaked for specific documents is our objective. For example, we could style postal letters in a cursive font for older novels while modern books could be styled like electronic messages. Knowing that most public domain novels are rather old, all books would use the cursive font by default, and newer books would have to override that setting.

To accomplish this, keep the following definitions in mind:

Environment
Setups that control an aspect of a presentation’s style.
Component
Content to be rendered, such as a chapter.
Product
A combination of environments and components.
Project
A collection of environments applied to many products.

Another way to think of the relationships is depicted in the following figure:

Generic Project Structure

To make this concrete, consider the following example where all Standard Ebooks use the same table of contents (ToC) formatting, except for Jekyll and Hyde, which will format the ToC differently:

Specific Project Structure

Often a component represents a single chapter, title page, index, or other book matter. Doing so allows for fine-grained control over the product output. By keeping them together, we can simplify the actual local project structure by including the component in the product directly, as shown:

Actual Local Project Structure

Let’s see how this works.

Setup Generic Project Structure

Create new locations for project and product files:

mkdir -p $HOME/dev/writing/book/project/product

Inside the project directory create project.tex:

\startproject *
  \environment prj-para
\stopproject

The * instructs ConTeXt to use the file name—without the .tex extension—as the project name. This is a convenient short-cut, but take care to update references when renaming files.

Later we’ll have to resolve a paradoxical situation with respect to typesetting the title and author, but for now we can don horseblinds.

Continue by creating prj-para.tex in the project directory:

\startenvironment *
  \setupalign[flushleft]
  \setupinterlinespace[small]
\stopenvironment

Again, the * means to substitute the file name; the name provides the linkage to prj-para within project.tex, so it is important that the file name and enironment reference match.

Next, change to the product directory and then create product.tex:

\startproduct *
  \project project

  \environment prd-para

  \component contents
\stopproduct

Also create prd-para.tex in the same directory:

\startenvironment *
  \setupalign[normal]
  \setupinterlinespace[medium]
\stopenvironment

The product refers to the contents component. For now, either copy or link to the contents from Jekyll and Hyde into the product directory:

cd $HOME/dev/writing/book
./build -V -b \
  robert-louis-stevenson_the-strange-case-of-dr-jekyll-and-mr-hyde
cd project/product
ln -s ../../contents.tex .

That trailing period (.) at the end of the ln command instructs ln to create a symbolic link in the current working directory. The file will have the same name as the linked file.

Compile the product (projects cannot generate PDF files):

cd $HOME/dev/writing/book/project
context product/product.tex --path=product

Passing --path tells ConTeXt where to find the resources referenced by product.tex, namely prd-para.tex and contents.tex.

The ConTeXt-generated product.pdf file resembles:

Product with Aligned Text

What this implies is that the product environment is overriding the settings from the project environment. To prove that the project-wide settings are applied by default, remove the setups from prd-para.tex by changing the file to the following:

\startenvironment *
\stopenvironment

Re-run:

context product/product.tex --path=product

Open the file to see that the project-wide styles are applied by default:

Product with Ragged Right Text

Both the paragraph justification and interline space have changed.

We have a directory and file structure for organizing both global styles and styles that are applied to specific documents.

Break the Build

Recall that pandoc staunchly refuses to create “broken” TeX output, which meant writing a lua filter named classify.lua to transcode the ::: annotations into TeX macros. The build script has an option to run that lua filter. Try the following:

./build -V -c

ConTeXt can no longer generate the PDF: the build is broken. This is the problem that the pandoc has tried to avoid, breaking the typesetting engine:

[20:38:12.3324] Convert ./main.tex to novel.pdf
ConTeXt failed with exit code 1.
return {
 ["filename"]="contents.tex",
 ["lastcontext"]="\nl.1 \\startbooktitle\n                  ",
 ["lastluaerror"]="?",
 ["lasttexerror"]="! Undefined control sequence",
 ["linenumber"]=1,
 ["offset"]=10,
}[20:38:12.9635] Postprocess

At first glance, the error message looks inscrutable. Take note of the values for Last TeX Error (lasttexerror) and Last Context (lastcontext): they inform the reader that \startbooktitle is not a recognized macro (control sequence). That tracks because the revised lua filter has the following line:

return pandoc.RawBlock( "context", macro .. "book" .. annotation )

The "book" string was added to help avoid name collisions with existing macros, such as \starttitle and \stoptitle. Also, we don’t really know the entire list of possible annotations sprinkled throughout the epubs; someone could have used “text” as a classification, which would result in \starttext and \stoptext—and we definitely don’t want to redefine them! We just don’t know, at least not until the Markdown is generated.

Once a novel has been transcoded from XHTML into Markdown, the XSL templates export all known and unknown classes as annotations prefixed using :::. A short command chain lists the unique annotations:

grep "^::: " *md | sort | uniq | cut -d' ' -f2

Jekyll and Hyde’s annotations include:

author
footer
letter
postscript
signature
title
valediction

The annotations must have corresponding start/stop definitions for ConTeXt to recognize them (e.g., \startbookauthor and \stopbookauthor). We’ll come back to these when defining postal letter themes in prj-letter.tex.

Fix the Build

Ensuring that the annotated texts don’t break the build entails the following high-level steps:

Rename Files

Using project as a directory name doesn’t convey its purpose, nor does product spark much joy. Give them both new names that relate to their intention. For example, most of these books are coming from Standard Ebooks and the novel we’re typesetting is The Strange Case of Dr. Jekyll and Mr. Hyde, so the following names are more on-point:

standard-ebooks
├── standard-ebooks/jekyll-hyde
│   ├── standard-ebooks/jekyll-hyde/jekyll-hyde.tex
│   └── standard-ebooks/jekyll-hyde/prd-para.tex
├── standard-ebooks/prj-para.tex
└── standard-ebooks/standard-ebooks.tex

Update References

Once renamed, change standard-ebooks.tex and jekyll-hyde.tex to reference the appropriate project and product. That is, standard-ebooks.tex will use:

\product jekyll-hide

And jekyll-hyde.tex must now reference the new project name:

\project standard-ebooks

Rebuild to make sure everything still works:

cd $HOME/dev/writing/book/standard-ebooks
context jekyll-hyde/jekyll-hyde.tex --path=jekyll-hyde

If all went well, a new PDF file is produced.

Create Contents

Previously, we copied (or linked) the contents.tex file into the product directory. The build script has options to automate this task:

By default, the main build file is searched for in the current working directory (.) and its file name is main—the .tex extension is added by the script. To use the standard-ebooks styling for Jekyll and Hyde, run the following:

./build -V -i jekyll-hyde -d standard-ebooks/jekyll-hyde -b \
  robert-louis-stevenson_the-strange-case-of-dr-jekyll-and-mr-hyde

Remember that the -c argument instructs pandoc to run the classifier lua filter, meaning the following will also break the build:

./build -V -c -i jekyll-hyde -d standard-ebooks/jekyll-hyde -b \
  robert-louis-stevenson_the-strange-case-of-dr-jekyll-and-mr-hyde

Although the build is still broken, we can organize the setups.

Define Macros

Temporarily change standard-ebooks/prj-para.tex to fix the build:

\startenvironment *
  \definestartstop[bookauthor][]
  \definestartstop[bookfooter][]
  \definestartstop[bookletter][]
  \definestartstop[bookpostscript][]
  \definestartstop[booksignature][][]
  \definestartstop[booktitle][style=italic, color=darkblue]
  \definestartstop[bookvalediction][]
\stopenvironment

Re-run:

./build -V -c -i jekyll-hyde -d standard-ebooks/jekyll-hyde -b \
  robert-louis-stevenson_the-strange-case-of-dr-jekyll-and-mr-hyde 

Review novel.pdf:

Successful Build

The build is fixed. At last, we can move on to crafting a project theme.

Project Theme

Defining all the styles in a single file would be boorish. Instead, consider isolating different presentation aspects by their responsibilities:

body
Complete novel contents
colours
Constants and setups for colours
counters
Values for tallying
dimensions
Constants for common sizes
document
Meta information about the document
figures
How figures are loaded and presented
fonts
Font families and font features
front matter
Adds title page and table of contents
headers
Page headers and footers
headings
Volume, part, chapter, section, and subsection headings
layouts
Margins and other page layout settings
letters
Hand-written postal letters
lists
Enumerated and itemized lists
modules
Modules to enhance functionality
pages
Page numbering
paper
Document page and trim sizes
paragraphs
Interline space, indentation, alignment, drop-caps, etc.
penalties
Restrictions regarding widows and orphans
strings
Constants for common words, nouns, and phrases
tables
Tabular data
title
Title page (author name and book title)
toc
Table of contents
uppercase
Font feature to employ uppercase text

More document components can be added as needs arise. The above list covers elements common to many books. We’ll explore a few key points of each in turn; download the theme source files in advance to follow along.

Body

Quite often books will use Roman numerals for the front matter and Arabic numerals for the prose. To make this happen in ConText, the front matter must be distinguished from the body matter. In the XSL file, the book’s body is annotated with ::: body, which is later translated to a \startbookbody macro. Inside prj-body.tex, that \startbookbody and its associated \stopbookbody macro are mapped to macros that ConTeXt can parse, namely \startbodymatter and \stopbodymatter:

\let\startbookbody\startbodymatter
\let\stopbookbody\stopbodymatter

Colours

Create a consistent colour experience by defining colour constants. For demonstration purposes, we’ll use a two-colour and one-shade system comprised of a main colour, an accent colour, and neutrals. A couple of sites to help pick out colours are Colour Lovers (most loved) and Colormind (lock-in colours and let the computer generate the rest). After settling on a palette, update the prj-colour.tex file.

\definecolor[BookColourMain][h=0081C2]
\def\BookColourDk{0.5}
\definespotcolor[BookColourMainDk][BookColourMain][p=\BookColourDk]

\setupcolors[
  state=start,
  rgb=yes,
  textcolor=BookColourNeutralDk,
  pagecolormodel=auto,
]

The first line defines a new colour named BookColourMain with the hexadecimal colour code that represents its colour. ConTeXt has many ways to assign colour values, including definitions that are relative to other colours by using \definespotcolor where the p option can be used to lighten or darken the new colour.

Providing state=start ensures that colours show up in the document; setting textcolor=BookColourNeutralDk adjusts the book body text colour; and using pagecolormodel=auto (versus none) can sometimes help with transparency issues in PDF files.

My convention uses Dk or Lt as suffixes to denote darker or lighter colour variants defined for the theme.

Counters

This file has no setups because poems have been covered extensively. Refer to the poem counter described in the previous sections for details.

Dimensions

Constants in the prj-dimensions.tex file can control various measures that relate to the output document. General fiction, for example, has a common paper trim size of 5½ x 8½ inches—see my interior book design notes for additional standard sizes. Such measurements can be defined in a single location for later use via the \measure{} macro:

\definemeasure[BookPageTrimWidth][5.5in]
\definemeasure[BookPageTrimHeight][8.5in]

Document

Setting up the document gives us the opportunity to enable hyperlinks, configure colours for internal and external links, and assign both the title and author metadata. Accomplish these tasks as follows:

\enabledirectives[interaction.identity.preroll]
\enabledirectives[interaction.bookmarks.preroll]

\setupinteraction[
  state=start,
  color=BookColourNeutralDk,
  contrastcolor=BookColourAccent,
  title={\BookTitle},
  author={\BookAuthor},
]

Enabling those directives ensures that TeX symbols (such as ~, a non-breaking space) are replaced with their appropriate counterparts within the PDF file’s properties and bookmarks.

Using \BookTitle and \BookAuthor for the interaction options is one reason why a circular dependency is created, which we’ll resolve later.

Figures

Many books, especially technical ones, will have a variety of image types. Inside prj-figures.tex we can set the file extension order for image precedence. Here we see that a vector graphics file is preferred over any other file type that has the same name (but different extension):

\setupexternalfigures[
  order={svg,pdf,png,tif,jpg},
  location={default},
]

This means, for example, that if illustration.svg exists then it will be embedded in the document, despite illustration.jpg existing as well.

Fonts

Before beginning with fonts, download some to use in your book(s), like the following free fonts that pair well together:

Download and install them as follows:

  1. Click one of the above font links.

  2. Click Select this style on any available style (e.g., Thin 100).

  3. Click Download family.

  4. Save to a known directory, such as $HOME/archives.

  5. Repeat for the remaining font links.

  6. Install the files into a directory, such as $HOME/.fonts/ttf.

    mkdir -p $HOME/.fonts/ttf
    cd $HOME/.fonts/ttf
    unzip -o $HOME/archives/Montserrat.zip
    unzip -o $HOME/archives/Vollkorn_SC.zip
    unzip -o $HOME/archives/Vollkorn.zip
    unzip -o $HOME/archives/Rochester.zip
  7. Set the OSFONTDIR environment variable, such as (be mindful of the double slash–//–at the end, it informs the typesetting engine to also scan subdirectories for fonts):

    export OSFONTDIR="$OSFONTDIR;$HOME/.fonts//"
  8. Reload the fonts so that ConTeXt will find them:

    mtxrun --script fonts --reload
  9. Make sure the fonts can be found by listing them:

    mtxrun --script fonts --list --all | grep -i vollkorn

The fonts are installed. To learn more about how ConTeXt loads and uses fonts, readers are encouraged to browse the following documents:

Open prj-fonts.tex to review the key lines:

\definefontfamily[BookFontFace] [rm] [Vollkorn] [bf=name:vollkornscregular]
\definefontfamily[BookFontFace] [ss] [Montserrat]
\definefontfamily[BookFontFace] [hw] [Rochester]

\setupbodyfont[BookFontFace]

The first three lines map core font styles—serif (rm), sans serif (ss), and handwritten—to specific fonts. Official font names from the font files, such as Vollkorn, can be found by using the following command:

fc-scan $(fc-list | cut -d':' -f1 | grep -i vollk) | grep "family:"

where vollk could be monts or roch. Notice that bf typically means boldface but the value has been deliberately set to the small caps font, Vollkorn SC Regular. Although mtxrun does not show the official font name, we can list and use font identifiers as follows:

mtxrun --script fonts --list --all | grep -i vollk | cut -d' ' -f1

where vollk is in the font name. The output has vollkornscregular in the returned list of font identifiers. Most font names identifiers (or names) include sc to indicate small caps. Assigning the small caps font to the bold font face adheres to the Standard Ebooks specification.

Defining fonts this way is far simpler than described previously.

Front Matter

Front matter consists of a table of contents placed in prj-frontmatter.tex, to be styled in prj-toc.tex:

\definestructureconversionset [frontpart:pagenumber] [] [romannumerals]

\startfrontmatter
  \setuppagenumbering[location=bottom]

  \completecontent
\stopfrontmatter

The page numbering, only for the front matter, is set to the bottom. This demonstrates a way to differentiate front matter page numbers from prose page numbers, beyond the usual Roman versus Arabic. We also set the page numbers to Roman numerals for the front matter to stay classy.

Typically, colophon and copyright pages would also be included here.

Headers

Review the interior book design page to see common combinations for running headers. From the sample designs given previously, The Goat in the Tree had a nice touch: page numbers in square brackets, a little lighter than the rest of the text. That looks as follows:

\def\BookPageNumber{[~\userpagenumber~]}

\setupheader[color=BookColourNeutralLt]

\setupheadertexts
  [\BookTitle]
  [\BookPageNumber]
  [\BookPageNumber]
  [{\getmarking[section]}]

\setuppagenumbering[location=]

The tildes give some space between the square brackets and the page number. Note the special \userpagenumber macro, which will be defined later. Calling on \setupheader provides the opportunity to set various features of the headers as a whole at a single location in the code; in this case, changing the colour to something lighter than the prose.

Use \setupheadertexts to change the text supplied in the running header. You can think of the call as follows:

\setupheadertexts
  [Left Page, Left Side]
  [Left Page, Right Side]
  [Right Page, Left Side]
  [Right Page, Right Side]

These settings won’t be visible until double-sided mode is enabled.

Lastly, we need to clear out the default page numbering (location=) because we’ve customized how we want the numbers to look in the book.

Headings

For me, one of the more rewarding parts of interior book design with ConTeXt is styling the section headings—possibly because the setuphead command has so many features! By default novel chapters start on a right-hand page and the prose starts about a third of the way down the page. Take a look at prj-headings.tex:

\setuphead[section][
  page=BookSectionPageBreak,
  style=\tfa\ss,
  color=BookColourNeutral,
  header=empty,
  footer=BookFooterPageNumber,
  alternative=middle,
  numbercommand=\groupedcommand{}{\blank[\measure{BookSectionTitleDepth}]},
  after={\blank[\measure{BookSectionProseDepth}]},
  sectionconversionset=BookSectionNumerals,
]

Here’s how most of these options work:

Keep in mind that BookSectionNumerals is not yet defined. Any product that requires custom numerals must define them explicitly. Steps to do so are captured in the table of contents section.

Layouts

Layouts provide a lot of control over typesetting pages: margins, running header/footer height, amount of space for typesetting text, and more. The following figure names the major typesetting areas of a page in ConTeXt:

ConTeXt Page Layout

Defining layouts can prove useful when debugging page regions to determine how ConTeXt is laying out the page. Even without custom layouts, ConTeXt has a \showframe macro that will draw an outline for all the typeset frames on a page. We’ll limit our usage of layout setups to enforcing grid alignment, as follows:

\setuplayout[grid=yes]

When typesetting in a column environment, a grid layout can help ensure that the text across all the columns is aligned.

Letters

Earlier the following setups were added to fix the broken build:

\definestartstop[bookfooter][]
\definestartstop[bookletter][]
\definestartstop[bookpostscript][]
\definestartstop[booksignature][][]
\definestartstop[bookvalediction][]

These move into prj-letters.tex so that we can simulate a hand-written letter by applying the following styles:

In code, this resembles:

\definestartstop[bookletter][
  style=\hw,
  before={%
    \setupnarrower[
      left=\measure{BookLetterIndent},
      right=\measure{BookLetterIndent}
    ]
    \startnarrower[left, right]
      \setupalign[
        hz,
        hanging,
        nothyphenated,
        flushleft,
      ]
      \blank[big]
  },
  after={%
      \blank[big]
    \stopnarrower
  }%
]

Most of these options can be reasoned without much explanation. Sometimes using nothyphenated is insufficient to remove hyphens from the right-hand side altogether; so, hz tells the typesetting engine that slight horizontal stretching of characters is acceptable, and hanging permits punctuation to protrude (hang) a bit beyond the text’s width.

What’s important about these lines is that the \setupalign takes place inside the \startnarrower block so that it applies only to the letter. Moving \setupalign outside the narrower environment will affect all the text after the letter as well. (Yes, it is possible to add another \setupalign in the after= option, but that leads to maintaining the code in two places.)

Due to the way I prefer to format code, the % is necessary after the after, otherwise ConTeXt cannot parse the \definestartstop.

Lists

Here we find a simple setup for bullet lists, which won’t be belaboured:

\setupitemgroup[itemize][1][packed, autointro][

The individual items are packed together, meaning no vertical whitespace between them; using autointro attempts to prevent any widowed items.

Modules

ConTeXt has numerous modules that make configuring certain aspects of a book much easier. When installing or updating ConTeXt, all the modules can be installed by provided the --modules=all command line argument:

sh ./first-setup.sh --modules=all

Depending on where ConTeXt is installed, the module documentation will be installed within a subdirectory containing third-party files:

$CONTEXT_HOME/tex/texmf-modules/doc/context/third/...

For example, the fancy break module is documented under:

.../fancybreak/fancybreak-doc.pdf

Pages

Distinguishing the body matter from the front matter provides fine-grained control over the page numbers written into the document as well as the page numbers displayed in the PDF reader itself. In the following figure, the page number in the top-left shows i while the page number typeset into the document’s header also shows i:

Page Numbers

Different conceptual page numbers are applied to the document. First, the front matter is using the default page counter; we changed its appearance using the \definestructureconversionset macro. Second, is a “user page” concept, which is a counter initialized only for the body matter. Arguably, the counter could be defined in the prj-counter.tex file.

\startsectionblockenvironment [bodypart]
  \setcounter [userpage] [1]
\stopsectionblockenvironment

The value is retrieved using \userpagenumber, which we saw earlier in prj-headers.tex.

Paper

Use prj-paper.tex to control the size of the final PDF file’s dimensions, which can strongly influence the resulting printed paper size. There are many ways to control the paper size, add crop marks, and more. Here is a simple way to set the page size to the predefined trim size:

\definepapersize[BookTrimSize][
  width={\measure{BookPageTrimWidth}},
  height={\measure{BookPageTrimHeight}},
]

\setuppapersize[BookTrimSize]

Paragraphs

By default, ConTeXt will not indent the first paragraph immediately after a new section. To change this, tell the typesetter how much to indent subsequent paragraphs:

\setupindenting[
  yes,
  \measure{BookParagraphIndent}
]

Not being an option list, refrain from adding a trailing comma after the last configuration item.

Penalties

My preference is to make the widows and orphans extremely expensive and therefore extraordinarily rare, which the following lines accomplish:

\setpenalties\widowpenalties{1}{10000}
\setpenalties\clubpenalties {1}{10000}

Strings

If there are any common strings, they would be defined as macros in this file. Some examples include:

Anything, really, that relates to all the books being typeset.

Tables

ConTeXt has many table mechanisms, including Extreme Tables, which are generated by pandoc. Reimagine the default setups as follows:

\setupxtable[
  frame=off,
  topframe=on,
  bottomframe=on,
  framecolor=BookColourMainLt,
  toffset=2pt,
  boffset=2pt,
  option={stretch, width},
  split=yes,
]

\setupxtable[foot][
  rulethickness=1pt,
]

Table cells in Extreme Tables can be treated similar to frames, thus:

The second half of a split table follows:

Table from Les Misérables

Title

Back to that paradox: the novel’s contents are embedded into the product; the product uses styling from the project; the project attempts to typeset a title page; and the title page needs the title, which is in the product’s contents. This circular dependency is depicted in the following figure:

Circular Dependency

Another way to think about this is that by the time the typesetting engine sees \BookTitle, the macro must have already been defined.

There are many ways resolve the paradox, but most of them involve obscure TeX magicros, lua filters, or more XSL code. An easy solution is to extract the author name and book title from contents.tex so that they can be imported before the title page references them—a task for the build script. (See the -y option to “yank” the author and title from the novel’s contents.)

Note that the ordering of prj-title.tex is important. Setups must precede the aspect of the document they are to style.

A makeup is a separate page, such as a title page, colophon, or copyright page. By default, standard makeups in ConTeXt have the following convenient properties:

To center the title page text horizontally, in addition to the default vertical centering, we codify the following:

\setupmakeup[standard][
  align=middle,
]

\startstandardmakeup
  \BookTitle

  \BookAuthor
\stopstandardmakeup

Toc

Jekyll and Hyde’s table of contents is pretty svelt out the gate, but let’s use Roman numerals for the chapter numbers. First, edit jekyll-hyde.tex (in the standard-ebooks/jekyll-hyde directory) to include a product-specific table of contents. When finished, the file will resemble:

\environment prd-toc
\environment prd-paragraphs

\project standard-ebooks

\component c_jekyll-hyde

Including the local environments before the project environments is imperative. Changing the order means that project-level definitions are declared first. Once declared, most of them cannot be changed.

Next, create prd-toc.tex in the same directory:

\startenvironment *
  \definestructureconversionset[BookSectionNumerals][,,R][n]
\stopenvironment

Deferring creation of the BookSectionNumerals to a product environment allows product-specific changes to the numerals used for sections (i.e., chapters) in the table of contents.

Standard Ebooks

When finished, the style environments inside of standard-ebooks.tex will be imported in dependency order.

Resolving the circular dependency meant extracting the title and author into a separate file, named prd-strings.tex. That file, logically, belongs with the other product-specific files. If the file isn’t present, we don’t want the book build process to fail. At the top of standard-ebooks.tex are definitions for the only macros absolutely required by the build:

\def\BookTitle{}
\def\BookAuthor{}

Any value can be placed inside the braces, such as a warning to the editor that neither book title nor author name could be found. Once we have a guarantee that the macros exist, we can try to override the values by loading the product-specific values:

\doifdocumentargument{book}{%
  \environment \getdocumentargument{book}/prd-strings
}

Basically, the snippet imports the prd-strings.tex file from the product that’s being built. The name of the product being built is passed in on the command line when building a document, which can be seen in the call to ConTeXt inside the build script:

context --nonstopmode --batchmode --purgeall \
  --book="${OPT_CONTEXT_BOOK_NAME}" \

Here, the book in --book is what gets passed as a parameter into the documentargument macros shown above. The OPT_CONTEXT_BOOK_NAME variable value is set using the -i command line argument to the script. When ConTeXt runs through standard-ebooks.tex, the snippet that is interpreted for Jekyll and Hyde effectively becomes:

\environment jekyll-hyde/prd-strings

We need to make sure that prd-strings exists, which—as mentioned—the build script creates after pandoc produces contents.tex. That pesky circular dependency is resolved.

Modes

Novels generated using the technique described above are created with physical printing in mind. We all want to save trees, so a mode to generate a version for on-screen viewing is desirable. One way to control the output format is through modes.

ConTeXt controls page margins for left/right pages through the not-so-obviously named \setuppagenumbering macro. In prj-pages.tex we see:

\setuppagenumbering[alternative=doublesided]

Add a mode for screen by using a \startmode block after that line, like so:

\setuppagenumbering[alternative=doublesided]

\startmode[screen]
  \setuppagenumbering[alternative=singlesided]
\stopmode

ConTeXt will use double-sided pages unless screen mode is enabled, whereby the page type is changed to single-sided. Our illustrious build script does not currently pass in the mode to ConTeXt, so you’ll have to add that yourself. Here’s a hint:

context --nonstopmode --batchmode --purgeall \
  --book="${OPT_CONTEXT_BOOK_NAME}" \
  --path="${PATH_PROJECT},${ARG_PATH_CONTEXT_TEX}" \
  --result="${ARG_FILE_CONTEXT_PDF}" \
  --modes="${ARG_CONTEXT_MODES}" \
  "${ARG_FILE_CONTEXT_MAIN}" > /dev/null 2>&1

If you get stuck, let me know.

Download

Download the themes, distributed under the MIT license.

Summary

This part covered many topics:

The last part in this series will explore drawing with MetaPost.

Contact

About the Author

My career has spanned tele- and radio communications, enterprise-level e-commerce solutions, finance, transportation, modernization projects in both health and education, and much more.

Delighted to discuss opportunities to work with revolutionary companies combatting climate change.