Typesetting Markdown – Part 5: Interpolation

by Dave Jarvis, July 6 2019

This part of the series describes how to reference interpolated strings inside Markdown documents.

Introduction

Part 4 described creating a reusable build script template and introduced controlling a document’s page size, layout, and thematic elements. This part describes a way to define, organise, and embed document variables. For simplification purposes, a variable described in this document can also be thought of as a constant or key-value pair.

Variables

Ancient Egyptians used hieroglyphic signs to represent numbers, such as those depicted in the following table:

Unicode	Meaning	Value
𓏺	Wooden dowel, stroke	1
𓎆	Hobble for cattle	10
𓍢	Coil of rope	100
𓆼	Lotus plant	1,000
𓂭	Finger	10,000
𓆐	Tadpole	100,000
𓁨	Ḥeḥ with arms supporting the sky	1,000,000

Symbolic representation of numbers has its roots in Sumerian cuneiform, one of the earliest writing systems invented. Using symbols back then was reasonably straightforward:

Create (or borrow) a wedge-tipped reed stylus.
Make a wet clay tablet.
Use the stylus to write symbols in the clay.
Leave the clay in direct sunlight to harden.

Thousands of years later, symbolic representations of numbers, text strings, and other data types are commonplace in systems created by software developers; however, using variables—the lifeblood of programming languages—within documentation remains fairly arduous for the vast majority of people. Consider the following Microsoft Word document:

Perhaps the phone number is used in multiple places throughout the text. When the phone number changes, it’d be convenient to change it once and be sure that all occurrences of the number are also updated. To make and insert a document variable, the author must know the following labyrinthine incantation:

Click File.
Click Properties.
Click Advanced Properties.
Click Custom tab.
Set Name to the variable name (e.g., PhoneNumber).
Set Value to the variable value.
Click OK.
Press Esc to resume document editing.
Click Insert.
Click Quick Parts.
Click Field.
Set Categories to: DocProperty.
Scroll to find PhoneNumber under Property.
Click OK to insert the variable.

The variable is inserted into the document, shown highlighted in the following screen capture:

When that number changes, anyone can update the variable—assuming they know the value was from a variable and they know how or care enough to reassign it. Practically, the deeper problem of inserting information from a single source of truth into documentation is not addressed. A Microsoft Word document is an unsuitable source of truth because (1) multiple applications cannot reuse its variables; (2) its variables cannot be assigned a category (i.e., they cannot easily be organised into namespaces); and (3) its document file format promotes vendor lock-in. Sourcing variables from Microsoft Word is akin to telling your relatives where to find clay tablets whenever they need to look up their ancestors’ names. With respect to editing efficiency, flexibility, and maintainability… that phone number might as well have been carved into clay.

Document variables would do well to meet the following criteria:

Creation – Make variables using four steps, or fewer.
Injection – Insert variables using three steps, or fewer.
Open – Variable definition formats must not be proprietary.
Unified – Variables can be retrieved from a single source of truth.
Orderly – Variable names whose values are categorically contextual.
Interpolated – Let variables reference other variables, recursively.

The last four items are addressed hereinafter.

Open

Free, open file formats for associating variable names with values abound:

JSON – JavaScript Object Notation is well-known to web developers.
TOML – Tom’s Obvious, Minimal Language is a simple configuration file format meant to be read easily.
XML – Extensible Markup Language is a file format originally designed for large-scale electronic publishing.
YAML – YAML Ain’t Markup Language is designed to be a human-readable file format for describing structured data.

Despite their intentions, human-readable data formats are developer-readable at best. Non-developers balk at learning hierarchical file format syntaxes. Providing a simple user interface would make learning the underlying file format largely irrelevant. Even though some people dislike editing and navigating hierarchies, having the ability to categorise data through a simple user interface has practical value for developers and non-developers alike.

A common visualisation is a tree interface, such as:

Miller Columns (links to an implementation that I developed) are another way to visualise hierarchical data. A mock-up with filtering resembles:

Having limited screen real estate, iPods use a drill-down menu hierarchy. The effect achieved is similar to the following:

The D3 data visualisation library provides yet another way to view deeply nested hierarchies:

No matter how the information is presented, a way to associate a document with the variables referenced within it is essential.

YAML is the only format pandoc supports directly, at time of writing. A TOML integration may be implemented in the future. Either way, since there are many tools—of varying accuracy—that can convert file formats, using YAML does not force the documents to depend on any particular data input format.

Unified

Ideally, document data is requested from a central location, such as a data warehouse. The data warehouse can be a façade, exposing a single source of truth for separate information sources necessary to operate a business. Upon retrieval, the data is transformed into the required format (e.g., YAML), so that the document can reference the values.

For most writing needs, a flat file is sufficient.

Orderly

As soon as a document of substantial length is drafted, the need to organise variables becomes apparent. Initially, for example, direct, fax, tollfree, support, and afterhours may suffice to capture various phone numbers. As a company expands into multiple locations, each of those variable names will be in conflict across the different locations. Similary, novels need ways to assign values to character sheets for a variety of characters. To avoid collisions, file formats must support spaces for variable names. Aptly, these are known as namespaces, and can help categorise information.

For example, a source code repository and a web server both have names and ports, which could be defined as per the following YAML file:

network:
  domain:
    name: librerie.com
    ip: 192.168.1.1
  servers:
    repository:
      name: svn.librerie.com
      port: 3690
    web:
      name: www.librerie.com
      port: 80

Even though name appears multiple times, the fully qualified variable names can be referenced without conflict. Clearly, network.domain.name, network.servers.repository.name, and network.servers.web.name have different values because they are in different namespaces, even though all end with name.

There is a little redundancy in the YAML file that will be addressed using interpolated strings. Hard-coding text that will probably change later—like transitioning from Subversion to Git—inevitably results in inaccurate documentation. (Arguably, repository.librerie.com may have been a more future-friendly host name, but that misses the point.)

Interpolated

String interpolation replaces placeholders with corresponding values. For example, consider the following metadata block, enclosed by three hyphens (---), of YAML variables atop a Markdown file:

---
protagonist:
  name:
    given: &given May
    surname: &surname Blood
    personal: *given *surname
---

Hello $protagonist.name.personal$.

It would be convenient if the value for protagonist.name.personal became May Blood in the output document. While anchors (e.g., &given) and references (e.g., *given) are part of the YAML specification, for the purposes of simple variables inside of documents, the syntax has the following issues:

Redundant – Variable names have uniquely defined namespaces, which makes the additional reference redundant. (YAML variables needn’t be uniquely named, so the notation can be useful.)
Unsupported - As of pandoc version 2.7.2, anchors and references cannot be used.
Pointers – C-style pointer syntax is abstruse for many people.
Recursion – Even if pandoc supported the syntax, the implementation probably would not allow references within references.

Pandoc uses $ symbols to delimit variable names within documents. Create a file named 01.md having the following contents:

---
title: Book
protagonist:
  name:
    given: May
    surname: Blood
    personal: May Blood
---

Hello $protagonist.name.personal$.

Save the file then run pandoc as follows:

pandoc 01.md --template 01.md 2>/dev/null | pandoc

Using 01.md as both a source of variables (i.e., a template) and a document allows pandoc to interpret the variables and apply their values to the document. Pandoc produces the following output:

<p>Hello May Blood.</p>

Short of writing a lua filter to parse metadata blocks, pandoc cannot replace strings within the YAML metadata block, meaning the following document will not produce the same HTML fragment as above:

---
protagonist:
  name:
    given: May
    surname: Blood
    personal: $protagonist.name.given$ $protagonist.name.surname$
---

Hello $protagonist.name.personal$.

Writing a lua filter would unnecessarily bind a possible solution to pandoc. Working around the lack of support for recursive string interpolation entails the following actions:

Put variables in a separate file, external to the Markdown.
Run a YAML preprocessor to perform string interpolation.
Integrate interpolated variables with the Markdown document.

Let’s see how preprocessing can work.

YAML and Markdown Separation

Create a file named definitions.yaml, representing locations in a novel:

hero:
  origin: $hero.city$, $hero.region$, $hero.country$
  city: Corvallis
  region: Oregon
  country: $countries.primary$
vacation:
  city: Redwood National Park
  region: California
  country: $countries.primary$
countries:
  primary: USA

Note the lack of metablock hyphens (---), which will be added later.

Create a file named 01.md having the following contents:

# Velocitas Formidabilis

"From $hero.city$ to $vacation.city$, $vacation.country$?" he asked.

The files are ready for preprocessing and merging.

YAML Preprocessor

Although a few YAML preprocessors exist, only yamlp can perform self-referential string interpolation on a standalone YAML file. YAML-specific preprocessors are listed in the following table:

Software	Issues
yamlp	Requires Java
yamp	Requires predefined variables
emrichen	Requires predefined variables
pandoc-moustache	Variables cannot reference variables

Full disclosure: I wrote the yamlp software.

Download

Download yamlp as follows:

Visit the download page.
Click yamlp.jar to download the pre-built Java archive file.

Install Java

Running yamlp requires a working Java installation:

Visit the OpenJDK page.
Download the applicable build (Linux, MacOS, or Windows).
Install the JDK as per its instructions.

Java is installed and can be run from the command-line.

Install yamlp

See the documentation for detailed yamlp installation and usage instructions. Note that Maven is only required for building the project and that downloading the pre-built Java archive file is sufficient.

Issue Tracking

Rather than report issues against yamlp, consider helping to migrate the software to a new programming language.

Help Wanted

Now that commercial use of Oracle’s Java is no longer free, having a native build that can be cross-compiled to multiple platforms using Rust or Haxe would be beneficial. Minimally, the ported version would:

be distributed under a permissive license;
read any aforementioned file format (JSON, YAML, TOML, etc.);
write to any of those file formats;
read from standard input and write to standard input;
perform recursive string interpolation on all variables;
have configurable variable delimiter start and end tokens; and
have a configurable variable path token (e.g., . or /).

If this seems like a challenging weekend project, take up the torch and then let me know. As a starting point, see the recursive interpolated strings algorithm in yamlp’s source code.

Delimiter Dilemma

On a side note, yamlp uses a regular expression to match variable delimiter tokens. Many programs hard-code delimiters without necessity. Apache Camel, in contrast, provides separate settings for the prefix and suffix tokens. An improvement to yamlp would be to replace its regular expression (regex) with delimiter tokens, similar to Apache Camel. This would simplify using delimiters like those listed in the following table:

Delimiter	Used by
$...$	pandoc
`$(...)`	Julia
`${...}`	bash, Apache Camel, and others.
`#{...}`	Aaron Parecki
`%{...}`	Puppet
`[%...]`	MultiMarkdown
`{{...}}`	Assemble, Handlebars, and others.
`((...))`	BOSH

Most delimiter tokens are special characters in regular expressions, as such they must be escaped, which complicates the expression.

Integration

This section describes how to interpolate strings in Markdown.

Requirements

Ensure the following files exist inside $HOME/dev/writing/book:

ci script from Part 4.
definitions.yaml (above)
01.md (above)

The requirements are met.

Update Script

Edit the ci script then make the changes that follow.

Update the DEPENDENCIES list to include Java:

"java,https://jdk.java.net"

Update the ARGUMENTS list to include YAML:

"-y,--yaml,YAML definitions file name"

Update arguments() to parse the YAML option:

-y|--yaml)
  ARG_FILE_YAML="$2"
  consume=2
;;

Provide a default file name for YAML definitions:

ARG_FILE_YAML="definitions.yaml"

Change the filter function to include monitoring of YAML files:

filter() {
  [[ "${1,,}" =~ \.(.*md|tex|y.?ml)$ ]]

  return $?
}

The following table explains the filter’s terse, conditional syntax:

Token	Meaning
`[[`	Begin evaluation of a Boolean expression
`"${1,,}"`	Convert the `$1` filename parameter to lower case
`=~`	Compare filename against a regular expression
`\.`	Starting from a period in the filename …
`(`	Find any pattern up until the closing parenthesis …
`.*md`	… that matches a string with `md`, such as Rmd
`\|tex`	… or matches a string with tex
`\|y.?ml`	… or matches a string with `y` and `ml`, such as yaml
`)`	Stop scanning for patterns to match
`$`	Ensure the match happens at the end of the string
`]]`	End of Boolean expression to evaluate

As before, this will match more than what’s expected, including .cmd.

Replace build_document() with the following snippet:

build_document() {
  local -r DIR_BUILD="artefacts"
  mkdir -p "${DIR_BUILD}"

  local -r FILE_MAIN_PREFIX="main"
  local -r FILE_BODY_PREFIX="${DIR_BUILD}/body"

  local -r FILE_CAT="${FILE_BODY_PREFIX}.md"
  local -r FILE_TEX="${FILE_BODY_PREFIX}.tex"
  local -r FILE_PDF="${FILE_BODY_PREFIX}.pdf"
  local -r FILE_DST="$(basename "${ARG_FILE_OUTPUT}" .pdf).pdf"

  $log "Preprocess YAML into ${FILE_CAT}"
  java -jar $HOME/bin/yamlp.jar < "${ARG_FILE_YAML}" > ${FILE_CAT}
  printf "%s\n" "---" >> "${FILE_CAT}"

  $log "Concatenate into ${FILE_CAT}"
  cat ./??.md >> "${FILE_CAT}"

  $log "Generate ${FILE_TEX}"
  pandoc "${FILE_CAT}" --template "${FILE_CAT}" 2>/dev/null | \
    pandoc --to context > "${FILE_TEX}"

  $log "Generate ${FILE_PDF}"
  context --nonstopmode --batchmode --purgeall \
    --path=artefacts,styles \
    "${FILE_MAIN_PREFIX}.tex" > /dev/null 2>&1

  $log "Rename ${FILE_MAIN_PREFIX}.pdf to ${FILE_DST}"
  mv "${FILE_MAIN_PREFIX}.pdf" "${FILE_DST}"
}

The following lines run the preprocessor:

$log "Preprocess YAML into ${FILE_CAT}"
java -jar $HOME/bin/yamlp.jar < "${ARG_FILE_YAML}" > ${FILE_CAT}
printf "%s\n" "---" >> "${FILE_CAT}"

The first line informs users what is happening. The second line runs yamlp using Java against the definitions.yaml file. The third line places the closing metablock separator ahead of the Markdown content; yamlp writes the opening separator, automatically.

Pandoc is instructed to interpret the newly interpolated template:

$log "Generate ${FILE_TEX}"
pandoc "${FILE_CAT}" --template "${FILE_CAT}" 2>/dev/null | \
  pandoc --to context > "${FILE_TEX}"

The changes are ready to run.

Run Continuous Integration Script

Restart the continuous integration script as follows:

Stop the ci script if it is running (e.g., using Ctrl+c).
Run the ci script again to ensure the changes are loaded.

Update Style

This section describes a few superficial changes to the document.

Change main.tex to include an override for table of contents styling:

\input toc

Add a file styles/toc.tex with the following contents, to eliminate the table of contents altogether:

\def\completecontent{}

Change styles/headings.tex to capitalise the chapter title by updating the setups for section to use the uppercase WORD macro as follows:

\setuphead[section][
  style=\ss\tfd\WORD,
  textcolor=ColourPrimary,
  numbercolor=ColourPrimary,
]

Revise the document colours by editing styles/colours.tex:

\definecolor[ColourPrimary][h=545454]
% ...
\definecolor[ColourPrimaryDk][h=333333]

Lastly, clear the contents from both layouts.tex and paper.tex to reset the paper size and page layout to their defaults. Make sure the files exist but are zero bytes in size.

Preview

Open output.pdf to see the output, which resembles:

Notice that $vacation.country$ resolves from $countries.primary$ to "USA" using yamlp. The YAML metablock in artefacts/body.md follows:

---
hero:
  origin: "Corvallis, Oregon, USA"
  city: "Corvallis"
  region: "Oregon"
  country: "USA"
vacation:
  city: "Redwood National Park"
  region: "California"
  country: "USA"
countries:
  primary: "USA"
---

All strings are interpolated correctly.

Download

Download book.zip to get the updated continuous integration script, book styles, YAML definition file, and Markdown example; all files are distributed under the MIT license.

Summary

This part explained recursive string interpolation, lamented the difficulty of using variables in documentation, provided example user interfaces for editing hierarchical data, and described how to embed interpolated strings in Markdown documents. Incidentally, by placing the variable definitions in a separate file, creating new variables has been reduced to fewer than four steps. Using variables is still tedious, for now. Part 6 describes how to use R to perform calculations that reuse the same YAML variable definitions.

Contact

About the Author

My career has spanned tele- and radio communications, enterprise-level e-commerce solutions, finance, transportation, modernization projects in both health and education, and much more.

Delighted to discuss opportunities to work with revolutionary companies combatting climate change.