Typesetting Markdown – Part 5: Interpolation
This part of the series describes how to reference interpolated strings inside Markdown documents.
Introduction
Part 4 described creating a reusable build script template and introduced controlling a document’s page size, layout, and thematic elements. This part describes a way to define, organise, and embed document variables. For simplification purposes, a variable described in this document can also be thought of as a constant or key-value pair.
Variables
Ancient Egyptians used hieroglyphic signs to represent numbers, such as those depicted in the following table:
Unicode | Vector | Meaning | Value |
---|---|---|---|
𓏺 | Wooden dowel, stroke | 1 | |
𓎆 | Hobble for cattle | 10 | |
𓍢 | Coil of rope | 100 | |
𓆼 | Lotus plant | 1,000 | |
𓂭 | Finger | 10,000 | |
𓆐 | Tadpole | 100,000 | |
𓁨 | Ḥeḥ with arms supporting the sky | 1,000,000 |
Symbolic representation of numbers has its roots in Sumerian cuneiform, one of the earliest writing systems invented. Using symbols back then was reasonably straightforward:
- Create (or borrow) a wedge-tipped reed stylus.
- Make a wet clay tablet.
- Use the stylus to write symbols in the clay.
- Leave the clay in direct sunlight to harden.
Thousands of years later, symbolic representations of numbers, text strings, and other data types are commonplace in systems created by software developers; however, using variables—the lifeblood of programming languages—within documentation remains fairly arduous for the vast majority of people. Consider the following Microsoft Word document:

Perhaps the phone number is used in multiple places throughout the text. When the phone number changes, it’d be convenient to change it once and be sure that all occurrences of the number are also updated. To make and insert a document variable, the author must know the following labyrinthine incantation:
- Click File.
- Click Properties.
- Click Advanced Properties.
- Click Custom tab.
- Set Name to the variable name (e.g.,
PhoneNumber
). - Set Value to the variable value.
- Click OK.
- Press
Esc
to resume document editing. - Click Insert.
- Click Quick Parts.
- Click Field.
- Set Categories to:
DocProperty
. - Scroll to find
PhoneNumber
under Property. - Click OK to insert the variable.
The variable is inserted into the document, shown highlighted in the following screen capture:

When that number changes, anyone can update the variable—assuming they know the value was from a variable and they know how or care enough to reassign it. Practically, the deeper problem of inserting information from a single source of truth into documentation is not addressed. A Microsoft Word document is an unsuitable source of truth because (1) multiple applications cannot reuse its variables; (2) its variables cannot be assigned a category (i.e., they cannot easily be organised into namespaces); and (3) its document file format promotes vendor lock-in. Sourcing variables from Microsoft Word is akin to telling your relatives where to find clay tablets whenever they need to look up their ancestors’ names. With respect to editing efficiency, flexibility, and maintainability… that phone number might as well have been carved into clay.
Document variables would do well to meet the following criteria:
- Creation – Make variables using four steps, or fewer.
- Injection – Insert variables using three steps, or fewer.
- Open – Variable definition formats must not be proprietary.
- Unified – Variables can be retrieved from a single source of truth.
- Orderly – Variable names whose values are categorically contextual.
- Interpolated – Let variables reference other variables, recursively.
The last four items are addressed hereinafter.
Open
Free, open file formats for associating variable names with values abound:
- JSON – JavaScript Object Notation is well-known to web developers.
- TOML – Tom’s Obvious, Minimal Language is a simple configuration file format meant to be read easily.
- XML – Extensible Markup Language is a file format originally designed for large-scale electronic publishing.
- YAML – YAML Ain’t Markup Language is designed to be a human-readable file format for describing structured data.
Despite their intentions, human-readable data formats are developer-readable at best. Non-developers balk at learning hierarchical file format syntaxes. Providing a simple user interface would make learning the underlying file format largely irrelevant. Even though some people dislike editing and navigating hierarchies, having the ability to categorise data through a simple user interface has practical value for developers and non-developers alike.
A common visualisation is a tree interface, such as:

Miller Columns (links to an implementation that I developed) are another way to visualise hierarchical data. A mock-up with filtering resembles:

Having limited screen real estate, iPods use a drill-down menu hierarchy. The effect achieved is similar to the following:

The D3 data visualisation library provides yet another way to view deeply nested hierarchies:

No matter how the information is presented, a way to associate a document with the variables referenced within it is essential.
YAML is the only format pandoc supports directly, at time of writing. A TOML integration may be implemented in the future. Either way, since there are many tools—of varying accuracy—that can convert file formats, using YAML does not force the documents to depend on any particular data input format.
Unified
Ideally, document data is requested from a central location, such as a data warehouse. The data warehouse can be a façade, exposing a single source of truth for separate information sources necessary to operate a business. Upon retrieval, the data is transformed into the required format (e.g., YAML), so that the document can reference the values.
For most writing needs, a flat file is sufficient.
Orderly
As soon as a document of substantial length is drafted, the need to organise variables becomes apparent. Initially, for example, direct
, fax
, tollfree
, support
, and afterhours
may suffice to capture various phone numbers. As a company expands into multiple locations, each of those variable names will be in conflict across the different locations. Similary, novels need ways to assign values to character sheets for a variety of characters. To avoid collisions, file formats must support spaces for variable names. Aptly, these are known as namespaces, and can help categorise information.
For example, a source code repository and a web server both have names and ports, which could be defined as per the following YAML file:
network:
domain:
name: librerie.com
ip: 192.168.1.1
servers:
repository:
name: svn.librerie.com
port: 3690
web:
name: www.librerie.com
port: 80
Even though name
appears multiple times, the fully qualified variable names can be referenced without conflict. Clearly, network.domain.name
, network.servers.repository.name
, and network.servers.web.name
have different values because they are in different namespaces, even though all end with name
.
There is a little redundancy in the YAML file that will be addressed using interpolated strings. Hard-coding text that will probably change later—like transitioning from Subversion to Git—inevitably results in inaccurate documentation. (Arguably, repository.librerie.com
may have been a more future-friendly host name, but that misses the point.)
Interpolated
String interpolation replaces placeholders with corresponding values. For example, consider the following metadata block, enclosed by three hyphens (---
), of YAML variables atop a Markdown file:
---
protagonist:
name:
given: &given May
surname: &surname Blood
personal: *given *surname
---
Hello $protagonist.name.personal$.
It would be convenient if the value for protagonist.name.personal
became May Blood
in the output document. While anchors (e.g., &given
) and references (e.g., *given
) are part of the YAML specification, for the purposes of simple variables inside of documents, the syntax has the following issues:
- Redundant – Variable names have uniquely defined namespaces, which makes the additional reference redundant. (YAML variables needn’t be uniquely named, so the notation can be useful.)
- Unsupported - As of pandoc version 2.7.2, anchors and references cannot be used.
- Pointers – C-style pointer syntax is abstruse for many people.
- Recursion – Even if pandoc supported the syntax, the implementation probably would not allow references within references.
Pandoc uses $
symbols to delimit variable names within documents. Create a file named 01.md
having the following contents:
---
title: Book
protagonist:
name:
given: May
surname: Blood
personal: May Blood
---
Hello $protagonist.name.personal$.
Save the file then run pandoc as follows:
pandoc 01.md --template 01.md 2>/dev/null | pandoc
Using 01.md
as both a source of variables (i.e., a template) and a document allows pandoc to interpret the variables and apply their values to the document. Pandoc produces the following output:
<p>Hello May Blood.</p>
Short of writing a lua filter to parse metadata blocks, pandoc cannot replace strings within the YAML metadata block, meaning the following document will not produce the same HTML fragment as above:
---
protagonist:
name:
given: May
surname: Blood
personal: $protagonist.name.given$ $protagonist.name.surname$
---
Hello $protagonist.name.personal$.
Writing a lua filter would unnecessarily bind a possible solution to pandoc. Working around the lack of support for recursive string interpolation entails the following actions:
- Put variables in a separate file, external to the Markdown.
- Run a YAML preprocessor to perform string interpolation.
- Integrate interpolated variables with the Markdown document.
Let’s see how preprocessing can work.
YAML and Markdown Separation
Create a file named definitions.yaml
, representing locations in a novel:
hero:
origin: $hero.city$, $hero.region$, $hero.country$
city: Corvallis
region: Oregon
country: $countries.primary$
vacation:
city: Redwood National Park
region: California
country: $countries.primary$
countries:
primary: USA
Note the lack of metablock hyphens (---
), which will be added later.
Create a file named 01.md
having the following contents:
# Velocitas Formidabilis
"From $hero.city$ to $vacation.city$, $vacation.country$?" he asked.
The files are ready for preprocessing and merging.
YAML Preprocessor
Although a few YAML preprocessors exist, only yamlp
can perform self-referential string interpolation on a standalone YAML file. YAML-specific preprocessors are listed in the following table:
Software | Issues |
---|---|
yamlp | Requires Java |
yamp | Requires predefined variables |
emrichen | Requires predefined variables |
pandoc-moustache | Variables cannot reference variables |
Full disclosure: I wrote the yamlp
software.
Download
Download yamlp
as follows:
- Visit the download page.
- Click
yamlp.jar
to download the pre-built Java archive file.
Install Java
Running yamlp
requires a working Java installation:
- Visit the OpenJDK page.
- Download the applicable build (Linux, MacOS, or Windows).
- Install the JDK as per its instructions.
Java is installed and can be run from the command-line.
Install yamlp
See the documentation for detailed yamlp
installation and usage instructions. Note that Maven is only required for building the project and that downloading the pre-built Java archive file is sufficient.
Issue Tracking
Rather than report issues against yamlp
, consider helping to migrate the software to a new programming language.
Help Wanted
Now that commercial use of Oracle’s Java is no longer free, having a native build that can be cross-compiled to multiple platforms using Rust or Haxe would be beneficial. Minimally, the ported version would:
- be distributed under a permissive license;
- read any aforementioned file format (JSON, YAML, TOML, etc.);
- write to any of those file formats;
- read from standard input and write to standard input;
- perform recursive string interpolation on all variables;
- have configurable variable delimiter start and end tokens; and
- have a configurable variable path token (e.g.,
.
or/
).
If this seems like a challenging weekend project, take up the torch and then let me know. As a starting point, see the recursive interpolated strings algorithm in yamlp
’s source code.
Delimiter Dilemma
On a side note, yamlp
uses a regular expression to match variable delimiter tokens. Many programs hard-code delimiters without necessity. Apache Camel, in contrast, provides separate settings for the prefix and suffix tokens. An improvement to yamlp
would be to replace its regular expression (regex) with delimiter tokens, similar to Apache Camel. This would simplify using delimiters like those listed in the following table:
Delimiter | Used by |
---|---|
$...$ | pandoc |
$(...) | Julia |
${...} | bash, Apache Camel, and others. |
#{...} | Aaron Parecki |
%{...} | Puppet |
[%...] | MultiMarkdown |
{{...}} | Assemble, Handlebars, and others. |
((...)) | BOSH |
Most delimiter tokens are special characters in regular expressions, as such they must be escaped, which complicates the expression.
Integration
This section describes how to interpolate strings in Markdown.
Requirements
Ensure the following files exist inside $HOME/dev/writing/book
:
ci
script from Part 4.definitions.yaml
(above)01.md
(above)
The requirements are met.
Update Script
Edit the ci
script then make the changes that follow.
Update the DEPENDENCIES
list to include Java:
"java,https://jdk.java.net"
Update the ARGUMENTS
list to include YAML:
"-y,--yaml,YAML definitions file name"
Update arguments()
to parse the YAML option:
-y|--yaml)
ARG_FILE_YAML="$2"
consume=2
;;
Provide a default file name for YAML definitions:
ARG_FILE_YAML="definitions.yaml"
Change the filter
function to include monitoring of YAML files:
filter() {
[[ "${1,,}" =~ \.(.*md|tex|y.?ml)$ ]]
return $?
}
The following table explains the filter’s terse, conditional syntax:
Token | Meaning |
---|---|
[[ | Begin evaluation of a Boolean expression |
"${1,,}" | Convert the $1 filename parameter to lower case |
=~ | Compare filename against a regular expression |
\. | Starting from a period in the filename … |
( | Find any pattern up until the closing parenthesis … |
.*md | … that matches a string with md , such as Rmd |
|tex | … or matches a string with tex |
|y.?ml | … or matches a string with y and ml , such as yaml |
) | Stop scanning for patterns to match |
$ | Ensure the match happens at the end of the string |
]] | End of Boolean expression to evaluate |
As before, this will match more than what’s expected, including .cmd
.
Replace build_document()
with the following snippet:
build_document() {
local -r DIR_BUILD="artefacts"
mkdir -p "${DIR_BUILD}"
local -r FILE_MAIN_PREFIX="main"
local -r FILE_BODY_PREFIX="${DIR_BUILD}/body"
local -r FILE_CAT="${FILE_BODY_PREFIX}.md"
local -r FILE_TEX="${FILE_BODY_PREFIX}.tex"
local -r FILE_PDF="${FILE_BODY_PREFIX}.pdf"
local -r FILE_DST="$(basename "${ARG_FILE_OUTPUT}" .pdf).pdf"
$log "Preprocess YAML into ${FILE_CAT}"
java -jar $HOME/bin/yamlp.jar < "${ARG_FILE_YAML}" > ${FILE_CAT}
printf "%s\n" "---" >> "${FILE_CAT}"
$log "Concatenate into ${FILE_CAT}"
cat ./??.md >> "${FILE_CAT}"
$log "Generate ${FILE_TEX}"
pandoc "${FILE_CAT}" --template "${FILE_CAT}" 2>/dev/null | \
pandoc --to context > "${FILE_TEX}"
$log "Generate ${FILE_PDF}"
context --nonstopmode --batchmode --purgeall \
--path=artefacts,styles \
"${FILE_MAIN_PREFIX}.tex" > /dev/null 2>&1
$log "Rename ${FILE_MAIN_PREFIX}.pdf to ${FILE_DST}"
mv "${FILE_MAIN_PREFIX}.pdf" "${FILE_DST}"
}
The following lines run the preprocessor:
$log "Preprocess YAML into ${FILE_CAT}"
java -jar $HOME/bin/yamlp.jar < "${ARG_FILE_YAML}" > ${FILE_CAT}
printf "%s\n" "---" >> "${FILE_CAT}"
The first line informs users what is happening. The second line runs yamlp
using Java against the definitions.yaml
file. The third line places the closing metablock separator ahead of the Markdown content; yamlp
writes the opening separator, automatically.
Pandoc is instructed to interpret the newly interpolated template:
$log "Generate ${FILE_TEX}"
pandoc "${FILE_CAT}" --template "${FILE_CAT}" 2>/dev/null | \
pandoc --to context > "${FILE_TEX}"
The changes are ready to run.
Run Continuous Integration Script
Restart the continuous integration script as follows:
- Stop the
ci
script if it is running (e.g., usingCtrl+c
). - Run the
ci
script again to ensure the changes are loaded.
Update Style
This section describes a few superficial changes to the document.
Change main.tex
to include an override for table of contents styling:
\input toc
Add a file styles/toc.tex
with the following contents, to eliminate the table of contents altogether:
\def\completecontent{}
Change styles/headings.tex
to capitalise the chapter title by updating the setups for section
to use the uppercase WORD
macro as follows:
\setuphead[section][
style=\ss\tfd\WORD,
textcolor=ColourPrimary,
numbercolor=ColourPrimary,
]
Revise the document colours by editing styles/colours.tex
:
\definecolor[ColourPrimary][h=545454]
% ...
\definecolor[ColourPrimaryDk][h=333333]
Lastly, clear the contents from both layouts.tex
and paper.tex
to reset the paper size and page layout to their defaults. Make sure the files exist but are zero bytes in size.
Preview
Open output.pdf
to see the output, which resembles:

Notice that $vacation.country$
resolves from $countries.primary$
to "USA"
using yamlp
. The YAML metablock in artefacts/body.md
follows:
---
hero:
origin: "Corvallis, Oregon, USA"
city: "Corvallis"
region: "Oregon"
country: "USA"
vacation:
city: "Redwood National Park"
region: "California"
country: "USA"
countries:
primary: "USA"
---
All strings are interpolated correctly.
Download
Download book.zip
to get the updated continuous integration script, book styles, YAML definition file, and Markdown example; all files are distributed under the MIT license.
Summary
This part explained recursive string interpolation, lamented the difficulty of using variables in documentation, provided example user interfaces for editing hierarchical data, and described how to embed interpolated strings in Markdown documents. Incidentally, by placing the variable definitions in a separate file, creating new variables has been reduced to fewer than four steps. Using variables is still tedious, for now. Part 6 describes how to use R to perform calculations that reuse the same YAML variable definitions.
Contact
About the Author
My career has spanned tele- and radio communications, enterprise-level e-commerce solutions, finance, transportation, modernization projects in both health and education, and much more.
Delighted to discuss opportunities to work with revolutionary companies combatting climate change.