Typesetting Markdown – Part 3: Automagicify
This part of the series describes how to create a build script that performs continuous integration when writing documents.
Introduction
Typesetting beautiful documentation can entail complex page layouts, stunning math, footnotes, citations, bibliographic references, lists of tables, river removal, microtypography and much more. There are many ways to generate PDF files from Markdown, but few of them provide the fine-grained controls that the typesetting engine ConTeXt offers.
Part 2 showed how to create a minimal PDF from a Markdown document using pandoc and ConTeXt.
Let’s address a few issues with the shell script created in Part 1 before exploring how to apply continuous integration concepts for regenerating documenation upon changes to the source files. Or skip the prepatory work by jumping to the new build script template section and continue reading about continuous integration from there.
Robust Scripting
Once a shell script takes on a life of its own, subtle effects and logic changes can cause the script to fail in unexpected ways. Robust scripts can compensate for some commonly encountered issues. For details beyond what is covered here, see:
- Writing Robust Bash Shell Scripts
- Best Practices for Writing Bash Scripts
- Bash scripting quirks & safety tips
Terminate on Error
When a bash
script is run, a safe practice is to ensure that the script terminates if any of the commands it executes fail. Accomplish this by using the set
command at the top of the script as follows:
set -o errexit
The disadvantage is that $?
cannot be used to determine whether a command failed because bash
will exit immediately. A way to work around the issue is to follow the command with a Boolean expression that returns true
. Consider the following command:
$ ls /UNICORN; echo $?
ls: cannot access '/UNICORN': No such file or directory
2
The 2
is the exit level from running the ls
command on a non-existent directory, which is displayed using echo $?
. By appending a Boolean expression (|| true
), the value of $?
is displayed as 0
instead:
$ ls /UNICORN || true; echo $?
ls: cannot access '/UNICORN': No such file or directory
0
Thus the bash
script will not terminate, even if the command fails, despite having enabled errexit
; however, the $?
variable is then always 0
, which isn’t very helpful. One way to address this side-effect is to include the statements to execute directly, such as the following:
ls /UNICORN || { echo "Try /narwhal"; }
Or, in a slightly cleaner fashion, by calling your own function:
ls /UNICORN || missing_unicorn;
Uninitialised Variables
Instruct bash
to prevent referencing uninitialised variables by using the set
command near the top of the script as follows:
set -o nounset
Since bash
will terminate the shell script whenever an uninitialised variable is used, the unset
variables now cause the following error:
$ ./build -d
./build: line 38: ARG_HELP: unbound variable
One way to resolve this is to change the corresponding unset
lines to:
ARG_HELP=
ARG_DEBUG=
REQUIRED_MISSING=
The variables are initialised to empty strings, allowing the script to use them as expected. But making these changes does not address a deeper issue: the original script contains duplicated logic. Let’s fix the deeper issue.
Eliminate Code Duplication
A principle of software development is that every piece of knowledge must have a single, unambiguous, authoritative representation.
Although subtle, this principle is broken in the script from Part 1. First, consider the following snippet from main()
:
if [ -n "${ARG_HELP}" ]; then
show_usage
exit 3
fi
Next, parse_commandline()
also has logic regarding the help argument:
-h|-\?|--help)
ARG_HELP="true"
;;
These two snippets are conditional evaluations regarding the “help” functionality. There is no need to twice evaluate whether help must be displayed. The duplication becomes apparent when the if
statement is re-written as follows:
if [ "${ARG_HELP}" = "true" ]; then
Using delegate functions eliminates the duplication. This entails the following changes:
- Introduce an empty function that performs no operations.
- Create a terminate function that exits the script with an exit level.
- Rename the existing function with a utile prefix.
- Assign the default help function to the empty function.
- Change the command-line argument parsing code.
- Update all function calls to use the delegating function variable.
Non-functional Function
A function having no functionality (that is, no operations of consequence) resembles:
noop() {
return 1
}
Returning 1
indicates that the function call failed, which ensures that any commands issued after a Boolean and
(&&
) conditional expression are not executed. This return value helps to simplify the main
function.
Terminate Function
To eliminate some minor duplication (the exit
command being issued in multiple locations), create a function that’s called to terminate the script:
terminate() {
exit "$1"
}
This is useful for isolating any clean-up operations to a single location. For example, trap an interrupt signal to turn off ANSI colour sequences or delete temporary directories.
Rename Functions
Rename the functions performing actual work with a prefix that suggests their usage, such as:
utile_show_usage() {
printf "Usage: %s [OPTION...]\n" "${SCRIPT_NAME}" >&2
printf " -d, --debug\t\tLog messages while processing\n" >&2
printf " -h, --help\t\tShow this help message then exit\n" >&2
return 0
}
It is important that the help utile_
function returns 0
, meaning the function call succeeded, and therefore the script must terminate. The function is treated similar to a command in that its return value is like an exit level. Consequently, returning non-zero values from functions will trigger the errexit
setting, causing the script to terminate.
Function Assignment
In bash
, delegate functions can be assigned like any other variable. At the bottom of the script, define delegate function variables for help and logging as follows:
show_usage=noop
log=noop
By default, the show_usage
and log
variables reference the noop
function. Different languages have ways to support such functionality. For example, C has function pointers. When parsing command-line arguments, be sure to set variables to the appropriate utile_
function.
Command-line Parsing
Eliminate the first instance of duplication by changing the debug and help cases in parse_commandline()
as follows:
-d|--debug)
log=utile_log
;;
-h|-\?|--help)
show_usage=utile_show_usage
;;
Rather than set a variable to record whether help or logging was requested, the noop
function is replaced with the utile_
function associated with the command-line argument.
Call Delegates
After these changes main()
simplifies to:
main() {
parse_commandline "$@"
$show_usage && terminate 3
validate_requirements && terminate 4
cd "${SCRIPT_DIR}" && execute_tasks && terminate 5
terminate 0
}
The show_usage
function call must be replaced by the $show_usage
delegate function variable. Similarly, all calls to log
must become $log
, such as:
execute_tasks() {
$log "Execute tasks"
return 1
}
For completeness, the last simplification is to eliminate the conditional from the original log
function:
utile_log() {
printf "[%s] " "$(date +%H:%I:%S.%4N)"
coloured_text "$1" "${COLOUR_LOGGING}"
}
The new build script template is ready for continuous integration.
New Build Script Template
Download the new build script template, distributed under the MIT license. This template is the starting point for the continuous integration script.
Continuous Integration
Continuous integration is useful for more than rebuilding documents when source files have changed. Another possible scenario is interweaving Markdown with R code to generate sophisticated living documents. Let’s walk through changes to the build script to see how continuous integration can work in practice.
Setup
Type the following commands at a new command prompt (hereafter referred to as the first terminal) to create a new sandbox:
mkdir -p $HOME/dev/writing/book # make directory (fail silently)
cd $HOME/dev/writing/book # change directory
rm * # remove previous sandbox
Copy the build script template into $HOME/dev/writing/book/
, then rename the script and make it executable as follows:
mv build ci
chmod +x ci
The continuous integration script is ready to be modified.
Main Loop
Open the ci
script in any text editor. Change execute_tasks()
as follows:
execute_tasks() {
$log "Execute tasks"
local await=close_write
$log "Await file modifications"
inotifywait -q -e "${await}" -m . | \
while read -r directory event filename; do
echo "${directory}${filename} changed (${event})"
done
return 1
}
Open another command prompt (hereafter the second terminal), then type the following commands:
cd $HOME/dev/writing/book
./ci -d
The following messages appear:
[14:02:12.3965] Check missing software requirements
[14:02:12.3976] Execute tasks
[14:02:12.3984] Await file modifications
The script is listening for file modification events—in particular, files closed after writing—within the directory hierarchy of $HOME/dev/writing/book/
.
Change the echo
line above to use the delegated log
function:
$log "${directory}${filename} changed (${event})"
Save the file. The second terminal shows:
./ci changed (CLOSE_WRITE,CLOSE)
This is expected because bash
loads a script when it is first invoked, but never reloads that script, even if modified. To see the changes, complete the following steps:
- Stop the
ci
script. - Re-run the
ci
script. - Return to the first terminal.
- Type:
touch 01.md
- Press
Enter
.
The second terminal shows:
[14:02:19.1366] ./01.md changed (CLOSE_WRITE,CLOSE)
Remember to update validate_requirements()
to include the new software requirement:
required inotifywait "https://github.com/rvoicilas/inotify-tools/wiki"
Next, change the script to execute a build whenever a Markdown file changes. Update execute_tasks()
as follows:
execute_tasks() {
$log "Execute tasks"
local -r await=close_write,delete
$log "Await file modifications"
inotifywait -q -e "${await}" -m . | \
while read -r directory event filename; do
# Act on Markdown file events; ignore directory delete events.
if [[ "${filename,,}" == *\.*md && ! "${event}" == *ISDIR* ]]; then
$log "${directory}${filename} (${event})"
execute_build
fi
done
return 1
}
The changes include:
- assign a local, read-only event variable (
local -r await=
); - listen for file write and delete events (
close_write,delete
); - filter on Markdown files (
"${filename,,}" == *\.*md
); and - ignore directory deletion events (
! "${event}" == *ISDIR*
).
Technically, the filter will pass desirable filenames such as 01.md
, 01.MD
, and 01.Rmd
; however, it will also pass less desirable names like 01.cmd
. Be aware that there are more possible extensions for Markdown files than the ci
script recognises. A more complex regular expression pattern will match more Markdown extensions, such as the following:
.*\.(m(?:d(?:te?xt|o?wn)?|arkdown|kdn?)|text)$
Define an empty execute_build
function as follows:
execute_build() {
:
}
Save the changes. In the second terminal restart the ci
script as follows:
- Press
Ctrl+c
to terminate the script. - Run
./ci -d
to restart the script.
In the first terminal, try creating and deleting both files and directories with various filename extensions. Watch the second terminal to see how the script behaves.
The main
function is complete.
Continuous Build Process
Create a continuous build process by changing execute_build()
to the following:
execute_build() {
local -r DIR_TEMP="$(mktemp tmp.XXXXXXXXXX -ut)"
local -r FILE_PREFIX="body"
local -r FILE_SRC="${DIR_TEMP}/${FILE_PREFIX}.md"
local -r FILE_TEX="${FILE_PREFIX}.tex"
local -r FILE_PDF="${FILE_PREFIX}.pdf"
local -r FILE_DST="output.pdf"
$log "Create ${DIR_TEMP}"
mkdir -p "${DIR_TEMP}"
$log "Concatenate files to ${FILE_SRC}"
cat ./??.md > "${FILE_SRC}"
$log "Generate ${FILE_TEX}"
pandoc --standalone --to context "${FILE_SRC}" \
> "${FILE_TEX}"
$log "Generate ${FILE_PDF}"
context --nonstopmode --batchmode --purgeall "${FILE_TEX}" \
> /dev/null 2>&1
$log "Rename ${FILE_PDF} to ${FILE_DST}"
mv "${FILE_PDF}" "${FILE_DST}"
$log "Remove ${DIR_TEMP}"
rm -rf "${DIR_TEMP}"
}
The first line in the function provides, but does not create, a unique path and assigns the value to a constant:
local -r DIR_TEMP="$(mktemp tmp.XXXXXXXXXX -ut)"
Creating intermediary Markdown files in a temporary directory avoids triggering an infinite loop. The subsequent lines create additional constants that are used by the commands that follow.
Another line of note is:
cat ./??.md > "${FILE_SRC}"
Using the ./
prefix to the globbing filename pattern of ??.md
ensures that filenames starting with a hyphen (-
) will not be interpreted as an argument to the concat
enate command. (This is also useful when deleting files that begin with a hyphen, such as rm ./-filename.txt
.)
Invoking ConTeXt in the following fashion will suppress useful error logs, an issue that will be addressed later:
context --nonstopmode --batchmode --purgeall "${FILE_TEX}" \
> /dev/null 2>&1
Renaming body.pdf
to output.pdf
towards the end of execute_build()
is an atomic operation that stops Evince from reloading the PDF file while ConTeXt is creating it, which helps prevent Evince from crashing.
Algorithmically, the function performs the following steps:
- Create a temporary directory to store Markdown files.
- Create a new Markdown source file in that temporary directory.
- Run pandoc to create a standalone ConTeXt document.
- Run ConTeXt to convert pandoc’s output to a PDF file.
- Rename the PDF file to a destination filename.
- Clean up by removing the temporary directory.
Run Script
Stop and restart the ci
script as before, then create 01.md
within $HOME/dev/writing/book
as follows:
# Typesetting Markdown -- Part 1: Build Script
This series describes a way to typeset Markdown content using the powerful typesetting engine ConTeXt.
Save the file. Check the output in the second terminal window, which will resemble:
[17:05:54.4291] Check missing software requirements
[17:05:54.4308] Execute tasks
[17:05:54.4322] Await file modifications
[17:05:56.1383] ./01.md (CLOSE_WRITE,CLOSE)
[17:05:56.1415] Create /tmp/tmp.w553vGErp6
[17:05:56.1440] Concatenate files to /tmp/tmp.w553vGErp6/body.md
[17:05:56.1468] Generate body.tex
[17:05:56.2058] Generate body.pdf
[17:05:57.8291] Rename body.pdf to output.pdf
[17:05:57.8321] Remove /tmp/tmp.w553vGErp6
Next, open output.pdf
with Evince. The PDF resembles:
Preview of continuously generated document
With Evince still open, change the word ConTeXt
in 01.md
to \ConTeXt
. The PDF file contents shown in the PDF reader change to the following:
Notice how the letter e
in ConTeXt
changes from lowercase to uppercase and drops below the baseline. This happened because pandoc exported the \ConTeXt
macro verbatim to the .tex
file, which ConTeXt then typeset as shown.
The ci
script now rebuilds the document upon any changes to Markdown files, an effective continuous integration.
Download
Download the continuous integration script, distributed under the MIT license.
Summary
This part explained how to create a shell script that performs continuous integration of modified Markdown files for PDF generation. Part 4 describes a way to create a document theme applied to various Markdown document chapters.
Contact
About the Author
My career has spanned tele- and radio communications, enterprise-level e-commerce solutions, finance, transportation, modernization projects in both health and education, and much more.
Delighted to discuss opportunities to work with revolutionary companies combatting climate change.