Typesetting Markdown – Part 3: Automagicify

This part of the series describes how to create a build script that performs continuous integration when writing documents.

Introduction

Typesetting beautiful documentation can entail complex page layouts, stunning math, footnotes, citations, bibliographic references, lists of tables, river removal, microtypography and much more. There are many ways to generate PDF files from Markdown, but few of them provide the fine-grained controls that the typesetting engine ConTeXt offers.

Part 2 showed how to create a minimal PDF from a Markdown document using pandoc and ConTeXt.

Let’s address a few issues with the shell script created in Part 1 before exploring how to apply continuous integration concepts for regenerating documenation upon changes to the source files. Or skip the prepatory work by jumping to the new build script template section and continue reading about continuous integration from there.

Robust Scripting

Once a shell script takes on a life of its own, subtle effects and logic changes can cause the script to fail in unexpected ways. Robust scripts can compensate for some commonly encountered issues. For details beyond what is covered here, see:

Terminate on Error

When a bash script is run, a safe practice is to ensure that the script terminates if any of the commands it executes fail. Accomplish this by using the set command at the top of the script as follows:

set -o errexit

The disadvantage is that $? cannot be used to determine whether a command failed because bash will exit immediately. A way to work around the issue is to follow the command with a Boolean expression that returns true. Consider the following command:

$ ls /UNICORN; echo $?
ls: cannot access '/UNICORN': No such file or directory
2

The 2 is the exit level from running the ls command on a non-existent directory, which is displayed using echo $?. By appending a Boolean expression (|| true), the value of $? is displayed as 0 instead:

$ ls /UNICORN || true; echo $?
ls: cannot access '/UNICORN': No such file or directory
0

Thus the bash script will not terminate, even if the command fails, despite having enabled errexit; however, the $? variable is then always 0, which isn’t very helpful. One way to address this side-effect is to include the statements to execute directly, such as the following:

ls /UNICORN || { echo "Try /narwhal"; }

Or, in a slightly cleaner fashion, by calling your own function:

ls /UNICORN || missing_unicorn;

Uninitialised Variables

Instruct bash to prevent referencing uninitialised variables by using the set command near the top of the script as follows:

set -o nounset

Since bash will terminate the shell script whenever an uninitialised variable is used, the unset variables now cause the following error:

$ ./build -d
./build: line 38: ARG_HELP: unbound variable

One way to resolve this is to change the corresponding unset lines to:

ARG_HELP=
ARG_DEBUG=
REQUIRED_MISSING=

The variables are initialised to empty strings, allowing the script to use them as expected. But making these changes does not address a deeper issue: the original script contains duplicated logic. Let’s fix the deeper issue.

Eliminate Code Duplication

A principle of software development is that every piece of knowledge must have a single, unambiguous, authoritative representation.

Although subtle, this principle is broken in the script from Part 1. First, consider the following snippet from main():

if [ -n "${ARG_HELP}" ]; then
  show_usage
  exit 3
fi

Next, parse_commandline() also has logic regarding the help argument:

-h|-\?|--help)
  ARG_HELP="true"
;;

These two snippets are conditional evaluations regarding the “help” functionality. There is no need to twice evaluate whether help must be displayed. The duplication becomes apparent when the if statement is re-written as follows:

if [ "${ARG_HELP}" = "true" ]; then

Using delegate functions eliminates the duplication. This entails the following changes:

Non-functional Function

A function having no functionality (that is, no operations of consequence) resembles:

noop() {
  return 1
}

Returning 1 indicates that the function call failed, which ensures that any commands issued after a Boolean and (&&) conditional expression are not executed. This return value helps to simplify the main function.

Terminate Function

To eliminate some minor duplication (the exit command being issued in multiple locations), create a function that’s called to terminate the script:

terminate() {
  exit "$1"
}

This is useful for isolating any clean-up operations to a single location. For example, trap an interrupt signal to turn off ANSI colour sequences or delete temporary directories.

Rename Functions

Rename the functions performing actual work with a prefix that suggests their usage, such as:

utile_show_usage() {
  printf "Usage: %s [OPTION...]\n" "${SCRIPT_NAME}" >&2
  printf "  -d, --debug\t\tLog messages while processing\n" >&2
  printf "  -h, --help\t\tShow this help message then exit\n" >&2

  return 0
}

It is important that the help utile_ function returns 0, meaning the function call succeeded, and therefore the script must terminate. The function is treated similar to a command in that its return value is like an exit level. Consequently, returning non-zero values from functions will trigger the errexit setting, causing the script to terminate.

Function Assignment

In bash, delegate functions can be assigned like any other variable. At the bottom of the script, define delegate function variables for help and logging as follows:

show_usage=noop
log=noop

By default, the show_usage and log variables reference the noop function. Different languages have ways to support such functionality. For example, C has function pointers. When parsing command-line arguments, be sure to set variables to the appropriate utile_ function.

Command-line Parsing

Eliminate the first instance of duplication by changing the debug and help cases in parse_commandline() as follows:

      -d|--debug)
        log=utile_log
      ;;
      -h|-\?|--help)
        show_usage=utile_show_usage
      ;;

Rather than set a variable to record whether help or logging was requested, the noop function is replaced with the utile_ function associated with the command-line argument.

Call Delegates

After these changes main() simplifies to:

main() {
  parse_commandline "$@"

  $show_usage && terminate 3
  validate_requirements && terminate 4

  cd "${SCRIPT_DIR}" && execute_tasks && terminate 5

  terminate 0
}

The show_usage function call must be replaced by the $show_usage delegate function variable. Similarly, all calls to log must become $log, such as:

execute_tasks() {
  $log "Execute tasks"

  return 1
}

For completeness, the last simplification is to eliminate the conditional from the original log function:

utile_log() {
  printf "[%s] " "$(date +%H:%I:%S.%4N)"
  coloured_text "$1" "${COLOUR_LOGGING}"
}

The new build script template is ready for continuous integration.

New Build Script Template

Download the new build script template, distributed under the MIT license. This template is the starting point for the continuous integration script.

Continuous Integration

Continuous integration is useful for more than rebuilding documents when source files have changed. Another possible scenario is interweaving Markdown with R code to generate sophisticated living documents. Let’s walk through changes to the build script to see how continuous integration can work in practice.

Setup

Type the following commands at a new command prompt (hereafter referred to as the first terminal) to create a new sandbox:

mkdir -p $HOME/dev/writing/book  # make directory (fail silently)
cd $HOME/dev/writing/book        # change directory
rm *                             # remove previous sandbox

Copy the build script template into $HOME/dev/writing/book/, then rename the script and make it executable as follows:

mv build ci
chmod +x ci

The continuous integration script is ready to be modified.

Main Loop

Open the ci script in any text editor. Change execute_tasks() as follows:

execute_tasks() {
  $log "Execute tasks"

  local await=close_write

  $log "Await file modifications"
  inotifywait -q -e "${await}" -m . | \
  while read -r directory event filename; do
    echo "${directory}${filename} changed (${event})"
  done

  return 1
}

Open another command prompt (hereafter the second terminal), then type the following commands:

cd $HOME/dev/writing/book
./ci -d

The following messages appear:

[14:02:12.3965] Check missing software requirements
[14:02:12.3976] Execute tasks
[14:02:12.3984] Await file modifications

The script is listening for file modification events—in particular, files closed after writing—within the directory hierarchy of $HOME/dev/writing/book/.

Change the echo line above to use the delegated log function:

    $log "${directory}${filename} changed (${event})"

Save the file. The second terminal shows:

./ci changed (CLOSE_WRITE,CLOSE)

This is expected because bash loads a script when it is first invoked, but never reloads that script, even if modified. To see the changes, complete the following steps:

  1. Stop the ci script.
  2. Re-run the ci script.
  3. Return to the first terminal.
  4. Type: touch 01.md
  5. Press Enter.

The second terminal shows:

[14:02:19.1366] ./01.md changed (CLOSE_WRITE,CLOSE)

Remember to update validate_requirements() to include the new software requirement:

  required inotifywait "https://github.com/rvoicilas/inotify-tools/wiki"

Next, change the script to execute a build whenever a Markdown file changes. Update execute_tasks() as follows:

execute_tasks() {
  $log "Execute tasks"

  local -r await=close_write,delete

  $log "Await file modifications"
  inotifywait -q -e "${await}" -m . | \
  while read -r directory event filename; do

    # Act on Markdown file events; ignore directory delete events.
    if [[ "${filename,,}" == *\.*md && ! "${event}" == *ISDIR* ]]; then
      $log "${directory}${filename} (${event})"

      execute_build
    fi
  done

  return 1
}

The changes include:

Technically, the filter will pass desirable filenames such as 01.md, 01.MD, and 01.Rmd; however, it will also pass less desirable names like 01.cmd. Be aware that there are more possible extensions for Markdown files than the ci script recognises. A more complex regular expression pattern will match more Markdown extensions, such as the following:

.*\.(m(?:d(?:te?xt|o?wn)?|arkdown|kdn?)|text)$

Define an empty execute_build function as follows:

execute_build() {
  :
}

Save the changes. In the second terminal restart the ci script as follows:

  1. Press Ctrl+c to terminate the script.
  2. Run ./ci -d to restart the script.

In the first terminal, try creating and deleting both files and directories with various filename extensions. Watch the second terminal to see how the script behaves.

The main function is complete.

Continuous Build Process

Create a continuous build process by changing execute_build() to the following:

execute_build() {
  local -r DIR_TEMP="$(mktemp tmp.XXXXXXXXXX -ut)"
  local -r FILE_PREFIX="body"

  local -r FILE_SRC="${DIR_TEMP}/${FILE_PREFIX}.md"
  local -r FILE_TEX="${FILE_PREFIX}.tex"
  local -r FILE_PDF="${FILE_PREFIX}.pdf"
  local -r FILE_DST="output.pdf"
  
  $log "Create ${DIR_TEMP}"
  mkdir -p "${DIR_TEMP}"

  $log "Concatenate files to ${FILE_SRC}"
  cat ./??.md > "${FILE_SRC}"

  $log "Generate ${FILE_TEX}"
  pandoc --standalone --to context "${FILE_SRC}" \
    > "${FILE_TEX}"

  $log "Generate ${FILE_PDF}"
  context --nonstopmode --batchmode --purgeall "${FILE_TEX}" \
    > /dev/null 2>&1 
  
  $log "Rename ${FILE_PDF} to ${FILE_DST}"
  mv "${FILE_PDF}" "${FILE_DST}"

  $log "Remove ${DIR_TEMP}"
  rm -rf "${DIR_TEMP}"
}

The first line in the function provides, but does not create, a unique path and assigns the value to a constant:

  local -r DIR_TEMP="$(mktemp tmp.XXXXXXXXXX -ut)"

Creating intermediary Markdown files in a temporary directory avoids triggering an infinite loop. The subsequent lines create additional constants that are used by the commands that follow.

Another line of note is:

  cat ./??.md > "${FILE_SRC}"

Using the ./ prefix to the globbing filename pattern of ??.md ensures that filenames starting with a hyphen (-) will not be interpreted as an argument to the concatenate command. (This is also useful when deleting files that begin with a hyphen, such as rm ./-filename.txt.)

Invoking ConTeXt in the following fashion will suppress useful error logs, an issue that will be addressed later:

  context --nonstopmode --batchmode --purgeall "${FILE_TEX}" \
    > /dev/null 2>&1 

Renaming body.pdf to output.pdf towards the end of execute_build() is an atomic operation that stops Evince from reloading the PDF file while ConTeXt is creating it, which helps prevent Evince from crashing.

Algorithmically, the function performs the following steps:

  1. Create a temporary directory to store Markdown files.
  2. Create a new Markdown source file in that temporary directory.
  3. Run pandoc to create a standalone ConTeXt document.
  4. Run ConTeXt to convert pandoc’s output to a PDF file.
  5. Rename the PDF file to a destination filename.
  6. Clean up by removing the temporary directory.

Run Script

Stop and restart the ci script as before, then create 01.md within $HOME/dev/writing/book as follows:

# Typesetting Markdown -- Part 1: Build Script

This series describes a way to typeset Markdown content using the powerful typesetting engine ConTeXt.

Save the file. Check the output in the second terminal window, which will resemble:

[17:05:54.4291] Check missing software requirements
[17:05:54.4308] Execute tasks
[17:05:54.4322] Await file modifications
[17:05:56.1383] ./01.md (CLOSE_WRITE,CLOSE)
[17:05:56.1415] Create /tmp/tmp.w553vGErp6
[17:05:56.1440] Concatenate files to /tmp/tmp.w553vGErp6/body.md
[17:05:56.1468] Generate body.tex
[17:05:56.2058] Generate body.pdf
[17:05:57.8291] Rename body.pdf to output.pdf
[17:05:57.8321] Remove /tmp/tmp.w553vGErp6

Next, open output.pdf with Evince. The PDF resembles:

Preview of continuously generated document

With Evince still open, change the word ConTeXt in 01.md to \ConTeXt. The PDF file contents shown in the PDF reader change to the following:

Notice how the letter e in ConTeXt changes from lowercase to uppercase and drops below the baseline. This happened because pandoc exported the \ConTeXt macro verbatim to the .tex file, which ConTeXt then typeset as shown.

The ci script now rebuilds the document upon any changes to Markdown files, an effective continuous integration.

Download

Download the continuous integration script, distributed under the MIT license.

Summary

This part explained how to create a shell script that performs continuous integration of modified Markdown files for PDF generation. Part 4 describes a way to create a document theme applied to various Markdown document chapters.

Contact

About the Author

My career has spanned tele- and radio communications, enterprise-level e-commerce solutions, finance, transportation, modernization projects in both health and education, and much more.

Delighted to discuss opportunities to work with revolutionary companies combatting climate change.