Test codeblocks in markdown documents

I enjoy a quick-start when I try out some new software. The sooner I see something blinking, the better. Straight-forward documentation and clear examples improve the experience of trying out new things. The simplest and most fun examples typically find their way into the readme of my project as that’s the place where new users are most likely to stumble upon first. In one project, I realised that I wanted to test the code that I wrote in the readme.md too. This post is about how I did just that.

Markdown is strong in its simplicity. Text-markup finds it way into the document via semantic textual decorations.

Hello world, it's a glorious day
====
Less is more.

Lorem Ipsum
---
Dolor C++:

```cpp
std::min(1.0, 3);
```

To distil code blocks - or more generally: elements - from the document, I use pandoc to convert the md document into an Abstract Syntax Tree (AST) in JSON format. An AST is nothing more than a tree-like structure that records how each symbol (token) belongs to its parent.

A simple AST could look like this:

1-doc
├── 1-heading
│   ├── 1-Hello
│   └── 2-World
├── 2-paragraph
│   ├── 1-Less
│   ├── 2-is
│   └── 3-more
├── 3-newline
├── 4-subheading
│   ├── 1-Lorem
│   └── 2-Ipsum
├── 5-paragraph
│   ├── 1-Dolor
│   └── 2-C++
├── 6-newline
└── 7-codeblock
    ├── 1-std__min_1_0__3__
    └── _type-cpp

Pandoc can generate a valid tree, which is extremely useful. Then, I can use jq to parse the AST and retrieve the right elements.

Instruct pandoc to generate the AST from an input file pandoc -i readme.md -t json, this will output the AST to stdout
Then use jq to select all blocks
Filter blocks that have t "CodeBlock"
Filter codeblocks that have type cpp (or anything else)
Output all the codeblock content on new lines

Use jq’s -r option to output actual newlines.

pandoc -i readme.md -t json \                                                                                   
| jq -r -c '.blocks[] | select(.t | contains("CodeBlock"))? | .c | select(.[0][1][0] | contains("cpp"))? | .[1] > readme.cpp'

Now, typically the plain dump of code from the readme.md won’t be executable because it’s not enclosed in a function nor has it access to the required dependencies. I use a simple bash-script to generate the required extras.

#!/bin/bash

if [ "$#" -ne 4 ]; then
  echo "Usage: $0 {pandoc path} {jq path} {./path/readme.md} {./output/test.cpp}" >&2
  exit 1
fi

CPPGEN=$($1 -i $3 -t json | $2 -r -c '.blocks[] | select(.t | contains("CodeBlock"))? | .c | select(.[0][1][0] | contains("cpp"))? | .[1]')

cat << EOF > $4
#include <gtest/gtest.h>

#include "mylib/mylib.hpp"

using namespace mylib;

TEST(readme, run) {
// generated by cmake via pandoc AST
$CPPGEN
}
EOF

./generate.sh {pandoc path} {jq path} {input md} {output cpp}

My build-system/task-runner for C++ projects is CMake. I let CMake generate the test on ‘configure-time’ in the build folder. In the ./tests directory I put the regular test sourcefiles and a CMakeLists.txt to configure the executables, adding:

# Use CMake to find the programs at configure time. 
find_program(PANDOC pandoc)
find_program(JQ jq)
if ("${PANDOC}" STREQUAL "PANDOC-NOTFOUND" OR "${JQ}" STREQUAL "JQ-NOTFOUND")
    message(WARNING "Pandoc or jq not found, will not generate test for code in readme.md")
else()
    message(STATUS "Generating test for code in readme.md with ${PANDOC}")
    execute_process(COMMAND bash "${CMAKE_SOURCE_DIR}/test/generate_readme_test.sh" "${PANDOC}" "${JQ}" "${CMAKE_SOURCE_DIR}/readme.md" "${CMAKE_CURRENT_BINARY_DIR}/test_readme.cpp")
endif()

file(GLOB TEST_SRCS test_*.cpp ${CMAKE_CURRENT_BINARY_DIR}/test_*.cpp)

Testing the readme

Success testing the readme code.

Note that you can even use the in-line tests of your testing framework.

Feel free to adjust to your needs!