Technical Articles / Opinions / News / Projects

soupault Configuration

In a nutshell, the purpose of soupault is to post-process HTML files generated by the generation processes of cleopatra. It is parameterized by two settings, the <<build-dir>> directory where ~soupault generates its output, and an eventual <<subdir>> wherein the website contents lives. The latter allows to generate only a subpart of a larger website.

For the present website, these two settings are initialized as follows.

<<build-dir>> :=
<<prefix>> :=

The rest of this document proceeds as follows. We first describe the general settings of soupault. Then, we enumerate the widgets enabled for this website. Finally, we provide a proper definition for soupault the cleopatra generation process.

Table of Contents

1 soupault General Settings

The general settings section of soupault.conf is fairly basic, and there is little to say that the “Getting Started” already discuss in length.

We emphasize three things:

  • The build_dir is set to <<build-dir>>/<<prefix>> in place of simply <<build-dir>>.
  • The ignore_extensions shall be updated to take into account artifacts produces by other cleopatra generation processes.
  • We disable the “clean URLs” feature of soupault. This option renames a HTML files ~foo/bar.html into foo/bar/index.html, which means when served by a HTTP server, the foo/bar URL will work. The issue we have with this feature is that the internal links within your websiste needs to take their final URL into account, rather than their actual name. If one day soupault starts rewriting internal URLs when clean_url is enabled, we might reconsider using it.
strict = true
verbose = false
debug = false
site_dir = "site"
build_dir = "<<build-dir>>/<<prefix>>"

page_file_extensions = ["html"]
ignore_extensions = [
  "draft", "vo", "vok", "vos", "glob",
  "html~", "org", "aux", "sass",

generator_mode = true
complete_page_selector = "html"
default_template_file = "templates/main.html"
default_content_selector = "main"
doctype = "<!DOCTYPE html>"
clean_urls = false

The list of ignored extensions should be programmatically generated with the help of cleopatra.

2 Widgets

2.1 Setting Page Title

We use the “page title” widget to set the title of the webpage based on the first (and hopefully the only) <h1> tag of the page.

widget = "title"
selector = "h1"
default = "~lthms"
prepend = "~lthms: "

2.2 Acknowledging soupault

When creating a new soupault project (using soupault --init), the default configuration file suggests advertising the use of soupault. Rather than hard-coding the used version of soupault (which is error-prone), we rather determine the version of soupault with the following script.

<<soupault-version>> :=
soupault --version | head -n 1 | tr -d '\n'
soupault 2.1.0

The configuration of the widget —initially provided by soupault— becomes less subject to the obsolescence.

widget = "insert_html"
html = """
  <meta name="generator" content="<<soupault-version()>>">
selector = "head"

2.3 Generating Table of Contents

The toc widget allows for generating a table of contents for HTML files which contains a node matching a given selector (in the case of this document, #generate-toc).

widget = "toc"
selector = "#generate-toc"
action = "replace_element"
valid_html = true
min_level = 2
numbered_list = true

We could propose a patch to soupault's upstream to add numbering in titles.

2.4 Fixing Org Internal Links

For some reason, Org prefix internal links to other Org documents with file://. To avoid that, we provide a simple plugin which removes file:// from the begining of a URL.

This plugin definition should be part of the org generation process, but that would require to aggregate “subconfig” into a larger one.

This plugin key component is the fix_org_urls function.

fix_org_urls(LIST, ATTR)
Enumerate the DOM elements of LIST, and check their ATTR attribute.
function fix_org_urls(list, attr)
  index, link = next(list)

  while index do
    href = HTML.get_attribute(link, attr)

    if href then
      href = Regex.replace(href, "^file://", "")
      HTML.set_attribute(link, attr, href)

    index, link = next(list, index)

We use this function to fix the URLs of tags known to be subject to Org strange behavior. For now, only <a> has been affected.

fix_org_urls(, "a"), "href")
fix_org_urls(, "img"), "src")

The configuration of this plugin, and the associated widget, is straightforward.

widget = "fix-org-urls"

2.5 Prefixing Internal URLs

On the one hand, internal links can be absolute, meaning they start with a leading /, and therefore are relative to the website root. On the other hand, website (especially static website) can be placed in larger context. For instance, my personal website lives inside the ~lthms directory of the domain.

The purpose of this plugin is to rewrite internal URLs which are relative to the root, in order to properly prefix them.

From a high-level perspective, the plugin structure is the following.

prefix_url = config["prefix_url"]

  1. We validate the widget configuration.
  2. We propose a generic function to enumerate and rewrite tags which can have internal URLs as attribute argument.
  3. We use this generic function for relevant tags.
<<validate_prefix>> :=
if not prefix_url then"Missing mandatory field: `prefix_url'")

if not Regex.match(prefix_url, "^/(.*)") then
  prefix_url = "/" .. prefix_url

if not Regex.match(prefix_url, "(.*)/$") then
  prefix_url = prefix_url .. "/"
<<prefix_func>> :=
function prefix_urls (links, attr, prefix_url)
  index, link = next(links)

  while index do
    href = HTML.get_attribute(link, attr)

    if href then
      if Regex.match(href, "^/") then
        href = Regex.replace(href, "^/*", "")
        href = prefix_url .. href

      HTML.set_attribute(link, attr, href)
    index, link = next(links, index)
<<prefix_calls>> :=
prefix_urls(, "a"), "href", prefix_url)
prefix_urls(, "link"), "href", prefix_url)
prefix_urls(, "img"), "src", prefix_url)
prefix_urls(, "script"), "src", prefix_url)

Again, configuring soupault to use this plugin is relatively straightforward. The only important thing to notice is the use of the after field, to ensure this plugin is run after the plugin responsible for fixing Org documents URLs.

widget = "urls-rewriting"
prefix_url = "<<prefix>>"
after = "fix-org-urls"

2.6 Marking External Links

function mark(name)
  return '<i class="url-mark fa fa-' .. name ..
         '" aria-hidden="true"></i>'

links =, "a")

index, link = next(links)

while index do
  href = HTML.get_attribute(link, "href")

  if href then
    if Regex.match(href, "^https?://") then
      icon = HTML.parse(mark('github'))
      HTML.append_child(link, icon)
    elseif Regex.match(href, "^https?://") then
      icon = HTML.parse(mark('external-link'))
      HTML.append_child(link, icon)

  index, link = next(links, index)
    display: inline
    font-size: 90%
    width: 1em

    content: "\00a0\f09b"

    content: "\00a0\f08e"
after = "generate-history"
widget = "external-urls"

2.7 Generating Per-File Revisions Tables

2.7.1 Users Instructions

This widgets allows to generate a so-called “revisions table” of the filename contained in a DOM element of id history, based on its history. Paths should be relative to the directory from which you start the build process (typically, the root of your repository). The revisions table notably provides hyperlinks to a git webview for each commit.

For instance, considering the following HTML snippet

<div id="history">

This plugin will replace the content of this <div> with the revisions table of site/posts/

2.7.2 Customization

The base of the URL webview for the document you are currently reading —afterwards abstracted with the <<repo>> noweb reference— is

<<repo>> :=
<details class="history">
    This revisions table has been automatically generated
    from <a href="<<repo>>">the <code>git</code> history
    of this website repository</a>, and the change
    descriptions may not always be as useful as they

    You can consult the source of this file in its current
    version <a href="<<repo>>/tree/{{file}}">here</a>.

    <td class="date"
    <td class="subject">{{subject}}</td>
    <td class="commit">
      <a href="<<repo>>/commit/{{filename}}/?id={{hash}}">
    border-top : 2px solid black
    border-bottom : 2px solid black
    border-collapse : collapse
    width : 35rem

    border-bottom : 1px solid black
    padding : .5em

#history .commit
    font-size : smaller
    font-family : 'Fira Code', monospace
    width : 7em
    text-align : center

2.7.3 Implementation

We use the built-in preprocess_element to implement, which means we need a script which gets its input from the standard input, and echoes its output to the standard input.

widget = "preprocess_element"
selector = "#history"
command = 'scripts/ templates/history.html'
action = "replace_content"

This plugin should be reimplemented using libgit2 or other git libraries, in a language more suitable than bash.

This plugin proceeds as follows:

  1. Using an ad-hoc script, it generates a JSON containing for each revision
    • The subject, date, hash, and abbreviated hash of the related commit
    • The name of the file at the time of this commit
  2. This JSON is passed to a mustache engine (haskell-mustache) with a proper template
  3. The content of the selected DOM element is replaced with the output of haskell-mustache

This translates in Bash like this.

function main () {
  local file="${1}"
  local template="${2}"

  generate_json ${file} > ${tmp_file}
  haskell-mustache ${template} ${tmp_file}
  rm ${tmp_file}

The difficult part of this script is the definition of the generate_json function. From a high-level perspective, this function is divided into three steps.

  1. We get an initial (but partial) set of data about the git commit of ${file}, from the most recent to the oldest
  2. For each commit, we check whether or not ${file} was renamed or not
  3. Finally, we output a result (because we are writing a bash script)
function generate_json () {
  local file="${1}"
  local logs=`<<git-log>>`

  if [ ! $? -eq 0 ]; then
      exit 1



We will use git to get the information we need. By default, git subcommands use a pager when its output is likely to be long. This typically includes git-log. To disable this behavior, git exposes the --no-pager command. We introduce _git, a wrapper around git with the proper option.

function _git () {
  git --no-pager "$@"

Afterwards, we use _git in place of git.

Using the git-log --pretty command-line argument, we can generate one JSON object per commit which contains most of the information we need, using the following format string.

<<pretty-format>> :=
{ "subject" : "%s", "abbr_hash" : "%h", "hash" : "%H", "date" : "%cs" }

Besides, we also need --follow to deal with file renaming. Without this option, git-log stops when the file first appears in the repository, even if this “creation” is actually a renaming. Therefore, the git command line we use to collect our initial history is

<<git-log>> :=
_git log --follow --pretty=format:'<<pretty-format>>' "${file}"

To manipulate JSON, we rely on three operators (yet to be defined):

In an OBJECT, get the value of a given FIELD
In an OBJECT, set the VALUE of a given FIELD
Append a VALUE at the end of an ARRAY
<<remane-tracking>> :=
local name="${file}"
local revisions='[]'
local first=0

while read -r rev; do
  rev=$(jset "${rev}" "filename" "\"${name}\"")

  if [ ${first} -eq 0 ]; then
      rev=$(jset "${rev}" "modified" "true")

  revisions=$(jappend "${revisions}" "${rev}")

  local hash=$(jget "${rev}" "hash")
  local rename=$(previous_name "${name}" "${hash}")

  if [[ ! -z "${rename}" ]]; then
done < <(echo "${logs}")

revisions=$(_jq "${revisions}" "length as \$l | .[\$l - 1].created |= true")
function previous_name () {
  local name=${1}
  local hash=${2}

  local unfold='s/ *\(.*\){\(.*\) => \(.*\)}/\1\2 => \1\3/'

  _git show --stat=10000 ${hash} \
      | sed -e "${unfold}" \
      | grep "=> ${name}" \
      | xargs \
      | cut -d' ' -f1
<<result-echoing>> :=
jset "$(jset "{}" "file" "\"${file}\"")" \
     "history" \

The last missing pieces are the definitions of the three JSON operators. We use jq to manipulate JSON data. Since jq processes JSON from its standard input, we first define a helper (similar to _git) to deal with JSON from variables seamlessly.

function _jq () {
  local input="${1}"
  local filter="${2}"

  echo "${input}" | jq -jcM "${filter}"
  • -j tells jq not to print a new line at the end of its outputs
  • -c tells jq to print JSON in a compact format (rather than prettified)
  • -M tells jq to output monochrome outputs

Internally, jget, jset, and jappend are implemented with jq basic filters.

function jget () {
  local obj="${1}"
  local field="${2}"

  _jq "${obj}" ".${field}"

function jset () {
  local obj="${1}"
  local field="${2}"
  local val="${3}"

  _jq "${obj}" "setpath([\"${field}\"]; ${val})"
function jappend () {
  local arr="${1}"
  local val="${2}"

  _jq "${arr}" ". + [ ${val} ]"

Everything is defined. We can call main now.

main "$(cat)" "${1}"

2.8 Rendering Equations Offline

2.8.1 Users instructions

Inline equations written in the DOM under the class and using the LaTeX \LaTeX syntax can be rendered once and for all by soupault. User For instance, <span class="imath">\LaTeX</span> is rendered LaTeX \LaTeX as expected.

Using this widgets requires being able to inject raw HTML in input files.

2.8.2 Implementation

We will use KaTeX \KaTeX to render equations offline. KaTeX \KaTeX availability on most systems is unlikely, but it is part of npm, so we can define a minimal package.json file to fetch it automatically.

  "private": true,
  "devDependencies": {
    "katex": "^0.11.1"

We introduce a Makefile recipe to call npm install. This command produces a file called package-lock.json that we add to GENFILES to ensure KaTeX \KaTeX will be available when soupault is called.

If has been modified since the last generation, Babel will generate package.json again. However, if the modifications of do not concern package.json, then npm install will not modify package-lock.json and its “last modified” time will not be updated. This means that the next time make will be used, it will replay this recipe again. As a consequence, we systematically touch packase-lock.json to satisfy make.

package-lock.json : package.json
        @echo "    init  npm packages"
        @npm install &>> build.log
        @touch $@

CONFIGURE += package-lock.json node_modules/

Once installed and available, KaTeX \KaTeX is really simple to use. The following script reads (synchronously!) the standard input, renders it using KaTeX \KaTeX and outputs the resut to the standard output.

var katex = require("katex");
var fs = require("fs");
var input = fs.readFileSync(0);
var displayMode = process.env.DISPLAY != undefined;

var html = katex.renderToString(String.raw`${input}`, {
    throwOnError : false,
    displayModed : displayMode


We reuse once again the preprocess_element widget. The selector is .imath (i stands for inline in this context), and we replace the previous content with the result of our script.

widget = "preprocess_element"
selector = ".imath"
command = "node scripts/katex.js"
action = "replace_content"

widget = "preprocess_element"
selector = ".dmath"
command = "DISPLAY=1 node scripts/katex.js"
action = "replace_content"

The KaTeX \KaTeX font is bigger than the serif font used for this website, so we reduce it a bit with a dedicated SASS rule.

.imath, .dmath
  font-size : smaller

  text-align : center

3 cleopatra Generation Process Definition

We introduce the soupault generation process, obviously based on the soupault HTML processor. The structure of a cleopatra generation process is always the same.


In the rest of this section, we define these three components.

3.1 Build Stages

From the perspective of cleopatra, it is a rather simple component, since the build stage is simply a call to soupault, whose outputs are located in a single (configurable) directory.

<<stages>> :=
soupault-build :
        @cleopatra echo Running  soupault

ARTIFACTS += <<build-dir>>/

3.2 Dependencies

Most of the generation processes (if not all of them) need to declare themselves as a prerequisite for soupault-build. If they do not, they will likely be executed after soupault is called.

This file defines an auxiliary SASS sheet that needs to be declared as a dependency of the build stage of the theme generation process.

Finally, the offline rendering of equations requires KaTeX \KaTeX to be available, so we include the file, and make package-lock.json (the proof that npm install has been executed) a prerequisite of soupault-build.

<<dependencies>> :=
theme-build : site/style/plugins.sass
soupault-build : package-lock.json

3.3 Ad-hoc Commands

Finally, this generation process introduces a dedicated (PHONY) command to start a HTTP server in order to navigate the generated website from a browser.

<<ad-hoc-cmds>> :=
serve :
        @echo "   start  a python server"
        @cd <<build-dir>>; python -m http.server 2>/dev/null

.PHONY : serve

This command does not assume anything about the current state of generation of the project. In particular, it does not check whether or not the <<build-dir>> directory exists. The responsibility to use make serve in a good setting lies with final users.