Technical Articles / Opinions / News / Projects

# soupault Configuration

In a nutshell, the purpose of soupault is to post-process HTML files generated by the generation processes of cleopatra. It is parameterized by two settings, the <<build-dir>> directory where ~soupault generates its output, and an eventual <<subdir>> wherein the website contents lives. The latter allows to generate only a subpart of a larger website.

For the present website, these two settings are initialized as follows.

<<build-dir>> :=
build

<<prefix>> :=
~lthms


The rest of this document proceeds as follows. We first describe the general settings of soupault. Then, we enumerate the widgets enabled for this website. Finally, we provide a proper definition for soupault the cleopatra generation process.

## 1soupault General Settings

The general settings section of soupault.conf is fairly basic, and there is little to say that the “Getting Started” already discuss in length.

We emphasize three things:

• The build_dir is set to <<build-dir>>/<<prefix>> in place of simply <<build-dir>>.
• The ignore_extensions shall be updated to take into account artifacts produces by other cleopatra generation processes.
• We disable the “clean URLs” feature of soupault. This option renames a HTML files ~foo/bar.html into foo/bar/index.html, which means when served by a HTTP server, the foo/bar URL will work. The issue we have with this feature is that the internal links within your websiste needs to take their final URL into account, rather than their actual name. If one day soupault starts rewriting internal URLs when clean_url is enabled, we might reconsider using it.
[settings]
strict = true
verbose = false
debug = false
site_dir = "site"
build_dir = "<<build-dir>>/<<prefix>>"

page_file_extensions = ["html"]
ignore_extensions = [
"draft", "vo", "vok", "vos", "glob",
"html~", "org", "aux", "sass",
]

generator_mode = true
complete_page_selector = "html"
default_template_file = "templates/main.html"
default_content_selector = "main"
doctype = "<!DOCTYPE html>"
clean_urls = false

soupault.conf

The list of ignored extensions should be programmatically generated with the help of cleopatra.

## 2 Widgets

### 2.1 Setting Page Title

We use the “page title” widget to set the title of the webpage based on the first (and hopefully the only) <h1> tag of the page.

[widgets.page-title]
widget = "title"
selector = "h1"
default = "~lthms"
prepend = "~lthms: "

soupault.conf

### 2.2 Acknowledging soupault

When creating a new soupault project (using soupault --init), the default configuration file suggests advertising the use of soupault. Rather than hard-coding the used version of soupault (which is error-prone), we rather determine the version of soupault with the following script.

<<soupault-version>> :=
soupault --version | head -n 1 | tr -d '\n'

soupault 2.1.0


The configuration of the widget —initially provided by soupault— becomes less subject to the obsolescence.

[widgets.generator-meta]
widget = "insert_html"
html = """
<meta name="generator" content="<<soupault-version()>>">
"""

soupault.conf

The toc widget allows for generating a table of contents for HTML files which contains a node matching a given selector (in the case of this document, #generate-toc).

[widgets.table-of-contents]
widget = "toc"
selector = "#generate-toc"
action = "replace_element"
valid_html = true
min_level = 2
numbered_list = true

soupault.conf

We could propose a patch to soupault's upstream to add numbering in titles.

### 2.4 Fixing Org Internal Links

For some reason, Org prefix internal links to other Org documents with file://. To avoid that, we provide a simple plugin which removes file:// from the begining of a URL.

This plugin definition should be part of the org generation process, but that would require to aggregate “subconfig” into a larger one.

This plugin key component is the fix_org_urls function.

fix_org_urls(LIST, ATTR)
Enumerate the DOM elements of LIST, and check their ATTR attribute.
function fix_org_urls(list, attr)

while index do

if href then
href = Regex.replace(href, "^file://", "")
end

end
end

plugins/fix-org-urls.lua

We use this function to fix the URLs of tags known to be subject to Org strange behavior. For now, only <a> has been affected.

fix_org_urls(HTML.select(page, "a"), "href")
fix_org_urls(HTML.select(page, "img"), "src")

plugins/fix-org-urls.lua

The configuration of this plugin, and the associated widget, is straightforward.

[widgets.fix-org-urls]
widget = "fix-org-urls"

soupault.conf

### 2.5 Prefixing Internal URLs

On the one hand, internal links can be absolute, meaning they start with a leading /, and therefore are relative to the website root. On the other hand, website (especially static website) can be placed in larger context. For instance, my personal website lives inside the ~lthms directory of the soap.coffee domain.

The purpose of this plugin is to rewrite internal URLs which are relative to the root, in order to properly prefix them.

From a high-level perspective, the plugin structure is the following.

prefix_url = config["prefix_url"]
<<validate_prefix>>

<<prefix_func>>
<<prefix_calls>>

plugins/urls-rewriting.lua
1. We validate the widget configuration.
2. We propose a generic function to enumerate and rewrite tags which can have internal URLs as attribute argument.
3. We use this generic function for relevant tags.
<<validate_prefix>> :=
if not prefix_url then
Plugin.fail("Missing mandatory field: prefix_url'")
end

if not Regex.match(prefix_url, "^/(.*)") then
prefix_url = "/" .. prefix_url
end

if not Regex.match(prefix_url, "(.*)/") then prefix_url = prefix_url .. "/" end  <<prefix_func>> := function prefix_urls (links, attr, prefix_url) index, link = next(links) while index do href = HTML.get_attribute(link, attr) if href then if Regex.match(href, "^/") then href = Regex.replace(href, "^/*", "") href = prefix_url .. href end HTML.set_attribute(link, attr, href) end index, link = next(links, index) end end  <<prefix_calls>> := prefix_urls(HTML.select(page, "a"), "href", prefix_url) prefix_urls(HTML.select(page, "link"), "href", prefix_url) prefix_urls(HTML.select(page, "img"), "src", prefix_url) prefix_urls(HTML.select(page, "script"), "src", prefix_url)  Again, configuring soupault to use this plugin is relatively straightforward. The only important thing to notice is the use of the after field, to ensure this plugin is run after the plugin responsible for fixing Org documents URLs. [widgets.urls-rewriting] widget = "urls-rewriting" prefix_url = "<<prefix>>" after = "fix-org-urls"  soupault.conf ### 2.6 Marking External Links function mark(name) return '<i class="url-mark fa fa-' .. name .. '" aria-hidden="true"></i>' end links = HTML.select(page, "a") index, link = next(links) while index do href = HTML.get_attribute(link, "href") if href then if Regex.match(href, "^https?://github.com") then icon = HTML.parse(mark('github')) HTML.append_child(link, icon) elseif Regex.match(href, "^https?://") then icon = HTML.parse(mark('external-link')) HTML.append_child(link, icon) end end index, link = next(links, index) end  plugins/external-urls.lua .url-mark.fa display: inline font-size: 90% width: 1em .url-mark.fa-github::before content: "\00a0\f09b" .url-mark.fa-external-link::before content: "\00a0\f08e"  site/style/plugins.sass [widgets.mark-external-urls] after = "generate-history" widget = "external-urls"  soupault.conf ### 2.7 Generating Per-File Revisions Tables #### 2.7.1 Users Instructions This widgets allows to generate a so-called “revisions table” of the filename contained in a DOM element of id history, based on its history. Paths should be relative to the directory from which you start the build process (typically, the root of your repository). The revisions table notably provides hyperlinks to a git webview for each commit. For instance, considering the following HTML snippet <div id="history"> site/posts/FooBar.org </div>  This plugin will replace the content of this <div> with the revisions table of site/posts/FooBar.org. #### 2.7.2 Customization The base of the URL webview for the document you are currently reading —afterwards abstracted with the <<repo>> noweb reference— is <<repo>> := https://code.soap.coffee/writing/lthms.git  <details class="history"> <summary>Revisions</summary> <p> This revisions table has been automatically generated from <a href="<<repo>>">the <code>git</code> history of this website repository</a>, and the change descriptions may not always be as useful as they should. </p> <p> You can consult the source of this file in its current version <a href="<<repo>>/tree/{{file}}">here</a>. </p> <table> {{#history}} <tr> <td class="date" {{#created}} id="created-at" {{/created}} {{#modified}} id="modified-at" {{/modified}} > {{date}} </td> <td class="subject">{{subject}}</td> <td class="commit"> <a href="<<repo>>/commit/{{filename}}/?id={{hash}}"> {{abbr_hash}} </a> </td> </tr> {{/history}} </table> </details>  templates/history.html table border-top : 2px solid black border-bottom : 2px solid black border-collapse : collapse width : 35rem td border-bottom : 1px solid black padding : .5em #history .commit font-size : smaller font-family : 'Fira Code', monospace width : 7em text-align : center  site/style/plugins.sass #### 2.7.3 Implementation We use the built-in preprocess_element to implement, which means we need a script which gets its input from the standard input, and echoes its output to the standard input. [widgets.generate-history] widget = "preprocess_element" selector = "#history" command = 'scripts/history.sh templates/history.html' action = "replace_content"  soupault.conf This plugin should be reimplemented using libgit2 or other git libraries, in a language more suitable than bash. This plugin proceeds as follows: 1. Using an ad-hoc script, it generates a JSON containing for each revision • The subject, date, hash, and abbreviated hash of the related commit • The name of the file at the time of this commit 2. This JSON is passed to a mustache engine (haskell-mustache) with a proper template 3. The content of the selected DOM element is replaced with the output of haskell-mustache This translates in Bash like this. function main () { local file="{1}"
local template="${2}" tmp_file=$(mktemp)
generate_json ${file} >${tmp_file}
haskell-mustache ${template}${tmp_file}
rm ${tmp_file} }  scripts/history.sh The difficult part of this script is the definition of the generate_json function. From a high-level perspective, this function is divided into three steps. 1. We get an initial (but partial) set of data about the git commit of ${file}, from the most recent to the oldest
2. For each commit, we check whether or not ${file} was renamed or not 3. Finally, we output a result (because we are writing a bash script) function generate_json () { local file="${1}"
local logs=<<git-log>>

if [ ! $? -eq 0 ]; then exit 1 fi <<remane-tracking>> <<result-echoing>> }  scripts/history.sh We will use git to get the information we need. By default, git subcommands use a pager when its output is likely to be long. This typically includes git-log. To disable this behavior, git exposes the --no-pager command. We introduce _git, a wrapper around git with the proper option. function _git () { git --no-pager "$@"
}

scripts/history.sh

Afterwards, we use _git in place of git.

Using the git-log --pretty command-line argument, we can generate one JSON object per commit which contains most of the information we need, using the following format string.

<<pretty-format>> :=
{ "subject" : "%s", "abbr_hash" : "%h", "hash" : "%H", "date" : "%cs" }


Besides, we also need --follow to deal with file renaming. Without this option, git-log stops when the file first appears in the repository, even if this “creation” is actually a renaming. Therefore, the git command line we use to collect our initial history is

<<git-log>> :=
_git log --follow --pretty=format:'<<pretty-format>>' "${file}"  To manipulate JSON, we rely on three operators (yet to be defined): jget OBJECT FIELD In an OBJECT, get the value of a given FIELD jset OBJECT FIELD VALIE In an OBJECT, set the VALUE of a given FIELD jappend ARRAY VALUE Append a VALUE at the end of an ARRAY <<remane-tracking>> := local name="${file}"
local revisions='[]'
local first=0

rev=$(jset "${rev}" "filename" "\"${name}\"") if [${first} -eq 0 ]; then
rev=$(jset "${rev}" "modified" "true")
first=1
fi

revisions=$(jappend "${revisions}" "${rev}") local hash=$(jget "${rev}" "hash") local rename=$(previous_name "${name}" "${hash}")

if [[ ! -z "${rename}" ]]; then name=${rename}
fi
done < <(echo "${logs}") revisions=$(_jq "${revisions}" "length as \$l | .[\$l - 1].created |= true")  function previous_name () { local name=${1}
local hash=${2} local unfold='s/ *$$.*$${$$.*$$ => $$.*$$}/\1\2 => \1\3/' _git show --stat=10000${hash} \
| sed -e "${unfold}" \ | grep "=>${name}" \
| xargs \
| cut -d' ' -f1
}

scripts/history.sh
<<result-echoing>> :=
jset "$(jset "{}" "file" "\"${file}\"")" \
"history" \
"${revisions}"  The last missing pieces are the definitions of the three JSON operators. We use jq to manipulate JSON data. Since jq processes JSON from its standard input, we first define a helper (similar to _git) to deal with JSON from variables seamlessly. function _jq () { local input="${1}"
local filter="${2}" echo "${input}" | jq -jcM "${filter}" }  scripts/history.sh • -j tells jq not to print a new line at the end of its outputs • -c tells jq to print JSON in a compact format (rather than prettified) • -M tells jq to output monochrome outputs Internally, jget, jset, and jappend are implemented with jq basic filters. function jget () { local obj="${1}"
local field="${2}" _jq "${obj}" ".${field}" } function jset () { local obj="${1}"
local field="${2}" local val="${3}"

_jq "${obj}" "setpath([\"${field}\"]; ${val})" } function jappend () { local arr="${1}"
local val="${2}" _jq "${arr}" ". + [ ${val} ]" }  scripts/history.sh Everything is defined. We can call main now. main "$(cat)" "${1}"  scripts/history.sh ### 2.8 Rendering Equations Offline #### 2.8.1 Users instructions Inline equations written in the DOM under the class and using the $\LaTeX$ syntax can be rendered once and for all by soupault. User For instance, <span class="imath">\LaTeX</span> is rendered $\LaTeX$ as expected. Using this widgets requires being able to inject raw HTML in input files. #### 2.8.2 Implementation We will use $\KaTeX$ to render equations offline. $\KaTeX$ availability on most systems is unlikely, but it is part of npm, so we can define a minimal package.json file to fetch it automatically. { "private": true, "devDependencies": { "katex": "^0.11.1" } }  package.json We introduce a Makefile recipe to call npm install. This command produces a file called package-lock.json that we add to GENFILES to ensure $\KaTeX$ will be available when soupault is called. If Soupault.org has been modified since the last generation, Babel will generate package.json again. However, if the modifications of Soupault.org do not concern package.json, then npm install will not modify package-lock.json and its “last modified” time will not be updated. This means that the next time make will be used, it will replay this recipe again. As a consequence, we systematically touch packase-lock.json to satisfy make. package-lock.json : package.json @echo " init npm packages" @npm install &>> build.log @touch$@

CONFIGURE += package-lock.json node_modules/

katex.mk

Once installed and available, $\KaTeX$ is really simple to use. The following script reads (synchronously!) the standard input, renders it using $\KaTeX$ and outputs the resut to the standard output.

var katex = require("katex");
var fs = require("fs");
var displayMode = process.env.DISPLAY != undefined;

var html = katex.renderToString(String.raw\${input}, {
throwOnError : false,
displayModed : displayMode
});

console.log(html)

scripts/katex.js

We reuse once again the preprocess_element widget. The selector is .imath (i stands for inline in this context), and we replace the previous content with the result of our script.

[widgets.inline-math]
widget = "preprocess_element"
selector = ".imath"
command = "node scripts/katex.js"
action = "replace_content"

[widgets.display-math]
widget = "preprocess_element"
selector = ".dmath"
command = "DISPLAY=1 node scripts/katex.js"
action = "replace_content"

soupault.conf

The $\KaTeX$ font is bigger than the serif font used for this website, so we reduce it a bit with a dedicated SASS rule.

.imath, .dmath
font-size : smaller

.dmath
text-align : center

site/style/plugins.sass

## 3cleopatra Generation Process Definition

We introduce the soupault generation process, obviously based on the soupault HTML processor. The structure of a cleopatra generation process is always the same.

<<stages>>
<<dependencies>>

soupault.mk

In the rest of this section, we define these three components.

### 3.1 Build Stages

From the perspective of cleopatra, it is a rather simple component, since the build stage is simply a call to soupault, whose outputs are located in a single (configurable) directory.

<<stages>> :=
soupault-build :
@cleopatra echo Running  soupault
@soupault

ARTIFACTS += <<build-dir>>/


### 3.2 Dependencies

Most of the generation processes (if not all of them) need to declare themselves as a prerequisite for soupault-build. If they do not, they will likely be executed after soupault is called.

This file defines an auxiliary SASS sheet that needs to be declared as a dependency of the build stage of the theme generation process.

Finally, the offline rendering of equations requires $\KaTeX$ to be available, so we include the katex.mk file, and make package-lock.json (the proof that npm install has been executed) a prerequisite of soupault-build.

<<dependencies>> :=
theme-build : site/style/plugins.sass
include katex.mk
soupault-build : package-lock.json


Finally, this generation process introduces a dedicated (PHONY) command to start a HTTP server in order to navigate the generated website from a browser.
serve :

This command does not assume anything about the current state of generation of the project. In particular, it does not check whether or not the <<build-dir>> directory exists. The responsibility to use make serve` in a good setting lies with final users.