Technical Articles / Opinions / News / Projects

soupault Configuration

In a nutshell, the purpose of soupault is to post-process HTML files generated by the generation processes of cleopatra

The rest of this document proceeds as follows. We first describe the general settings of soupault. Then, we enumerate the widgets enabled for this website. Finally, we provide a proper definition for soupault the cleopatra generation process.

1 soupault General Settings

The general settings section of soupault.conf is fairly basic, and there is little to say that the “Getting Started” already discuss in length.

We emphasize three things:

  • The build_dir is set to build/~lthms in place of simply build.
  • The ignore_extensions shall be updated to take into account artifacts produces by other cleopatra generation processes.
  • We disable the “clean URLs” feature of soupault. This option renames a HTML files ~foo/bar.html into foo/bar/index.html, which means when served by a HTTP server, the foo/bar URL will work. The issue we have with this feature is that the internal links within your websiste needs to take their final URL into account, rather than their actual name. If one day soupault starts rewriting internal URLs when clean_url is enabled, we might reconsider using it.
[settings]
strict = true
verbose = false
debug = false
site_dir = "site"
build_dir = "build/~lthms"

page_file_extensions = ["html"]
ignore_extensions = [
  "draft", "vo", "vok", "vos", "glob",
  "html~", "org", "aux", "sass",
]

generator_mode = true
complete_page_selector = "html"
default_template_file = "templates/main.html"
default_content_selector = "main"
doctype = "<!DOCTYPE html>"
clean_urls = false

The list of ignored extensions should be programmatically generated with the help of cleopatra.

2 Widgets

2.1 Setting Page Title

We use the “page title” widget to set the title of the webpage based on the first (and hopefully the only) <h1> tag of the page.

[widgets.page-title]
widget = "title"
selector = "h1"
default = "~lthms"
prepend = "~lthms: "

2.2 Acknowledging soupault

When creating a new soupault project (using soupault --init), the default configuration file suggests advertising the use of soupault. Rather than hard-coding the used version of soupault (which is error-prone), we rather determine the version of soupault with the following script.

soupault --version | head -n 1 | tr -d '\n'

The configuration of the widget —initially provided by soupault— becomes less subject to the obsolescence.

[widgets.generator-meta]
widget = "insert_html"
html = """<meta name="generator" content="soupault 2.4.0">"""
selector = "head"

2.3 Generating Table of Contents

The toc widget allows for generating a table of contents for HTML files which contains a node matching a given selector (in the case of this document, #generate-toc).

[widgets.table-of-contents]
widget = "toc"
selector = "#generate-toc"
action = "replace_element"
valid_html = true
min_level = 2
numbered_list = true

We could propose a patch to soupault's upstream to add numbering in titles.

2.4 Fixing Org Internal Links

For some reason, Org prefix internal links to other Org documents with file://. To avoid that, we provide a simple plugin which removes file:// from the begining of a URL.

This plugin definition should be part of the org generation process, but that would require to aggregate “subconfig” into a larger one.

This plugin key component is the fix_org_urls function.

fix_org_urls(LIST, ATTR)
Enumerate the DOM elements of LIST, and check their ATTR attribute.
function fix_org_urls(list, attr)
  index, link = next(list)

  while index do
    href = HTML.get_attribute(link, attr)

    if href then
      href = Regex.replace(href, "^file://", "")
      HTML.set_attribute(link, attr, href)
    end

    index, link = next(list, index)
  end
end

We use this function to fix the URLs of tags known to be subject to Org strange behavior. For now, only <a> has been affected.

fix_org_urls(HTML.select(page, "a"), "href")
fix_org_urls(HTML.select(page, "img"), "src")

The configuration of this plugin, and the associated widget, is straightforward.

[widgets.fix-org-urls]
widget = "fix-org-urls"

2.5 Prefixing Internal URLs

On the one hand, internal links can be absolute, meaning they start with a leading /, and therefore are relative to the website root. On the other hand, website (especially static website) can be placed in larger context. For instance, my personal website lives inside the ~lthms directory of the soap.coffee domain.

The purpose of this plugin is to rewrite internal URLs which are relative to the root, in order to properly prefix them.

From a high-level perspective, the plugin structure is the following.

prefix_url = config["prefix_url"]
<<validate_prefix>>

<<prefix_func>>
<<prefix_calls>>
  1. We validate the widget configuration.
  2. We propose a generic function to enumerate and rewrite tags which can have internal URLs as attribute argument.
  3. We use this generic function for relevant tags.
if not prefix_url then
  Plugin.fail("Missing mandatory field: `prefix_url'")
end

if not Regex.match(prefix_url, "^/(.*)") then
  prefix_url = "/" .. prefix_url
end

if not Regex.match(prefix_url, "(.*)/$") then
  prefix_url = prefix_url .. "/"
end
function prefix_urls (links, attr, prefix_url)
  index, link = next(links)

  while index do
    href = HTML.get_attribute(link, attr)

    if href then
      if Regex.match(href, "^/") then
        href = Regex.replace(href, "^/*", "")
        href = prefix_url .. href
      end

      HTML.set_attribute(link, attr, href)
    end
    index, link = next(links, index)
  end
end
prefix_urls(HTML.select(page, "a"), "href", prefix_url)
prefix_urls(HTML.select(page, "link"), "href", prefix_url)
prefix_urls(HTML.select(page, "img"), "src", prefix_url)
prefix_urls(HTML.select(page, "script"), "src", prefix_url)

Again, configuring soupault to use this plugin is relatively straightforward. The only important thing to notice is the use of the after field, to ensure this plugin is run after the plugin responsible for fixing Org documents URLs.

[widgets.urls-rewriting]
widget = "urls-rewriting"
prefix_url = "<<prefix>>"
after = "fix-org-urls"

2.6 Marking External Links

function mark(name)
  return '<i class="url-mark fa fa-' .. name ..
         '" aria-hidden="true"></i>'
end

links = HTML.select(page, "a")

index, link = next(links)

while index do
  href = HTML.get_attribute(link, "href")

  if href then
    if Regex.match(href, "^https?://github.com") then
      icon = HTML.parse(mark('github'))
      HTML.append_child(link, icon)
    elseif Regex.match(href, "^https?://") then
      icon = HTML.parse(mark('external-link'))
      HTML.append_child(link, icon)
    end
  end

  index, link = next(links, index)
end
.url-mark.fa
    display: inline
    font-size: 90%
    width: 1em

.url-mark.fa-github::before
    content: "\00a0\f09b"

.url-mark.fa-external-link::before
    content: "\00a0\f08e"
[widgets.mark-external-urls]
after = "generate-history"
widget = "external-urls"

2.7 Generating Per-File Revisions Tables

2.7.1 Users Instructions

This widgets allows to generate a so-called “revisions table” of the filename contained in a DOM element of id history, based on its history. Paths should be relative to the directory from which you start the build process (typically, the root of your repository). The revisions table notably provides hyperlinks to a git webview for each commit.

For instance, considering the following HTML snippet

<div id="history">
  site/posts/FooBar.org
</div>

This plugin will replace the content of this <div> with the revisions table of site/posts/FooBar.org.

2.7.2 Customization

The base of the URL webview for the document you are currently reading —afterwards abstracted with the <<repo>> noweb reference— is

https://code.soap.coffee/writing/lthms.git
<details class="history">
  <summary>Revisions</summary>
  <p>
    This revisions table has been automatically generated
    from <a href="<<repo>>">the <code>git</code> history
    of this website repository</a>, and the change
    descriptions may not always be as useful as they
    should.
  </p>

  <p>
    You can consult the source of this file in its current
    version <a href="<<repo>>/tree/{{file}}">here</a>.
  </p>

  <table>
  {{#history}}
  <tr>
    <td class="date"
{{#created}}
        id="created-at"
{{/created}}
{{#modified}}
        id="modified-at"
{{/modified}}
        >
      {{date}}
    </td>
    <td class="subject">{{subject}}</td>
    <td class="commit">
      <a href="<<repo>>/commit/{{filename}}/?id={{hash}}">
        {{abbr_hash}}
      </a>
    </td>
  </tr>
  {{/history}}
  </table>
</details>
table
    border-top : 2px solid black
    border-bottom : 2px solid black
    border-collapse : collapse
    width : 35rem

td
    border-bottom : 1px solid black
    padding : .5em

#history .commit
    font-size : smaller
    font-family : 'Fira Code', monospace
    width : 7em
    text-align : center

2.7.3 Implementation

We use the built-in preprocess_element to implement, which means we need a script which gets its input from the standard input, and echoes its output to the standard input.

[widgets.generate-history]
widget = "preprocess_element"
selector = "#history"
command = 'scripts/history.sh templates/history.html'
action = "replace_content"

This plugin proceeds as follows:

  1. Using an ad-hoc script, it generates a JSON containing for each revision
    • The subject, date, hash, and abbreviated hash of the related commit
    • The name of the file at the time of this commit
  2. This JSON is passed to a mustache engine (haskell-mustache) with a proper template
  3. The content of the selected DOM element is replaced with the output of haskell-mustache

This translates in Bash like this.

function main () {
  local file="${1}"
  local template="${2}"

  tmp_file=$(mktemp)
  generate_json ${file} > ${tmp_file}
  haskell-mustache ${template} ${tmp_file}
  rm ${tmp_file}
}

Generating the expected JSON is therefore as simple as:

  • Fetching the logs
  • Reading 8 line from the logs, parse the filename from the 6th line
  • Outputing the JSON

We will use git to get the information we need. By default, git subcommands use a pager when its output is likely to be long. This typically includes git-log. To disable this behavior, git exposes the --no-pager command. Besides, we also need --follow and --stat to deal with file renaming. Without this option, git-log stops when the file first appears in the repository, even if this “creation” is actually a renaming. Therefore, the git command line we use to collect our history is

function gitlog () {
  local file="${1}"
  git --no-pager log \
      --follow \
      --stat=10000 \
      --pretty=format:'%s%n%h%n%H%n%cs%n' \
      "${file}"
}

This function will generate a sequence of 8 lines containing all the relevant information we are looking for, for each commit, namely:

  • Subject
  • Abbreviated hash
  • Full hash
  • Date
  • Empty line
  • Change summary
  • Shortlog
  • Empty line

For instance, the gitlog function will output the following lines for the last commit of this very file:

Avoid to display too much noweb variables
d41cedb
d41cedb13ee19505dd54d67f4a57083a665804f3
2020-12-14

 site/cleopatra/soupault.org | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

Among other things, the 6th line contains the filename. We need to extract it, and we do that with sed. In case of file renaming, we need to parse something of the form both/to/{old => new}.

function parse_filename () {
  local line="${1}"
  local shrink='s/ *\(.*\) \+|.*/\1/'
  local unfold='s/\(.*\){\(.*\) => \(.*\)}/\1\3/'

  echo ${line} | sed -e "${shrink}" | sed -e "${unfold}"
}

The next step is to process the logs to generate the expected JSON. We have to deal with the fact that JSON does not allow the last item of an array to be concluded by ",". Besides, we also want to indicate which commit is responsible for the creation of the file. To do that, we use two variables: idx and last_entry. When idx is equal to 0, we know it is the latest commit. When idx is equal to last_entry, we know we are looking at the oldest commit for that file.

function generate_json () {
  local input="${1}"
  local logs="$(gitlog ${input})"

  if [ ! $? -eq 0 ]; then
      exit 1
  fi

  let "idx=0"
  let "last_entry=$(echo "${logs}" | wc -l) / 8"

  local subject=""
  local abbr_hash=""
  local hash=""
  local date=""
  local file=""
  local created="true"
  local modified="false"

  echo -n "{"
  echo -n "\"file\": \"${input}\""
  echo -n ",\"history\": ["

  while read -r subject; do
    read -r abbr_hash
    read -r hash
    read -r date
    read -r # empty line
    read -r file
    read -r # short log
    read -r # empty line

    if [ ${idx} -ne 0 ]; then
      echo -n ","
    fi

    if [ ${idx} -eq ${last_entry} ]; then
      created="true"
      modified="false"
    else
      created="false"
      modified="true"
    fi

    output_json_entry "${subject}" \
                      "${abbr_hash}" \
                      "${hash}" \
                      "${date}" \
                      "$(parse_filename "${file}")" \
                      "${created}" \
                      "${modified}"

    let idx++
  done < <(echo "${logs}")

  echo -n "]}"
}

Generating the JSON object for a given commit is as simple as

function output_json_entry () {
  local subject="${1}"
  local abbr_hash="${2}"
  local hash="${3}"
  local date="${4}"
  local file="${5}"
  local created="${6}"
  local last_entry="${7}"

  echo -n "{\"subject\": \"${subject}\""
  echo -n ",\"created\":${created}"
  echo -n ",\"modified\":${modified}"
  echo -n ",\"abbr_hash\":\"${abbr_hash}\""
  echo -n ",\"hash\":\"${hash}\""
  echo -n ",\"date\":\"${date}\""
  echo -n ",\"filename\":\"${file}\""
  echo -n "}"
}

And we are done! We can safely call the main function to generate our revisions table.

main "$(cat)" "${1}"

2.8 Rendering Equations Offline

2.8.1 Users instructions

Inline equations written in the DOM under the class and using the LaTeX \LaTeX syntax can be rendered once and for all by soupault. User For instance, <span class="imath">\LaTeX</span> is rendered LaTeX \LaTeX as expected.

Using this widgets requires being able to inject raw HTML in input files.

2.8.2 Implementation

We will use KaTeX \KaTeX to render equations offline. KaTeX \KaTeX availability on most systems is unlikely, but it is part of npm, so we can define a minimal package.json file to fetch it automatically.

{
  "private": true,
  "devDependencies": {
    "katex": "^0.11.1"
  }
}

We introduce a Makefile recipe to call npm install. This command produces a file called package-lock.json that we add to GENFILES to ensure KaTeX \KaTeX will be available when soupault is called.

If Soupault.org has been modified since the last generation, Babel will generate package.json again. However, if the modifications of Soupault.org do not concern package.json, then npm install will not modify package-lock.json and its “last modified” time will not be updated. This means that the next time make will be used, it will replay this recipe again. As a consequence, we systematically touch packase-lock.json to satisfy make.

package-lock.json : package.json
	@cleopatra echo "Fetching" "npm packages"
	@npm install &>> build.log
	@touch $@

CONFIGURE += package-lock.json node_modules/

Once installed and available, KaTeX \KaTeX is really simple to use. The following script reads (synchronously!) the standard input, renders it using KaTeX \KaTeX and outputs the resut to the standard output.

var katex = require("katex");
var fs = require("fs");
var input = fs.readFileSync(0);
var displayMode = process.env.DISPLAY != undefined;

var html = katex.renderToString(String.raw`${input}`, {
    throwOnError : false,
    displayModed : displayMode
});

console.log(html)

We reuse once again the preprocess_element widget. The selector is .imath (i stands for inline in this context), and we replace the previous content with the result of our script.

[widgets.inline-math]
widget = "preprocess_element"
selector = ".imath"
command = "node scripts/katex.js"
action = "replace_content"

[widgets.display-math]
widget = "preprocess_element"
selector = ".dmath"
command = "DISPLAY=1 node scripts/katex.js"
action = "replace_content"

The KaTeX \KaTeX font is bigger than the serif font used for this website, so we reduce it a bit with a dedicated SASS rule.

.imath, .dmath
  font-size : smaller

.dmath
  text-align : center

3 cleopatra Generation Process Definition

We introduce the soupault generation process, obviously based on the soupault HTML processor. The structure of a cleopatra generation process is always the same.

<<stages>>
<<dependencies>>
<<ad-hoc-cmds>>

In the rest of this section, we define these three components.

3.1 Build Stages

From the perspective of cleopatra, it is a rather simple component, since the build stage is simply a call to soupault, whose outputs are located in a single (configurable) directory.

soupault-build :
	@cleopatra echo Running  soupault
	@soupault

ARTIFACTS += build/

3.2 Dependencies

Most of the generation processes (if not all of them) need to declare themselves as a prerequisite for soupault-build. If they do not, they will likely be executed after soupault is called.

This file defines an auxiliary SASS sheet that needs to be declared as a dependency of the build stage of the theme generation process.

Finally, the offline rendering of equations requires KaTeX \KaTeX to be available, so we include the katex.mk file, and make package-lock.json (the proof that npm install has been executed) a prerequisite of soupault-build.

theme-build : site/style/plugins.sass
include katex.mk
soupault-build : package-lock.json

3.3 Ad-hoc Commands

Finally, this generation process introduces a dedicated (PHONY) command to start a HTTP server in order to navigate the generated website from a browser.

serve :
	@echo "   start  a python server"
	@cd build; python -m http.server 2>/dev/null

.PHONY : serve

This command does not assume anything about the current state of generation of the project. In particular, it does not check whether or not the <<build-dir>> directory exists. The responsibility to use make serve in a good setting lies with final users.