soupault

We use soupault to build this websitesoupault is an awesome free software project, with a unique approach to static website generation. You should definitely check out their website!.

Installation §

We install soupault in a local switch. We use a witness file _opam/.init to determine whether or not our switch has always been created during a previous invocation of cleopatra.

OCAML_VERSION := 4.11.2
OCAML := ocaml-base-compiler.${OCAML_VERSION}

CONFIGURE += _opam rss.json
ARTIFACTS += out

soupault-prebuild : _opam/init

Using soupault is as simple as calling it, without any particular command-line arguments.

soupault-build : package-lock.json style.min.css
	@cleopatra echo "Executing" "soupault"
	@soupault

We now describe our configuration file for soupault.

Configuration §

Global Settings §

The options of the [settings] section of a soupault configuration are often self-explanatory, and we do not spend too much time to detaul them.

[settings]
strict = true
site_dir = "site"
build_dir = "out/~lthms"
doctype = "<!DOCTYPE html>"
clean_urls = false
generator_mode = true
complete_page_selector = "html"
default_content_selector = "main"
page_file_extensions = ["html"]
ignore_extensions = [
  "v", "vo", "vok", "vos", "glob",
  "html~", "org"
]
default_template_file = "templates/main.html"
pretty_print_html = false

Setting Page Title §

We use the “page title” widget to set the title of the webpage based on the first (and hopefully the only) <h1> tag of the page.

[widgets.page-title]
widget = "title"
selector = "h1"
default = "~lthms"
prepend = "~lthms: "

Acknowledging soupault §

When creating a new soupault project (using soupault --init), the default configuration file suggests advertising the use of soupault. Rather than hard-coding the used version of soupault (which is error-prone), we rather determine the version of soupault with the following script.

soupault --version | head -n 1 | tr -d '\n'

The configuration of the widget —initially provided by soupault— becomes less subject to the obsolescenceThat is, as long as soupault does not change the output of its --version option..

[widgets.generator-meta]
widget = "insert_html"
html = """<meta name="generator" content="soupault 2.5.0">"""
selector = "head"

Prefixing Internal URLs §

On the one hand, internal links can be absolute, meaning they start with a leading /, and therefore are relative to the website root. On the other hand, website (especially static website) can be placed in larger context. For instance, my personal website lives inside the ~lthms directory of the soap.coffee domainTo my experience in hosting webapps and websites, this set-up is way harder to get right than I initially expect..

The purpose of this plugin is to rewrite internal URLs which are relative to the root, in order to properly prefix them.

From a high-level perspective, the plugin structure is the following.

First, we validate the widget configuration.

prefix_url = config["prefix_url"]

if not prefix_url then
  Plugin.fail("Missing mandatory field: `prefix_url'")
end

if not Regex.match(prefix_url, "^/(.*)") then
  prefix_url = "/" .. prefix_url
end

if not Regex.match(prefix_url, "(.*)/$") then
  prefix_url = prefix_url .. "/"
end

Then, we propose a generic function to enumerate and rewrite tags which can have.

function prefix_urls (links, attr, prefix_url)
  index, link = next(links)

  while index do
    href = HTML.get_attribute(link, attr)

    if href then
      if Regex.match(href, "^/") then
        href = Regex.replace(href, "^/*", "")
        href = prefix_url .. href
      end

      HTML.set_attribute(link, attr, href)
    end
    index, link = next(links, index)
  end
end

Finally, we use this generic function for relevant tags.

prefix_urls(HTML.select(page, "a"), "href", prefix_url)
prefix_urls(HTML.select(page, "link"), "href", prefix_url)
prefix_urls(HTML.select(page, "img"), "src", prefix_url)
prefix_urls(HTML.select(page, "script"), "src", prefix_url)
prefix_urls(HTML.select(page, "use"), "href", prefix_url)

Again, configuring soupault to use this plugin is relatively straightforward.

[widgets.urls-rewriting]
widget = "urls-rewriting"
prefix_url = "~lthms"
after = "mark-external-urls"

Marking External Links §

function mark(name)
  return '<span class="icon"><svg><use href="/img/icons.svg#'
         .. name ..
         '"></use></svg></span>'
end

links = HTML.select(page, "a")

index, link = next(links)

while index do
  href = HTML.get_attribute(link, "href")

  if href then
    if Regex.match(href, "^https?://github.com") then
      icon = HTML.parse(mark("github"))
      HTML.append_child(link, icon)
    elseif Regex.match(href, "^https?://") then
      icon = HTML.parse(mark("external-link"))
      HTML.append_child(link, icon)
    end
  end

  index, link = next(links, index)
end
[widgets.mark-external-urls]
after = "generate-history"
widget = "external-urls"

Generating a Table of Contents §

The toc widget allows for generating a table of contents for HTML files which contains a node matching a given selector (in the case of this document, #generate-toc).

[widgets.table-of-contents]
widget = "toc"
selector = "#generate-toc"
action = "replace_content"
valid_html = true
min_level = 2
max_level = 3
numbered_list = false
heading_links = true
heading_link_text = " §"
heading_links_append = true
heading_link_class = "anchor-link"

[widgets.append-toc-title]
widget = "insert_html"
selector = "#generate-toc"
action = "prepend_child"
html = '<h2>Table of Contents</h2>'
after = "table-of-contents"

Generating Per-File Revisions Tables §

Users Instructions

This widgets allows to generate a so-called “revisions table” of the filename contained in a DOM element of id history, based on its history. Paths should be relative to the directory from which you start the build process (typically, the root of your repository). The revisions table notably provides hyperlinks to a git webview for each commit.

For instance, considering the following HTML snippet

<div id="history">
  site/posts/FooBar.org
</div>

This plugin will replace the content of this <div> with the revisions table of site/posts/FooBar.org.

Customization

The base of the URL webview for the document you are currently reading is https://labs.soap.coffee/soap.coffee/lthms.git.

The template used to generate the revision table is the following.

<details id="history">
  <summary>Revisions</summary>
  <p>
    This revisions table has been automatically generated
    from <a href="https://labs.soap.coffee/soap.coffee/lthms.git">the
    <code>git</code> history of this website repository</a>, and the
    change descriptions may not always be as useful as they should.
  </p>

  <p>
    You can consult the source of this file in its current version
    <a href="https://labs.soap.coffee/soap.coffee/lthms.git/tree/{{file}}">here</a>.
  </p>

  <table class="fullwidth">
  {{#history}}
  <tr>
    <td class="date"
{{#created}}
        id="created-at"
{{/created}}
{{#modified}}
        id="modified-at"
{{/modified}}
        >{{date}}</td>
    <td class="subject">{{subject}}</td>
    <td class="commit">
      <a href="https://labs.soap.coffee/soap.coffee/lthms.git/commit/{{filename}}/?id={{hash}}">{{abbr_hash}}</a>
    </td>
  </tr>
  {{/history}}
  </table>
</details>

Implementation

We use the built-in preprocess_element to implement, which means we need a script which gets its input from the standard input, and echoes its output to the standard input.

[widgets.generate-history]
widget = "preprocess_element"
selector = "#history"
command = 'scripts/history.sh templates/history.html'
action = "replace_element"

This plugin proceeds as follows:

  1. Using an ad-hoc script, it generates a JSON containing for each revision
    • The subject, date, hash, and abbreviated hash of the related commit
    • The name of the file at the time of this commit
  2. This JSON is passed to a mustache engine (haskell-mustache) with a proper template
  3. The content of the selected DOM element is replaced with the output of haskell-mustache

This translates in Bash like this.

function main () {
  local file="${1}"
  local template="${2}"

  tmp_file=$(mktemp)
  generate_json ${file} > ${tmp_file}
  haskell-mustache ${template} ${tmp_file}
  rm ${tmp_file}
}

Generating the expected JSON is therefore as simple as:

  • Fetching the logs
  • Reading 8 line from the logs, parse the filename from the 6th line
  • Outputing the JSON

We will use git to get the information we need. By default, git subcommands use a pager when its output is likely to be long. This typically includes git-log. To disable this behavior, git exposes the --no-pager command. Besides, we also need --follow and --stat to deal with file renaming. Without this option, git-log stops when the file first appears in the repository, even if this “creation” is actually a renaming. Therefore, the git command line we use to collect our history is

function gitlog () {
  local file="${1}"
  git --no-pager log \
      --follow \
      --stat=10000 \
      --pretty=format:'%s%n%h%n%H%n%cs%n' \
      "${file}"
}

This function will generate a sequence of 8 lines containing all the relevant information we are looking for, for each commit, namely:

  • Subject
  • Abbreviated hash
  • Full hash
  • Date
  • Empty line
  • Change summary
  • Shortlog
  • Empty line

For instance, the gitlog function will output the following lines for the last commit of this very file:

Use the built-in plugin to insert the “Table of Contents” title
e372272
e372272dc738d8079e6ed61ec37f9abb58a6a45f
2021-03-28

 site/cleopatra/soupault.org | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Among other things, the 6th line contains the filename. We need to extract it, and we do that with sed. In case of file renaming, we need to parse something of the form both/to/{old => new}.

function parse_filename () {
  local line="${1}"
  local shrink='s/ *\(.*\) \+|.*/\1/'
  local unfold='s/\(.*\){\(.*\) => \(.*\)}/\1\3/'

  echo ${line} | sed -e "${shrink}" | sed -e "${unfold}"
}

The next step is to process the logs to generate the expected JSON. We have to deal with the fact that JSON does not allow the last item of an array to be concluded by ",". Besides, we also want to indicate which commit is responsible for the creation of the file. To do that, we use two variables: idx and last_entry. When idx is equal to 0, we know it is the latest commit. When idx is equal to last_entry, we know we are looking at the oldest commit for that file.

function generate_json () {
  local input="${1}"
  local logs="$(gitlog ${input})"

  if [ ! $? -eq 0 ]; then
      exit 1
  fi

  let "idx=0"
  let "last_entry=$(echo "${logs}" | wc -l) / 8"

  local subject=""
  local abbr_hash=""
  local hash=""
  local date=""
  local file=""
  local created="true"
  local modified="false"

  echo -n "{"
  echo -n "\"file\": \"${input}\""
  echo -n ",\"history\": ["

  while read -r subject; do
    read -r abbr_hash
    read -r hash
    read -r date
    read -r # empty line
    read -r file
    read -r # short log
    read -r # empty line

    if [ ${idx} -ne 0 ]; then
      echo -n ","
    fi

    if [ ${idx} -eq ${last_entry} ]; then
      created="true"
      modified="false"
    else
      created="false"
      modified="true"
    fi

    output_json_entry "${subject}" \
                      "${abbr_hash}" \
                      "${hash}" \
                      "${date}" \
                      "$(parse_filename "${file}")" \
                      "${created}" \
                      "${modified}"

    let idx++
  done < <(echo "${logs}")

  echo -n "]}"
}

Generating the JSON object for a given commit is as simple as

function output_json_entry () {
  local subject="${1}"
  local abbr_hash="${2}"
  local hash="${3}"
  local date="${4}"
  local file="${5}"
  local created="${6}"
  local last_entry="${7}"

  echo -n "{\"subject\": \"${subject}\""
  echo -n ",\"created\":${created}"
  echo -n ",\"modified\":${modified}"
  echo -n ",\"abbr_hash\":\"${abbr_hash}\""
  echo -n ",\"hash\":\"${hash}\""
  echo -n ",\"date\":\"${date}\""
  echo -n ",\"filename\":\"${file}\""
  echo -n "}"
}

And we are done! We can safely call the main function to generate our revisions table.

main "$(cat)" "${1}"

Rendering Equations Offline §

Users instructions

Inline equations written in the DOM under the class and using the LaTeX \LaTeX syntax can be rendered once and for all by soupault. User For instance, <span class="imath">\LaTeX</span> is rendered LaTeX \LaTeX as expected.

Using this widgets requires being able to inject raw HTML in input files.

Implementation

var katex = require("katex");
var fs = require("fs");
var input = fs.readFileSync(0);
var displayMode = process.env.DISPLAY != undefined;

var html = katex.renderToString(String.raw`${input}`, {
    throwOnError : false,
    displayModed : displayMode
});

console.log(html)

We reuse once again the preprocess_element widget. The selector is .imath (i stands for inline in this context), and we replace the previous content with the result of our script.

[widgets.inline-math]
widget = "preprocess_element"
selector = ".imath"
command = "node scripts/render-equations.js"
action = "replace_content"

[widgets.display-math]
widget = "preprocess_element"
selector = ".dmath"
command = "DISPLAY=1 node scripts/render-equations.js"
action = "replace_content"

RSS Feed §

[index]
index = true
dump_json = "rss.json"
extract_after_widgets = ["urls-rewriting"]

[index.fields]
title = {
  selector = ["h1"]
}

modified-at = {
  selector = ["#modified-at"]
}

created-at = {
  selector = ["#created-at"]
}

Series Navigation §

function get_title_from_path (path)
   if Sys.is_file(path) then
      local content_raw = Sys.read_file(path)
      local content_dom = HTML.parse(content_raw)
      local title = HTML.select_one(content_dom, "h1")

      if title then
         return String.trim(HTML.inner_html(title))
      else
         Plugin.fail(path .. ' has no <h1> tag')
      end
   else
      Plugin.fail(path .. ' is not a file')
   end
end
function generate_nav_item_from_title (title, url, template)
    local env = {}
    env["url"] = url
    env["title"] = title
    local new_content = String.render_template(template, env)
    return HTML.parse(new_content)
end
function generate_nav_items (cwd, cls, template)
  local elements = HTML.select(page, cls)

  local i = 1
  while elements[i] do
    local element = elements[i]
    local url = HTML.strip_tags(element)
    local path = Sys.join_path(cwd, url)
    local title_str = get_title_from_path(path)

    HTML.replace_content(
      element,
      generate_nav_item_from_title(title_str, url, template)
    )

    i = i + 1
  end
end
cwd = Sys.dirname(page_file)

home_template = 'This article is part of the series “<a href="{{ url }}">{{ title }}</a>.”'
nav_template = '<a href="{{ url }}">{{ title }}</a>'

generate_nav_items(cwd, ".series", home_template)
generate_nav_items(cwd, ".series-prev", nav_template)
generate_nav_items(cwd, ".series-next", nav_template)
[widgets.series]
widget = "series"

Injecting Minified CSS §

style = HTML.select_one(page, "style")

if style then
  css = HTML.create_text(Sys.read_file("style.min.css"))
  HTML.replace_content(style, css)
end
[widgets.css]
widget = "css"

Cleaning-up §

function remove_if_empty(html)
   if String.trim(HTML.inner_html(html)) == "" then
      HTML.delete(html)
   end
end
function remove_all_if_empty(cls)
   local elements = HTML.select(page, cls)

   local i = 1
   while elements[i] do
      local element = elements[i]
      remove_if_empty(element)
      i = i + 1
   end
end
remove_all_if_empty("p") -- introduced by org-mode
remove_all_if_empty("div.code") -- introduced by coqdoc
[widgets.clean-up]
widget = "clean-up"