Extensible Records in OCaml Using dmap
dmap
is a library to create and
manipulate heterogeneous maps. It features a very straightforward API which
leverages the common trick of tagging the type of the keys with a parameter
specifying the types of the associated valuesThis article assumes readers are familiar with GADTs.
. That is, given 'a key
the type of keys of a heterogeneous map t
, then the
value associated to a key k
of type bool key
is expected to
be of type bool
.
In this write-up, we show how it is possible to use dmap
to implement
extensible records (that is, records which can be extended with new fields
after they have been defined) in OCaml. I have also published on GitHub a
repository containing the implementation presented in this
article . While dmap
is
far from being the only available solution for this use caseFor heterogeneous maps, gmap
is
apparently a popular alternative, and so is
hmap
. For extensible records,
orec
looks like a very interesting
library I should probably take a long look at.
, it is the
one I happened to use for implementing a “plugin system” in a library where
each plugin can manipulate a shared state in a type-safe way.
Encoding Extensible Records
The (simplified) module type of a heterogeneous map built using dmap
is as
follows.
module type S = sig
type t
type 'a key
type binding = Binding : 'a key * 'a -> binding
val empty : t
val add : 'a key -> 'a -> t -> t
val find_opt : 'a key -> t -> 'a option
val fold_left : ('a -> binding -> a) -> 'a -> t -> 'a
end
The use case for heterogeneous maps I’ve been interested in recently is
encoding records. More precisely, constructors of keys encode fields containing
collections of values. In such approach, a constructor Foo : int key
encodes a field of type int option
, while a constructor Bar : string -> bool key
encodes a field of type bool StringMap.t
. That is, the
heterogeneous map HMap.t
, such as
type _ key = Foo : int key | Bar : string -> bool key
module HMap : S with type 'a key = 'a key
can be used to encode values of a record type
type t = {
foo : int option;
bar : bool StringMap.t;
}
as the following conversion functions demonstrates.
let of_record {foo; bar} =
let res = HMap.empty in
let res = HMap.add Foo foo res in
StringMap.fold_left
(fun res (key, value) -> HMap.add (Bar key) value res)
res
bar
let of_hmap map =
{
foo = HMap.find_opt Foo map;
bar = HMap.fold_left
(fun smap (Binding binding) ->
match binding with
| (Bar k, v) -> StringMap.add k v smap
| _ -> smap)
StringMap.empty
map;
}
As a consequence, to construct a heterogeneous map whose type of keys is extensible is a way to get extensible records in OCaml.
To that end, dmap
only asks for a function to compare two arbitrary keys.
More precisely, given the type
type _ key = ..
the challenge is to write the function compare
of type
val compare : 'a key -> 'b key -> ('a, 'b) Dmap.cmp
with Dmap.cmp
being definedThis definition of cmp
enforces that two keys can only be
equal if and only if the values associated to them are of the same type.
as
type (_, _) cmp =
| Lt : ('a, 'b) cmp
| Eq : ('a, 'a) cmp
| Gt : ('a, 'b) cmp
The Solution
To use extensible types in this kind of setting, the trick I am aware of is to
use a registration mechanismFor instance, this is what the tezos-error-monad
package is
using for being able to print the values of its extensible error
type .
.
The compare
function will work as follows.
- We assign an integer to each constructor of the type
key
. - We compare these integers when the arguments of
compare
are constructed from different constructors. - We implement one comparison function per constructors, to be called when the
two arguments of
compare
are the same.
The exercise is being made a bit more complicated than what the previous
paragraph suggests due to the fact that both key
and cmp
are GADTs. As a
consequence, the rest of the article will proceed as follows. First, we will
implement the overall logic of the compare
function implemented with
a registration mechanism, and only then will we tackle the OCaml compiler
errors GADTs so often bring.
Step 1: Forget About GADTs
To implement our registration mechanism, we rely on a good old reference to a
listIt is a good idea to hide this implementation “details”
behind a nice API, using a mli
file.
. The position of a registered object within the list will
determine the integer attributed to it.
let registered_keys = ref []
let register_key key =
registered_keys := key :: !registered_keys
In this list, we store a collection of first-class modules which provide
us everything we need to compare two 'a key
values obtained from the
same constructors.
module type KEY = sig
type t
val proj : 'a key -> t option
val compare : t -> t -> int
end
The type t
of the module type KEY
is the type of one
constructor’s arguments. The simplest definition of t
is
unit
, which means the field encoded with the related constructor
points to a singleton value (as foo
did in the previous section).
However, it is also possible to have keys’ constructors taking arguments, which
would translate into a field pointing to collections of value. This was the case
for bar
, which was pointing to a mapping of strings to booleans.
The compare
function will consist in iterating over the first-class modules
registered in registered_keys
, in order to determine to which ones the two
arguments belong to.
This means that considering a call compare left right
, for each first-class module
of registered_keys
, either
- both
left
andright
have not yet been associated to a first-class module, - or
left
has, but `right hasn’t - or
right
has, butleft
hasn’t - or both
left
and `right have been associated to a first-class module.
Only when the fourth stage has been reached can we provide the expected result
of compare
, whose type we have simplified in this section to return a int
instead of the GADT Dmap.cmp
.
To encode these stages, we introduce a dedicated accumulator type.
type ('a, 'b) acc =
| Init : 'a key * 'b key -> ('a, 'b) acc
| Compare_left_with : 'a key * int -> ('a, 'b) acc
| Compare_right_with : int * 'b key -> ('a, 'b) acc
| Res : int -> ('a, 'b) acc
This allows us to implement compare
Note the use of the Seq
module here. It is motivated by the fact that
List.fold_lefti
does not exist.
We need to know the position of the current first-class module we are
testing the keys against, hence the need for a fold_lefti
. Of course, we
could have added the current position to the acc
type, but the function
is already cumbersome enough, and using Seq.t
is mostly free.
.
let compare : type a b. a key -> b key -> int =
fun left right ->
List.to_seq !registered_keys
|> Seq.fold_lefti
(fun (acc : (a, b) acc) i (module K : KEY) ->
match acc with
| Init (left, right) -> (
match (K.proj left, K.proj right) with
| Some left, Some right -> Res (K.compare left right)
| Some _, None -> Compare_right_with (i, right)
| None, Some _ -> Compare_left_with (left, i)
| None, None -> acc)
| Compare_right_with (j, right) -> (
match K.proj right with
| Some _ -> Res (Int.compare j i)
| None -> acc)
| Compare_left_with (left, j) -> (
match K.proj left with
| Some _ -> Res (Int.compare i j)
| None -> acc)
| _ -> acc)
(Init (left, right))
|> function
| Res x -> x
| _ ->
raise
(Invalid_argument
"comparison with at least one unregistered key variant")
This function gives us the overall structure and logic of compare
, but of
course we are not done yet.
Step 2: Deal with GADTs
In practice and as hinted in the introduction of this section, dmap
is not
expected a comparison function of type 'a key -> 'b key -> ('a, 'b) Dmap.cmp
, not 'a key -> 'b key -> int
.
The first step is to update acc
accordingly.
type ('a, 'b) acc =
| Init : 'a key * 'b key -> ('a, 'b) acc
| Compare_left_with : 'a key * int -> ('a, 'b) acc
| Compare_right_with : int * 'b key -> ('a, 'b) acc
- | Res : int -> ('a, 'b) acc
+ | Res : ('a, 'b) Dmap.cmp -> ('a, 'b) acc
Then, we focus our attention to modifying compare
, by modifying its
type signature, and by updating its definition to make use of the new
acc
type.
-let compare : type a b. a key -> b key -> int =
+let compare : type a b. a key -> b key -> (a, b) Dmap.cmp =
fun left right ->
List.to_seq !registered_keys
|> Seq.fold_lefti
(fun (acc : (a, b) acc) i (module K : KEY) ->
match acc with
| Init (left, right) -> (
match (K.proj left, K.proj right) with
- | Some left, Some right -> Res (K.compare left right)
+ | Some left, Some right ->
+ let x = K.compare left right in
+ Res (if x = 0 then Eq else if x < 0 then Lt else Gt)
| Some _, None -> Compare_right_with (i, right)
| None, Some _ -> Compare_left_with (left, i)
| None, None -> acc)
| Compare_right_with (j, right) -> (
match K.proj right with
- | Some _ -> Res (Int.compare j i)
+ | Some _ ->
+ let x = Int.compare j i in
+ Res (if x < 0 then Lt else Gt)
| None -> acc)
| Compare_left_with (left, j) -> (
match K.proj left with
- | Some _ -> Res (Int.compare i j)
+ | Some _ ->
+ let x = Int.compare j i in
+ Res (if x < 0 then Lt else Gt)
| None -> acc)
| _ -> acc)
(Init (left, right))
At this point, we are remembered that the OCaml compiler is not that easy to
please when GADTs are involved. In particular, it complains for our use
of Eq
.
46 | Res (if x = 0 then Eq else if x < 0 then Lt else Gt)
^^
Error: This expression has type (a, a) Dmap.cmp
but an expression was expected of type (a, b) Dmap.cmp
Type a is not compatible with type b
This error message is the result of a tragedy in two acts:
- The
Eq
constructor ofcmp
is of type('a, 'a) cmp
. That is, only two keys associated with values of the same type can be equal. compare
takes two keys whose types can be different, and even ifproj
is expected to return aSome
only when both keys are constructed with the same constructor, nothing in its type enforces that invariant.
As such, OCaml is not convinced with our code alone that a = b
, and rejects
it.
One possible trick here is to rely on an equality witness typeThis is the trick to keep in mind when playing with GADTs, to a
point where eq
could arguably be added to the OCaml’s standard
library.
, and
refine proj
in order to convince OCaml our code is type-safe.
type (_, _) eq = Refl : ('a, 'a) eq
We then modify KEY
to have proj
not only return the
arguments of the constructor, but also a type equality witness to unify the
polymorphic argument of key
to the expected value type of said constructor.
module type KEY = sig
type t
+ type r
- val proj : 'a key -> t option
+ val proj : 'a key -> (t * ('a, r) eq) option
val compare : t -> t -> int
end
This is enough to convince OCaml that a = b
, because it gets a proof that a = K.r
and b = K.r
But only if you actually match Refl
explicitly, meaning the
pattern Some (left, _), Some (right, _)
would still lead to the same
error as not returning Refl
at all.
.
match (K.proj left, K.proj right) with
- | Some left, Some right ->
+ | Some (left, Refl), Some (right, Refl) ->
let x = K.compare left right in
Res (if x = 0 then Eq else if x < 0 then Lt else Gt)
| Some _, None -> Compare_right_with (i, right)
| None, Some _ -> Compare_left_with (left, i)
That’s it. compare
is done, compiles, and satisfies the expectations of
dmap
. We can instantiate its functor with our key
extensible
typeThis approach will lead every value of type Extensible_record.t
to share the same fields, which might not be you need.
One can imagine embedding the registered_keys
list of known keys inside
the extensible records values, along with a heterogeneous map. This
would allow each value to have its own dynamic set of fields.
.
module Extensible_record = Dmap.Make (struct
type 'a t = 'a key
let compare = compare
end)
Step 3: Profit
To wrap up this article, we can come back to our initial example of a record
with two fields foo
and bar
.
Here is how we can extend key
to have a Foo
constructor.
type _ key += Foo : int key
let () =
register_key
(module struct
type t = unit
type r = int
let proj : type a. a key -> (t * (a, r) eq) option = function
| Foo -> Some ((), Refl)
| _ -> None
let compare () () = 0
end)
Similarly, here is how we can register Bar
.
type _ key += Bar : string -> bool key
let () =
register_key
(module struct
type t = string
type r = bool
let proj : type a. a key -> (t * (a, r) eq) option = function
| Bar arg -> Some (arg, Refl)
| _ -> None
let compare = String.compare
end)
Once registered, these two fields/keys can be used to populate an
Extensible_record.t
value with values of the expected types.
let () =
let record = Extensible_record.empty in
let record = Extensible_record.add Foo 3 record in
let record = Extensible_record.add (Bar "foobar") true record in
assert (Extensible_record.find Foo record = 3) ;
assert (Extensible_record.find (Bar "foobar") record)
A nice happy consequence of this approach is that it allows us to register private fields only a given module can manipulate. Indeed, if this module does not expose the constructor of the key it relies on, then only it can interact with this part of the extensible record.