lthms' avatar, a hand drawing looking person, wearing a headset, close to a window on a raining night
Thomas Letan
lthms · he/him

Did you come across something which caught your attention? Don’t hesitate to shoot me an email in my public inbox.

 Published on

Patch Dependencies for Stacked Git

Every time I catch myself thinking about dependencies between changeset of a software project, the fascinating field of patch theories comes to my mind.

A “patch theory” usually refers to the mathematical foundation behind the data model of so-called Patch-based DVCS like Darcs and Pijul. More precisely, a patch theory is an encoding of the state of a repository, equipped with operations (gathered in so-called patches, not to be confused with GNU diff patches) one can do to this state. For instance, my rough understanding of Pijul’s patch theory is that a repository is an oriented graph of lines, and a patch is a set of operations onto this graph.

An interesting aspect of patch theory is that it requires a partial order for its patches, from which a Patch-based DVCS derives a dependency graph. In a nutshell, a patch PP depends on the patches which are responsible for the presence of the lines that PP modifies.

I have always found this idea of a dependency graph for the patches of a repository fascinating, because I though it would be a very valuable tool in the context of software development.

I wanted to slightly change the definition of what a patch dependency is, though. See, the partial order of Darcs and Pijul focus on syntactic dependencies: the relation between lines in text files. They need that to reconstruct these text files in the file system of their users. As a software developers writing these text files, I quickly realized these dependencies were of little interest to me, though. What I wanted to be able to express was that a feature introduced by a patch PP relied on a fix introduced by a patch PP'.

I have experimented with Darcs and Pijul quite a bit, with this idea stuck in the back of my mind. At the end of this journey, I convinced myselfI am not trying to convince you, per say. This is a very personal and subjective feedback, it does not mean someone else couldn't reach a different conclusion. (1) this beautiful idea I had simply could not scale, and (2) neither I nor our industry is ready to give up on the extensive ecosystem that has been built on top of git just yet. As a consequence, my interest in Patch-based DVCS decreased sensibly.

Until very recently, that is. I got reminded of the appeal of a dependency graph for changesets when I started adopted a Gitlab workflow centered around Stacked Git and smaller, sometimes interdependent MRs.

A manually curated graph dependency for a whole repository is not practical, but what about my local queue of patches, patiently waiting to be integrated into the upstream repository I am contributing too? Not only does it look like a more approachable task, it could make synchronizing my stacked MRs a lot easier.

The workflow I have in mind would proceed as follows.

Because what we want is semantic dependencies, not syntactic dependencies between patches, I really think it makes a lot of sense to completely delegate the dependencies declaration to the developerFurther versions of Stacked Git could explore computing the dependency graph automatically, similarly to what Git does. But I think that if Darcs and Pijul told us anything, it's that this computation is far from being trivial. . The very mundane example that convinced me is the CHANGELOG file any mature software project ends up maintaining. If the contribution guidelines require to modify the CHANGELOG file in the same commit as a feature is introduced, then the patches to two independent features will systematically conflict. This does not mean, from my patch queue perspective, I should be forced to pop the first commit before starting to work on the second one. It just means that when I call stg prepare, I can have to fix a conflict, but fixing Git conflicts is part of the job after allAnd we have tools to help us. I wonder to which extends git rerere could save the day in some cases, for instance. . If for some reasons solving a conflict proves to be too cumbersome, I can always acknowledge that, and declare a new dependency to the appropriate patch. It only means I and my reviewers will be constrained a bit more than expected when dealing with my stack of MRs.

I am under the impression that this model extends quite nicely the current way Stacked Git is working. To its core, it extends its data model to constraint a bit push and pop, and empowers developers to organize a bit its local mess.