What’s missing in Git v2.0.0

I recently blogged about the Git v2.0.0 release, what changed, and why should you care. Unfortunately the conclusion was that nothing much changed (other than the usual new features and bug fixes). In this post I will discuss what should have changed, and why.

What is needed

Fortunately, Git has had the Git User’s Survey in the past, so we know what users want.

  1. user-interface: 3.25
  2. documentation: 3.22
  3. tools (e.g. GUI): 3.01
  4. more features: 2.41
  5. portability: 2.34
  6. performance: 2.28
  7. community (mailing list): 1.70
  8. localization (translation): 1.65
  9. community (IRC): 1.65

Obviously, since user-interface and documentation are the areas that need more improvement, that’s what Git v2.0.0 should have focused, right?

History

I already mentioned this in the other post, but I’ll do it again.

First of all, Git as a long history of never breaking user expectations (other than the Git v1.6.0 fiasco (which changed all the git-foo commands with ‘git foo’)), and as such a lot of thought is devoted into ways to minimize changes in behavior, or even how to avoid it completely. Perhaps too much care is devoted into this.

The preparation for Git v2.0.0 started more than three years ago with a mail from Junio C Hamano, asking for developers to submit ideas for changes that normally would not happen because they break backwards compatibility, he invited us to think as if “we were writing Git from scratch”. This big release that would break backwards compatibility was going to be named “1.8.0″ and people started to submit ideas for this important release. Eventually too much time passed, the versioning scheme changed, v1.8.0 was released, and the changes proposed for v1.8. slipped into what is now v2.0.

Since no substantial changes in behavior happened since v1.0, it would follow that v2.0 was an important release, and a good opportunity to gather all the ideas about what needs to change in Git. However, seemingly out of nowhere, without any discussion or even a warning, the maintainer tagged v2.0.0-rc0, and therefore all the features that were not already merged couldn’t be merged for v2.0.0.

Thus v2.0.0 was destined to have a small list of changes, and that’s how it remained.

What could have changed

The following is a list of things that I argued should be part of Git v2.0.0.

git update

I wrote a whole post about the issue, but basically, ‘git pull‘ is broken for the most common use-case: update the current branch.

This is a known issue that has been discussed over and over, and everyone agrees that it is indeed an issue, and something needs to be done to fix it.

There have been different proposals, but by far the most comprehensive and simple is to add a new ‘git update‘ command.

This way when you want to merge a pull request, you do ‘git pull‘, and when you just want to update the current branch, you do ‘git update‘, which by default would barf if there’s divergence between your local branch (e.g. ‘master’), and the remote one (e.g. ‘origin/master’), instead of doing a merge by default. This should decrease substantially the amount of “evil merges”, merges that happened by mistake, usually by somebody that is not familiar with Git.

The patches are relatively new, but the command is simple, so there isn’t much danger of screwing things up.

The publish tracking branch

I also wrote a blog post about this; basically Git’s support for triangular workflows is not the best.

A triangular workflow is when you pull from one location (e.g. central repo), and push to another (e.g. personal GitHub fork). If you are using upstream tracking branches (you should), you have to make a decision where you set your upstream; the central repo, or your personal one. Depending on which you use, is the advantages you get, but you cannot have it all.

But with the publish tracking branch you can have all the advantages.

I’ve been cooking these patches for a long long time and I have to say this is one essential feature for me, and they patches work perfectly.

Support for Mercurial and Bazaar

Support for Mercurial and Bazaar repositories has been cooking for a long time in the “contrib” area (you can both pull and push). At this point in time the code is production-ready, and it was already graduated and merged to be released in Git v2.1.

However, the maintainer suddenly changed his mind and decided it would be better to distribute them as third party tools. He didn’t give any valid reason and clearly didn’t think it through, but they are now separate.

The code is already widely used (git-remote-hg, git-remote-bzr), and could easily be merged.

Use “stage” instead of “index”

Everybody agrees that “index” is a horrible name for Git’s “staging area”, however, nobody has done much to fix the problem.

One first step is to replace all the –cached and –index options with –staged and –no-work, which are much simpler to understand.

Another step is to add a ‘git stage‘ command that acts as a helper to work with the staging area: ‘git stage add‘, ‘git stage diff‘, ‘git stage reset‘, ‘git stage rm‘, ‘git stage edit‘, and so on.

The patches are very straight-forward.

Default aliases

Virtually every version control system has default aliases (e.g. hg co, cvs ci, svn di, etc.), except Git.

Adding default aliases is very simple to do and only brings advantages. If you don’t like the default alias, you can override it.

Patches here.

Shoulda coulda woulda

It would have been great if you could just do ‘git clone hg::mercurial-repo‘ without installing anything extra, if everybody could start using ‘git update‘ instead of ‘git pull‘, if you could do ‘git stage diff‘, or ‘git reset --stage‘. Also, if triangular workflows were properly supported.

Unfortunately that’s not the case, and Git v2.0.0 is already released, and there isn’t much to be excited about.

You might think “perhaps for Git v3.0″ (which could happen in two years, or ten, how knows), but if the past is any indication of the future, it won’t happen, specially since I’ve given up on all these patches.

The fact of the matter is that in every release of Git, there is only one focus: performance. Despite the fact that it’s #6 in the list of concerns of users, Git developers work on this because that’s their area of expertise, because it’s fun for them, and because they get paid to do so. There are occasional new features, and a bit of portability now and then, but for the most part Windows support is neglected in Git, which is why the msysgit project was born.

The documentation will always remain cryptic, because for the developers, it’s not cryptic, it’s very clear. And the user-interface will never change, because the developers don’t like change.

If you don’t believe me look at the backwards-incompatible changes in Git v2.0.0, or in fact, try to think back to the last time Git changed anything. Personally other than the git-foo -> ‘git foo’ change in v1.6.0 (which was horribly handled), I can’t think of anything but minor changes.

Anyway, you can use all these features I listed today (and more) if you use git-fc instead of Git. It is my own fork of Git that has all the features of Git, plus more.

Is there anything in that list that I missed? Do you think Git v2.0.0 has enough changes as it is?

Git v2.0.0, what changed, and why should you care

Git v2.0.0 is a backward-incompatible release, which means you should expect differences since the v1.x series.

Unless you’ve been following closely the Git mailing list, you probably don’t know the history behind the v2.0 release, which started long time ago (more than three years). It all started with a mail from Junio C Hamano, asking for developers to submit ideas for changes that normally would not happen because they break backwards compatibility, he invited us to think as if “we were writing Git from scratch”. This big release that would break backwards compatibility was going to be named “1.8.0″ and people started to submit ideas for this important release. Eventually too much time passed, the versioning scheme changed, v1.8.0 was released, and the changes proposed for v1.8. slipped into what is now v2.0.

Parts of v2.0 have been already been deployed one way or the other (for example if you have configured ‘push.default = simple’), but finally today we have v2.0 final. And here are the big changes that we got.

‘git push’ default has changed

Here’s what the release notes say:

When "git push [$there]" does not say what to push, we have used the
traditional "matching" semantics so far (all your branches were sent
to the remote as long as there already are branches of the same name
over there).  In Git 2.0, the default is now the "simple" semantics,
which pushes:

 - only the current branch to the branch with the same name, and only
   when the current branch is set to integrate with that remote
   branch, if you are pushing to the same remote as you fetch from; or

 - only the current branch to the branch with the same name, if you
   are pushing to a remote that is not where you usually fetch from.

You can use the configuration variable "push.default" to change
this.  If you are an old-timer who wants to keep using the
"matching" semantics, you can set the variable to "matching", for
example.  Read the documentation for other possibilities.

Is that clear? Given the bad track record of Git documentation it wouldn’t surprise me if you didn’t get what this chunk of text is trying to say at all. Personally I find it much easier to read the code to figure out what is happening.

So let me try to explain. When you type ‘git push’ (without any arguments), Git uses the configuration ‘push.default’ in order to find out what to push. Before ‘push.default’ defaulted to ‘matching’, and now it defaults to ‘simple’.

The ‘matching’ configuration essentially converts ‘git push‘ into ‘git push origin :‘, which means push all the matching branches, so if you have a local ‘master’, and there’s a remote ‘master’, ‘master’ is pushed; if you have a local and remote ‘fix-1′, ‘fix-1′ is pushed, if you have a local ‘ext-feature-1′, but there’s no matching remote branch, it’s not pushed, and so on.

The ‘simple’ configuration pushes a single branch instead, and it uses your configured upstream branch (see this post for a full explanation of the upstream branch), so if your current branch is ‘master’, and if ‘origin/master’ is the upstream of your ‘master’ branch, ‘git push’ will basically be the same as ‘git push origin master‘, or to be more specific ‘git push origin master:master‘ (the upstream branch can have a different name).

Note: If you are not familiar with the src:dst syntax; you can push a local branch ‘src’ and have the ‘dst’ name on the server, so you don’t need to rename a local branch, you can do ‘git push origin foobar:feature-a’, and your local branch “foobar” will be named “feature-a” on the server. This has nothing to do with v2.0.

However, if the current branch is ‘fix-1′ and the upstream is ‘origin/master’, ‘git push’ will complain that the name of the destination branch is not the same, because it doesn’t know if to do ‘git push origin fix-1:master‘ or ‘git push origin fix-1:fix-1‘.

Additionally if you do ‘git push github‘ (not the remote of your upstream branch), Git will simply use the name of the current branch, essentially ‘git push github fix-1‘ (‘fix-1′ being the name of the current branch).

This mode is anything but simple to describe. But perhaps the name is OK, because you can expect it to “simply work”.

Would I care?

If you don’t type ‘git push’, but instead specify what and where to push… you don’t care.

If you have configured ‘push.default’ already, which most likely you already did, because otherwise you will be getting the following annoying message all the time since two years ago… you don’t care.

warning: push.default is unset; its implicit value is changing in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the current behavior after the default changes, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

When push.default is set to 'matching', git will push local branches
to the remote branches that already exist with the same name.

In Git 2.0, Git will default to the more conservative 'simple'
behavior, which only pushes the current branch to the corresponding
remote branch that 'git pull' uses to update the current branch.

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

So, most likely you don’t care.

‘git add’ in directory

Here’s what the release notes say:

When "git add -u" and "git add -A" are run inside a subdirectory
without specifying which paths to add on the command line, they
operate on the entire tree for consistency with "git commit -a" and
other commands (these commands used to operate only on the current
subdirectory).  Say "git add -u ." or "git add -A ." if you want to
limit the operation to the current directory.

Although this is a clearer explanation, it’s not very clear what is changing, so let me give you can example.

Say you have modified two files, ‘README’ and ‘test/basic.t’, then you go to the ‘test’ directory, and run ‘git add -u‘, in pre-v2.0 only ‘test/basic.t’ will be staged, in post-v2.0 both files will be staged. If you run the command in the top level directory, nothing changes.

Would I care?

If you haven’t seen the following warning while doing ‘git add -u‘ or ‘git add -A‘, or if you don’t even use those options, you are fine.

warning: The behavior of 'git add --update (or -u)' with no path argument from a
subdirectory of the tree will change in Git 2.0 and should not be used anymore.
To add content for the whole tree, run:

  git add --update :/
  (or git add -u :/)

To restrict the command to the current directory, run:

  git add --update .
  (or git add -u .)

With the current Git version, the command is restricted to the current directory.

‘git add’ adds removals

Here’s what the release notes say:

"git add " is the same as "git add -A " now, so that
"git add dir/" will notice paths you removed from the directory and
record the removal.  In older versions of Git, "git add " used
to ignore removals.  You can say "git add --ignore-removal " to
add only added or modified paths in , if you really want to.

Again, it should be clearer with an example. Say you removed the file ‘test/basic.t’ and added a new file ‘test/main.t’, those changes are not staged, so you stage them with ‘git add test/’, pre-v2.0 ‘test/basic.t’ would remain tracked, post-v2.0, ‘test/basic.t’ is removed from the stage.

Would I care?

If you haven’t seen the following warning while doing ‘git add‘, you are fine.

warning: You ran 'git add' with neither '-A (--all)' or '--ignore-removal',
whose behaviour will change in Git 2.0 with respect to paths you removed.
Paths like 'test/basic.t' that are
removed from your working tree are ignored with this version of Git.

* 'git add --ignore-removal ', which is the current default,
  ignores paths you removed from your working tree.

* 'git add --all ' will let you also record the removals.

Run 'git status' to check the paths you removed from your working tree.

The rest

The "-q" option to "git diff-files", which does *NOT* mean "quiet",
has been removed (it told Git to ignore deletion, which you can do
with "git diff-files --diff-filter=d").

Most people don’t use this command, thus don’t care.

"git request-pull" lost a few "heuristics" that often led to mistakes.

Again, most people don’t use this command, which is mostly broken anyway.

The default prefix for "git svn" has changed in Git 2.0.  For a long
time, "git svn" created its remote-tracking branches directly under
refs/remotes, but it now places them under refs/remotes/origin/ unless
it is told otherwise with its "--prefix" option.

If you don’t use ‘git svn’, you don’t care. If you don’t see a difference between ‘trunk’ and ‘origin/trunk’, you don’t care.

tl;dr

You probably don’t care about these backward-incompatible changes. Sure, Git v2.0.0 received a good dosage of new features and bug-fixes, but so did v1.9.0, and all the versions before.

Given the fact that Git v2.0.0 has been cooking for three years, I think it’s a big missed opportunity that nothing really changed, specially given that in previous user surveys people have said the user-interface and documentation needs to improve, and there have been patches to try to do so. In a separate post I discuss what I think Git v2.0.0 should have included.

Is ‘git pull’ broken? If so, what’s the fix?

Is ‘git pull’ really broken? I know what you are thinking; such a pervasive and basic command cannot possibly be broken. Unfortunately, it is.

It is not some marginal issue, many experienced Git users avoid ‘git pull’ and even urge newcomers to avoid using that command, there’s many sites that encourage you to not use the command, and there have been a lot of threads on the mailing list about the issue (Pull is mostly evil, A failing attempt to use Git in a centralized environment), the maintainer, Junio C Hamano has accepted there’s a big problem, even Linus Torvalds agreed something needs to change.

In order to identify the problem we first need to define the two main ways ‘git pull’ is used.

Pull requests

One way ‘git pull’ is used, is to integrate pull requests into the mainline. For example in the Linux kernel, the DRM maintainer sends a pull request to Linus Torvalds, saying basically:

The following changes are available in the git repository at:

git://people.freedesktop.org/~airlied/linux drm-next

So Linus can just do:

git pull git://people.freedesktop.org/~airlied/linux drm-next

In this mode ‘git pull’ actually works fine, which is not too surprising, since it’s the main thing Linus Torvalds does.

However, this is not the way most people use ‘git pull’.

Update branch

What most people do is for example update their local ‘master’ branch, to the remote ‘origin/master’ branch. Essentially doing ‘git fetch origin’, ‘git merge origin/master’.

However, that’s not exactly what most people actually want to do.

If you don’t have any changes of your own in ‘master’, then yes, ‘git pull’ does what you want, but if you do have changes, and thus the branches have diverged, then ‘git pull’ will create a new merge commit. This might or might not be what you want, but the majority of Git newbies do not want that, or rather, the team they contribute to don’t want those “evil merges”. Unfortunately these newbies don’t know what they are doing, and Git is not making it easier.

So you end up with something like this:

git-pull

Most likely what the team wants is that the local chances are rebased on top of the remote ones, but if they want a merge, they want it the other way around, that is: merge the local changes to the remote ones, as if a topic branch was merged.

git-pull-fix

A merge with this order of parents has many advantages, including a clearer history, however, it’s not possible to do that with ‘git pull’, so you have to do ‘git fetch’, create a new branch, switch to the master branch, merge the other branch, and finally remove the other branch. It’s not straight-forward at all.

It is this mode that is broken, and that’s the reason many people try to avoid ‘git pull’; it rarely does what you want by default.

The solution

There have been many solutions proposed, however, there are many many use-cases to consider, and a solution that takes them all into consideration for the future is not easy to find.

The best solution that seems to accommodate all present use-cases and future ones is the introduction of a new command: ‘git update‘.

By default this command will complain if the branches have diverged, so you have to either do ‘git update --rebase‘ or ‘git update --merge‘, this ensures that newbies aren’t going to do “evil merges” by mistake.

Also, when you do a ‘git update --merge‘ the order of the parents is reversed, which means it appears you are merging ‘master’ to ‘origin/master’, and not the other way around as it happens with ‘git pull’, which means it appears as if you are merging a topic branch, which is what most people want.

git-update

There are many many more advantages to this new command, but probably too subtle to mention in this post.

When will this be ready?

Probably never. I sent a summary of the issues and the solution to the mailing list, which addresses all the use-cases that were discussed. I have the required patches with tests and documentation on my personal branch, and I’ve been using this new command for a while now.

Why isn’t this picked? Maybe it’s because none of the core developers experience these issues. Maybe because they don’t use ‘git pull’ in the second form. Who knows.

The fact is that there is no interest to get this fixed, even though the issue has been acknowledged, so it’s not likely to be fixed any time soon.

So what can you do about it? The best thing you can do right now is simply avoid using ‘git pull’. Additionally, you might want to instruct your fellow coworkers to avoid unsing it as well, specially the ones that are not very familiar with Git.

Also, you might want to use my fork, git-fc, which does have the ‘git update‘ command, which works better than ‘git pull‘ even when there’s no branch divergence, and when there is, ‘git update --merge‘ is also superior, because the order of the parents is right.

Using Git with triangular workflows; tips, tricks, and more

Chances are you are using a triangular workflow, even if you don’t know it. A triangular workflow simply means that you pull from one repository, and push to another. This is what the vast majority of Git users do, unfortunately most of the good stuff is buried in the nearly incomprehensible official manpages.

In this blog post I’ll try to shine some light into triangular workflows, how to make use of the upstream tracking branch for them, and explain the new publish tracking branch.

The basics

Say you clone a repository:

% git clone https://github.com/tiimgreen/github-cheat-sheet
% cd github-cheat-sheet

Then you do some changes and want to share them back.

What most people would do is create a fork in GitHub and push their changes there.

% git remote add mine https://github.com/felipec/github-cheat-sheet
% git push mine

After doing that they do a pull request so their changes can be merged to the original repository.

This workflow is not specific to GitHub by any means, for example the Linux kernel developers have the main repository in git.kernel.org, and they send pull requests by mail using repositories all over the map (example).

The help

If you do this over and over it becomes clear that a little help from Git would be nice.

The first thing you can do is setup the configuration ‘remote.pushdefault’ to the repository you usually push to (in the above case ‘mine’). So now you can type `git push` instead of `git push mine` every time.

The next thing would be to setup an upstream tracking branch (read my blog post about it if you are not familiar with it).

% git branch --set-upstream-to mine/fix-typos

Then Git would greet you with the following help:

Your branch is ahead of 'mine/fix-typos' by 1 commit.

This is telling you that you probably want to push your branch again, since it’s not up-to-date in the remote. It shows you that each time you switch to that branch, or when you do `git status`.

Moreover, `git branch -vv` would show you this help:

* fix-typos ... [mine/fix-typos: ahead 1] Fix a bunch of typos

So it seems Git already has tons of help for this workflow, doesn’t it? Not so fast.

The real upstream

The upstream tracking branch is useful for other purposes, but for that we need to set a different upstream:

% git branch --set-upstream-to origin/master

Now that the upstream is ‘master’ in the ‘origin’ remote, and when you run `git status`, you get:

Your branch and 'origin/master' have diverged,
and have 2 and 10 different commits each, respectively.

What that message is telling you is that ‘origin/master’ has moved, so there are 10 commits in ‘origin/master’ that your branch doesn’t have (and your branch has 2 commits ‘origin/master’ doesn’t have). In those cases you probably would want to rebase on top of ‘origin/master’ so that it’s easier for upstream maintainers to merge your branch, although you can merge ‘origin/master’ too, or simply do nothing and hope there are no conflicts. Either way the information is useful so you can decide what to do.

In addition, if you want to rebase, the command is easier; instead of `git rebase origin/master` you can just type `git rebase`, since `git rebase` by default uses the upstream tracking branch.

Moreover, if you always stay up-to-date, you can do `git pull --rebase`, which will fetch all remote the branches, and then rebase your current branch (e.g. ‘fix-typos’) on top of the upstream (e.g. ‘origin/master’). You can also configure ‘pull.rebase = true’ to always do this when you type `git pull`.

Not to mention that `git branch -vv` gives a much more useful information:

* fix-typos ... [master: ahead 2, behind 10] Fix a bunch of typos

Check how it looks in my real repository:

git branch --vv with upstream

You get other additional benefits, like for example you get warned if you try to delete a branch that hasn’t been merged to its upstream:

warning: not deleting branch 'fix-typos' that is not yet merged to
'origin/master', even though it is merged to HEAD.
error: The branch 'fix-typos' is not fully merged.
If you are sure you want to delete it, run 'git branch -D fix-typos'.

This is actually what the upstream tracking branch is meant for: to track the upstream, that is; the target branch where eventually all the commits of the source branch eventually should end up. All the commits of ‘fix-typos’ should end up in ‘origin/master’, therefore ‘origin/master’ is the upstream of ‘fix-typos’.

We want to have all the goodies of tracking ‘origin/master’ as our upstream, but we also want to track ‘mine/fix-typos’ so we know when we need to push. Unfortunately we can’t set them both as upstream, so we must choose one set of benefits over the other. Or should we?

The solution

The solution is not that hard to figure out: we need another upstream! Or rather; we need some concept that is similar to the upstream tracking branch, but instead of tracking the final destination, we track the location we push our commits to.

This is the publish tracking tracking branch.

When you set it up, you get all the information:

Your branch and 'origin/master' have diverged,
and have 2 and 10 different commits each, respectively.
Some commits haven't been published to 'mine/fix-typos'.

* fix-typos ... [origin/master, mine/fix-typos *: ahead 2, behind 10]

Notice the extra ‘*’ next to the publish branch, which hints that it needs to be published.

Also, you can type `git pull` and `git rebase`, which will use the upstream branch as you would expect, and `git push` which will use the publish branch.

In other words; everything just works perfectly.

You set up the publish branch just like you set up the upstream branch:

% git branch --set-publish-to mine/fix-typo

Or:

% git push --set-publish mine

But wait, there’s more: you are not tied to push to a single remote; you can set different branches in different remotes as publish tracking. For example ‘fix-typos’ to ‘github/fix-typos’, ‘bug-fix’ to ‘client/bug-fix’, and so on. You can even choose a different branch name in the remote: ‘client-b-bug-fix’ to ‘client-b/bug-fix’.

Nice, isn’t it?
git branch -vv publish

The problem

There is only one problem with the publish branch: it’s not in upstream git :(

It is part of my fork, git-fc. If you use my fork, you will get this and other features, and you won’t loose any feature from official Git. Or you can use the specific branch, ‘fc/publish‘.

I’ve been using this code for more than half a year, and it has been reviewed in the Git mailing list, so you can trust it won’t eat your babies :)

Why isn’t it in official Git?

WARNING: if you don’t like conflicts or you know me for “adversarial” style (and don’t like it), skip this section

That’s a very good question. If the maintainer (Junio C Hamano) has accepted the triangular workflows are lacking, and a separate ‘upstream’ tracking branch is needed. Why isn’t it there?

The short answer is that they have an ad hominem thing against me, so even if my patches are correct and they solve a long-standing problem, they are not applied. They are only picked if they are trivial, or not controversial, or obvious fixes. Which is why I started a fork.

I sent the original version of the patches in September 2013, with virtually no comments. Then on January 2014 people start discussing (once again) about the issues with triangular workflows, and even complain about the lack of @{publish}. Eventually they start writing preparatory patches. But I had already written the whole thing several months ago!

It can’t be attributed to the fact they went inadvertently unnoticed because I re-sent the series once, and because I wrote about the support for @{publish} when I announced the git-fc fork.

Then I returned to the project after a long hiatus, and noticed they were working on something I already did, so let them know and send the patches again. This time they receive more feedback, and even make it into Junio’s “pu” (proposed updates) branch. Patches are often dropped from “pu”, sometimes for no reason at all, so this is not a reason they will get in.

This is the message Junio attached to the patch series:

 Add branch@{publish}; it seems that this is somewhat different from
 Ram and Peff started working on.  There were many discussion
 messages going back and forth but it does not appear that the
 design issues have been worked out among participants yet.

The “design issues” have not been worked out because “Ram” is not actively working on Git anymore (possibly thanks to the fact that nothing ever changes), and “Peff” said he wasn’t interested in the @{publish} concept, but more like a @{push} concept which will only benefit him and his weird bare-bones mode of interacting with Git. The fact that the @{publish} concept is what would benefit a vast majority of the user base is of no consequence to “Peff”.

So will it ever get into Git’s mainline? Who knows.

Get the goodies

If you want to use the publish tracking branch feature, get git-fc and follow the installation instructions. In addition you would get a ton of other features, and will loose none :)

If you use ArchLinux, you can get the package from AUR.

Enjoy :)

Demystifying the init system (PID 1)

With all the talk about debian choosing a default init system (link, link), I’ve decided to share with the world a little project I’ve been working on to help me understand /sbin/init aka. PID 1.

In this blog post I will go step by step showing what an init system must do to be functional. I will ignore all the legacy SysVinit stuff, and technical nuances, and just concentrate on what’s really important.

Introduction

First of all, what is ‘init‘? In it’s essence it’s a process that must be running at all times, if this process ends, the kernel enters into a panic mode, after which you cannot do anything else, except rebooting.

This process doesn’t need to do anything special, you can use /bin/sh as your init, or even /bin/yes (although the latter wouldn’t be very useful).

So let’s write our very first init.

#!/usr/bin/ruby
Process.spawn('agetty', 'tty1')
sleep

Believe it or not, this is actually a rather useful init. How useful it is depends on how your kernel was compiled, your partitioning scheme, and if your root file-system is mounted rw or not. But either way, it covers the basics: rule #1; always keep running no matter what.

This is almost true, except that we need to be listening for SIGCHLD, otherwise some processes wouldn’t be cleaned up properly, so:

Signal.trap(:SIGCHLD) do
  loop do
    begin
      status = Process.wait(-1, Process::WNOHANG)
      break if status == nil
    rescue Errno::ECHILD
      break
    end
  end
end

Reboot

Now that we have the running indefinitely under control, it’s time to stop running (only when requested), but in order to do that we need some kind of IPC with the running process. There’s many ways to achieve this, but I chose UNIX sockets to do that.

So instead of sleeping forever, we listen for commands issued to /run/initctl:

begin
  server = UNIXServer.open('/run/initctl')
rescue Errno::EADDRINUSE
  File.delete('/run/initctl')
  retry
end

loop do
  ctl = server.accept
  cmd = ctl.readline.chomp.to_sym
  # do stuff
end

And when the user is calling us with arguments, we pass those commands through /run/initctl.

def do_cmd(*cmd)
  ctl = UNIXSocket.open('/run/initctl')
  ctl.puts(cmd.join(' '))
  puts(ctl.readline.chomp)
  exit
end

case ARGV[0]
when 'poweroff', 'restart', 'halt'
  do_cmd(ARGV[0].to_sym)
end

So can issue the command init poweroff to turn off the machine, but in order to do that we need to tell the kernel:

def sys_reboot(cmd)
  map = { poweroff: 0x4321fedc, restart: 0x01234567, halt: 0xcdef0123 }
  syscall(169, 0xfee1dead, 537993216, map[cmd])
end

These numbers are not important, what is important is that the kernel understands them, and with this we actually turn off the machine (or halt, or reboot).

Thread carefully

Obviously it would be tedious to type a bunch of commands each time the machine starts, so we need to actually do stuff after booting, however, if we do something wrong, we might render the system unusable. A simple way to solve this is to use scripts, fork a shell, and let it run those, so if there’s something wrong with the scripts, the shell dies, but not PID 1, so the system remains usable, which again, is rule #1.

Fortunately Ruby has exceptions, so we can run code with a safety net that catches all exceptions, and there’s no need to fork, which would waste precious booting time.

def action(name)
  print(name)
  begin
    yield
  rescue => e
    print(' (error: %s)' % e)
  end
  puts
end

With this helper, we can safely run chunks of code, and if they fail, the error is reported to the user.

Initialization

This is the bulk of the code; the instructions you don’t want to type every time. This is mostly tedious stuff, you can skim or skip this section safely.

def mount(type, device, dir, opts)
  Dir.mkdir(dir) unless File.directory?(dir)
  system('mount', '-t', type, device, dir, '-o', opts)
end

action 'Mounting virtual file-systems' do
  mount('proc', 'proc', '/proc', 'nosuid,noexec,nodev')
  mount('sysfs', 'sys', '/sys', 'nosuid,noexec,nodev')
  mount('tmpfs', 'run', '/run', 'mode=0755,nosuid,nodev')
  mount('devtmpfs', 'dev', '/dev', 'mode=0755,nosuid')
  mount('devpts', 'devpts', '/dev/pts', 'mode=0620,gid=5,nosuid,noexec')
  mount('tmpfs', 'shm', '/dev/shm', 'mode=1777,nosuid,nodev')
end

And set the hostname.

action 'Setting hostname' do
  hostname = File.read('/etc/hostname').chomp
  File.write('/proc/sys/kernel/hostname', hostname)
end

Notice that many things can go wrong, for example the file ‘/etc/hostname’ might not exist, however, that would cause an exception, and our init would continue just fine.

Another thing we would want to do is kill all the processes, otherwise we might not be able to unmount the file-systems. We could do killall5, but we wouldn’t have much control over the processes, and that would require a fork. Instead we can rely on the kernel to do the right thing, and all we have to do is wait for the results.

def killall

  def allgone?()
    Dir.glob('/proc/*').each do |e|
      pid = File.basename(e).to_i
      begin
        next if pid < 2
        # Is it a kernel process?
        next if File.read('/proc/%i/cmdline' % pid).empty?
      rescue Errno::ENOENT
      end
      return false
    end
    return true
  end

  def wait_until(timeout = 2, interval = 0.25)
    start = Time.now
    begin
      break true if yield
      sleep(interval)
    end while (Time.now - start) < timeout
  end

  ok = false

  action 'Sending SIGTERM to processes' do
    Process.kill(:SIGTERM, -1)
    ok = wait_until(10) { allgone? }
    raise 'Failed' unless ok
  end

  return if ok

  action 'Sending SIGKILL to processes' do
    Process.kill(:SIGKILL, -1)
    ok = wait_until(15) { allgone? }
    raise 'Failed' unless ok
  end

end

Time to mount real file-systems:

NETFS = %w[nfs nfs4 smbfs cifs codafs ncpfs shfs fuse fuseblk glusterfs davfs fuse.glusterfs]
VIRTFS = %w[proc sysfs tmpfs devtmpfs devpts]

action 'Mounting local filesystems' do
  except = NETFS.map { |e| 'no' + e }.join(',')
  system('mount', '-a', '-t', except, '-O', 'no_netdev')
end

# On shutdown

action 'Unmounting real filesystems' do
  except = (NETFS + VIRTFS).map { |e| 'no' + e }.join(',')
  system('umount', '-a', '-t', except, '-O', 'no_netdev')
end

If you are using a modern distribution, chances are your /run and /tmp directories are cleared up on every boot, so many files and directories need to be re-created. We could do this by hand, but we could also use the systemd-tmpfiles utility which uses the configuration already provided by your distribution in tmpfiles.d directories.

action 'Manage temporary files' do
  system('systemd-tmpfiles', '--create', '--remove', '--clean')
end

begin
  File.delete('/run/nologin')
rescue Errno::ENOENT
end

Unless you are using a custom kernel with modules built-in, chances are you are going to need udev, so fire it up:

action 'Starting udev daemon' do
  system('/usr/lib/systemd/systemd-udevd', '--daemon')
end

action 'Triggering udev uevents' do
  system('udevadm', 'trigger', '--action=add', '--type=subsystems')
  system('udevadm', 'trigger', '--action=add', '--type=devices')
end

action 'Waiting for udev uevents to be processed' do
  system('udevadm', 'settle')
end

# On shutdown

action 'Shutting down udev' do
  system('udevadm', 'control', '--exit')
end

Finally

After all this initialization stuff, your system is most likely very usable already, and in fact I was able to start a display manager (SLiM) at this point, which was my main goal while writing this. But we are just getting started.

In control

Another thing init should do is keep track of launched daemons. Each time we do that we store the PID, and when the child exists, we remove it from the list.

def start(id, cmd)
  $daemons[id] = Process.spawn(*cmd)
end

start('agetty1', %w[agetty tty1])

# On SIGCHLD
key = $daemons.key(status)
$daemons.delete(key) if key

Once we have this it’s trivial to report the status of them (e.g. init status agetty1).

ctl.puts($daemons[args.first] ? 'ok' : 'dead')

At this point we actually have a feature that SysVinit doesn’t have. Not bad for 200 lines of code!

cgroups

cgroups is a feature that is often misunderstood, probably because there are no good tools to make use of them, but they are not that hard. Lennart Pottering went to a lot of trouble trying to explain exactly what systemd does with them and it does not, but I don’t think he did a very good job of clarifying anything. Basically systemd is not doing anything with them Normally systemd is not doing anything with them (by default), simply labeling processes so you can see how they are grouped by using visualization tools like systemd-cgls, but that’s it.

The single most important way you can take advantage of cgroups is for scheduling purposes, so for example your web browser is a control group, and your heavy compilation is in another, then Linux scheduler would isolate the two processes from stealing resources from each other without the need of adjusting the nice level. Basically with cgroups there’s no need for nice (although you can use alongside).

But you don’t have to move a finger to get this benefit, the kernel already does that if you have CONFIG_SCHED_AUTOGROUP, which you should. Then, cgroups would be created for each session in the system, if you don’t know what sessions are, you can run ‘ps f -eo pid,sid,cmd‘ to find out to which session id each process belongs to.

To prove this I wrote a little script that finds out the auto-grouping as reported by the Linux kernel, and you can find groups like:

------------------------------------------------------------------------------
503	slim -nodaemon
895	/bin/sh /etc/xdg/xfce4/xinitrc -- /etc/X11/xinit/xserverrc
901	dbus-launch --sh-syntax --exit-with-session
938	xfce4-session
948	xfwm4
952	xfce4-panel
954	Thunar --daemon
956	xfdesktop
958	conky -q
964	nm-applet
------------------------------------------------------------------------------

This is exactly what you would expect, the session leader (SLiM) starts a bunch of processes, and all of them belong to the same session, and if I compile a Linux kernel, I get:

------------------------------------------------------------------------------
14584	zsh
17920	make
20610	make -f scripts/Makefile.build obj=arch/x86
20661	make -f scripts/Makefile.build obj=kernel
20715	make -f scripts/Makefile.build obj=mm
20734	make -f scripts/Makefile.build obj=arch/x86/kernel
20736	make -f scripts/Makefile.build obj=fs
20750	make -f scripts/Makefile.build obj=arch/x86/kvm
20758	make -f scripts/Makefile.build obj=arch/x86/mm
21245	make -f scripts/Makefile.build obj=ipc
21274	make -f scripts/Makefile.build obj=security
21281	make -f scripts/Makefile.build obj=security/keys
21376	/bin/sh -c set -e; 	   echo '  CC      mm/mmu_context.o'; ...
21378	gcc -Wp,-MD,mm/.mmu_context.o.d ...
21387	/bin/sh -c set -e; 	   echo '  CC      ipc/msg.o'; ...
21390	gcc -Wp,-MD,ipc/.msg.o.d ...
21395	/bin/sh -c set -e; 	   echo '  CC      kernel/extable.o'; ...
21399	/bin/sh -c set -e; 	   echo '  CC [M]  arch/x86/kvm/pmu.o'; ...
21400	gcc -Wp,-MD,kernel/.extable.o.d ...
21403	gcc -Wp,-MD,arch/x86/kvm/.pmu.o.d .
21405	/bin/sh -c set -e; 	   echo '  CC      arch/x86/kernel/probe_roms.o'; ...
21407	gcc -Wp,-MD,arch/x86/kernel/.probe_roms.o.d ...
21413	/bin/sh -c set -e; 	   echo '  CC      fs/inode.o'; ...
21415	/bin/sh -c set -e; 	   echo '  CC      arch/x86/mm/srat.o'; ...
21418	/bin/sh -c set -e; 	   echo '  CC      security/keys/keyctl.o'; ...
------------------------------------------------------------------------------

This group will contain a lot of processes that take a lot of resources, but the scheduler knows they belong to the same group. If somebody logs in to my machine and starts running folding@home we would have two cgroups trying to use 100% of the CPU, so the scheduler would assign 50% to one, and 50% to the other, even though the first one has many more processes. Without the grouping, the scheduler would be unfair against folding@home, giving it as much time as it gives each one of the compilation processes.

All this without you moving a finger. Well, almost.

def start(id, cmd)
  pid = fork do
    Process.setsid()
    exec(*cmd)
  end
  $daemons[id] = pid
end

Socket activation

systemd has made a lot of fuss about socket activation, and how it’s the next best thing after sliced bread. I agree it’s a great idea, but the idea didn’t come from systemd, AFAIK it came from OSX. But, do we need systemd to get the same in Linux?

def start_with_socket(id, stream, cmd)

  server = TCPServer.new(stream)

  Thread.new do
    loop do
      socket = server.accept
      system(*cmd, :in => socket, : out => socket)
    end
  end

end

start_with_socket('sshd', 22, %w[/usr/bin/sshd -i])

Believe it or not, this simple code achieves socket activation. We create a socket, and a new thread that waits for connections, if nobody connects, nothing happens, we have an idle thread, each time somebody connects, we launch ssh -i, which as far as I can tell is the same thing xinetd does, and systemd.

But hey, this is the simple socket activation, it’s not the really fancy one.

Thread.new do
  if managed
    IO.select([server])
    pid = fork do
      env = {}
      env['LISTEN_PID'] = $$.to_s
      env['LISTEN_FDS'] = 1.to_s
      Process.setsid()
      exec(env, *cmd, 3 => server)
    end
    $daemons[id] = pid
  else
    loop do
      socket = server.accept
      system(*cmd, :in => socket, : out => socket)
    end
  end
end

There, this does exactly the same thing as systemd (at least for one socket, multiple ones are easy too), so yeah, we have socket activation.

But wait, there’s more

Hopefully this covers the basics of what an init system should do, and how it’s not rocket science, nor voodoo. It is actually something very straightforward; start the system, keep it running, simple. Of course there’s many other things an operating system should do, but those things don’t belong to the init system, don’t let anyone tell you otherwise.

I have more changes on top of this that bring my little toy init system almost up-to-par to Arch Linux’s initscripts, which is what they used before moving to systemd, so chances are if you use my init, you would have little to no problems in your own system.

Unlike systemd and others, this code is actually very readable, so you can add and remove code as you like very easily, and of course, the less code you have, the faster you boot.

Personally when I hear somebody saying “Oh! but OpenRC doesn’t have socket activation, we need systemd!”, I just roll my eyes.

If you want to give it a try, get the code from GitHub:

https://github.com/felipec/finit

Cheers.

finit

Announcing git-fc; a friendly fork of Git

I’ll start with the obvious question; why a fork? Well, the short answer is; my patches are not being applied, the long answer is convoluted and would require long explanation of how Git development works, principles and guidelines, but more importantly the culture of the core developers, and I’m not going to get into that, maybe in the comments section if somebody is interested.

So what is git-fc? It is a friendly fork, and by that I mean that it’s a fork that won’t deviate from the mainline, it is more like a branch in Git terms. This branch will move forward close to Git’s mainline, and it could be merged at any point in time, if the maintainer wished to do so.

git-fc doesn’t include experimental code, or half-assed features, so you can expect the same level of stability as Git’s mainline. Also, it doesn’t remove any feature, or do any backwards incompatible changes, so you can replace git with git-fc and you wouldn’t notice the difference. The delta comes in the extra features that I’ll describe in detail below, that is all.

Who am I? I’ve contributed many patches to Git, mainly the git-remote-hg/bzr two-way bridges, but many many other things. Here’s a list of the top 10 contributors to Git since last year by number of patches:

% git shortlog --since='1 year ago' --no-merges -n -s | head -n 10
   388	Junio C Hamano
   308	Felipe Contreras
   230	Jeff King
   161	Nguyễn Thái Ngọc Duy
   122	Michael Haggerty
   103	Ramkumar Ramachandra
    96	John Keeping
    69	Eric Sunshine
    59	Thomas Rast
    51	René Scharfe

More info in ohloh.

As you see, I’ve done a lot of work for Git’s mainline, so chances are you have already benefited from my code one way or the other.

However, the most interesting patches are not merged. I wrote a summary of my 160 patches, explaining their status, so Git developers would prioritize them, but I think it’s fair to say they are just not going to apply them.

So, what do you get if you use git-fc?

@ shortcut

Many people have suggested a shortcut for the non-particularly-intuitive “HEAD”, but none of these suggestions seemed very appealing, or feasible.

Because Git already has an ref@op revision syntax, where if you remove the ref, HEAD is implied, I thought @ could be thought as HEAD.

This change was welcome and accepted by the Git mainline, and it even was on track for v1.8.4 but it was dropped last minute because of some issues that are fixed now, and you probably will see it in v1.8.5. But why wait? :)

Nice ‘branch -v’

If you have configured the upstream tracking branch for your branches (I wrote a blog post about them), when you do ‘git branch -v’ you see something like this:

  fc/branch/fast      177dcad [ahead 2] branch: reorganize verbose options
  fc/stage            abb6ad5 [ahead 14] completion: update 'git reset' ...
  fc/transport/improv eb4d3c7 [ahead 10] transport-helper: don't update ...

While that provides useful information, it doesn’t show the upstream tracking branch, just says “ahead 2″ but “ahead 2″ compared to what?

If you do ‘git branch -vv’, then you see the answer:

  fc/branch/fast      177dcad [master: ahead 2] branch: reorganize ...
  fc/stage            abb6ad5 [master: ahead 14] completion: update ...
  fc/transport/improv eb4d3c7 [master: ahead 10] transport-helper: don't ...

Unfortunately both options take a lot of time (relative to most Git commands which are instantaneous), because computing the “ahead 2″ takes a lot of time. So I decided to switch things around, so ‘git branch -v’ gives you:

  fc/branch/fast      177dcad [master] branch: reorganize verbose options
  fc/stage            abb6ad5 [master] completion: update 'git reset' new ...
  fc/transport/improv eb4d3c7 [master] transport-helper: don't update refs ...

And it does so instantaneously.

Default aliases

Many (if not all) version control system tools have shortcuts for their most common operations; hg ci, svn co, cvs st. But not Git. You can configure your own aliases manually, but you might have some trouble if you use somebody else’s machine.

Adding default aliases is trivial, it helps everyone, and it doesn’t hurt anyone, yet the patch to do so was rejected.

For now, there are only four aliases, but more can be added later if they are requested.

co = checkout
ci = commit
rb = rebase
st = status

If you have already these aliases, or mapped to something else, your aliases would take precedence over the default ones, so you won’t have any problems.

Streamlined remote helpers

I have spent a lot of time working on git-remote-hg and git-remote-bzr, and although they are relatively new, they have proven to be quite stable and solid, yet they are only part of the “contrib” area side by side with much simpler and way less solid scripts.

In order these in Git mainline you might need a bit of tinkering, and it’s not straight-forward to package them for distributions.

With git-fc they are installed by default, and in the right way, making things easier for distributions.

Improvements to the transport helper

The two way bridges between Git and Mercurial/Bazaar already work quite well, but they lack some features, specifically you cannot do –force, or –dry-run, or use an old:new refspec. If you are not familiar with the old:new refspec; you can do ‘git push master:my-master’, which would push your ‘master’ branch, as if it was named ‘my-master’ in the remote repository.

This is extremely useful if you are really serious about using Git as a transparent client to access a Mercurial repository.

New core.mode configuration

Git is already preparing users for the v2.0 release which would bring minor backward compatibility breakage, but some people would rather get rid of the warnings which are going to stay probably for many releases more and just move to the new behavior already.

Testing Git v2.0 behavior today would not only help git-fc, but also the Git mainline, and you can do that by setting core.mode = next, so if you do this and provide feedback about any issues, that would be greatly appreciated. Unfortunately you cannot test the v2.0 behavior in Git mainline because they rejected the patches, but you can in git-fc.

Please note that the v2.0 behavior might change in the future, before v2.0 is released, so if you enable this mode you need to be aware of that. Chances are you are not going to notice any difference anyway.

In addition to the “next” (v2.0) mode, there’s the “progress” mode. This mode enables “next” plus other configurations that have been proposed to change by default in v2.0, but hasn’t yet been agreed.

In particular, you get these:

merge.defaulttoupstream = true
branch.autosetupmerge = always
mergetool.prompt = false

There might be more in the future, and suggestions are welcome.

It is recommended that you setup this mode for git-fc:

git config --global core.mode progress

Non-ff pulls rejected by default

Even in the Git project everybody has agreed this is the way to go in order to avoid the typical Git newbie making the mistake of doing a merge, when perhaps (s)he wanted to do git reset, or git rebase. With this change git complains that that a non-fast-forward branch is being pulled, so the user has to decide what to do.

The user would have to do either ‘git pull --merge‘ or ‘git pull --rebase‘, the former being what Git mainline currently does.

The user can of course choose the old behavior, which is easy to configure:

git config --global pull.mode merge

Official staging area

Everybody already uses the term “staging area” already, and Git developers also agreed it the best term to what is officially referred to as “the index”. So git-fc has new options for all commands that modify the staging area (e.g. git grep –staged, git rm –staged), and also adds a new git stage command that makes it easier to work with the staging area.

'git stage' [options] [--] [...]
'git stage add' [options] [--] [...]
'git stage reset' [-q|--patch] [--] [...]
'git stage diff' [options] [] [--] [...]
'git stage rm' [options] [--] [...]
'git stage apply' [options] [--] [...]
'git stage edit'

Without any command, git stage adds files to the stage, same as git add, same as in Git mainline.

New fetch.default configuration

When you have configured the upstream tracking branch for all your branches, you will probably have tracking branches that point to a local branch, for example feature-a pointing to master, in which case you would get something like:

% git fetch
From .
 * branch            master     -> FETCH_HEAD

Which makes absolutely no sense, since the ‘.’ repository is not even documented, and FETCH_HEAD is a marginally known concept. In this case git fetch is basically doing nothing from the user’s point of view.

So the user can configure fetch.default = simple to get a simple sensible default; ‘git fetch‘ will always use origin by default, which is not ideal for everyone, but it’s better than the current alternative.

If you use the “progress” mode, this option is also enabled.

Publish tracking branch

Git mainline doesn’t have the greatest support for triangular workflows, a good solution for that is to introduce a second “upstream” tracking branch which is for the reverse; the branch you normally push to.

Say you clone a repository (libgit2) in GitHub, then create a branch (feature-a) and push it to your personal repository, you would want to track two branches (origin/master), and (mine/feature-a), but Git mainline only provides support for a single upstream tracking branch.

If you setup your upstream tracking branch to origin/master, then you can just do git rebase without arguments and git will pick the right branch (origin/master) to rebase to. However, git push by default will also try to push to origin/master, which is not what you want. Plus git branch -v will show how ahead/behind your branch is compared to origin/master, not mine/feature-a.

If you set up your upstream to mine/feature-a, then git push will work, but git rebase won’t.

With this option, git rebase uses the upstream branch, and git push uses the publish branch.

Setting the publish tracking branch is easy:

git push --set-publish mine feature-a

Or:

git branch --set-publish mine/feature-a

And git branch -v will show it as well:

fc/branch/fast      177dcad [master, gh/fc/branch/fast] branch: ...
fc/stage            abb6ad5 [master, gh/fc/stage] completion: ...
fc/transport/improv eb4d3c7 [master, gh/fc/transport/improv] ...

Support for Ruby

By far the most complex and interesting feature, but unfortunately also the one that is not yet 100% complete.

There is partial optional support for Ruby. Git already has tooling so any language can use it’s plumbing and achieve plenty of tasks:

IO.popen(%w[git for-each-ref]) do |io|
io.each do |line|
sha1, kind, name = line.split()
# stuff
end
end

However, this a) requires a process fork, and b) requires I/O communication to get the desired data. While this is not a big deal on many systems, it is in Windows systems where forks are slow, and many Git core programs don’t work as well as they do in Linux.

Git has a goal to replace all the core scripts with native C versions, but it’s a goal only in name that is not actually pursued. In addition, that still leaves out any third party tools since Git doesn’t provide a shared libgit library, which is why an independent libgit2 was needed in the first place.

Ruby bindings solve these problems:

for_each_ref() do |name, sha1, flags|
# stuff
end

The command ‘git ruby‘ can use this script by providing the bindings for many Git’s internal C functions (though not all), which makes it easier to write Ruby programs that take full advantage of Git without any need of forks, or I/O communication.

Conclusion

As you might guess, I’ve spent a lot of time working on all these features, plus all the ones that are already merged in Git’s mainline. Hopefully they are useful to some people.

It’s easy to compile and install:

make install

By default git will be installed in your home directory, but you can also do what I do: ‘make prefix=/opt/git install‘, and add ‘/opt/git/bin’ to your $PATH. All you need is a few development packages; zlib, curl, expat, openssl.

The code is in Github, the home page is in Google code, and the mailing list in Google groups. All comments and patches are welcome.

You can find future comments and releases in this blog, under the git-fc tag.

git-fc

The Linux way; never ever break user experience

Through the years it has become more and more obvious to me that there’s two camps in open source development, and one camp is not even aware of how the other camp works (or succeeds, rather), often to their own detriment. This was blatantly obvious in Miguel de Icaza’s blog post What Killed the Linux Desktop, in which he accused Linus Torvalds for setting the attitude of breaking API’s to the developer’s heart content without even realizing that they (Linux kernel developers) do the exact opposite of what he claimed; Linux never ever breaks user-space API. This triggered a classic example of many thorned discussions between the two camps, which illustrates how one side doesn’t have a clue of how the other side operates, or even; that there’s entirely different way of doing things. I will name the camps “The Linux guys” (even though they don’t work strictly on the Linux kernel), and “The user-space guys”, which is people that work on user-space, GNOME being one of the peak examples.

This is not an attempt to put people in two black-and-white categories, there’s a spectrum of behaviors, and surely you can find people in a group with the mentality of the other camp. Ultimately it’s you the one that decides if there’s a divide in attitudes or not.

The point of this post is to explain the number one rule of Linux kernel development; never ever break user experience, why that is important, and how far off the user-space camp is.

When I say “Linux” I’m referring to the Linux kernel, because that’s the name of the kernel project.

The Linux way

There are many exhaustive details of what makes the Linux kernel development work on a day-to-day basis, and many reasons for why it is the way it is, and why it’s good and desirable to work in that way. But for now I will simply concentrate on what it is.

Never ever break user experience. This point cannot be stressed enough. Forget about development practices, forget about netiquette, forget about technical competence… what is important is the users, and to never break their expectations. Let me quote Linus Torvalds:

The biggest thing any program can do is not the technical details of the program itself; it’s how useful the program is to users.

So any time any program (like the kernel or any other project), breaks the user experience, to me, that’s the absolute worst failure that a software project can make.

This is a point that is so obvious, and yet many projects (especially big ones), often forget; that they are nothing without their user-base. If you start a small project all by yourself, you are painfully aware that in order for your project to succeed, you need users. Without users you cannot get more developers, and your project could very well disappear from the face of the Earth, and nobody would notice. But once your project is big enough, one or two users complaining about something start being less of an issue, in fact, hundreds of them might be insignificant, and at some point, you loose any measure of what percentage of users are complaining, this problem might grow to the point that developers say “users don’t know what they want”, in order to ignore the importance of users, and their needs.

But that’s not the Linux way; it doesn’t matter if you have one user, or ten, or millions, your project still succeeds (or fails) because of the same reason; it’s useful to some people (or not). And if you break user experience, you risk that usefulness, and you risk your project being irrelevant for your users. That is not good.

Of course, there are compromises, sometimes you can do a bit of risk analysis: OK; this change might affect 1% of our current users, and the change would be kind of annoying, but it would make the code so much more maintainable; let’s go for it. And it’s all about to where you draw the line. Sometimes it might be OK to break user experience, if you have good reasons for it, but you should really try to avoid it, and if you go forward, provide an adjustment period, a configuration for the old behavior, and even involve your users in the whole process to make sure the change is indeed needed, and their discomfort is minimized.

At the end of the day it’s all about trust. I use project X not only because it works for me, but because I trust that it will keep working for me in the years to come. If for some reason I expected it to break next year, I might be better off looking for something else right now that I trust I could keep relying on indefinitely, than having project X break on me while I’m on the middle of a deadline, and I don’t have time for their shenanigans.

Obvious stuff, yet many project don’t realize that. One example is when the udidks2 project felt they should change the address of the mount directories from `/media/foo`, to `/run/media/$user/foo`. What?! I’m in the middle of something important, and all of a sudden I can’t find my disks’ content in /media? I had to spend a considerable amount of time until I found the reason; no, udisks2 didn’t had a bug; they introduced this change willingly and knowingly. They didn’t give any deprecation warning while they moved to the new location, they didn’t have an option to keep the old behavior, they just moved it, with no explanation, in one single commit (here), from one version to the next. Am I going to keep using their project? No. Why would I? Who knows when would be the next time they decide to break some user experience unilaterally without deprecation warnings or anything? The trust is broken, and many others agree.

How about the Linux kernel? When was the last time your Linux kernel failed you in some way that it was not a bug, but that the developers knowingly and willingly broke things for you? Can’t think of any? Me neither. In fact, people often forget about the Linux kernel, because it just works. The external drivers (like NVIDIA or AMD) is not a problem of the kernel, but the drivers themselves, and I will explain later on. You have people bitching about all kinds of projects, and threatening forks, and complaining about the leadership, and whatnot. None of that happens with the Linux kernel. Why? Because it just works. Not for me, not for 90% of the users, for everybody (or 99.99% of everybody).

Because they never ever break user experience. Ever. Period.

The deniers

Miguel de Icaza, after accusing Linus not maintaining a stable ABI for drivers, went on arguing that it was kernel developers’ fault of spreading attitudes like:

We deprecated APIs, because there was a better way. We removed functionality because “that approach is broken”, for degrees of broken from “it is a security hole” all the way to “it does not conform to the new style we are using”.

What part of “never ever break user experience” didn’t Icaza understand? It seems he only mentions the internal API, which does change all the time in the Linux kernel, and which has never had any resemblance of a promise that it wouldn’t (thus the “internal” part), and ignoring the public user-space API, which does indeed never break, which is why you, as a user, don’t have to worry about your user-space not working on Linux v3.0, or Linux v4.0. How can he not see that? Is Icaza blind?

Torvalds:

The gnome people claiming that I set the “attitude” that causes them problems is laughable.

One of the core kernel rules has always been that we never ever break any external interfaces. That rule has been there since day one, although it’s gotten much more explicit only in the last few years. The fact that we break internal interfaces that are not visible to userland is totally irrelevant, and a total red herring.

I wish the gnome people had understood the real rules inside the kernel. Like “you never break external interfaces” – and “we need to do that to improve things” is not an excuse.

Even after Linus Torvalds and Alan Cox explained to him how the Linux kernel actually works in a Google+ thread, he didn’t accept anything.

Lennart Poettering being face to face with both (Torvalds and Cox), argued that this mantra (never break user experience) wasn’t actually followed (video here). Yet at the same time his software (the systemd+udev beast) recently was criticized for knowingly and willingly breaking user experience by making the boot hang for 30s per device that needed firmware. Linus’ reply was priceless (link):

Kay, you are so full of sh*t that it’s not funny. You’re refusing to
acknowledge your bugs, you refuse to fix them even when a patch is
sent to you, and then you make excuses for the fact that we have to
work around *your* bugs, and say that we should have done so from the
very beginning.

Yes, doing it in the kernel is “more robust”. But don’t play games,
and stop the lying. It’s more robust because we have maintainers that
care, and because we know that regressions are not something we can
play fast and loose with. If something breaks, and we don’t know what
the right fix for that breakage is, we *revert* the thing that broke.

So yes, we’re clearly better off doing it in the kernel.

Not because firmware loading cannot be done in user space. But simply
because udev maintenance since Greg gave it up has gone downhill.

So you see, it’s not that GNOME developers understand the Linux way and simply disagree that’s the way they want to go, it’s that they don’t even understand it, even when it’s explained to them directly, clearly, face to face. This behavior is not exclusive to GNOME developers, udisks2 is another example, and there’s many more, but probably not as extreme.

More examples

Linus Torvalds gave Kay a pretty hard time for knowingly and willingly introducing regressions, but does Linux fares better? As an example I can think of a regression I found with Wine, after realizing the problem was in the kernel, I bisected the commit that introduced the problem and notified Linux developers. If this was udev, or GNOME, or any other crappy user-space software, I know what their answer would be: Wine is doing something wrong, Wine needs to be fixed, it’s Wine’s problem, not ours. But that’s not Linux, Linux has a contract with user-space and they never break user experience, so what they did is to revert the change, even though it made things less-than-ideal on the kernel side, that’s what was required so you, the user, doesn’t experience any breakage. The LKML thread is here.

Another example is what happened when Linux moved to 3.0; some programs expected a 2.x version, or even 2.6.x, these programs were clearly buggy, as they should check that the version is greater than 2.x, however, the bugs were already there, and people didn’t want to recompile their binaries, and they might not even be able to do that. It would be stupid for Linux to report 2.6.x, when in fact it’s 3.x, but that’s exactly what they did. They added an option so the kernel would report a 2.6.x version, so the users would have the option to keep running these old buggy binaries. Link here.

Now compare the switch to Linux 3.0 which was transparent and as painless as possible, to the move to GNOME 3. There couldn’t be a more perfect example of a blatant disregard to current user experience. If your workflow doesn’t work correctly in GNOME 3… you have to change your workflow. If GNOME 3 behaves almost as you would expect, but only need a tiny configuration… too bad. If you want to use GNOME 3 technology, but you would like a grace period while you are able to use the old interface, while you adjust to the new one… sucks to be you. In fact, it’s really hard to think of any way in which they could have increased the pain of moving to GNOME 3. And when users reported their user experience broken, the talking points were not surprising: “users don’t know what they want”, “users hate change”, “they will stop whining in a couple of months”. Boy, they sure value their users. And now they are going after the middle-click copy.

If you have more examples of projects breaking user experience, or keeping it. Feel free to mention them in the comments.

No, seriously, no regressions

Sometimes even Linux maintainers don’t realize how important this rule is, and in such cases, Linus doesn’t shy away from explaining it to them (link):

Mauro, SHUT THE FUCK UP!

It’s a bug alright – in the kernel. How long have you been a
maintainer? And you *still* haven’t learnt the first rule of kernel
maintenance?

If a change results in user programs breaking, it’s a bug in the
kernel. We never EVER blame the user programs. How hard can this be to
understand?

> So, on a first glance, this doesn’t sound like a regression,
> but, instead, it looks tha pulseaudio/tumbleweed has some serious
> bugs and/or regressions.

Shut up, Mauro. And I don’t _ever_ want to hear that kind of obvious
garbage and idiocy from a kernel maintainer again. Seriously.

I’d wait for Rafael’s patch to go through you, but I have another
error report in my mailbox of all KDE media applications being broken
by v3.8-rc1, and I bet it’s the same kernel bug. And you’ve shown
yourself to not be competent in this issue, so I’ll apply it directly
and immediately myself.

WE DO NOT BREAK USERSPACE!

The fact that you then try to make *excuses* for breaking user space,
and blaming some external program that *used* to work, is just
shameful. It’s not how we work.

Fix your f*cking “compliance tool”, because it is obviously broken.
And fix your approach to kernel programming.

And if you think that was an isolated incident (link):

Rafael, please don’t *ever* write that crap again.

We revert stuff whether it “fixed” something else or not. The rule is
“NO REGRESSIONS”. It doesn’t matter one whit if something “fixes”
something else or not – if it breaks an old case, it gets reverted.

Seriously. Why do I even have to mention this? Why do I have to
explain this to somebody pretty much *every* f*cking merge window?

This is not a new rule.

There is no excuse for regressions, and “it is a fix” is actually the
_least_ valid of all reasons.

A commit that causes a regression is – by definition – not a “fix”. So
please don’t *ever* say something that stupid again.

Things that used to work are simply a million times more important
than things that historically didn’t work.

So this had better get fixed asap, and I need to feel like people are
working on it. Otherwise we start reverting.

And no amount “but it’s a fix” matters one whit. In fact, it just
makes me feel like I need to start reverting early, because the
maintainer doesn’t seem to understand how serious a regression is.

Compare and contrast

Now that we have a good dose of examples it should be clear how the difference in attitudes from the two camps couldn’t be more different.

In the GNOME/PulseAudio/udev/etc. camp, if a change in API causes a regression on the receiving end of that API, the problem is in the client, and the “fix” is not reverted, it stays, and the application needs to change, if the user suffers as a result of this, too bad, the client application is to blame.

In the Linux camp, if a change in API causes a regression, Linux has a problem, the change is not a “fix”, it’s a regression and it must be reverted (or otherwise fixed), so the client application doesn’t need to change (even though it probably should), and the user never suffers as a result. To even hint otherwise is cause for harsh public shaming.

Do you see the difference? Which of the two approaches do you think is better?

What about the external API?

Linux doesn’t support external modules, if you use an external module, you are own your own. They have good reasons for this; all modules can and should be part of the kernel, this makes maintenance easy for everybody.

Each time an internal API needs to be changed, the person that does the change can do it for all the modules that are using that API. So if you are a company, let’s say Texas Instruments, and you manage to get your module into the Linux mainline, you don’t have to worry about API changes, because they (Linux developers), would do the updates for you. This allows the internal API to always be clean, consistent, relevant, and useful. As an example of a recent change, Russell King (the ARM maintainer), introduced a new API to set the DMA mask, and in the process updated all users of dma_set_mask() to use the new function dma_set_mask_and_coherent(), and by doing that found potential bugs in many instances. So companies like Intel, NVIDIA, and Texas Instruments, benefit from cleaner and more robust code without moving a finger, Rusell did it all in his 51 patch series.

In addition, by having all the modules on the same source tree, when a generic API is to be added, it’s easy to consider all possible use-cases, because the code is readily available. An example of this is the preliminary Common Display Framework, which takes into consideration drivers from Renesas, NVIDIA, Samsung, Texas Instruments, and other Linaro companies. After this framework is done, all existing display drivers would benefit, but things would be specially easier for future generations of drivers. It’s only because of this refactoring that the amount of drivers supported by Linux can grow without the amount of code exploding uncontrollably, which is one of the advantages Linux has over Windows, OSX, and other operating systems’ kernels.

If companies don’t play along in this collaborate effort, like is the case with NVIDIA’s and AMD’s proprietary drivers, is to their own detriment, and there’s nobody to blame but those companies. Whenever you load one of these drivers, Linux goes immediately into a tainted mode, which means that if you find problems with the Linux kernel, Linux developers cannot help you. It’s not that they don’t want to help, but it’s that they might be physically incapable. If a closed-source module has a bug and corrupts memory on the kernel side, there is no way to find that out, and might show as some other module, or even the core itself crashing. So if a Linux developer sees a crash say, on a wireless driver, but the kernel is tainted, there is only so much he can do before deciding it’s not worth his time to investigate this issue which has a good chance of being caused by a proprietary driver.

Thus if a Linux update broke your NVIDIA driver, blame NVIDIA. Or even better, don’t use the proprietary driver, use noveau.

Conclusion

Hopefully after reading this article it would be clear to you what is the number one rule of Linux kernel development, why it is a good rule, and why other projects should follow it.

Unfortunately it should also be clear that other projects, particularly those related to GNOME, don’t follow it, and why that causes such backlash, controversy, and forks.

In my opinion there’s no hope in GNOME, or any other user-space project, being nearly as successful as Linux if they don’t follow the simplest most important rule. Linux will always keep growing in importance and development power, and these others are forever doomed to forks, nearly identical alternatives, and their developers jumping ship after their trust gets broken. If only they would follow this simple rule, or at least understand it.

Bonus video

Tux