Demystifying the init system (PID 1)

With all the talk about debian choosing a default init system (link, link), I’ve decided to share with the world a little project I’ve been working on to help me understand /sbin/init aka. PID 1.

In this blog post I will go step by step showing what an init system must do to be functional. I will ignore all the legacy SysVinit stuff, and technical nuances, and just concentrate on what’s really important.

Introduction

First of all, what is ‘init‘? In it’s essence it’s a process that must be running at all times, if this process ends, the kernel enters into a panic mode, after which you cannot do anything else, except rebooting.

This process doesn’t need to do anything special, you can use /bin/sh as your init, or even /bin/yes (although the latter wouldn’t be very useful).

So let’s write our very first init.

#!/usr/bin/ruby
Process.spawn('agetty', 'tty1')
sleep

Believe it or not, this is actually a rather useful init. How useful it is depends on how your kernel was compiled, your partitioning scheme, and if your root file-system is mounted rw or not. But either way, it covers the basics: rule #1; always keep running no matter what.

This is almost true, except that we need to be listening for SIGCHLD, otherwise some processes wouldn’t be cleaned up properly, so:

Signal.trap(:SIGCHLD) do
  loop do
    begin
      status = Process.wait(-1, Process::WNOHANG)
      break if status == nil
    rescue Errno::ECHILD
      break
    end
  end
end

Reboot

Now that we have the running indefinitely under control, it’s time to stop running (only when requested), but in order to do that we need some kind of IPC with the running process. There’s many ways to achieve this, but I chose UNIX sockets to do that.

So instead of sleeping forever, we listen for commands issued to /run/initctl:

begin
  server = UNIXServer.open('/run/initctl')
rescue Errno::EADDRINUSE
  File.delete('/run/initctl')
  retry
end

loop do
  ctl = server.accept
  cmd = ctl.readline.chomp.to_sym
  # do stuff
end

And when the user is calling us with arguments, we pass those commands through /run/initctl.

def do_cmd(*cmd)
  ctl = UNIXSocket.open('/run/initctl')
  ctl.puts(cmd.join(' '))
  puts(ctl.readline.chomp)
  exit
end

case ARGV[0]
when 'poweroff', 'restart', 'halt'
  do_cmd(ARGV[0].to_sym)
end

So can issue the command init poweroff to turn off the machine, but in order to do that we need to tell the kernel:

def sys_reboot(cmd)
  map = { poweroff: 0x4321fedc, restart: 0x01234567, halt: 0xcdef0123 }
  syscall(169, 0xfee1dead, 537993216, map[cmd])
end

These numbers are not important, what is important is that the kernel understands them, and with this we actually turn off the machine (or halt, or reboot).

Thread carefully

Obviously it would be tedious to type a bunch of commands each time the machine starts, so we need to actually do stuff after booting, however, if we do something wrong, we might render the system unusable. A simple way to solve this is to use scripts, fork a shell, and let it run those, so if there’s something wrong with the scripts, the shell dies, but not PID 1, so the system remains usable, which again, is rule #1.

Fortunately Ruby has exceptions, so we can run code with a safety net that catches all exceptions, and there’s no need to fork, which would waste precious booting time.

def action(name)
  print(name)
  begin
    yield
  rescue => e
    print(' (error: %s)' % e)
  end
  puts
end

With this helper, we can safely run chunks of code, and if they fail, the error is reported to the user.

Initialization

This is the bulk of the code; the instructions you don’t want to type every time. This is mostly tedious stuff, you can skim or skip this section safely.

def mount(type, device, dir, opts)
  Dir.mkdir(dir) unless File.directory?(dir)
  system('mount', '-t', type, device, dir, '-o', opts)
end

action 'Mounting virtual file-systems' do
  mount('proc', 'proc', '/proc', 'nosuid,noexec,nodev')
  mount('sysfs', 'sys', '/sys', 'nosuid,noexec,nodev')
  mount('tmpfs', 'run', '/run', 'mode=0755,nosuid,nodev')
  mount('devtmpfs', 'dev', '/dev', 'mode=0755,nosuid')
  mount('devpts', 'devpts', '/dev/pts', 'mode=0620,gid=5,nosuid,noexec')
  mount('tmpfs', 'shm', '/dev/shm', 'mode=1777,nosuid,nodev')
end

And set the hostname.

action 'Setting hostname' do
  hostname = File.read('/etc/hostname').chomp
  File.write('/proc/sys/kernel/hostname', hostname)
end

Notice that many things can go wrong, for example the file ‘/etc/hostname’ might not exist, however, that would cause an exception, and our init would continue just fine.

Another thing we would want to do is kill all the processes, otherwise we might not be able to unmount the file-systems. We could do killall5, but we wouldn’t have much control over the processes, and that would require a fork. Instead we can rely on the kernel to do the right thing, and all we have to do is wait for the results.

def killall

  def allgone?()
    Dir.glob('/proc/*').each do |e|
      pid = File.basename(e).to_i
      begin
        next if pid < 2
        # Is it a kernel process?
        next if File.read('/proc/%i/cmdline' % pid).empty?
      rescue Errno::ENOENT
      end
      return false
    end
    return true
  end

  def wait_until(timeout = 2, interval = 0.25)
    start = Time.now
    begin
      break true if yield
      sleep(interval)
    end while (Time.now - start) < timeout
  end

  ok = false

  action 'Sending SIGTERM to processes' do
    Process.kill(:SIGTERM, -1)
    ok = wait_until(10) { allgone? }
    raise 'Failed' unless ok
  end

  return if ok

  action 'Sending SIGKILL to processes' do
    Process.kill(:SIGKILL, -1)
    ok = wait_until(15) { allgone? }
    raise 'Failed' unless ok
  end

end

Time to mount real file-systems:

NETFS = %w[nfs nfs4 smbfs cifs codafs ncpfs shfs fuse fuseblk glusterfs davfs fuse.glusterfs]
VIRTFS = %w[proc sysfs tmpfs devtmpfs devpts]

action 'Mounting local filesystems' do
  except = NETFS.map { |e| 'no' + e }.join(',')
  system('mount', '-a', '-t', except, '-O', 'no_netdev')
end

# On shutdown

action 'Unmounting real filesystems' do
  except = (NETFS + VIRTFS).map { |e| 'no' + e }.join(',')
  system('umount', '-a', '-t', except, '-O', 'no_netdev')
end

If you are using a modern distribution, chances are your /run and /tmp directories are cleared up on every boot, so many files and directories need to be re-created. We could do this by hand, but we could also use the systemd-tmpfiles utility which uses the configuration already provided by your distribution in tmpfiles.d directories.

action 'Manage temporary files' do
  system('systemd-tmpfiles', '--create', '--remove', '--clean')
end

begin
  File.delete('/run/nologin')
rescue Errno::ENOENT
end

Unless you are using a custom kernel with modules built-in, chances are you are going to need udev, so fire it up:

action 'Starting udev daemon' do
  system('/usr/lib/systemd/systemd-udevd', '--daemon')
end

action 'Triggering udev uevents' do
  system('udevadm', 'trigger', '--action=add', '--type=subsystems')
  system('udevadm', 'trigger', '--action=add', '--type=devices')
end

action 'Waiting for udev uevents to be processed' do
  system('udevadm', 'settle')
end

# On shutdown

action 'Shutting down udev' do
  system('udevadm', 'control', '--exit')
end

Finally

After all this initialization stuff, your system is most likely very usable already, and in fact I was able to start a display manager (SLiM) at this point, which was my main goal while writing this. But we are just getting started.

In control

Another thing init should do is keep track of launched daemons. Each time we do that we store the PID, and when the child exists, we remove it from the list.

def start(id, cmd)
  $daemons[id] = Process.spawn(*cmd)
end

start('agetty1', %w[agetty tty1])

# On SIGCHLD
key = $daemons.key(status)
$daemons.delete(key) if key

Once we have this it’s trivial to report the status of them (e.g. init status agetty1).

ctl.puts($daemons[args.first] ? 'ok' : 'dead')

At this point we actually have a feature that SysVinit doesn’t have. Not bad for 200 lines of code!

cgroups

cgroups is a feature that is often misunderstood, probably because there are no good tools to make use of them, but they are not that hard. Lennart Pottering went to a lot of trouble trying to explain exactly what systemd does with them and it does not, but I don’t think he did a very good job of clarifying anything. Basically systemd is not doing anything with them Normally systemd is not doing anything with them (by default), simply labeling processes so you can see how they are grouped by using visualization tools like systemd-cgls, but that’s it.

The single most important way you can take advantage of cgroups is for scheduling purposes, so for example your web browser is a control group, and your heavy compilation is in another, then Linux scheduler would isolate the two processes from stealing resources from each other without the need of adjusting the nice level. Basically with cgroups there’s no need for nice (although you can use alongside).

But you don’t have to move a finger to get this benefit, the kernel already does that if you have CONFIG_SCHED_AUTOGROUP, which you should. Then, cgroups would be created for each session in the system, if you don’t know what sessions are, you can run ‘ps f -eo pid,sid,cmd‘ to find out to which session id each process belongs to.

To prove this I wrote a little script that finds out the auto-grouping as reported by the Linux kernel, and you can find groups like:

------------------------------------------------------------------------------
503	slim -nodaemon
895	/bin/sh /etc/xdg/xfce4/xinitrc -- /etc/X11/xinit/xserverrc
901	dbus-launch --sh-syntax --exit-with-session
938	xfce4-session
948	xfwm4
952	xfce4-panel
954	Thunar --daemon
956	xfdesktop
958	conky -q
964	nm-applet
------------------------------------------------------------------------------

This is exactly what you would expect, the session leader (SLiM) starts a bunch of processes, and all of them belong to the same session, and if I compile a Linux kernel, I get:

------------------------------------------------------------------------------
14584	zsh
17920	make
20610	make -f scripts/Makefile.build obj=arch/x86
20661	make -f scripts/Makefile.build obj=kernel
20715	make -f scripts/Makefile.build obj=mm
20734	make -f scripts/Makefile.build obj=arch/x86/kernel
20736	make -f scripts/Makefile.build obj=fs
20750	make -f scripts/Makefile.build obj=arch/x86/kvm
20758	make -f scripts/Makefile.build obj=arch/x86/mm
21245	make -f scripts/Makefile.build obj=ipc
21274	make -f scripts/Makefile.build obj=security
21281	make -f scripts/Makefile.build obj=security/keys
21376	/bin/sh -c set -e; 	   echo '  CC      mm/mmu_context.o'; ...
21378	gcc -Wp,-MD,mm/.mmu_context.o.d ...
21387	/bin/sh -c set -e; 	   echo '  CC      ipc/msg.o'; ...
21390	gcc -Wp,-MD,ipc/.msg.o.d ...
21395	/bin/sh -c set -e; 	   echo '  CC      kernel/extable.o'; ...
21399	/bin/sh -c set -e; 	   echo '  CC [M]  arch/x86/kvm/pmu.o'; ...
21400	gcc -Wp,-MD,kernel/.extable.o.d ...
21403	gcc -Wp,-MD,arch/x86/kvm/.pmu.o.d .
21405	/bin/sh -c set -e; 	   echo '  CC      arch/x86/kernel/probe_roms.o'; ...
21407	gcc -Wp,-MD,arch/x86/kernel/.probe_roms.o.d ...
21413	/bin/sh -c set -e; 	   echo '  CC      fs/inode.o'; ...
21415	/bin/sh -c set -e; 	   echo '  CC      arch/x86/mm/srat.o'; ...
21418	/bin/sh -c set -e; 	   echo '  CC      security/keys/keyctl.o'; ...
------------------------------------------------------------------------------

This group will contain a lot of processes that take a lot of resources, but the scheduler knows they belong to the same group. If somebody logs in to my machine and starts running folding@home we would have two cgroups trying to use 100% of the CPU, so the scheduler would assign 50% to one, and 50% to the other, even though the first one has many more processes. Without the grouping, the scheduler would be unfair against folding@home, giving it as much time as it gives each one of the compilation processes.

All this without you moving a finger. Well, almost.

def start(id, cmd)
  pid = fork do
    Process.setsid()
    exec(*cmd)
  end
  $daemons[id] = pid
end

Socket activation

systemd has made a lot of fuss about socket activation, and how it’s the next best thing after sliced bread. I agree it’s a great idea, but the idea didn’t come from systemd, AFAIK it came from OSX. But, do we need systemd to get the same in Linux?

def start_with_socket(id, stream, cmd)

  server = TCPServer.new(stream)

  Thread.new do
    loop do
      socket = server.accept
      system(*cmd, :in => socket, : out => socket)
    end
  end

end

start_with_socket('sshd', 22, %w[/usr/bin/sshd -i])

Believe it or not, this simple code achieves socket activation. We create a socket, and a new thread that waits for connections, if nobody connects, nothing happens, we have an idle thread, each time somebody connects, we launch ssh -i, which as far as I can tell is the same thing xinetd does, and systemd.

But hey, this is the simple socket activation, it’s not the really fancy one.

Thread.new do
  if managed
    IO.select([server])
    pid = fork do
      env = {}
      env['LISTEN_PID'] = $$.to_s
      env['LISTEN_FDS'] = 1.to_s
      Process.setsid()
      exec(env, *cmd, 3 => server)
    end
    $daemons[id] = pid
  else
    loop do
      socket = server.accept
      system(*cmd, :in => socket, : out => socket)
    end
  end
end

There, this does exactly the same thing as systemd (at least for one socket, multiple ones are easy too), so yeah, we have socket activation.

But wait, there’s more

Hopefully this covers the basics of what an init system should do, and how it’s not rocket science, nor voodoo. It is actually something very straightforward; start the system, keep it running, simple. Of course there’s many other things an operating system should do, but those things don’t belong to the init system, don’t let anyone tell you otherwise.

I have more changes on top of this that bring my little toy init system almost up-to-par to Arch Linux’s initscripts, which is what they used before moving to systemd, so chances are if you use my init, you would have little to no problems in your own system.

Unlike systemd and others, this code is actually very readable, so you can add and remove code as you like very easily, and of course, the less code you have, the faster you boot.

Personally when I hear somebody saying “Oh! but OpenRC doesn’t have socket activation, we need systemd!”, I just roll my eyes.

If you want to give it a try, get the code from GitHub:

https://github.com/felipec/finit

Cheers.

finit

Announcing git-fc; a friendly fork of Git

I’ll start with the obvious question; why a fork? Well, the short answer is; my patches are not being applied, the long answer is convoluted and would require long explanation of how Git development works, principles and guidelines, but more importantly the culture of the core developers, and I’m not going to get into that, maybe in the comments section if somebody is interested.

So what is git-fc? It is a friendly fork, and by that I mean that it’s a fork that won’t deviate from the mainline, it is more like a branch in Git terms. This branch will move forward close to Git’s mainline, and it could be merged at any point in time, if the maintainer wished to do so.

git-fc doesn’t include experimental code, or half-assed features, so you can expect the same level of stability as Git’s mainline. Also, it doesn’t remove any feature, or do any backwards incompatible changes, so you can replace git with git-fc and you wouldn’t notice the difference. The delta comes in the extra features that I’ll describe in detail below, that is all.

Who am I? I’ve contributed many patches to Git, mainly the git-remote-hg/bzr two-way bridges, but many many other things. Here’s a list of the top 10 contributors to Git since last year by number of patches:

% git shortlog --since='1 year ago' --no-merges -n -s | head -n 10
   388	Junio C Hamano
   308	Felipe Contreras
   230	Jeff King
   161	Nguyễn Thái Ngọc Duy
   122	Michael Haggerty
   103	Ramkumar Ramachandra
    96	John Keeping
    69	Eric Sunshine
    59	Thomas Rast
    51	René Scharfe

More info in ohloh.

As you see, I’ve done a lot of work for Git’s mainline, so chances are you have already benefited from my code one way or the other.

However, the most interesting patches are not merged. I wrote a summary of my 160 patches, explaining their status, so Git developers would prioritize them, but I think it’s fair to say they are just not going to apply them.

So, what do you get if you use git-fc?

@ shortcut

Many people have suggested a shortcut for the non-particularly-intuitive “HEAD”, but none of these suggestions seemed very appealing, or feasible.

Because Git already has an ref@op revision syntax, where if you remove the ref, HEAD is implied, I thought @ could be thought as HEAD.

This change was welcome and accepted by the Git mainline, and it even was on track for v1.8.4 but it was dropped last minute because of some issues that are fixed now, and you probably will see it in v1.8.5. But why wait? :)

Nice ‘branch -v’

If you have configured the upstream tracking branch for your branches (I wrote a blog post about them), when you do ‘git branch -v’ you see something like this:

  fc/branch/fast      177dcad [ahead 2] branch: reorganize verbose options
  fc/stage            abb6ad5 [ahead 14] completion: update 'git reset' ...
  fc/transport/improv eb4d3c7 [ahead 10] transport-helper: don't update ...

While that provides useful information, it doesn’t show the upstream tracking branch, just says “ahead 2″ but “ahead 2″ compared to what?

If you do ‘git branch -vv’, then you see the answer:

  fc/branch/fast      177dcad [master: ahead 2] branch: reorganize ...
  fc/stage            abb6ad5 [master: ahead 14] completion: update ...
  fc/transport/improv eb4d3c7 [master: ahead 10] transport-helper: don't ...

Unfortunately both options take a lot of time (relative to most Git commands which are instantaneous), because computing the “ahead 2″ takes a lot of time. So I decided to switch things around, so ‘git branch -v’ gives you:

  fc/branch/fast      177dcad [master] branch: reorganize verbose options
  fc/stage            abb6ad5 [master] completion: update 'git reset' new ...
  fc/transport/improv eb4d3c7 [master] transport-helper: don't update refs ...

And it does so instantaneously.

Default aliases

Many (if not all) version control system tools have shortcuts for their most common operations; hg ci, svn co, cvs st. But not Git. You can configure your own aliases manually, but you might have some trouble if you use somebody else’s machine.

Adding default aliases is trivial, it helps everyone, and it doesn’t hurt anyone, yet the patch to do so was rejected.

For now, there are only four aliases, but more can be added later if they are requested.

co = checkout
ci = commit
rb = rebase
st = status

If you have already these aliases, or mapped to something else, your aliases would take precedence over the default ones, so you won’t have any problems.

Streamlined remote helpers

I have spent a lot of time working on git-remote-hg and git-remote-bzr, and although they are relatively new, they have proven to be quite stable and solid, yet they are only part of the “contrib” area side by side with much simpler and way less solid scripts.

In order these in Git mainline you might need a bit of tinkering, and it’s not straight-forward to package them for distributions.

With git-fc they are installed by default, and in the right way, making things easier for distributions.

Improvements to the transport helper

The two way bridges between Git and Mercurial/Bazaar already work quite well, but they lack some features, specifically you cannot do –force, or –dry-run, or use an old:new refspec. If you are not familiar with the old:new refspec; you can do ‘git push master:my-master’, which would push your ‘master’ branch, as if it was named ‘my-master’ in the remote repository.

This is extremely useful if you are really serious about using Git as a transparent client to access a Mercurial repository.

New core.mode configuration

Git is already preparing users for the v2.0 release which would bring minor backward compatibility breakage, but some people would rather get rid of the warnings which are going to stay probably for many releases more and just move to the new behavior already.

Testing Git v2.0 behavior today would not only help git-fc, but also the Git mainline, and you can do that by setting core.mode = next, so if you do this and provide feedback about any issues, that would be greatly appreciated. Unfortunately you cannot test the v2.0 behavior in Git mainline because they rejected the patches, but you can in git-fc.

Please note that the v2.0 behavior might change in the future, before v2.0 is released, so if you enable this mode you need to be aware of that. Chances are you are not going to notice any difference anyway.

In addition to the “next” (v2.0) mode, there’s the “progress” mode. This mode enables “next” plus other configurations that have been proposed to change by default in v2.0, but hasn’t yet been agreed.

In particular, you get these:

merge.defaulttoupstream = true
branch.autosetupmerge = always
mergetool.prompt = false

There might be more in the future, and suggestions are welcome.

It is recommended that you setup this mode for git-fc:

git config --global core.mode progress

Non-ff pulls rejected by default

Even in the Git project everybody has agreed this is the way to go in order to avoid the typical Git newbie making the mistake of doing a merge, when perhaps (s)he wanted to do git reset, or git rebase. With this change git complains that that a non-fast-forward branch is being pulled, so the user has to decide what to do.

The user would have to do either ‘git pull --merge‘ or ‘git pull --rebase‘, the former being what Git mainline currently does.

The user can of course choose the old behavior, which is easy to configure:

git config --global pull.mode merge

Official staging area

Everybody already uses the term “staging area” already, and Git developers also agreed it the best term to what is officially referred to as “the index”. So git-fc has new options for all commands that modify the staging area (e.g. git grep –staged, git rm –staged), and also adds a new git stage command that makes it easier to work with the staging area.

'git stage' [options] [--] [...]
'git stage add' [options] [--] [...]
'git stage reset' [-q|--patch] [--] [...]
'git stage diff' [options] [] [--] [...]
'git stage rm' [options] [--] [...]
'git stage apply' [options] [--] [...]
'git stage edit'

Without any command, git stage adds files to the stage, same as git add, same as in Git mainline.

New fetch.default configuration

When you have configured the upstream tracking branch for all your branches, you will probably have tracking branches that point to a local branch, for example feature-a pointing to master, in which case you would get something like:

% git fetch
From .
 * branch            master     -> FETCH_HEAD

Which makes absolutely no sense, since the ‘.’ repository is not even documented, and FETCH_HEAD is a marginally known concept. In this case git fetch is basically doing nothing from the user’s point of view.

So the user can configure fetch.default = simple to get a simple sensible default; ‘git fetch‘ will always use origin by default, which is not ideal for everyone, but it’s better than the current alternative.

If you use the “progress” mode, this option is also enabled.

Publish tracking branch

Git mainline doesn’t have the greatest support for triangular workflows, a good solution for that is to introduce a second “upstream” tracking branch which is for the reverse; the branch you normally push to.

Say you clone a repository (libgit2) in GitHub, then create a branch (feature-a) and push it to your personal repository, you would want to track two branches (origin/master), and (mine/feature-a), but Git mainline only provides support for a single upstream tracking branch.

If you setup your upstream tracking branch to origin/master, then you can just do git rebase without arguments and git will pick the right branch (origin/master) to rebase to. However, git push by default will also try to push to origin/master, which is not what you want. Plus git branch -v will show how ahead/behind your branch is compared to origin/master, not mine/feature-a.

If you set up your upstream to mine/feature-a, then git push will work, but git rebase won’t.

With this option, git rebase uses the upstream branch, and git push uses the publish branch.

Setting the publish tracking branch is easy:

git push --set-publish mine feature-a

Or:

git branch --set-publish mine/feature-a

And git branch -v will show it as well:

fc/branch/fast      177dcad [master, gh/fc/branch/fast] branch: ...
fc/stage            abb6ad5 [master, gh/fc/stage] completion: ...
fc/transport/improv eb4d3c7 [master, gh/fc/transport/improv] ...

Support for Ruby

By far the most complex and interesting feature, but unfortunately also the one that is not yet 100% complete.

There is partial optional support for Ruby. Git already has tooling so any language can use it’s plumbing and achieve plenty of tasks:

IO.popen(%w[git for-each-ref]) do |io|
io.each do |line|
sha1, kind, name = line.split()
# stuff
end
end

However, this a) requires a process fork, and b) requires I/O communication to get the desired data. While this is not a big deal on many systems, it is in Windows systems where forks are slow, and many Git core programs don’t work as well as they do in Linux.

Git has a goal to replace all the core scripts with native C versions, but it’s a goal only in name that is not actually pursued. In addition, that still leaves out any third party tools since Git doesn’t provide a shared libgit library, which is why an independent libgit2 was needed in the first place.

Ruby bindings solve these problems:

for_each_ref() do |name, sha1, flags|
# stuff
end

The command ‘git ruby‘ can use this script by providing the bindings for many Git’s internal C functions (though not all), which makes it easier to write Ruby programs that take full advantage of Git without any need of forks, or I/O communication.

Conclusion

As you might guess, I’ve spent a lot of time working on all these features, plus all the ones that are already merged in Git’s mainline. Hopefully they are useful to some people.

It’s easy to compile and install:

make install

By default git will be installed in your home directory, but you can also do what I do: ‘make prefix=/opt/git install‘, and add ‘/opt/git/bin’ to your $PATH. All you need is a few development packages; zlib, curl, expat, openssl.

The code is in Github, the home page is in Google code, and the mailing list in Google groups. All comments and patches are welcome.

You can find future comments and releases in this blog, under the git-fc tag.

git-fc

The Linux way; never ever break user experience

Through the years it has become more and more obvious to me that there’s two camps in open source development, and one camp is not even aware of how the other camp works (or succeeds, rather), often to their own detriment. This was blatantly obvious in Miguel de Icaza’s blog post What Killed the Linux Desktop, in which he accused Linus Torvalds for setting the attitude of breaking API’s to the developer’s heart content without even realizing that they (Linux kernel developers) do the exact opposite of what he claimed; Linux never ever breaks user-space API. This triggered a classic example of many thorned discussions between the two camps, which illustrates how one side doesn’t have a clue of how the other side operates, or even; that there’s entirely different way of doing things. I will name the camps “The Linux guys” (even though they don’t work strictly on the Linux kernel), and “The user-space guys”, which is people that work on user-space, GNOME being one of the peak examples.

This is not an attempt to put people in two black-and-white categories, there’s a spectrum of behaviors, and surely you can find people in a group with the mentality of the other camp. Ultimately it’s you the one that decides if there’s a divide in attitudes or not.

The point of this post is to explain the number one rule of Linux kernel development; never ever break user experience, why that is important, and how far off the user-space camp is.

When I say “Linux” I’m referring to the Linux kernel, because that’s the name of the kernel project.

The Linux way

There are many exhaustive details of what makes the Linux kernel development work on a day-to-day basis, and many reasons for why it is the way it is, and why it’s good and desirable to work in that way. But for now I will simply concentrate on what it is.

Never ever break user experience. This point cannot be stressed enough. Forget about development practices, forget about netiquette, forget about technical competence… what is important is the users, and to never break their expectations. Let me quote Linus Torvalds:

The biggest thing any program can do is not the technical details of the program itself; it’s how useful the program is to users.

So any time any program (like the kernel or any other project), breaks the user experience, to me, that’s the absolute worst failure that a software project can make.

This is a point that is so obvious, and yet many projects (especially big ones), often forget; that they are nothing without their user-base. If you start a small project all by yourself, you are painfully aware that in order for your project to succeed, you need users. Without users you cannot get more developers, and your project could very well disappear from the face of the Earth, and nobody would notice. But once your project is big enough, one or two users complaining about something start being less of an issue, in fact, hundreds of them might be insignificant, and at some point, you loose any measure of what percentage of users are complaining, this problem might grow to the point that developers say “users don’t know what they want”, in order to ignore the importance of users, and their needs.

But that’s not the Linux way; it doesn’t matter if you have one user, or ten, or millions, your project still succeeds (or fails) because of the same reason; it’s useful to some people (or not). And if you break user experience, you risk that usefulness, and you risk your project being irrelevant for your users. That is not good.

Of course, there are compromises, sometimes you can do a bit of risk analysis: OK; this change might affect 1% of our current users, and the change would be kind of annoying, but it would make the code so much more maintainable; let’s go for it. And it’s all about to where you draw the line. Sometimes it might be OK to break user experience, if you have good reasons for it, but you should really try to avoid it, and if you go forward, provide an adjustment period, a configuration for the old behavior, and even involve your users in the whole process to make sure the change is indeed needed, and their discomfort is minimized.

At the end of the day it’s all about trust. I use project X not only because it works for me, but because I trust that it will keep working for me in the years to come. If for some reason I expected it to break next year, I might be better off looking for something else right now that I trust I could keep relying on indefinitely, than having project X break on me while I’m on the middle of a deadline, and I don’t have time for their shenanigans.

Obvious stuff, yet many project don’t realize that. One example is when the udidks2 project felt they should change the address of the mount directories from `/media/foo`, to `/run/media/$user/foo`. What?! I’m in the middle of something important, and all of a sudden I can’t find my disks’ content in /media? I had to spend a considerable amount of time until I found the reason; no, udisks2 didn’t had a bug; they introduced this change willingly and knowingly. They didn’t give any deprecation warning while they moved to the new location, they didn’t have an option to keep the old behavior, they just moved it, with no explanation, in one single commit (here), from one version to the next. Am I going to keep using their project? No. Why would I? Who knows when would be the next time they decide to break some user experience unilaterally without deprecation warnings or anything? The trust is broken, and many others agree.

How about the Linux kernel? When was the last time your Linux kernel failed you in some way that it was not a bug, but that the developers knowingly and willingly broke things for you? Can’t think of any? Me neither. In fact, people often forget about the Linux kernel, because it just works. The external drivers (like NVIDIA or AMD) is not a problem of the kernel, but the drivers themselves, and I will explain later on. You have people bitching about all kinds of projects, and threatening forks, and complaining about the leadership, and whatnot. None of that happens with the Linux kernel. Why? Because it just works. Not for me, not for 90% of the users, for everybody (or 99.99% of everybody).

Because they never ever break user experience. Ever. Period.

The deniers

Miguel de Icaza, after accusing Linus not maintaining a stable ABI for drivers, went on arguing that it was kernel developers’ fault of spreading attitudes like:

We deprecated APIs, because there was a better way. We removed functionality because “that approach is broken”, for degrees of broken from “it is a security hole” all the way to “it does not conform to the new style we are using”.

What part of “never ever break user experience” didn’t Icaza understand? It seems he only mentions the internal API, which does change all the time in the Linux kernel, and which has never had any resemblance of a promise that it wouldn’t (thus the “internal” part), and ignoring the public user-space API, which does indeed never break, which is why you, as a user, don’t have to worry about your user-space not working on Linux v3.0, or Linux v4.0. How can he not see that? Is Icaza blind?

Torvalds:

The gnome people claiming that I set the “attitude” that causes them problems is laughable.

One of the core kernel rules has always been that we never ever break any external interfaces. That rule has been there since day one, although it’s gotten much more explicit only in the last few years. The fact that we break internal interfaces that are not visible to userland is totally irrelevant, and a total red herring.

I wish the gnome people had understood the real rules inside the kernel. Like “you never break external interfaces” – and “we need to do that to improve things” is not an excuse.

Even after Linus Torvalds and Alan Cox explained to him how the Linux kernel actually works in a Google+ thread, he didn’t accept anything.

Lennart Poettering being face to face with both (Torvalds and Cox), argued that this mantra (never break user experience) wasn’t actually followed (video here). Yet at the same time his software (the systemd+udev beast) recently was criticized for knowingly and willingly breaking user experience by making the boot hang for 30s per device that needed firmware. Linus’ reply was priceless (link):

Kay, you are so full of sh*t that it’s not funny. You’re refusing to
acknowledge your bugs, you refuse to fix them even when a patch is
sent to you, and then you make excuses for the fact that we have to
work around *your* bugs, and say that we should have done so from the
very beginning.

Yes, doing it in the kernel is “more robust”. But don’t play games,
and stop the lying. It’s more robust because we have maintainers that
care, and because we know that regressions are not something we can
play fast and loose with. If something breaks, and we don’t know what
the right fix for that breakage is, we *revert* the thing that broke.

So yes, we’re clearly better off doing it in the kernel.

Not because firmware loading cannot be done in user space. But simply
because udev maintenance since Greg gave it up has gone downhill.

So you see, it’s not that GNOME developers understand the Linux way and simply disagree that’s the way they want to go, it’s that they don’t even understand it, even when it’s explained to them directly, clearly, face to face. This behavior is not exclusive to GNOME developers, udisks2 is another example, and there’s many more, but probably not as extreme.

More examples

Linus Torvalds gave Kay a pretty hard time for knowingly and willingly introducing regressions, but does Linux fares better? As an example I can think of a regression I found with Wine, after realizing the problem was in the kernel, I bisected the commit that introduced the problem and notified Linux developers. If this was udev, or GNOME, or any other crappy user-space software, I know what their answer would be: Wine is doing something wrong, Wine needs to be fixed, it’s Wine’s problem, not ours. But that’s not Linux, Linux has a contract with user-space and they never break user experience, so what they did is to revert the change, even though it made things less-than-ideal on the kernel side, that’s what was required so you, the user, doesn’t experience any breakage. The LKML thread is here.

Another example is what happened when Linux moved to 3.0; some programs expected a 2.x version, or even 2.6.x, these programs were clearly buggy, as they should check that the version is greater than 2.x, however, the bugs were already there, and people didn’t want to recompile their binaries, and they might not even be able to do that. It would be stupid for Linux to report 2.6.x, when in fact it’s 3.x, but that’s exactly what they did. They added an option so the kernel would report a 2.6.x version, so the users would have the option to keep running these old buggy binaries. Link here.

Now compare the switch to Linux 3.0 which was transparent and as painless as possible, to the move to GNOME 3. There couldn’t be a more perfect example of a blatant disregard to current user experience. If your workflow doesn’t work correctly in GNOME 3… you have to change your workflow. If GNOME 3 behaves almost as you would expect, but only need a tiny configuration… too bad. If you want to use GNOME 3 technology, but you would like a grace period while you are able to use the old interface, while you adjust to the new one… sucks to be you. In fact, it’s really hard to think of any way in which they could have increased the pain of moving to GNOME 3. And when users reported their user experience broken, the talking points were not surprising: “users don’t know what they want”, “users hate change”, “they will stop whining in a couple of months”. Boy, they sure value their users. And now they are going after the middle-click copy.

If you have more examples of projects breaking user experience, or keeping it. Feel free to mention them in the comments.

No, seriously, no regressions

Sometimes even Linux maintainers don’t realize how important this rule is, and in such cases, Linus doesn’t shy away from explaining it to them (link):

Mauro, SHUT THE FUCK UP!

It’s a bug alright – in the kernel. How long have you been a
maintainer? And you *still* haven’t learnt the first rule of kernel
maintenance?

If a change results in user programs breaking, it’s a bug in the
kernel. We never EVER blame the user programs. How hard can this be to
understand?

> So, on a first glance, this doesn’t sound like a regression,
> but, instead, it looks tha pulseaudio/tumbleweed has some serious
> bugs and/or regressions.

Shut up, Mauro. And I don’t _ever_ want to hear that kind of obvious
garbage and idiocy from a kernel maintainer again. Seriously.

I’d wait for Rafael’s patch to go through you, but I have another
error report in my mailbox of all KDE media applications being broken
by v3.8-rc1, and I bet it’s the same kernel bug. And you’ve shown
yourself to not be competent in this issue, so I’ll apply it directly
and immediately myself.

WE DO NOT BREAK USERSPACE!

The fact that you then try to make *excuses* for breaking user space,
and blaming some external program that *used* to work, is just
shameful. It’s not how we work.

Fix your f*cking “compliance tool”, because it is obviously broken.
And fix your approach to kernel programming.

And if you think that was an isolated incident (link):

Rafael, please don’t *ever* write that crap again.

We revert stuff whether it “fixed” something else or not. The rule is
“NO REGRESSIONS”. It doesn’t matter one whit if something “fixes”
something else or not – if it breaks an old case, it gets reverted.

Seriously. Why do I even have to mention this? Why do I have to
explain this to somebody pretty much *every* f*cking merge window?

This is not a new rule.

There is no excuse for regressions, and “it is a fix” is actually the
_least_ valid of all reasons.

A commit that causes a regression is – by definition – not a “fix”. So
please don’t *ever* say something that stupid again.

Things that used to work are simply a million times more important
than things that historically didn’t work.

So this had better get fixed asap, and I need to feel like people are
working on it. Otherwise we start reverting.

And no amount “but it’s a fix” matters one whit. In fact, it just
makes me feel like I need to start reverting early, because the
maintainer doesn’t seem to understand how serious a regression is.

Compare and contrast

Now that we have a good dose of examples it should be clear how the difference in attitudes from the two camps couldn’t be more different.

In the GNOME/PulseAudio/udev/etc. camp, if a change in API causes a regression on the receiving end of that API, the problem is in the client, and the “fix” is not reverted, it stays, and the application needs to change, if the user suffers as a result of this, too bad, the client application is to blame.

In the Linux camp, if a change in API causes a regression, Linux has a problem, the change is not a “fix”, it’s a regression and it must be reverted (or otherwise fixed), so the client application doesn’t need to change (even though it probably should), and the user never suffers as a result. To even hint otherwise is cause for harsh public shaming.

Do you see the difference? Which of the two approaches do you think is better?

What about the external API?

Linux doesn’t support external modules, if you use an external module, you are own your own. They have good reasons for this; all modules can and should be part of the kernel, this makes maintenance easy for everybody.

Each time an internal API needs to be changed, the person that does the change can do it for all the modules that are using that API. So if you are a company, let’s say Texas Instruments, and you manage to get your module into the Linux mainline, you don’t have to worry about API changes, because they (Linux developers), would do the updates for you. This allows the internal API to always be clean, consistent, relevant, and useful. As an example of a recent change, Russell King (the ARM maintainer), introduced a new API to set the DMA mask, and in the process updated all users of dma_set_mask() to use the new function dma_set_mask_and_coherent(), and by doing that found potential bugs in many instances. So companies like Intel, NVIDIA, and Texas Instruments, benefit from cleaner and more robust code without moving a finger, Rusell did it all in his 51 patch series.

In addition, by having all the modules on the same source tree, when a generic API is to be added, it’s easy to consider all possible use-cases, because the code is readily available. An example of this is the preliminary Common Display Framework, which takes into consideration drivers from Renesas, NVIDIA, Samsung, Texas Instruments, and other Linaro companies. After this framework is done, all existing display drivers would benefit, but things would be specially easier for future generations of drivers. It’s only because of this refactoring that the amount of drivers supported by Linux can grow without the amount of code exploding uncontrollably, which is one of the advantages Linux has over Windows, OSX, and other operating systems’ kernels.

If companies don’t play along in this collaborate effort, like is the case with NVIDIA’s and AMD’s proprietary drivers, is to their own detriment, and there’s nobody to blame but those companies. Whenever you load one of these drivers, Linux goes immediately into a tainted mode, which means that if you find problems with the Linux kernel, Linux developers cannot help you. It’s not that they don’t want to help, but it’s that they might be physically incapable. If a closed-source module has a bug and corrupts memory on the kernel side, there is no way to find that out, and might show as some other module, or even the core itself crashing. So if a Linux developer sees a crash say, on a wireless driver, but the kernel is tainted, there is only so much he can do before deciding it’s not worth his time to investigate this issue which has a good chance of being caused by a proprietary driver.

Thus if a Linux update broke your NVIDIA driver, blame NVIDIA. Or even better, don’t use the proprietary driver, use noveau.

Conclusion

Hopefully after reading this article it would be clear to you what is the number one rule of Linux kernel development, why it is a good rule, and why other projects should follow it.

Unfortunately it should also be clear that other projects, particularly those related to GNOME, don’t follow it, and why that causes such backlash, controversy, and forks.

In my opinion there’s no hope in GNOME, or any other user-space project, being nearly as successful as Linux if they don’t follow the simplest most important rule. Linux will always keep growing in importance and development power, and these others are forever doomed to forks, nearly identical alternatives, and their developers jumping ship after their trust gets broken. If only they would follow this simple rule, or at least understand it.

Bonus video

Tux

Advanced Git concepts; the upstream tracking branch

Probably one of most powerful and under-utilized concepts of Git is the upstream tracking branch, and to be honest it probably was too difficult to use properly in the past, but not so much any more.

Here I’ll try to explain what it is, and how you can take the most advantage out of it.

Remote tracking branches

Before trying to understand what the upstream tracking branch is, you need to be familiar with remote branches (e.g. origin/master). If you are not, you probably want to read the section about them in the Pro Git book here.

To see all your remote tracking branches, you can use ‘git branch –remotes’.

The upstream tracking branch

Even if you have never heard of the concept, you probably already have at least one upstream tracking branch: master -> origin/master. When you clone a repository the current HEAD (usually ‘master’) is checked out for you, but also, it’s setup to track ‘origin/master’, and thus ‘origin/master’ is the “upstream” of ‘master’.

This has some implications on some Git tools, for example, when you run ‘git status‘ you might see a message like this:

# Your branch is behind 'origin/master' by 1 commit.

Also, if you run ‘git branch -vv‘:

* master 549ca22 [origin/master: behind 1] Add bash_profile

This is useful in order to keep your local branches synchronized with the remote ones, but it’s only scratching the surface.

Once you have realized that your local branch has diverged from the remote one, you will probably want to either rebase or merge, so you might want to do something like:

git rebase origin/master

However, ‘origin/master’ is already configured as the upstream tracking branch of ‘master’, so you can do:

git rebase master@{upstream}

Maybe you think @{upstream} is too much to type, so you can do @{u} instead, and since we are already on ‘master’ we can do HEAD@{u}, or even simpler:

git rebase @{u}

But Git is smarter than that, by default both ‘git merge’ and ‘git rebase’ will use the upstream tracking branch, so:

git rebase

Configuring the upstream branch

So now you know that upstream tracking branches are incredibly useful, but how to configure them? There’s many ways.

By default, when you checkout a new branch, and you are using a remote branch as the starting point, the upstream tracking branch will be setup automatically.

git checkout -b dev origin/dev

Or:

git checkout dev

If the starting point is a local branch, you can force the tracking by specifying the –track option:

git checkout --track -b dev master

If you already created the branch, you can update only the tracking info:

git branch --set-upstream-to master dev

There’s a very similar option called –set-upstream, however, it’s not intuitive, and it’s now deprecated in favor of –set-upstream-to, to be sure and avoid confusion, simply use -u.

You can also set it up at the same time as you are pushing:

git push --set-upstream origin dev

Finally, you can configure Git so they are always created, even if you don’t specify the –track option:

git config --global branch.autosetupmerge always

Conclusion

So there you have it, go nuts and configure the upstream branch for all your branches ;)

git-branch-upstream

An in-depth analysis of Mercurial and Git branches

I’ve discussed the advantages of Git over Mercurial many times (e.g. here, and here), and I even created a challenge for Mercurial supporters, but in this blog post I’ll try to refrain from doing judgments and concentrate on the actual facts (the key-word being try).

Continuing this full disclosure; I’ve never actually used Mercurial, at least on a day-to-day basis, where I actually had to get something done. But I’ve used it plenty of times testing many different things, precisely to find out how to do things that I can do easily in Git. In addition, I’ve looked deep into the code to figure out how to overcome some of what I considered limitations of the design. And finally, I wrote Git’s official GitMercurial bridge; git-remote-hg (more here).

So, because I’ve spent months figuring out how to achieve certain things in Mercurial, and after talking with the best and the brightest (Git, gitifyhg, hg-git, and Mercurial developers), and exploring the code myself, I can say with a good degree of confidence that if I claim something cannot be done in Mercurial, that’s probably the case. In fact, I invited people from the #mercurial IRC channel in Freenode to review this article, and I invite everyone to comment down below if you think there’s any mistake (comments are welcome).

Git vs. Mercurial branches

Now, I’ve explained before why I think the only real difference between Git and Mercurial is how they handle branches. Basically; Git branches are all-purpose, all-terrain, and Mercurial have different tools for different purposes, and can almost do as much as Git branches, but not quite.

I thought the only real limitation was that Mercurial branches (or rather bookmarks), didn’t nave a per-repository namespace. For example: in Git the branch “development” can be in different repositories, and point to different commits, and to visualize them, you can refer to “max/development” (Max’s development branch), “sarah/development” (Sarah’s), “origin/development” (The central repository version), “development” (your own version). In Mercurial you only have “development”, and that’s it. I consider that a limitation of Mercurial, but feel free to consider it a “difference”. But it turns out there’s more.

In Git, it’s easy to add, remove, rename, and move branches. In Mercurial, bookmarks are supposed to work like Git branches, however, they don’t change the basics of how Mercurial works, and in Mercurial it doesn’t matter if you have a bookmark or not pointing to a commit, it’s still there, and completely visible; in Mercurial, each branch can have multiple “heads”, it doesn’t matter if there’s a bookmark pointing to it or not. So in order to remove a bookmark (and its commits), you need to use “hg strip” command, and to use that command, you need to enable the MqExtension, however, that’s for local repositories, for remote ones you need to cross your fingers, and hope your server has a way to do that — Bitbucket does through its web UI, but it’s possible that there is just no way.

Mercurial advocates often repeat the mantra “history is sacred”, and Mercurial’s documentation attempts to explain why changing history is hard, that shows why it’s hard to remove bookmarks (and it’s commits); it’s just Mercurial’s design.

On the other hand, if you want to remove a branch in git; you can just do “git push :feature-a“. Whether “history is sacred” or not is left for each project to decide.

Solving divergence

In any version control system, divergence is bound to happen, and in distributed ones, even more. Mercurial and Git solve this problem in very different ways, lets see how by looking at a very simple divergent repository:

Diverged

As you can see we have a “Fix” in our local branch, but somebody already did an “Update” to this branch in the remote repository. Both Mercurial and Git would barf when you try to push this “Fix” commit, but lets see how to solve it in each.

In Git this problem is called a “non fast-forward” push, which means that “Fix” is not an ancestor of the tip of the branch (“Update”), so the branch cannot be fast-forwarded to “Fix”. There are three options: 1) force the push (git push --force), which basically means override “origin/master” to point to “master”, which effectively dumps “Update” 2) merge “Update” and “Fix” and then push 3) rebase “Fix” on top of “Update” and then push. Obviously dropping commits is not a good idea, so either a merge or a rebase are recommended, and both would create a new commit that can be fast-forwarded from “Update”.

In Mercurial, the problem is called “multiple heads”. In Git “origin/master” and “master” are two different branches, but in Mercurial, they are two heads of the same branch. To solve the problem, you can start by running “hg heads“, which will show you all the heads of all the branches, in this case “Fix” and “Update” would be the heads of the “default” branch (aka. “master”). Then you have also three options: 1) force the push (hg push --force), although in appearance it looks the same as the Git command, it does something completely different; it pushes the new head to the remote 2) merge and push 3) rebase and push (you need the rebase extension). Once again, the first option is not recommended, because it shifts the burden from one developer to multiple ones. In theory, the developer that is pushing the new commit would know how to resolve the conflicts in case they arise, so (s)he is the one that should resolve them, and not take the lazy way out and shift the burden to other developers.

Either way solves the problem, but Git uses remote namespaces, which I already shown are useful regardless, and the other requires the concept of multiple heads. That is one reason why the concept of “anonymous heads”, that is used as an example of a feature Mercurial has over Git, is not really needed.

Mercurial bookmarks and the forced push problem

The biggest issue (IMO) I found with Mercurial bookmarks is how to create them in the first place. The issue is subtle, but it affects Git-like workflows, and specially Git<->Mercurial bridges, either way it’s useful to understand Mercurial’s design and behavior.

Suppose you have a very simple repository:

Simple repository

In Git, “feature-a” is a branch, and you can just push it without problems. In Mercurial, if “feature-a” is a bookmark, you can’t just push it, because if you do, the “default” branch would have two heads. To push this new bookmark, you need to do “hg push --force“. However, this only happens if the commit “Update” is made, also, you can push “feature-a” if it points to “Init”, and after pushing the bookmark, you can update it to include the “Feature A” commit. The end result is the same, but Mercurial barfs if you try to push the bookmarks and the commits at the same time, and there’s an update on the branch.

There’s no real reason why this happens, it’s probably baggage from the fact that Mercurial bookmarks are not an integral part of the design, and in fact began as an extension that was merged to the core in v1.8.

To workaround this problem in git-remote-hg, I wrote my own simplified version of the push() method that ignores checks for new heads, because in Git there cannot be more than one head per branch. The code still checks that the remote commit of this branch is an ancestor of the new one, if not, you would need to do ‘git push –force’, just like in Git. Essentially, you get exactly the same behavior of Git branches, with Mercurial bookmarks.

Fixing Git

All right, I’m done trying to avoid judgement, but to try to be fair, I’ll start by mentioning the one (and only one) feature that Git lacks in comparison to Mercurial; find the branch-point of a branch, that is; the point where a branch was created (or rebased onto). It is trivial to figure that out visually, and there are scripts that do a pretty good job of finding that out from the topology of the repository, but there are always corner-cases where this doesn’t work. For more details on the problem and proposed solutions check the stackoverflow question.

Personally I’ve never needed this, but if you absolutely need this, it’s easy to patch Git, I wrote a few patches that implement this:

https://github.com/felipec/git/commits/fc/base

This implements the @{tail} notation, which is similar to the official @{upstream} notation, so you can do something like “development@{tail}”, which will point to the first commit the “development” branch was created on.

If this was really needed, the patches could be merged to upstream Git, but really, it’s not.

Fixing Mercurial

On the other hand fixing Mercurial wouldn’t be that easy:

  1. Support remote ‘hg strip’. Just like Git can easily delete remote commits, Mercurial should be able to.
  2. Support remote namespaces for bookmarks. Begin able to see where “sarah/development” points to, is an invaluable feature.
  3. Improve bookmark creation. So the user doesn’t need to force the push depending on the circumstances

Thanks to git-remote-hg, you can resolve 2) and 3) by using Git to work with Mercurial repositories, unfortunately, there’s nothing anybody can do for 1), it’s something that has to be fixed in Mercurial’s core.

Conclusion

I often hear people say that what you can achieve with Git, you can achieve with Mercurial, and vice versa, and at the end of the day it’s a matter of preference, but that’s not true. Hopefully after reading this blog post, you are able to distinguish what can and cannot be done in each tool.

And again, as usual, all comments are welcome, so if you see a mistake in the article, by all means point it out.

Cheers.

What’s new in Git v1.8.4 Mercurial bridge

Git v1.8.4 has been released, and git-remote-hg received a lot of good updates, here’s a summary of them:

Precise branch tracking

Git is able to find out and report if a branch is new, if it needs to be updated, if it can be fast-forward or not, etc.

   b3f6f3a..c0d1c89  master -> master
 * [new branch]      new -> new
 ! [rejected]        bad -> bad (non-fast-forward)
 ! [rejected]        updated -> updated (fetch first)

Unfortunately, Mercurial’s code doesn’t make this easy (you can’t just push new bookmarks), but it has been worked around by writing a custom push() method.

In addition, if you use my patched version of Git (here), you can also use –force and –dry-run when pushing.

+ 51c1c5f...0faf0ed bad -> bad (forced update)

In short, git-remote-hg now makes interacting with Mercuerial repositories exactly the same as with Git ones, except for deleting remote branches (Mercurial just cannot do that).

Shared repository

One of the most useful features of Git (and that Mercurial doesn’t have), is remote name-spaces. So you can easily track “max/development”, “sarah/development”, etc. however, to properly track multiple Mercurial repositories, git-remote-hg needs to create a clone of the Mercurial repo, and if the repository is a big one, having multiple unrelated clones wastes a lot of space.

The solution is to use the Mercurial share extension, which is not really an extension, as it’s part of the core (but can only be used by activating the extension), so you can add as many Mercurial remotes as you want, and they would all share the same object store.

Use SHA-1′s to identify revisions

Previously, Mercurial revisions were stored as revision numbers (e.g. the tenth commit is stored as 10), which means if history is rewritten there’s no way to tell that the revision changed, so the Git commit wouldn’t change either (as it’s cached).

By using SHA-1′s, Mercurial revisions are always tracked properly.

Properly update bookmarks

Previously, Mercurial bookmarks were only fetched once, this is now fixed to always update them.

All extensions are loaded

This way all kinds of extensions the user has configured will affect git-remote-hg, for example the keyring extension.

Make sure history rewrites get updated

Before, Git would complain that a non-fast-forward updated happened–not any more.

Always point HEAD to “default”

Mercurial properly reports which is the current branch and bookmark, but only for local repositories. To get rid of the mismatch we always track “default”

Don’t force bookmark updates

We were inadvertently forcing the update of bookmarks, effectively overriding the previous one even if the update was not fast-forward.

Use Git author for lightweight tags

Unannotated tags don’t have an author in Git, but it’s needed for Mercurial, so instead of providing an empty author, use the one configured for Git.

Fix replacing a file with a directory

What’s next

There are few features missing, and they might not land in upstream Git any more, but:

Support for revision notes

This feature allows showing the Mercurial revision as Git notes:

commit 6c88a31540012991de3add247a958fd83531256f
Author: Felipe Contreras 
Date:   Fri Aug 23 13:00:30 2013 -0500

    Test

Notes (hg):
    e392886b34c2498185eab4301fd0e30a888b5335

If you want to have the latest fixes and features, you need to use my personal repository:

https://github.com/felipec/git

Unfortunately, you not only need the python script, but to compile Git itself to have all the benefits (like push –force and –dry-run).

Wiki

Also, there’s more information and detailed instructions about how to install and configure this remote-helper.

https://github.com/felipec/git/wiki/git-remote-hg

I’m now quite confident git-remote-hg is by far the best bridge between Git and Mercurial, and here’s a comparison between this and other projects.

Enjoy :)

What it takes to improve Git or: How I fixed zsh completion

I’ve used Git since pretty much day one, and I use it all the time, so it’s important to me that it’s easy to type Git commands quickly and efficiently. I use zsh, which I believe is way superior to bash, unfortunately I found many issues with its Git completion.

In this blog post I will try to guide you through the ordeal from how I identified a problem, and how I ended up fixing it years after, for everyone’s benefit.

The issue

I work on the Linux (kernel) source tree from time to time, and I noticed that sometimes completion took a long, looong time. Specifically, I found that typing ‘git show v’ took several seconds to complete.

I decided to bring that issue up to the zsh developers, and it caused a lot of fuzz. I won’t go to every detail of the discussion, but long story short; they were not going to fix the issue because of their uncompromising principles; correctness over functionality, even if very few people use that correctness, and the functionality is almost completely broken to the point the completion is not usable in certain cases. I argued that completion is meant to make typing commands more efficient, and if completing a command takes longer than what it would have taken me to type it manually, the completion is failing its purpose. I thought any sane person would see the problem with that, but apparently I was wrong (or was I?).

Fortunately zsh has bash completion emulation, so it’s possible to use Git’s official bash completion in zsh. You loose some of the features of zsh completion, but it works very efficiently (‘git show v’ was instantaneous).

Unfortunately, zsh’s bash emulation, and zsh’ bash completion emulation (two different things), are not perfect, so some workarounds were needed in Git’s bash completion script, and those workarounds were not working properly by the time I started to use such completion, so that’s when my involvement begin.

Fixing the bridge

Each time I found a bug, I tried to fix it in Git (patch), and made sure that zsh folks fixed in their side too (commit), so eventually no workarounds would be needed, and everything would work correctly.

The completion worked for the most part, but with workarounds, and not exactly as good as bash’s. So I decided to fix zsh’s bash completion emulation once and for all. After my patches were applied by zsh developers, Git’s official completion worked much closer to how it did in bash, but there were still minor issues.

Moreover, Git’s bash completion was constantly changing, and it was only a matter of time before one change broke zsh’s completion, so I decided to get involved, understand the code and simplify it to minimize the possibility (e.g. d79f81a, 583e4d5). I saw a lot of areas of improvement, but in order to make sure nothing got broken in the process of simplification, I thought it would make sense to have some tests (5c293a6). Git’s testing framework is one of the most powerful and simple there is, so it was a pleasure to write those tests. Eventually the completion tests were good enough that I became confident in changing a lot of the completion code.

At the same time I realized most of zsh’s bash completion emulation code was not needed at all, so I wrote a very small version of it that only worked with Git’s completion. The result was very simple, and it worked perfectly, yet it could be even simpler, if only I could simplify Git’s completion even more.

The culmination of that work was the creation of __git_complete (6b179ad), a helper that has nothing to do with zsh, but it solved a long standing problem with Git completion and aliases. It’s not worth going into details about what was the problem, and why it received so much push-back from Git developers (mostly because of naming issues), what is important is that I implemented it with a wrapper function, a wrapper function that was *exactly* what my zsh simple completion wrapper needed.

Now that everything was in place, the final wrapper script ended up very small and simple (c940786), it didn’t have any of the bugs zsh’s bash completion emulation had, and was under full control of the Git project, so it could be improved later on.

Finally. I had Git completion in zsh that worked *perfectly*; it worked exactly the same as it did on bash. But that was not enough.

Now that Git completion worked just like in bash, it was time to implement some extras. zsh completion is extremely powerful, and does things bash cannot even dream of doing, and with my custom wrapper, it was possible to have the best of both worlds, and that’s exactly what I decided to do (4911589).

git-zsh-shot

Finally

So there it is, after years of work, several hundreds of mails, tons of patches through different iterations… Git now has nice zsh completion that not only works as efficiently as in bash without any difference, but in fact it even has more features.

If you want to give it a try, just follow the instructions: contrib/completion/git-completion.zsh

;)

Felipe Contreras (54):
      git-completion: fix regression in zsh support
      git-completion: workaround zsh COMPREPLY bug
      completion: work around zsh option propagation bug
      completion: use ls -1 instead of rolling a loop to do that ourselves
      completion: simplify __gitcomp and __gitcomp_nl implementations
      tests: add initial bash completion tests
      completion: simplify __gitcomp_1
      completion: simplify by using $prev
      completion: add missing general options
      completion: simplify __git_complete_revlist_file
      completion: add new __git_complete helper
      completion: rename internal helpers _git and _gitk
      completion: add support for backwards compatibility
      completion: remove executable mode
      completion: split __git_ps1 into a separate script
      completion: fix shell expansion of items
      completion: add format-patch options to send-email
      completion: add comment for test_completion()
      completion: standardize final space marker in tests
      completion: simplify tests using test_completion_long()
      completion: consolidate test_completion*() tests
      completion: refactor __gitcomp related tests
      completion: simplify __gitcomp() test helper
      completion: add new zsh completion
      completion: start moving to the new zsh completion
      completion: fix warning for zsh
      completion: add more cherry-pick options
      completion: trivial test improvement
      completion: get rid of empty COMPREPLY assignments
      completion: add new __gitcompadd helper
      completion: add __gitcomp_nl tests
      completion: get rid of compgen
      completion: inline __gitcomp_1 to its sole callsite
      completion: small optimization
      prompt: fix untracked files for zsh
      completion: add file completion tests
      completion: document tilde expansion failure in tests
      completion; remove unuseful comments
      completion: use __gitcompadd for __gitcomp_file
      completion: refactor diff_index wrappers
      completion: refactor __git_complete_index_file()
      completion: add hack to enable file mode in bash < 4
      completion: add space after completed filename
      completion: remove __git_index_file_list_filter()
      completion: add missing format-patch options
      complete: zsh: trivial simplification
      complete: zsh: use zsh completion for the main cmd
      completion: zsh: don't override suffix on _detault
      completion: cleanup zsh wrapper
      completion: synchronize zsh wrapper
      completion: regression fix for zsh
      prompt: fix for simple rebase
      completion: zsh: improve bash script loading
      completion: avoid ls-remote in certain scenarios