An in-depth analysis of Mercurial and Git branches

I’ve discussed the advantages of Git over Mercurial many times (e.g. here, and here), and I even created a challenge for Mercurial supporters, but in this blog post I’ll try to refrain from doing judgments and concentrate on the actual facts (the key-word being try).

Continuing this full disclosure; I’ve never actually used Mercurial, at least on a day-to-day basis, where I actually had to get something done. But I’ve used it plenty of times testing many different things, precisely to find out how to do things that I can do easily in Git. In addition, I’ve looked deep into the code to figure out how to overcome some of what I considered limitations of the design. And finally, I wrote Git’s official GitMercurial bridge; git-remote-hg (more here).

So, because I’ve spent months figuring out how to achieve certain things in Mercurial, and after talking with the best and the brightest (Git, gitifyhg, hg-git, and Mercurial developers), and exploring the code myself, I can say with a good degree of confidence that if I claim something cannot be done in Mercurial, that’s probably the case. In fact, I invited people from the #mercurial IRC channel in Freenode to review this article, and I invite everyone to comment down below if you think there’s any mistake (comments are welcome).

Git vs. Mercurial branches

Now, I’ve explained before why I think the only real difference between Git and Mercurial is how they handle branches. Basically; Git branches are all-purpose, all-terrain, and Mercurial have different tools for different purposes, and can almost do as much as Git branches, but not quite.

I thought the only real limitation was that Mercurial branches (or rather bookmarks), didn’t nave a per-repository namespace. For example: in Git the branch “development” can be in different repositories, and point to different commits, and to visualize them, you can refer to “max/development” (Max’s development branch), “sarah/development” (Sarah’s), “origin/development” (The central repository version), “development” (your own version). In Mercurial you only have “development”, and that’s it. I consider that a limitation of Mercurial, but feel free to consider it a “difference”. But it turns out there’s more.

In Git, it’s easy to add, remove, rename, and move branches. In Mercurial, bookmarks are supposed to work like Git branches, however, they don’t change the basics of how Mercurial works, and in Mercurial it doesn’t matter if you have a bookmark or not pointing to a commit, it’s still there, and completely visible; in Mercurial, each branch can have multiple “heads”, it doesn’t matter if there’s a bookmark pointing to it or not. So in order to remove a bookmark (and its commits), you need to use “hg strip” command, and to use that command, you need to enable the MqExtension, however, that’s for local repositories, for remote ones you need to cross your fingers, and hope your server has a way to do that — Bitbucket does through its web UI, but it’s possible that there is just no way.

Mercurial advocates often repeat the mantra “history is sacred”, and Mercurial’s documentation attempts to explain why changing history is hard, that shows why it’s hard to remove bookmarks (and it’s commits); it’s just Mercurial’s design.

On the other hand, if you want to remove a branch in git; you can just do “git push :feature-a“. Whether “history is sacred” or not is left for each project to decide.

Solving divergence

In any version control system, divergence is bound to happen, and in distributed ones, even more. Mercurial and Git solve this problem in very different ways, lets see how by looking at a very simple divergent repository:

Diverged

As you can see we have a “Fix” in our local branch, but somebody already did an “Update” to this branch in the remote repository. Both Mercurial and Git would barf when you try to push this “Fix” commit, but lets see how to solve it in each.

In Git this problem is called a “non fast-forward” push, which means that “Fix” is not an ancestor of the tip of the branch (“Update”), so the branch cannot be fast-forwarded to “Fix”. There are three options: 1) force the push (git push --force), which basically means override “origin/master” to point to “master”, which effectively dumps “Update” 2) merge “Update” and “Fix” and then push 3) rebase “Fix” on top of “Update” and then push. Obviously dropping commits is not a good idea, so either a merge or a rebase are recommended, and both would create a new commit that can be fast-forwarded from “Update”.

In Mercurial, the problem is called “multiple heads”. In Git “origin/master” and “master” are two different branches, but in Mercurial, they are two heads of the same branch. To solve the problem, you can start by running “hg heads“, which will show you all the heads of all the branches, in this case “Fix” and “Update” would be the heads of the “default” branch (aka. “master”). Then you have also three options: 1) force the push (hg push --force), although in appearance it looks the same as the Git command, it does something completely different; it pushes the new head to the remote 2) merge and push 3) rebase and push (you need the rebase extension). Once again, the first option is not recommended, because it shifts the burden from one developer to multiple ones. In theory, the developer that is pushing the new commit would know how to resolve the conflicts in case they arise, so (s)he is the one that should resolve them, and not take the lazy way out and shift the burden to other developers.

Either way solves the problem, but Git uses remote namespaces, which I already shown are useful regardless, and the other requires the concept of multiple heads. That is one reason why the concept of “anonymous heads”, that is used as an example of a feature Mercurial has over Git, is not really needed.

Mercurial bookmarks and the forced push problem

The biggest issue (IMO) I found with Mercurial bookmarks is how to create them in the first place. The issue is subtle, but it affects Git-like workflows, and specially Git<->Mercurial bridges, either way it’s useful to understand Mercurial’s design and behavior.

Suppose you have a very simple repository:

Simple repository

In Git, “feature-a” is a branch, and you can just push it without problems. In Mercurial, if “feature-a” is a bookmark, you can’t just push it, because if you do, the “default” branch would have two heads. To push this new bookmark, you need to do “hg push --force“. However, this only happens if the commit “Update” is made, also, you can push “feature-a” if it points to “Init”, and after pushing the bookmark, you can update it to include the “Feature A” commit. The end result is the same, but Mercurial barfs if you try to push the bookmarks and the commits at the same time, and there’s an update on the branch.

There’s no real reason why this happens, it’s probably baggage from the fact that Mercurial bookmarks are not an integral part of the design, and in fact began as an extension that was merged to the core in v1.8.

To workaround this problem in git-remote-hg, I wrote my own simplified version of the push() method that ignores checks for new heads, because in Git there cannot be more than one head per branch. The code still checks that the remote commit of this branch is an ancestor of the new one, if not, you would need to do ‘git push –force’, just like in Git. Essentially, you get exactly the same behavior of Git branches, with Mercurial bookmarks.

Fixing Git

All right, I’m done trying to avoid judgement, but to try to be fair, I’ll start by mentioning the one (and only one) feature that Git lacks in comparison to Mercurial; find the branch-point of a branch, that is; the point where a branch was created (or rebased onto). It is trivial to figure that out visually, and there are scripts that do a pretty good job of finding that out from the topology of the repository, but there are always corner-cases where this doesn’t work. For more details on the problem and proposed solutions check the stackoverflow question.

Personally I’ve never needed this, but if you absolutely need this, it’s easy to patch Git, I wrote a few patches that implement this:

https://github.com/felipec/git/commits/fc/base

This implements the @{tail} notation, which is similar to the official @{upstream} notation, so you can do something like “development@{tail}”, which will point to the first commit the “development” branch was created on.

If this was really needed, the patches could be merged to upstream Git, but really, it’s not.

Fixing Mercurial

On the other hand fixing Mercurial wouldn’t be that easy:

  1. Support remote ‘hg strip’. Just like Git can easily delete remote commits, Mercurial should be able to.
  2. Support remote namespaces for bookmarks. Begin able to see where “sarah/development” points to, is an invaluable feature.
  3. Improve bookmark creation. So the user doesn’t need to force the push depending on the circumstances

Thanks to git-remote-hg, you can resolve 2) and 3) by using Git to work with Mercurial repositories, unfortunately, there’s nothing anybody can do for 1), it’s something that has to be fixed in Mercurial’s core.

Conclusion

I often hear people say that what you can achieve with Git, you can achieve with Mercurial, and vice versa, and at the end of the day it’s a matter of preference, but that’s not true. Hopefully after reading this blog post, you are able to distinguish what can and cannot be done in each tool.

And again, as usual, all comments are welcome, so if you see a mistake in the article, by all means point it out.

Cheers.

Advertisements

What’s new in Git v1.8.4 Mercurial bridge

Git v1.8.4 has been released, and git-remote-hg received a lot of good updates, here’s a summary of them:

Precise branch tracking

Git is able to find out and report if a branch is new, if it needs to be updated, if it can be fast-forward or not, etc.

   b3f6f3a..c0d1c89  master -> master
 * [new branch]      new -> new
 ! [rejected]        bad -> bad (non-fast-forward)
 ! [rejected]        updated -> updated (fetch first)

Unfortunately, Mercurial’s code doesn’t make this easy (you can’t just push new bookmarks), but it has been worked around by writing a custom push() method.

In addition, if you use my patched version of Git (here), you can also use –force and –dry-run when pushing.

+ 51c1c5f...0faf0ed bad -> bad (forced update)

In short, git-remote-hg now makes interacting with Mercuerial repositories exactly the same as with Git ones, except for deleting remote branches (Mercurial just cannot do that).

Shared repository

One of the most useful features of Git (and that Mercurial doesn’t have), is remote name-spaces. So you can easily track “max/development”, “sarah/development”, etc. however, to properly track multiple Mercurial repositories, git-remote-hg needs to create a clone of the Mercurial repo, and if the repository is a big one, having multiple unrelated clones wastes a lot of space.

The solution is to use the Mercurial share extension, which is not really an extension, as it’s part of the core (but can only be used by activating the extension), so you can add as many Mercurial remotes as you want, and they would all share the same object store.

Use SHA-1’s to identify revisions

Previously, Mercurial revisions were stored as revision numbers (e.g. the tenth commit is stored as 10), which means if history is rewritten there’s no way to tell that the revision changed, so the Git commit wouldn’t change either (as it’s cached).

By using SHA-1’s, Mercurial revisions are always tracked properly.

Properly update bookmarks

Previously, Mercurial bookmarks were only fetched once, this is now fixed to always update them.

All extensions are loaded

This way all kinds of extensions the user has configured will affect git-remote-hg, for example the keyring extension.

Make sure history rewrites get updated

Before, Git would complain that a non-fast-forward updated happened–not any more.

Always point HEAD to “default”

Mercurial properly reports which is the current branch and bookmark, but only for local repositories. To get rid of the mismatch we always track “default”

Don’t force bookmark updates

We were inadvertently forcing the update of bookmarks, effectively overriding the previous one even if the update was not fast-forward.

Use Git author for lightweight tags

Unannotated tags don’t have an author in Git, but it’s needed for Mercurial, so instead of providing an empty author, use the one configured for Git.

Fix replacing a file with a directory

What’s next

There are few features missing, and they might not land in upstream Git any more, but:

Support for revision notes

This feature allows showing the Mercurial revision as Git notes:

commit 6c88a31540012991de3add247a958fd83531256f
Author: Felipe Contreras 
Date:   Fri Aug 23 13:00:30 2013 -0500

    Test

Notes (hg):
    e392886b34c2498185eab4301fd0e30a888b5335

If you want to have the latest fixes and features, you need to use my personal repository:

https://github.com/felipec/git

Unfortunately, you not only need the python script, but to compile Git itself to have all the benefits (like push –force and –dry-run).

Wiki

Also, there’s more information and detailed instructions about how to install and configure this remote-helper.

https://github.com/felipec/git/wiki/git-remote-hg

I’m now quite confident git-remote-hg is by far the best bridge between Git and Mercurial, and here’s a comparison between this and other projects.

Enjoy ūüôā

What it takes to improve Git or: How I fixed zsh completion

I’ve used Git since pretty much day one, and I use it all the time, so it’s important to me that it’s easy to type Git commands quickly and efficiently. I use zsh, which I believe is way superior to bash, unfortunately I found many issues with its Git completion.

In this blog post I will try to guide you through the ordeal from how I identified a problem, and how I ended up fixing it years after, for everyone’s benefit.

The issue

I work on the Linux (kernel) source tree from time to time, and I noticed that sometimes completion took a long, looong time. Specifically, I found that typing ‘git show v’ took several seconds to complete.

I decided to bring that issue up to the zsh developers, and it caused a lot of fuzz. I won’t go to every detail of the discussion, but long story short; they were not going to fix the issue because of their uncompromising principles; correctness over functionality, even if very few people use that correctness, and the functionality is almost completely broken to the point the completion is not usable in certain cases. I argued that completion is meant to make typing commands more efficient, and if completing a command takes longer than what it would have taken me to type it manually, the completion is failing its purpose. I thought any sane person would see the problem with that, but apparently I was wrong (or was I?).

Fortunately zsh has bash completion emulation, so it’s possible to use Git’s official bash completion in zsh. You loose some of the features of zsh completion, but it works very efficiently (‘git show v’ was instantaneous).

Unfortunately, zsh’s bash emulation, and zsh’ bash completion emulation (two different things), are not perfect, so some workarounds were needed in Git’s bash completion script, and those workarounds were not working properly by the time I started to use such completion, so that’s when my involvement begin.

Fixing the bridge

Each time I found a bug, I tried to fix it in Git (patch), and made sure that zsh folks fixed in their side too (commit), so eventually no workarounds would be needed, and everything would work correctly.

The completion worked for the most part, but with workarounds, and not exactly as good as bash’s. So I decided to fix zsh’s bash completion emulation once and for all. After my patches were applied by zsh developers, Git’s official completion worked much closer to how it did in bash, but there were still minor issues.

Moreover, Git’s bash completion was constantly changing, and it was only a matter of time before one change broke zsh’s completion, so I decided to get involved, understand the code and simplify it to minimize the possibility (e.g. d79f81a, 583e4d5). I saw a lot of areas of improvement, but in order to make sure nothing got broken in the process of simplification, I thought it would make sense to have some tests (5c293a6). Git’s testing framework is one of the most powerful and simple there is, so it was a pleasure to write those tests. Eventually the completion tests were good enough that I became confident in changing a lot of the completion code.

At the same time I realized most of zsh’s bash completion emulation code was not needed at all, so I wrote a very small version of it that only worked with Git’s completion. The result was very simple, and it worked perfectly, yet it could be even simpler, if only I could simplify Git’s completion even more.

The culmination of that work was the creation of __git_complete (6b179ad), a helper that has nothing to do with zsh, but it solved a long standing problem with Git completion and aliases. It’s not worth going into details about what was the problem, and why it received so much push-back from Git developers (mostly because of naming issues), what is important is that I implemented it with a wrapper function, a wrapper function that was *exactly* what my zsh simple completion wrapper needed.

Now that everything was in place, the final wrapper script ended up very small and simple (c940786), it didn’t have any of the bugs zsh’s bash completion emulation had, and was under full control of the Git project, so it could be improved later on.

Finally. I had Git completion in zsh that worked *perfectly*; it worked exactly the same as it did on bash. But that was not enough.

Now that Git completion worked just like in bash, it was time to implement some extras. zsh completion is extremely powerful, and does things bash cannot even dream of doing, and with my custom wrapper, it was possible to have the best of both worlds, and that’s exactly what I decided to do (4911589).

git-zsh-shot

Finally

So there it is, after years of work, several hundreds of mails, tons of patches through different iterations… Git now has nice zsh completion that not only works as efficiently as in bash without any difference, but in fact it even has more features.

If you want to give it a try, just follow the instructions: contrib/completion/git-completion.zsh

ūüėČ

Felipe Contreras (54):
      git-completion: fix regression in zsh support
      git-completion: workaround zsh COMPREPLY bug
      completion: work around zsh option propagation bug
      completion: use ls -1 instead of rolling a loop to do that ourselves
      completion: simplify __gitcomp and __gitcomp_nl implementations
      tests: add initial bash completion tests
      completion: simplify __gitcomp_1
      completion: simplify by using $prev
      completion: add missing general options
      completion: simplify __git_complete_revlist_file
      completion: add new __git_complete helper
      completion: rename internal helpers _git and _gitk
      completion: add support for backwards compatibility
      completion: remove executable mode
      completion: split __git_ps1 into a separate script
      completion: fix shell expansion of items
      completion: add format-patch options to send-email
      completion: add comment for test_completion()
      completion: standardize final space marker in tests
      completion: simplify tests using test_completion_long()
      completion: consolidate test_completion*() tests
      completion: refactor __gitcomp related tests
      completion: simplify __gitcomp() test helper
      completion: add new zsh completion
      completion: start moving to the new zsh completion
      completion: fix warning for zsh
      completion: add more cherry-pick options
      completion: trivial test improvement
      completion: get rid of empty COMPREPLY assignments
      completion: add new __gitcompadd helper
      completion: add __gitcomp_nl tests
      completion: get rid of compgen
      completion: inline __gitcomp_1 to its sole callsite
      completion: small optimization
      prompt: fix untracked files for zsh
      completion: add file completion tests
      completion: document tilde expansion failure in tests
      completion; remove unuseful comments
      completion: use __gitcompadd for __gitcomp_file
      completion: refactor diff_index wrappers
      completion: refactor __git_complete_index_file()
      completion: add hack to enable file mode in bash < 4
      completion: add space after completed filename
      completion: remove __git_index_file_list_filter()
      completion: add missing format-patch options
      complete: zsh: trivial simplification
      complete: zsh: use zsh completion for the main cmd
      completion: zsh: don't override suffix on _detault
      completion: cleanup zsh wrapper
      completion: synchronize zsh wrapper
      completion: regression fix for zsh
      prompt: fix for simple rebase
      completion: zsh: improve bash script loading
      completion: avoid ls-remote in certain scenarios

Bridge support in git for mercurial and bazaar

I’ve been involved in the git project for some time now, and through the years I’ve had to work with other DSCMs, like mercurial and monotone. Not really as a user, but investigating the inner workings of them, mostly for conversion purposes, and I’ve written tools that nobody used, like mtn2git, or pidgin-git-import, but eventually I decided; why not write something that everybody can use?

So, I present to you git-remote-hg, and git-remote-bzr. What do they do?

git clone "hg::http://selenic.com/repo/hello"
git clone "bzr::lp:gnuhello"

That’s right, you can clone both mercurial and bazaar repositories now with little to no effort.

The original repository will be tracked like any git repository:

% git remote show origin
* remote origin
  Fetch URL: hg::http://selenic.com/repo/hello
  Push  URL: hg::http://selenic.com/repo/hello
  HEAD branch: master
  Remote branches:hg and mercurial. 
    branches/default tracked
    master           tracked
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (create)

You can pull, and in fact push as well. Actually, you wouldn’t even notice that this remote is not a git repository. You can create and push new branches… everything.

You might be thinking that this code being so new might have many issues, and you might screw up a mercurial repository. While that is true to some extent, there’s already a project that tries to act as a bridge between mercurial and git: hg-git. This project has a long history, and a lot of people use it successfully. So, I took their test framework, cleaned it up, and used to test my git-remote-hg code. Eventually all the tests passed, and I cloned big mercurial repositories without any issue, and the result were exact replicas, and I know that because the SHA-1’s were the same ūüôā

So you can even keep working on top of clones you have created with hg-git, and switch back and forth between git-remote-hg and hg-git as you wish. The advantage over hg-git, is that you don’t need to use the mercurial interface at all, you can use git ūüôā

To enable the hg-git compatibility mode you would need this:

% git config --global remote-hg.hg-git-compat true

What about the others?

Surely I must not be the first one try something this cool, right? I don’t want to dwell into all the details why none of the other tools suited my needs, but let’s give a quick look:

hg-git

It’s for mercurial, not git. So you need to use it through the mercurial UI.

fast-export

This is a very nice tool, but only allows you to export stuff (fetch an hg repo), and you have to manually run the ‘git fast-import’ command, setup the marks, and so on. Also, it’s not very well maintained.

Another git-remote-hg

It needs hg-git.

Yet another git-remote-hg

You need quite a lot of patches on top of git, so your vanilla version of git is not going to cut it.¬†In addition to that it doesn’t support as many features; no tags, no bookmarks. Plus it fails all my extensive tests for git-remote-hg.

git-hg

It needs separate tools, such as ‘hg convert’, and fast-export, and it doesn’t use remote helper infrastructure, so it’s not transparent: you have to run git-hg clone, git-hg fetch, and so on.

Another git-hg

It needs hg-git, and doesn’t use git’s remote helper infrastructure either: you have to do git hg-clone, git hg-pull, etc.

There might be others, but you get the point.

The situation in bazaar is not much better, but I’m not going to bother listing the tools.

More coolness

In fact, you can push and pull from and to mercurial and bazaar ūüôā

% git remote -v
bzr	bzr::lp:~felipec/+junk/test (fetch)
bzr	bzr::lp:~felipec/+junk/test (push)
gh	gh:felipec/test.git (fetch)
gh	gh:felipec/test.git (push)
hg	hg::bb+ssh://felipec/test (fetch)
hg	hg::bb+ssh://felipec/test (push)

Isn’t that cool?

So, are there any drawbacks? Surely some information must be lost.

Nope, not really. At least not when using the hg-git compat mode, because all the extra information is saved in the commit messages.

Update: I forgot to mention the only bidirectionality problem: octopus merges. In git a merge can have any number of parent commits, but in mercurial only two are allowed. So you might have problem converting from a git repository to a mercurial one. It’s the only feature that hg-git has, that git-remote-hg doesn’t. As for bazaar, it also allows multiple parents, so it should work fine, but this hasn’t been thoroughly tested.

The only consideration is that in the case of mercurial branches are quite different from git branches, so you need to be really careful to get the behavior you want, which is why it’s best to just ignore mercurial branches, and use bookmarks instead. When you create git branches and push them to mercurial, bookmarks will be created on whatever branch is current.

Trying it out

Just copy them files (git-remote-hg,¬†git-remote-bzr) to any location available in your $PATH (.e.g ~/bin), and start cloning ūüôā

Soon they might be merged into upstream git, so they would be distributed as other contrib scripts (e.g. /usr/share/git), and eventually possibly enabled by default. They were distributed as other contrib scripts, and they were going to be enabled by default, until the maintainer decided not to, for no reason at all.

Enjoy!

Progress through the years, and Ruby awesomeness

We’ve all have noticed that when looking back at code you wrote years ago, you realize how much you have progressed–or how much you sucked in the past, depending on your perspective–so it’s always a fair assumption to say, that in the future you will look back at the code you are working now, and feel ashamed. Perhaps this becomes logarithmically less of an issue as time goes on, but at least I still experience this.

I want to share what I think is a good example of this. Ruby code is particularly special in this respect since there’s always tons of ways to write code that do exactly the same.

Back in 2007 when I started msn-pecan, I wanted a script to create a ChangeLog, not because I thought it was needed, but mainly to shut up the people that said the git log wasn’t good enough, and that you still needed to update the ChangeLog in every commit. Few of those people remain today, but my script is still here.

The objective was to output something like this:

== Release 0.0.2 ==

2007-11-30	Felipe Contreras 
	* Add readme.
	* Update copyright stuff.
	* Fix to allow aliases with special characters.

== Release 0.0.1 ==

2007-11-29	Felipe Contreras 
	* Fix warnings.
	* Add display name support.

2007-11-28	Felipe Contreras 
	* Add .gitignore file.
	* Update info.
	* Fix compilation.
	* Stuff required to compile.
	* Initial import.

For some reason I was forced to look at it:

#!/usr/bin/env ruby

require "stringio"

revs=`git rev-list master`
refs=`git for-each-ref --format="%(objectname) %(refname)" refs/tags`

commits = revs.map { |e| e.chomp }

tags = {}

list = {}
list_order = []
$tag = ""

refs.each do |l|
  tag, name = l.chomp.split(" ")
  name = name.slice(/refs\/tags\/v(.*)/, 1)
  tags[tag] = name
end

commits.each do |e|
  commit = StringIO.new(`git show --pretty="format:%an ::%ai::%s\n%b" --quiet #{e}`)
  author, date, subject = commit.readline.chomp.split("::")
  date = date.split(" ")[0]
  if tags[e]
    $tag = tags[e]
    list_order << {:id => $tag}
  end
  id = "#{date}\t#{author}"
  uid = "#{date}\t#{author}\t#{$tag}"
  if not list[uid]
    list[uid] = []
    list_order << {:id => id, :value => list[uid]}
  end
  list[uid] << subject
end

list_order.each do |i|
  id = i[:id]
  value = i[:value]

  if value
    puts "#{id}"
    puts value.map { |e| "\t* #{e}" }.join("\n")
    puts "\n"
  else
    puts "== Release #{id} ==\n\n"
  end
end

If you are familiar with Ruby, you would see that there’s tons of things that don’t make sense there, unnecessary complexity, simpler idioms, etc. So I decided to clean it up:

#!/usr/bin/env ruby

require 'date'

tags = {}
tag = nil
commits = []

class Commit
  attr_reader :author, :date, :tag

  def initialize(id, subject, author, date, tag)
    @id = id
    @author = author
    @date = Date.parse(date)
    @subject = subject
    @tag = tag[1..-1] if tag
  end

  def to_s
    return @subject
  end

end

open("|git for-each-ref --format='%(objectname) %(refname:short)' refs/tags").each do |l|
  commit, id = l.chomp.split(" ")
  tags[commit] = id
end

open("|git log --oneline --format='%H%x00%an %x00%ai%x00%s'").each do |l|
  commit, author, date, subject = l.chomp.split("")
  tag = tags[commit] if tags[commit]
  commits << Commit.new(commit, subject, author, date, tag)
end

commits.group_by(&:tag).each do |tag, commits|
    puts "== Release #{tag} ==\n\n"
    commits.group_by(&:date).sort.reverse.each do |date, commits|
      commits.group_by(&:author).each do |author, commits|
        puts "#{date}\t#{author}\n"
        puts commits.map { |e| "\t* #{e}" }.join("\n")
        puts "\n"
      end
    end
end

So we went from code that took more than 10s to complete, to one that barely took 100ms, it’s much easier to read, smaller, and it’s actually more correct (there was a bug in sorted dates). Not bad!

I was particularly surprised by the method group_by, which takes any Enumeratable and does what the name says:

[
  {:id => 0, :name => 'foo'},
  {:id => 0, :name => 'bar'},
  {:id => 1, :name => 'roo'} ].group_by { |e| e[:id] }
=> {
  0=>[{:id=>0, :name=>"foo"}, {:id=>0, :name=>"bar"}],
  1=>[{:id=>1, :name=>"roo"}] }

And then, in the functional tradition, you can sort, reverse, etc. the result.

Moreover, instead of getting the full output of a command with `git command`, it’s much better to pipe it with open("|git command"), something obvious, but a bit tedious to do in other languages.

The rest is git magick–I’ve been progressing on that front as well ūüôā

Did I mention I love ruby?

msn-pecan 0.1.4 released

Hi,

Long time no see! I’ve been busy with many other things and haven’t had time to work on msn-pecan, but I’m back.

This a minor bugfix release, mostly to fix issues while reconnecting. These will be useful for HTTP non-persistent connections which is a work in progress. There’s plenty many other patches pending in the queue, some you can try on the master branch.

Felipe Contreras (6):
      node: avoid writes when connection is closed
      ssl-conn: cleanup properly on errors
      io: properly cancel connect operations
      build: fix platform detection
      win32: update instructions for Arch Linux
      win32: version bump

Download from the (usual place).

Analysis of the 2011 GNOME user survey results

Last year Phoronix launched the 2011 GNOME user survey, an effort I initiated to try to give a voice to GNOME users, a voice I though was¬†sorely¬†lacked. I knew the effort wasn’t going to be accepted by GNOME developers, because through the years they have always rejected anything that resembles user feedback: surveys, polls, brainstorm, or something as easy as enabling voting in bugzilla. I gave it a try anyway because I made a bet with Zeeshan Ali that this would happen, and it did (although he never accepted that).

However, in the process the effort received a lot of positive feedback and the survey improved quite a lot, it would have been a shame to throw it to the trash, so, as Alan Cox suggested, I tried to make this survey happen anyway and thanks to Michael Larabel from Phoronix, it did.

According to GNOME people, such a survey, or in fact, any survey will always give bad results, because it’s impossible to get rid of all the biases. This was discussed extensively in the GNOME desktop-devel mailing list, and despite all my efforts to convince them otherwise, I didn’t succeed (which wasn’t a¬†surprise¬†at all). To give an idea of the sort of resistance we are talking about, I made the argument that even president elections across the globe would have similar problems, and they argued that even those are flawed. One would have to wonder what method in their minds would give proper statistically significant user feedback, but the answer is not surprising: nothing. Yet, they claim they listen to users, somehow. Although their method of “listening to users” remains mysterious, it probably involves personal experience: talking with their friends and family, and close circles (which are more biased than any survey could).

But in this discussion there was valid criticism, mainly the non-response bias. It was argued that only people that hate GNOME would answer the survey, and thus the survey would be biased towards negative comments, and the results worthless. There is some truth to that claim, and the survey tried to identify and mitigate that factor, with success I think. I will start by explained how this was done, because otherwise the survey would indeed be worthless for many purposes, which is not the case. Other arguments against the results of the survey are invalid in my opinion, but unlike other blogs from GNOME developers, this blog has comments open, so if you disagree and find other valid criticism to this survey, you are free to comment below.

Nevertheless, the most important point that GNOME people missed, is that this was a survey v1.0, and like all open source software, it can (and will) improve. Even if they are correct (they aren’t), and the results of v1.0 is totally worthless, that doesn’t mean that v2.0, or later versions, will. Unfortunately in their mind neither v1.0, nor v100.0 can provide any helpful results at all, or for that matter any other survey in the history of humanity.

I will present the rationale of why I think these results are good and valid, and I think any rational person would agree, but you would be the judge of that.

If you want to see the straight results of the survey, go to this Phoronix page.

Non-response bias

Let’s invent a scenario where we have a population of 1000 people, and we want to figure out how many people like shopping. To do that we sample 100 people and it turns out around 25% of them say they like shopping. That’s it, right? Job done. Well, what happens if most of people interviewed where males, say 2/3 males, 1/3 females. We know from other sources that we should expect the male/female ratio to be around 50/50, so we know there was a disproportion, but is that a problem? If it turns out that say, females have a bias towards liking shopping, then yes, there is a problem.

If we find results like that, we know there is bias, but we can use our knowledge about the male/female ratio expected in the total population to calculate what would be the real amount of people that like to shop in the total population. But as GNOME people were quick to mention, we don’t have a census of Linux desktop users, so we don’t have any idea of the proportions we might expect for different biases. In the example above, if we don’t know the male/female ratio of the total population, then we have no idea if 33/66 was disproportionate or not, all we know is that there is bias. And that’s the end of the game, the 25% our survey returned is worthless. We can say how many females and males like shopping, but nothing about the total population. I wouldn’t call this totally worthless, but GNOME people would.

But that is only¬†if¬†there is a bias. If it turns out that males and females like shopping at roughly the same proportion, then it doesn’t matter what is the ratio of the total population, and it doesn’t matter if we are being disproportionate or not. All we can do is identify the bias, if we don’t even ask the question “What is your sex?”, then we are truly lost, because we can’t even know if there was bias or not.¬†So all we can do is try to identify all possible sources of bias, and then check if this bias is indeed happening or not.

Even worst is what happens if no females participate in the survey at all. That is highly unlikely in this case, but consider online surveys; people without Internet would be unlikely to participate, and if they have a bias then the results would be skewed. This was also used by GNOME people as an argument against any kind of online survey.

However, I had an idea; what if we ask people to print the survey and give it to these kind of people without Internet. This in effect would be no different from a survey that is not online, the only difference is that instead of paying people to go around finding GNOME users, we ask the community to do so —¬†crowd-sourcing¬†the problem. We didn’t ask this directly, perhaps for v2, but I added a question to identify how people are filling the survey: on their own, on behalf of somebody else, and so on.

We did spend a considerable amount of time trying to identify all these possible biases, and then add questions in the survey to see if there’s bias or not, and that’s all anybody can do.

So all the criticism GNOME people threw was taken care one way or the other, even if they didn’t agree so. The only way the results of this survey would be worthless is if somebody comes up with a possible bias that no question in the survey would identify (we need a new survey), or if it turns out there’s a real bias (we need a census).

Was there bias?

Let’s examine the main claim of GNOME people; people that answer the survey would have a bias towards hating GNOME.

Overall, how satisfied are you with GNOME?

Barely, Completely, Halfway, Mostly, Not At All

How are you taking this survey?

I am acting on behalf of somebody else, Completely on my own, Other, Somebody is pushing for me to do it

As you can see irrespective of the way people answered the survey, there’s a clear tendency: most people avoided the extremes and answered primarily “Mostly”, and then “Halfway”. The people that answered “I am acting on behalf of somebody else” went more for the extremes, but the difference is not significantly different from the people that answered “Completely on my own”. So, fortunately it seems GNOME people were wrong; there’s no bias.

Just to be sure, lets look at another, similar question:

How do you compare your current GNOME version with the version from one year ago?

Better, No Changes, Cannot Say, Worse

Again we see no significant differences depending on how people answered the survey; most people said either better, or worse at about the same rate. The people that answered on behalf of somebody else had a tendency to answer “Better”, and the people that got pushed, answered “Worse” a bit more, but in general no big difference.

Clearly, if there is a bias, it’s not dependent on how people are answering the survey. There’s another possibility; we didn’t get enough proxy responses. Statistically speaking, we would expect the results of “I am acting on behalf of somebody else” to be¬†¬Ī18% off target, but “Somebody is pushing for me to do it” only¬†¬Ī8, based on the amount of responses with a 95% certainty. So it would be nice if we get more of these responses on the next survey, but we can be relatively certain that if there is a bias, it’s not that great. Certainly not enough to¬†declare¬†the whole survey worthless.

I did similar analyses on the different questions of the survey, and I couldn’t find any significant biases from groups that could be underrepresented, therefore, the results of the survey are valid.

However, there are certain interesting results.

Interesting results

This is the same improvement question, but depending on whether or not people used GNOME 3.

How do you compare your current GNOME version with the version from one year ago?

Better, No Changes, Cannot Say, WorsePeople that haven’t used GNOME 3 mostly say that it hasn’t changed, but the ones that have used it have very different opinions, which matches what we have seen in online discussions: people either hate it or love it, but it’s interesting to see that the split is 50/50.

Another theory that popped in the discussions was that people that use terminals would hate GNOME 3, but “normal people” would love it.

Overall, how satisfied are you with GNOME?

Barely, Completely, Halfway, Mostly, Not At All

How often do you use a terminal/console?

Is there anything else?,¬†When I have no other option,¬†I can’t live without them,¬†What is that?

However, it seems to be the opposite: people that don’t even know what a terminal is certainly don’t like GNOME, it’s the rest that do: from people that use it when there’s no other option, from people that don’t use anything else.

One of my theories was that contributors to the project would have a bias toward liking it.

Overall, how satisfied are you with GNOME?

Barely, Completely, Halfway, Mostly, Not At All

Have you contributed to the GNOME project?

No, Yes

But it appears I was also wrong; contributors to the project have roughly the same opinion as the non-contributors.

There is a much more clear differentiator tough: people that have tried to contact the project, unsuccessfully.

Overall, how satisfied are you with GNOME?

Barely, Completely, Halfway, Mostly, Not At All

Have you ever contacted the GNOME team?

No, I don’t know how;¬†No, never had the need;¬†Yes, successfully; Yes, unsuccessfully

The people that have tried to contact the project unsuccessfully tend to say either they are barely satisfied, or not at all with GNOME. Of all the people that have contacted the project 2/3 of them say it was unsatisfactory. This should be worrying.

And then, people that like to use another window manager.

Overall, how satisfied are you with GNOME?

Barely, Completely, Halfway, Mostly, Not At All

Have you ever contacted the GNOME team?

Yes, successfully; Yes, unsuccessfully; No, I don’t know how; No, never had the need

Clearly GNOME has problems interacting with other window managers. Or at least people seem to think so.

What needs to change?

Unfortunately there was no easy way to ask GNOME users this question so it was more or less open form, and that’s very difficult to analyze, however, I’ve taken the time to read them one by one, and count them, and so far I have 20% of the 10000 responses, but I doubt reading the rest will change the results much. I will try to do so in the future, as time permits, but I don’t promise much.

Better customization (397)

This is by far the most requested, people don’t want to manually fiddle with gconf/dconf, extensions, or gnome-tweak, they want a whole lot more options integrated. One suggestion was to have an advanced mode, which is something I have suggested in the past, but GNOME developers are adamant against it for no rational reasons.

Not only do people want more options, but they think some of the current ones are useless, and that defaults are all wrong. They also want more options for power management, like deciding what happens when you close the lid. Also more options to change the appearance: font, icons, keyboard bindings, screensaver, etc. Also, disable the accessibility stuff.

GNOME 2 (113)

People love their GNOME 2. A lot of them suggested to have two interfaces, others requested to improve the fallback mode, but most of them demanded to get rid of GNOME shell directly. A lot of people also asked to bring back the GNOME 2 panel, and taskbar.

Improve performance and footprint (89)

A lot people think GNOME is too bloated and asked for better performance, less CPU usage, less memory usage, smoother animations, faster start-up times, etc. Specially the ones that have used GNOME shell, which apparently is a resource hog.

Nautilus (57)

People are definitely not happy with nautilus. They complain it’s too slow, takes too much time to start, and lacks a lot functionality. Many suggested to get rid of it and use some of the already existing alternatives.

Notifications (46)

The new notifications are annoying; one shouldn’t need to move the mouse cursor the corner to see them; that makes it easy to miss them, which defeats the purpose of a notification.

Shutdown / Restart / Suspend (44)

This is a no-brainer; just add this option. Yes, you can see them by pressing “alt” but people don’t want that, there’s no excuse to remove basic functionality, and users are complaining hard for this particular feature.

Improve theming (32)

Users want better themes and icons, they don’t like the default theme, also want a dark theme, and also think the amount of customization a theme can achieve is not enough.

Better multi-monitor support (32)

Another very requested feature was to improve multi-monitor support.

Others (in order)

  • Improve Evolution
  • Listen to users
  • Reduce dependencies (specially¬†PulseAudio)
  • Improved reliability / stability
  • Minimize / Mazimize
  • Tiling
  • Get rid of Evolution (or split the calendar component
  • alt+tab
  • Faster shell search
  • Collaborate more with other communities
  • Fix ATI fglrx issues
  • Integrate Zeitgeist
  • Render KDE apps seamlessly
  • Reduce dead space
  • Compiz compatibility
  • Developer attitude

Conclusion

What is clear is that most people want more configuration options, a lot liked GNOME 2 much better than GNOME 3, and they want an option to have an interface similar to GNOME 2 with GNOME 3 technology. For every user that likes GNOME 3, there is one that hates it. There’s plenty of people that think they are ignored by GNOME developers, and that users are¬†treated¬†like idiots (and they don’t like that). Users want a better attitude from the developers, not only toward users, but towards other FOSS projects like KDE (GTK+ apps like fine under KDE, Qt apps don’t under GNOME), and less of the not-invented-here syndrome.

Hopefully it has become clear to rational people that there is some value in the results of this survey, even if GNOME developers reject it. All the known biases were identified, and fortunately the ones that could cause non-response bias didn’t do so, so the results are valid and nobody was underrepresented ūüôā

The next survey¬†will incorporate some of the findings here, and hopefully more people would be aware of the need to reach people that normally wouldn’t answer this survey. Perhaps at some point GNOME developers would accept that there is no significative non-response bias because of that and start listening to the results.