Pidgin picking the wrong DVCS again; Mercurial

I have long criticized Pidgin’s move to Monotone. I have tried to analyse this rather marginal DVCS tool, I wrote my own mtn->git conversion tool, and I helped validate and improve Monotone’s official tool afterwards. I have spent countless hours identifying Pidgin’s contributors, finding their real names, email address, etc. I have also manually dug through old commits to properly identify the real authors of a patch (as opposed to the committer). Finally, I have also wrote scripts that automatically create a nice, clean, git repo.

pidgin-git-import

Now that Pidgin is thinking on switching away from Monotone (which was my recommendation long time ago), they are considering Mercurial, and I’ve also helped on their conversion scripts.

pidgin-mtn-conv-files

However, I feel they are making yet another mistake, because they haven’t actually analysed their decision. In order to demonstrate the double standards, cognitive dissonance, and lack of follow up, I’m posting chunks of the recorded history on the mailing list, with links.

On switching

Back then, when I argued that Monotone wasn’t a good choice.

If monotone becomes not good enough, or if there’s a clear winner in the DVCS space, then ya, maybe it’s silly to keep using monotone then.

Ka-Hing Cheung

Interesting how at this point in time khc considers switching only there’s a clear winner in the DVCS space, which is now obviously git. But even more interesting is how did they come to the conclusion than monotone was the clear winner back in 2007.

> The more popular DSCMs right now are git and bzr, with hg not so far away.
> Choosing anything else doesn’t seem to be a wise choice.

Right. I think this is also a potentially compelling reason to switch.

If we decide to switch, I’d like to wait just a little longer before we
do it to see how git vs. bzr is going to turn out so we don’t end up
switching twice.

Richard Laager

Richard also agrees that it makes sense to wait to see which DVCS ends up being a clear winner.

But some people didn’t see a reason to switch:

I don’t see any reason to switch at all (hmm, deja vu–I recall saying this
before the switch to svn). Monotone does everything we need it to and then
some.

John Bailey

I personally don’t care why “nobody” (according to you) is using it. It works
for us. End of story.

Monotone meets our needs and is easily extended for future needs.

John Bailey

The conclusion was that there was no reason to switch, and when they do, they would make sure it’s a clear winner in the DVCS space so they don’t have to switch yet again soon.

Bad choices

Since picking a DVCS is a big decision, you must certainly want to avoid making wrong assumptions while picking one. Which they did in the case of Monotone.

I found a working copy of a git repo I created back when we were evaluating
which dvcs to use. The git repo from then is 649M, my much much newer
monotone database is only 244M.

We spent considerable time and effort evaluating our options when we made
this switch. I am a little tired of the idea that we made it relatively
arbitrarily.

Of them, git, at the time, as you claim this is solved, required almost a
full CDs worth of space just for the repository. Unlike monotone, this
repository has to be stored _per working copy_. That means if I am working on
head in pidgin.my/ and gobjectification in pidgin.gobject/ and 2.4.x in
pidgin.2.4/ then I incur this cost 3 times, whereas with montone I have one
database, and so incur the cost once. Monotone _could_ be larger than a git
repo and still be a clear winner (space wise) on this alone. In fact, this
was one of the deciding factors in picking monotone over git. Several
developers expressed a very strong dis-interest in re-learning their work
flows to be able to effecitently work on multiple projects in one working
copy.

I wouldn’t mind switching now if 1)there were a clear benefit to doing so and
2)a degree of certainty that we won’t be switching again in another couple
years.

Luke Schierer

Luke says their due diligence while evaluating options, and one of the deciding factors was that git was taking “too much space”. Well, duh!… because it was not compressed. Luke tried to defend himself by stating that it was not available in the version he used.

I am unsure when it was introduced, but I’ve also an old source for git
laying around, which indicates I was using 0.99.9h or 0.99.9i, not sure
which. Neither have a git-gc available.

Luke Schierer

But I mentioned that it was clearly explained in the tutorial:

git-repack did that.

It’s explained in tutorial.txt on 0.99.9h.

Felipe Contreras

I went back to 0.99.9h to make sure:
% git --version
git version 0.99.9h
% git repack
% git prune-packed
% du -h .git --max-depth=0
82M .git

So no, they didn’t do their homework when analysing git. If they wanted to reduce the amount of space, they could have looked at the tutorial, asked in the mailing list, or IRC. Thus they made the wrong choice.

Update: I forgot to mention the thread in the Git mailing list about Git vs Monotone in which Git developers, including Linus Torvalds, explain why the 3x argument is nonsense:

Don’t even bother. The guy is apparently not even trying to work with his
tools, he just has an agenda to push.

Quite frankly, anybody who wants to stay with monotone, we should
_encourage_ them. They add nothing to any possible project, because they
are clearly not very intelligent.

Linux Torvalds

Ad hominem means we don’t discuss with you

Unfortunately, I was quickly relegated to the status of “monotone illiterate”. So everything I say is by definition wrong, and not worth of discussion:

The above are all examples of where you simply don’t understand Monotone.

Richard Laager

Which is funny, because while trying to explain how the same can be achieved in git, I’m attacked because the same can be done in mtn, and therefore I don’t understand. I asked precisely how is it that I “don’t understand Monotone”, and of course, I got no answer.

Same here:

It’s very clear from this response that you do not understand Monotone’s
design. As the time spent correcting your misunderstandings is not useful to
you or me (because you’re fine with git anyway and because it has no bearing on
the project’s choices), I’m not going to bother.

Richard Laager

Also with no response.

It’s hard to argue with people when at the time you prove them wrong, they cut the discussion. This happened again and again.

Reconsideration

A few years later, now they are ready to switch:

For almost four years now, we’ve been using Monotone as our source
repository. Over that time, it’s proven itself to work very well for us, I
think. However, there have always been some annoyances with it.

Notable complaints we’ve had are the time it takes to pull our history into a
fresh database and the lack of compatibility with tools. For the pull issue,
we hacked around this by using a cron job to generate a bootstrap database
and make it available for download, as this was generally faster than a full
netsync of our entire history. Tools, on the other hand, have been hit and
miss. I remember Mark having trouble with meld after a new monotone release
changed some of the ‘automate’ functionality, breaking meld’s use of it.
Other useful tools, like monotone-viz, seem to take forever to be updated for
even the tiniest change in monotone.

It’s been discussed a number of times over the last couple years that we
should consider moving away from Monotone. At this point, I’m now
inclined to agree. To that end, I propose that we move from Monotone to
Mercurial. I’ll come back to this in a bit.

John Bailey

So John now realizes that I was right, and monotone has many deficiencies, and the fact it’s not widely used does have consequences.

But notice how he is jumping directly to use Mercurial without any analysis at all. In fact, he makes clear his bias against git.

Git may offer some or all of these as well, but I’m not particularly fond of
git, as we all know…

And then people start voting for their favourite one.

This debate has dried up. I think it’s time to end it. To update everyone, the
final votes (with two votes communicated to me off-list) are:

Total votes in favor of hg: 11
* Developers: 7
* CPW’s: 2
* Downstream Projects: 2

Total votes in favor of git: 5
* Developers: 3
* Downstream Projects: 1
* Translators: 1 (this vote was communicated on the translators@ list)

I think it’s safe to say hg won. Does anyone disagree with this assessment or
object to declaring hg the winner at this point?

John Bailey

Wait a second. Was this supposed to be a popularity contest? You can see a clear tendency here: they use whatever they are comfortable with. They picked Monotone because some developers knew it, and in fact some had vested interests being involved in the project. Now they are picking Mercurial, because many of them are already using it on another project.

What about git?

Mercurial is much better than Monotone, but nobody has talked about the benefits of Git, so I do:

Mercurial vs Git

Some good points are discussed, but there’s one that I found particularly interesting.

Richard Laager wrote:
>> And you would see from which branch the commits came from:
>> Merge branch ‘rlaager-foo’
>> https://github.com/ecoffey/pidgin-illustration/commit/6fc76c5bc9ac0929f7fd1e2e2d2fcb2840d394e1
>
> I asked how you would tell for revision B1, B2, or B3, not for the A4
> merge.

All you need is the merge. The merge tells you the name of the branch,
the actual branch (the commits) being merged, and the branch being
merged to.

Graphically, you can see that all the information is there:

% git log –oneline –decorate –graph
* 6fc76c5 (HEAD, origin/master, origin/HEAD, master) Merge branch
‘rlaager-foo’
|\
| * ecf1201 B3
| * 0d87796 B2
| * 4174bf8 B1
* | e9e3659 A3
* | 6411aea A2
|/
* c4c7b1d A1 commit

And you can make a simple query: what are the commits on the merged
branch of ‘master’ (A4), that were not part of master before the
merge?

% git log –oneline –decorate –graph master^1..master^2
* ecf1201 B3
* 0d87796 B2
* 4174bf8 B1

Isn’t that exactly what you wanted? master^1 is the original branch
(A3), master^2 is the merged branch (B3), and master^1..master^2 are
the commits of master^2 that are not in master^1.

Isn’t git cool? 🙂

Felipe Contreras

Richard is thinking on Mercurial/Monotone terms, where B1, B2 and B3 are permanently labelled as ‘rlaager-foo’, and he asks how Git could distinguish them. I answer him, showing that there’s no difference, and as usual, I get no response.

Another interesting point is that John likes my idea of split repositories:

== graft points ==
I saw a proposal from John Bailey to have a structure like:

libgnt
libpurple
finch
|– libgnt
|– libpurple
`– po
pidgin
|– libpurple
`– po
po

The disadvantage, he claimed, is that the history would have to start from zero
since otherwise each one of these repos would have to be 215 MB.

Not with git.

My proposal would be the following. Convert the whole history to git, then,
make a new release, say 2.8 where the directories are split (libpurple, pidgin,
etc.). That repository would be the legacy one, then, start new separate
repositories that start from scratch.

Then, for the people that want to have the full history, they can setup a
graft-point[3] and voilá; the full history would be available on each one of the
repos.

As a concept, this particular proposal has merit. A lot of merit, in my
opinion. I’d propose that if we want to move away from the single-repo model
we make 2.8.x the end of the line for both one-repo and 2.x. Then for 3.0.0
we create whatever arrangement of repositories we feel is appropriate (I’d
say libpurple, libgnt, finch, pidgin, and po if we don’t split the po’s into
each repo) with all the subrepo glue we want/need and start anew, referring
to the old repo if we need history.

This is an interesting feature that some may find useful. That said, the
Adium guys have done pretty well without such functionality in their new
“each branch is a repo” model. It does make me wonder how difficult writing
such an extension for mercurial would be, though.

John Bailey

However, once it becomes clearer that Mercurial doesn’t have such an option, suddenly he doesn’t like the idea anymore.

We don’t need no stinking analysis

Notice at this point there’s still no analysis at all. No pros vs cons, no nothing. Just simple comments such as “I’m also a +1 on Hg, FWIW“.

Great. This time they are not even pretending, so if Mercurial turns out to be a bad choice, they cannot say “We spent considerable time and effort evaluating our options when we made this switch“. I pointed that out to them:

Moving to Hg without any analysis at all

I thought that was crystal clear, but apparently not:

Ethan Blanton wrote:
> For posterity … you’re wrong. ?We *did* discuss pros and cons, at length.
> ?We discussed the impact on migration, the loss of information in moving
> from monotone to hg or git, developers’ experiences with hg, git, and other
> DVCSes which did not make the final cut. ?We looked at repository sizes and
> talked about speed. ?We discussed the differences in branches, tags,
> naming, etc. in the various systems. ?We talked about bug tracker
> integration, both with trac and moving trackers. ?We even talked about how
> hilariously wrong your explanations of non-git VCSes are, and how they’ve
> not improved since the times when you didn’t understand monotone.

Easy to say. The only recorded “discussion” was between me and John Bailey,
which he abandoned after I provided strong arguments against his notion of
“branch”. And whatever you said about me not understanding monotone was
probably behind doors, and wrong, as nobody has every provided evidence of me
not understanding something about monotone… not that it matters. And what I
said about non-git VCSes still stands.

Now, the issue is not that you discussed (without hardly any record), the
issue is that there is no conclusion. There’s no analysis anywhere. There’s
no list of pros and cons. There’s no list reasons. All those are probably on
your mind, and it’s hard to counter argue with those.

Why don’t you provide an analysis? Some official explanation?

http://code.google.com/p/support/wiki/DVCSAnalysis

Felipe Contreras

As per usual. No response.

Moreover:

> These discussions took place on mailing lists, on IRC, in the XMPP
> MUC, in public, in private, and all over the place. ?You simply
> weren’t present, which was your own choice. ?Your arguments were
> heard, and weighed, despite the fact that you interjected them
> one-sided, misinformed, and late in the process.

_My_ choice? You have banned me from the IRC channel. And I am looking
at the online mail archive. There’s hardly anything there.

Analysis

Since Pidgin guys don’t want to do it, I will try. I already wrote a post on “Mercurial vs Git” in which I compare both in a generic way.

My conclusion on that post was that both Mercurial and Git are very similar, except in the way branches are handled, and that makes a huge difference. I already explained to Pidgin guys that everything you can do with Mercurial branches, you can do with Git branches, but the opposite is not the case; Mercurial doesn’t have namespaces on their bookmarks.

However, the biggest advantage of Git for Pidgin guys, is that it allows them to switch today. As I already explained above; it’s possible to split into multiple repositories, and start them from scratch, so people can start committing right now. Meanwhile, the project to convert the old history can continue as usual, and if people want to take a look at the old history, they only need to set up a graft-point. That’s what they did on the Linux kernel. This also has the advantage that even 2 years or more from now, the scripts to convert the old history can still receive fixes and the new history would not be affected.

Since Mercurial doesn’t have that, Pidgin guys have to wait until the scripts to convert history are “perfect”. And they can’t split their repository. This is a big disadvantage, specially for people like me that are only interested in libpurple, and not the whole enchilada.

Plus the obvious advantage of git; incredible performance:

Action git hg
Cloning 2:13m 6:19m
Size 106M 213M
Pull 0.367s 1.714s
Commit 0.137s 0.239s
Diff 0.018s 0.275s
Show 0.007s 0.153s

Note: git repo, hg repo.

Plus there’s the added advantage that mtn->git conversion takes 1h (30m if it’s incremental), while mtn->hg conversion takes about 40h (yes, hours). Which means it’s much faster to test the conversion scripts, by multiple people, and even use it on a daily basis!

Of course, I am biased towards git, but since Pidgin guys have not published any analysis at all, this is as good as it gets.

With this analysis in mind, to me it’s clear that Git is the right choice, and Pidgin guys are making yet another mistake. The only reason they are picking Mercurial is comfortability; they are used to hg, so hg it is. I don’t think it’s as bad as their decision to use Monotone (so it will take longer for them to realize it would be better to use Git), which is good, but still, not ideal.

As for me, I can only try to continue maintaining my scripts, if at least to make sure they are doing the conversion right.

BTW. If you are interested in the full discussions, I have put them in mbox format here.

3 thoughts on “Pidgin picking the wrong DVCS again; Mercurial

  1. What’s the status of this? I’ve been looking for a git repo of libpurple specifically (I started looking on GitHub, and noted your pidgin-git-import), but it looks like they’re still on Monotone. Have their plans to move to a DVCS that’s actually useful to others moved forward at all?

    Like

  2. @Andy No update at all. I haven’t heard any comments regarding this analysis, no comments on the mailing list, no changes in 4 months to their conversion scripts. I sent Richard Laager some patches for their scripts, and got no response.

    Maybe you should ask on their mailing list.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.