Pidgin; how not to choose the right SCM

August 24, 2008August 26, 2008 / FelipeC

After developing a mtn to git conversion tool I understood much better how monotone works and why it does the seemingly crazy things it does. Once you understand the internals everything makes much more sense.

With the hope of shedding some light to the core developers of Pidgin regarding the DSCM tool (monotone) they use, I wrote an email explaining all the little details of monotone and how other tools do exactly the same thing in a simpler way. Here is the mail.

The results where mixed, some people discussed valid points, while others didn’t really try to understand and just attacked git, and myself. It seems some devs are quite fond of mtn and wouldn’t allow any attacks on it.

Anyway, at some point it was accepted that they didn’t do a good job at evaluating git, they discarded it because of the size of the repository (700M). Of course, they didn’t bother to read the manual, ask on IRC or in the mailing list. Don’t ask me why.

Apparently that discussion generated a post in John Bailey’s blog in which he explained why git is not a good option to him. That in turn generated a discussion in the git’s mailing list explaining that he was wrong, was doing an unfair comparison, obviously pushing his own agenda.

Then out of nowhere another discussion in the ohloh forums started. Basically an exchange between Gary Kramlich and me, in which I found out his blog post about a diagram explaining his typical workflow with monotone.

So I answered with this diagram, which clearly shows that exactly the same workflow is much simpler in git. I think in the end he realized that too.

So I’ve answered all their questions, and I’ve proved them wrong in their arguments, I’ve offered a hands-on session to clarify any misconceptions they might have. As a result I get John Bailey to call me a zealot that is not worth listening to.

Anyway, returning to the topic, here are some tips:

Don’t use an obscure DSCM

Even a popular SCM is better; your developers can use their decent DSCM (git-svn) to interact with it. If you choose an obscure DSCM you’ll have issues because of the lack of support, trac, web ui, desktop ui, free service provider (github.com), etc.

Also you make it more difficult for new contributors; they have to learn a new DSCM just to contribute.

Pidgin is the last major project that uses mtn, OpenEmbedded was using it but they decided to switch to git.

Don’t choose a DSCM that imposes itself

It doesn’t matter which DSCM you choose, not everyone is going to be happy with it, that’s why it’s a good idea to choose one that makes contributions as patches a part of the tool. Both bzr and git provide this functionality, people can send patches, for which they can use their favorite tool (git over bzr), and the maintainers can integrate the patches quite easily.

Not only that, but in order to allow reviewing the patches quite often need to be changed, that’s when rebasing comes. Both bzr and git allow rebasing, which is essential if you want to create nice patches, a la quilt style.

Moreover, bot bzr and git allow shallow clones so you don’t have to download the whole repo just to contribute.

Pidgin has a relatively big repository, so you have to download a tarball about 200MB, then do a couple of commands just to have the latest code. That’s because the initial fetch sucks in mtn.

Keep an open mind, but a closed mouth

DSCM flame wars can become pretty nasty pretty quickly, that’s why it’s a good idea to don’t participate in them. However, it’s good to be well informed about all the different DSCMs. There are other ways to be informed (private mails, or messages) discuss in the respective IRC channel of the tool you are interested in (#git, #bzr). That way you get the best facts pretty quickly instead of useless discussions.

~~Pidgin devs~~ Some Pidgin devs on the other had, praise their good understanding of other DSCMs, and are quick to point out why others are not a good choice for them, all based on misconceptions. After lengthy flamewars and embarrassing blog posts their stand is still: git is not good for us. All the reasons they’ve provided have been flawed, so now they hang on to the reasons they are not disclosing.

Well that’s it, if you are a Pidgin developer, or monotone user, that wants to get to know git better just let me know, send me a message or an email.

Cheers.

32 thoughts on “Pidgin; how not to choose the right SCM”

Gary Kramlich

August 24, 2008 at 14:43

I’m *REALLY* getting tired of talking about this…

As I mentioned in one of my posts on ohloh, when we evaluated git, the best front end was cogito. Which as far as I’m aware, was created because the main git front end at the time sucked, but once it got better, cogito died a horrible death.

I can not honestly remember exactly what we all looked at during the evaluation since it was something like 2 years ago now. So I for one can’t say we did a good job of evaluating it, and since you were on hiatus from pidgin development, nor can you.

What I can remember form the evaluation period, is that monotone met our needs. It was much slower back then, but anything that fit our needs and was easy to understand was better than CVS (yes we were still using CVS at the time of the evaluation). Aside from that, one of the Pidgin developers was already contributing to monotone. That was enough for us to choose monotone.

I thought I already mentioned this, but your diagram is not as detailed as mine. I show every step along the way, you have not. I’m not saying yours in incomplete, I’m saying it’s not as detailed and thus misleading since you’re not showing everything involved (stuff like changing to the working copy directory and so on).

I really wish you would heed your own advice about DSCM flame wars. For the record, places Felipe has brought up that we should switch DSCM’s (mind you, to the best of my knowledge he is the only one that has done this…). The pidgin devel mailing list, #pidgin, the ohloh forums, my blog, and now his blog.

As I stated earlier, I have *VERY* tired of discussing this. After all of these rambling circular discussions, reiterating over the same stuff time and time again, I’m really to the point that I don’t care if GIT could make me a sandwich, do my laundry, etc; You’re actions, your statements (as well as others) and your persistence have really turned me personally off from the tool.

I need a tool that works for my projects. It doesn’t need to integrate into everything under the sun, it doesn’t need a kitchen sink, it just needs to work and do the things I need. Monotone fits this bill. I am happy with it. I don’t move away from things that I am happy using.

LikeLike
FelipeC

August 24, 2008 at 16:12

You are, again, presenting wrong facts.

a) It was accepted already that the main reason git was rejected was that the repository size was too big. Maybe it would have been rejected for other reasons, but those reasons where not explored in depth as they should have been. You didn’t even read the manual, that’s not a good job.

The fact that ‘a’ was good enough doesn’t mean you did a good job evaluating ‘b’.

b) There’s no need for a diagram for the git workflow. If you want details for each of the actions:

start a new project/branch

For a project:
git init

If you want to make modifications, then go to ‘work on an existing branch’

For a branch, go to ‘branch a branch’

work on a new (to you) project/branch

For a project:
git clone $url

If you want to make modifications, then go to ‘work on an existing branch’

For a branch, go to ‘branch a branch’ (will be remote)

branch a branch

git checkout -b $branch $orig_branch

If the original branch (foo) is local then the name is ‘foo’ if it’s remote, then ‘$remote/foo’.
If you want to make modifications, then go to ‘work on an existing branch’
If you want to publish it, then go to ‘serve a branch’

work on an existing branch

If you want to pull upstream changes, then:
git pull

Do some modifications, then commit them:
git commit -a

If you want to publish it, then go to ‘serve a branch’

serve a branch

git push $branch $server

merging branches

If you are not on the the branch you want to merge to:
git checkout $branch_a

Then:
git merge $branch_b

If there are conflicts, solve them and re-commit.
If you want to make modifications, then go to ‘work on an existing branch’
If you want to publish it, then go to ‘serve a branch’

As you can see you just need to ask yourself ‘what do I want to do?’ then just do it, and then ask the same question again, and just do it. No need for a diagram.

Anyway. I’m not saying directly that you should switch DSCMs, I’m just saying that monotone is a bad choice.

What I’m interested here is, do you think it is a good choice now? Forget about Pidgin, forget about your projects in mtn. If you were to start a project from scratch, why not consider git, or bzr, or mercurial?

This has become a circular discussion because you keep giving the same unrelated answers, which I have already argued that are not valid.

If you want to get out of the circular discussion, give a valid reason why you don’t want to try it. We could go step by step with your typical workflow in mtn and practice how you achieve the same thing in git.

In any case, you sound like: I just need to travel from LA to NY, that’s all I need. My horse fits this bill. I am happy with it. I don’t move away from things that I am happy using. I’m sick of this guy trying to sell me a car for that, he is too persistent.

LikeLike
Gary Kramlich

August 24, 2008 at 18:07

I said I’m done… i mean it…

LikeLike
Sleek

August 25, 2008 at 09:08

So why are you still here ?

LikeLike
FelipeC

August 25, 2008 at 10:08

This is typical, when I demonstrate you are presenting twisted facts that’s when you suddenly are done with it.

You don’t have any valid reason not to try it, your only reason is ‘I’m too persistent’. Even if that were true, if you are confident mtn is a good choice why not take some time to let me guide you through your typical workflow in git (hands-on) so this discussion gets finalized for good?

And also, don’t take this the wrong way, the way I see it: friends don’t let friends use crappy SCMs.

I’m working on a mtn2git converter that will make things even more interesting for you. Why can’t you assume good faith and spend a little bit of your time on this since I’ve spent a considerable amount myself?

LikeLike
Patrick Georgi

August 26, 2008 at 14:49

I can only concur with Gary’s advice that you should take your own advice: The git advocacy squad is about as obnoxious as the gentoo advocacy squad was several years ago, and the ubuntu advocacy squad one or two years ago.

If Gary is happy with monotone, why should he “let him guide you” through git? Why should he bother? (other than to try to silence a trouble maker on the list – who won’t be quiet until his object of fanboydom is implemented anyway?)

I prefer a truthful history myself, too – so rebase is out of the question, and with that, git lost most of its appeal vs. monotone. and don’t come talking about how git stores all the real history: the next “git-gc” kills it.
Anyway whether git is better or not (in this light or in general), being annoying until they finally give in is a surprisingly successful strategy, but don’t complain if people react harshly (or if they’re nice, by staying silent at some point).

LikeLike
Patrick Georgi

August 26, 2008 at 15:50

ermm.. “let you guide him”, of course 😉

LikeLike
FelipeC

August 26, 2008 at 17:29

Patrick: I meant that the project maintainers should not encourage flamewars in the mailing list. They should evaluate the different SCMs in a more private way.

Although it might advisable for everyone to don’t participate in flamewars, or start them, sometimes you just need to do what you think is right. I felt I had the duty to share my findings when I analyzed monotone, I was hoping something productive out of it, and I got it.

Don’t let Gary fool you, although some developers embrace mtn as if it was their child, some others took a different stance. So it was productive to a certain extent.

Are you seriously asking why try new things? Well, because you might discover something you like, or even love.

Now you are criticizing git, so I must answer.

First of all, you have a very poor understanding of git; git-gc doesn’t modify the history at all, it’s just packing it, it’s a lossless compression, at any point in time you can unpack the objects and you’ll have them exactly as you had them before. Just like bzip2.

Regarding rebase, I really wonder if people are shutting their minds regarding to it. It’s really simple. Traditionally people work for some period of time, let’s say a day, and at the end of the day, you commit. You expect that commit to be “truthful” and remain in the history forever.

However. What happens if at the end of the day you have a bunch of changes, but the code is still not working, shall you commit it? Maybe you can create a new branch, and commit the changes there, so the master branch stays stable.

But there’s a lot of silly comments, and debugging, and you don’t want that to go into the history forever, right? So you wait until the next day when hopefully you have something nice and clean to commit.

Well, that’s where tools for patch management like quilt come in handy. At the end of the day, you just save your work into a quilt patch, the next day you keep working, and add another patch, when everything is ready, you can cleanup the patches, and commit the changes one by one. That way you can have a clean history, you can keep working without worrying too much about the way you are separating from the original branch.

Now, at this point you might see that quilt is actually a pseudo SCM, a personal SCM you can say, for dynamic history.

So git rebase is just like that… it doesn’t make sense to modify the permanent history (commits already pushed), but it does make sense to modify the local history (commits that haven’t been pushed yet). If it helps, picture those as pseudo-commits, not real commits, they become real when you say so (push).

I can only tell you that git not just another DSCM, once I understood the subtleties of its internals and I compared them with others, like monotone, it’s amazing the level of things they ‘just got right’.

Just yesterday I found about a new strategy to find file renames more effectively. As it turns out, as soon as you use this new code you can see the results in previous commits. Is it changing the history? Of course not! The history is not storing file renames, file renames are just presentation of the data.

One more point to git’s good design. None of the other DSCMs do that, in fact, they criticize git for not storing file renames.

The way I see it, you and Gary would surely like git… if only you could understand it. Maybe I’m not explaining it right. But in any case, you shouldn’t take stance based on false arguments (git-gc modifying the history). You can say git supporters annoy you, but you shouldn’t say git isn’t good if you really don’t know.

And I don’t complain about people reacting harshly, I just wonder what’s the point of doing that? Anyway, they are the ones missing it.

LikeLike
design

August 26, 2008 at 18:12

> Keep an open mind, but a closed mouth says the man who wrote an email to a group of developers telling them how their scm is inferior and how they are closed-minded and should wake up

LikeLike
FelipeC

August 26, 2008 at 19:11

design: No, I wrote an analysis of their *current* SCM, and it’s deficiencies compared to other solutions. They have switched 2 times already, and some of them already saw deficiencies on monotone.

After that thread I saw other projects move to git, and I realized how they did it: they had an open mind, and didn’t discuss things very publicly. I guess you are assuming that because I suggest to do that I’m implying pidgin developers didn’t do that… that’s not true.

I haven’t said anything regarding what I think about their stance, other than they have mixed opinions. How some particular developers handled the discussion is another subject entirely.

LikeLike
FelipeC

August 26, 2008 at 19:16

Hmm, after re-reading what I wrote I agree it can be interpreted the wrong way (it was too generic), I’ve changed the text, hopefully now it conveys what I wanted to say.

LikeLike
Ryan Tomayko

August 27, 2008 at 05:28

Wow! Are you two married? I haven’t seen such a pissing match since The Honeymooners.

LikeLike
FelipeC

August 27, 2008 at 10:17

Lol, I’m not pissed 🙂

LikeLike
James

August 27, 2008 at 10:44

Anyone who is so attached to a DVCS – in both camps (mtn and git) – needs to let go and go outside for once.

Pidgin’s choice for using mtn was just a choice. It works. Leave them be. On the otherhand, having to download a changeset database because AFIK mtn does not support shallow checkouts is a bit retarded.

LikeLike
FelipeC

August 27, 2008 at 11:19

Yeah, mtn doesn’t support shallow checkouts. Mtn users might say: not yet. But if you look at the details you understand why that’s not so easy: mtn store file deltas, so if you want to checkout the latest revision you need to go back possibly through every revision in the history, that would be too much for the database server. I guess that’s one of the reasons mercurial doesn’t support that either (also uses deltas).

I also agree with detachment to the SCM a project uses. It’s not healthy to be so attached.

LikeLike
James

August 28, 2008 at 01:50

@FelipeC: Im under the opinion that using both a centralized and DVCS at the same time wins – users who are not comfortable with DVCS can use the centralized VCS and those that are comfortable can use their tool of choice (eg bzr, git, etc).

Forcing a huge changeset database down someones throat when they first checkout your project to contribute (fixing a small bug or implementing a feature they want) discourages contribution – because it takes so long to get the development environment setup and that potential new contributer loses interest.

LikeLike
FelipeC

August 28, 2008 at 02:33

James: I partially agree.

I can contribute to svn projects quite easily with git-svn, it works so well that in fact it seems like a perfect couple. However, you cannot say the same of CVS; it’s a nightmare from any point of view.

On the other hand, you can use DSCMs as centralized SCMs, you don’t need a separate tool for that. One clear advantage is the separation between committer and author; that encourages patches as the contributor name appears in the log (it’s important for some people).

But still, I believe centralized strategy is not the best. When you have distributed development it’s easier to collaborate, get new contributors easily into the project, and there are other advantages.

Regarding the shallow clone I couldn’t agree more.

LikeLike
Patrick Georgi

August 30, 2008 at 16:21

from the git-gc manpage (as taken from http://www.kernel.org/pub/software/scm/git/docs/git-gc.html):
“Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git-add.”
so it’s _not_ lossless. And as I understand it, git-rebase makes objects unreachable (by creating new history starting from some point, basically splicing new changes in between).
Otherwise you’d end up with as many head revisions as there were rebase operations.

So yes, with git-gc you actually lose data (just that most git users would probably say that they don’t care about _this_ data, but I prefer to have a truthful history of what happened, not a cleaned up result).

As for “a new way to find renamed files” – monotone has the most stable one: you explicitely say what happens and, unlike the other tools, it records it and copes with it (including all the trouble that comes from that). monotone tells you what changed between revisions. git only tells you how the code looked at a point in time (not counting revisionist actions by rebase)

Tracking such stuff explicitely has a real use case: if you have a couple of files with rather similar content, named a, b, c, and you do a rename a->d, and c->a (ending up with files a, b, d), git will likely mess up, as the heuristic for file renames won’t be very happy about the similar content. my guess is that most likely it will attempt to rename+patch c->d.
Not a problem, until you have (in a checkout, or in another branch) changes in those files that should travel to the right locations.

as for non-working trees, I commit them all the time – I just don’t push them usually (unless I’m on a side-branch, where no “normal” user is harmed). so I have no need to resort to quilt just for that. Still, there’s an intent and a date of the change recorded.

As long as git can’t properly handle file renames (oh, what happens if you rename a directory, and someone else added a file in his branch – where does it end up? And: will it still do the same reasonable thing in 12 months with a newer version?), it definitely doesn’t belong in my list of “things done right”.
The git designers chose to go a different route (probably because they don’t require “interesting” rename scenarios very often), and I’m okay with that – I don’t have to use that tool, after all. I merely take offense with people who pretend to know better about my requirements than I do.

That’s why I don’t need someone trying to sell me git (or anything else, for that matter), and persisting that I “at least try it”. I tried git, I threw git away, case closed – your assumption that anybody who doesn’t use git a) never tried, or b) didn’t understand it, is annoying at best.
And somehow, it’s mostly git zealots who exhibit this behaviour.

LikeLike
FelipeC

August 30, 2008 at 19:17

So you have all the elements but still don’t understand? As long as you don’t rebase or remove branches, git gc will not remove any objects. So you are in control of when and how objects are removed, if any.

Git tells you whatever you want to know. It can tell you what changed between revisions. In fact it’s much more efficient than mtn at finding changes between two revisions that are far away from each other; it happens instantaneously.

Detecting the rename of a->d and c->a is no problem in git, I just tried, it just works. Why wouldn’t it?

git status # renamed: a.c -> d.c # renamed: c.c -> a.c

Remember that git doesn’t track files per-se, it tracks blobs. So it tracks the content moving from one place to another, a “rename”, is just a representation of content moving from one place to another to help the human observer see what happened.

Regarding quilt, you don’t ever wish you had made certain fix at certain point? Maybe you made a typo, added an extra parenthesis. Do you really want to keep that history even if no one is going to see it but you? Or you are going to send the whole branch as a series of patches.

Regarding that directory rename, and then merge a branch with an added file. Well, you are right, git is not going to handle that properly. It’s not a big deal, that doesn’t happen often.

Some people require “interesting” rename scenarios, that’s why the rename detection keeps improving, and you get the benefits for the whole repository.

Well, you are the first one I discuss with who do seem to understand git, but I don’t think have given it a fair chance.

If your only objection is the rename handling, try to think about these more complicated rename scenarios:

Convert from a SCM that renaming issues (no renames, or bad ones) (tarballs?)
Move a file, change the contents, copy it to another file, make more changes
Forgot to do ‘git mv/cp’ (did it directly).
Manually remove 99% of the contents of a file into another file

In all of these cases you don’t need to care about how git does the renaming/copy stuff, you just do what seems sensible at the time, commit, and keep working. Nobody has to deal with the laziness of somebody else while handling renames, etc.

It just works.

Maybe you are right, maybe you did a) did try it, b) did understand it. So this discussion is fruitless, because there’s c) is unable to grasp common sense.

LikeLike
Patrick Georgi

August 31, 2008 at 08:36

“So you have all the elements but still don’t understand? As long as you don’t rebase or remove branches, git gc will not remove any objects. So you are in control of when and how objects are removed, if any.”

With these constraints, what’s the advantage of git over monotone? You get just the same “nasty”, unadultered history, except for the minor bit that git doesn’t store enough data for the faithful representation (so it has to rely on heuristics that are “improved all the time” and are expected to change over time – so your merge today, and your merge in a year might end up with different results, because the heuristics came to different conclusions as to how the trees came into existance)

your example with “git status” – does git still know about that, once you committed it? how, when it only stores snapshots (and infers file movements using those heuristics)?

If I make a typo, I commit it to the branch (at the top), when I merge the branch, all the revisions in-between are sent with it (why should I send series of patches?). I’m not ashamed of my typos. If someone wants the big picture, it’s as easy as “mtn diff -r oldrev -r newrev”, as with any other SCM tool.

Consider this: I committed and pushed the typo-version (because the typo only affects unusual corner cases). Someone else files a bug. I fix the bug, then commit – now I have the “real history” and can’t clean up (because the code was already pushed, and the “bad” history is out there, and probably already being worked on by others). So with rebase, we have only partially cleaned up history, so why bother at all?

As for “uncommon scenarios”, how about you let me decide what’s a common scenario in my work flow, okay? (eg. java has quite a stringent mapping between file content and file name, while the code is largely boilerplate at times)

Git might be good for you or the git devs (and I think I stated that explicitely before), and it might not be good for me at the same time (without having to resort to theories that I’m not in right in my mind, kthx?)
It’s just as well not a one-size-fits-all solution as any other thing in this universe.

LikeLike
FelipeC

August 31, 2008 at 11:56

The merge of tomorrow is exactly the same as the merge of yesterday, the only thing that changes is how you see it.

When you do a backup of an old spreadsheet do you also store how the application was rendering it? No, you should only be concerned about the raw data. Do you store the CSS style in the database of a website? No, you should only be concerned about the raw data.

You are not storing a ‘completely exact copy’ of what you see, you store what makes sense to store; data.

You can always specify if you want rename detection or not, and now perhaps the rename strategy.

The problem with committing typos is that you might leave the repository in a non-working state. Once you do something like git bisect you want all the points in time to be working, otherwise chasing a bug thought the history is more difficult because of these unuseful commits.

“So with rebase, we have only partially cleaned up history, so why bother at all?”

Why bother cleaning up your mess for the benefit of others? Well, I guess at some point in time this will be part of common SCM etiquette. It just makes sense.

Point taken regarding the different rename scenarios in Java. But you have only shown one scenario where git is not behaving optimally, and it’s not a big deal. In the rest of the scenarios there are no issues.

Sure, it might take time to make git a one-size-fits-all solution, it’s probably just 90% there. I agree that depending on your needs bzr or hg might make more sense. But mtn? That’s something I just don’t see.

LikeLike
Patrick Georgi

August 31, 2008 at 18:35

bzr and hg have the disadvantage that they fail at the same places where git fails, namely their reliance on snapshots.
monotone was that way, too – until the authors got sick of adding more heuristics to figure out what happened between two states that they just made it explicit, which I consider one of the best decisions in the DSCM space so far.

As for “The merge of tomorrow is exactly the same as the merge of yesterday, the only thing that changes is how you see it.”, not quite: if the heuristic that decides how files moved in the tree was improved in the mean time, a merge might come to different conclusions to what happened on both sides.

That’s a side-effect of that “feature” you described some comments before: “Some people require “interesting” rename scenarios, that’s why the rename detection keeps improving, and you get the benefits for the whole repository.”
If the detection “improves”, it has to change in the results (otherwise there’s no improvement). But if the result changes, you get different results depending on when you rely on that output.

LikeLike
Stefan Reinauer

August 31, 2008 at 18:47

This is awesome. I was just about to move a major project of mine over to git. It’s a bit stiff, but hey, a lot of overly committed advocates almost convinced me. Now I found this page seeing where this non-problem-oriented university course git-militarism will eventually lead.

Thank you FelipeC, you finally convinced me I will never ever move any of my projects to git.

LikeLike
FelipeC

August 31, 2008 at 18:57

“if the heuristic that decides how files moved in the tree was improved in the mean time, a merge might come to different conclusions to what happened on both sides.”

In a merge you have parent commits, and a resulting commit. That’s it. If rename heuristic change, you see the merge differently, but the resulting commit is, and will always be exactly the same.

If the detection “improves”, it has to change in the results (otherwise there’s no improvement). But if the result changes, you get different results depending on when you rely on that output.

The result that changes is how you see it, there’s no physical impact on previous commits.

LikeLike
FelipeC

August 31, 2008 at 19:15

This is awesome. I was just about to move a major project of mine over to git. It’s a bit stiff, but hey, a lot of overly committed advocates almost convinced me. Now I found this page seeing where this non-problem-oriented university course git-militarism will eventually lead.

Where will it lead? Too much emphasis on the issues you might have?

Thank you FelipeC, you finally convinced me I will never ever move any of my projects to git.

So you are basing your decision on the fact that one guy in his blog is explaining why some supposed git deficiencies are really not, in a tone that you consider militant?

I fail to see how your project might be affected by what I’m trying to explain in this blog. I doubt the things I say in my blog have any effect in the git community at all, and even if it did, even if I was zealot that sees git as perfect and is not willing to discuss any improvements (which I’m not), how can you generalize that to the whole git community?

Seriously, if there’s anything wrong in git, it will be improved. I’m trying to explain why the renaming issue is not a problem because many git haters say it’s wrong, and that issue along is reason enough to don’t use git.

LikeLike
Patrick Georgi

September 6, 2008 at 12:37

“So with rebase, we have only partially cleaned up history, so why bother at all?”

Why bother cleaning up your mess for the benefit of others? Well, I guess at some point in time this will be part of common SCM etiquette. It just makes sense.

—
Another issue with that is that you break the “other favorite feature” of git, bisecting.
Consider the following:
There’s a repository, with revs 1, 2, 3.
You locally create revs 4, 5, 6. In the meantime, rev 7 happens on the repository.

Now you want to push your revs, so to “clean up”, you rebase.
You now have:
1 – 2 – 3 – 7 – 4′ – 5′ – 6′

6′ is the merge of 6 and 7, and you will likely test that merge. 4′ and 5′ are 4 and 5 applied on top of 7. You will probably not test them completely.

Later on, someone wants to bisect a part of the repository for a bug, including your rebased area. 4′ and 5′ are merged without conflicts – but they might fail semantically, and thus might fail the bisecting test.

If you had committed diverged history (ie. what really happened), you’d have two lines of development: 1-2-3-7-8 and 1-2-3-4-5-6-8, where 8 is the merge of 6 and 7 (properly tested)

All of them are validated by their author in exactly that form.
As for my unclean commits, if I were to commit broken trees to a monotone repository where such testing occurs commonly, I’d simply add a certificate that states “broken” – the bisect tool would have to know about it, but then could ignore that.

Of course, rebased revisions could be specially tagged, too, but in the scenario above, two revisions would be marked as probably-broken which should be perfectly fine (at least they are in monotone)

So “cleaning up the repository” is actually harmful, if you’re actually intending to use the history later-on – simply because the history is _forged_

LikeLike
FelipeC

September 6, 2008 at 13:15

Again, you are describing an unlikely scenario. How many times have you found that rebasing commits like 4 and 5 systematically break. My bet is never, because you don’t rebase.

I will tell you how many times I’ve found that; never.

If you use rebase the way you are supposed to, that is, for your own personal non-pushed branches, then you should make sure that each commit that you apply doesn’t have those ‘semantical’ issues. You can see that visibly when you apply them, but also, you will have to test at least the result of your rebase, right? At that point you would notice those ‘semantical’ issues, and you can fix them in the right commits (4′ and 5′), and rebase again.

You a describing a theoretical situation that is very highly unlikely. While it is not desirable, the point is; are you willing to give up something that’s hugely beneficial (a clean history) vs a theoretical highly unlikely scenario?

In any case, you don’t want to rebase if you don’t want to and still use git. But more importantly, rebasing is unavoidable when you do patch reviews.

BTW, I’ve never seen those “broken” certificates, is that also theoretical or are those real?

LikeLike
Patrick Georgi

September 7, 2008 at 10:15

You simply apply your opinion (“clean history is beneficial”) as fact and go from there. That’s no base for discussion.

I prefer a true account of “what really happened, thank you very much”. Git won’t give me that, so even if I don’t use git, I couldn’t trust anybody to exercise the care that rebase seems to require (I’d assume that most people only try the HEAD of a rebase, just as I will only try the result of a merge – just that in the case of a merge, everything before was already tested).

My point is: You assume too much. That people take care when modifying history (my experience is, they generally won’t). That people don’t care for renames (except for “corner cases” such as java – hint: look at the stats, how popular java – or c#, which has the same issue – is).
For you (let’s say “revisionist”; not coding in java often), git is probably the right choice (You’d use something different, otherwise). For me (let’s call me “historian”; doing java and all kind of other crazy stuff), git is not the right choice, even if _I_ don’t use rebase or any of the other “cute” features of git (otherwise I’d probably use it).

So, use git. But please, pretty please, don’t try to coerce anybody into your git cult. That’s obnoxious, rude, and pretentious – stuff like “friends don’t let friends use crappy SCMs” only shows how blind you are to the concerns of others. Make your case for git, sure. But insulting people (“unable to grasp common sense”) neither helps the people you call “friends”, nor does it promote git in any usable way.

As for you using the “network effect” cop-out (The “Don’t use an obscure DSCM” section): For many projects, I simply don’t care whether people contribute. If they do, I’m happy. If they don’t, I’m happy, too – and then, I don’t care if “my tool” is popular, or not. Otherwise, I’d have had to convert my scripting stuff from perl, to python, to ruby, just to follow the popularity trail. Gee, thanks.
My use of monotone predates git, so it’s really the same thing.

LikeLike
FelipeC

September 7, 2008 at 11:11

I didn’t say that people test each revision they rebase, I said they test the HEAD. When you test the HEAD you see the ‘semantic’ issues and you fix them in the previous commits (4′ and 5′). Maybe it’s hard for you to understand because you’ve never tried git rebase –interactive.

So far you’ve proven one very highly unlikely scenario where git renames would do the wrong thing for you, and now you are saying that even if you don’t use rebase git is not the right choice? Why? Because of that directory rename thing that almost never happens?

If you prohibit rebasing in your project, which you can’t really do in collaborative development, what else is there that makes git not be the right choice?

By “friends don’t let friends use crappy SCMs” I meant Grim, not you.

Maybe I have insulted you, but I don’t see that you have grasped common sense:

a) How can you avoid rebasing when you do patch reviewing? You’ve not answered that question, hence I assume my position still holds true: rebasing is unavoidable.

b) If you prohibit rebasing in your project, what else is there that makes git the wrong choice.

c) Is a so highly unlikely scenario where there’s a merge of a renamed directory enough avoid git completely. If so, have you tried to bring that issue to the git mailing list? I’m sure if it was a valid use case a fix would get implemented.

Yeah, I’m condescending to you because you’ve clearly not explained a valid reason why git simply doesn’t work for you, and could not work for you. Other people that avoid git might have valid reasons, I accept that possibility, but from this discussion I see it very unlikely that you have one. That’s my opinion.

Heh, if you don’t care if people contribute to your projects that means that you don’t care about your projects yourself, because admit it, at some point in time you will stop maintaining them and they will die.

LikeLike
FelipeC

September 26, 2008 at 21:30

Patrick: surprise oh surprise!

Git is on the way to detect directory renames:
http://marc.info/?l=git&m=122238477415620&w=2

So now there’s no excuse.

LikeLike
Patrick Georgi

October 11, 2008 at 13:03

It’s still a philosophical difference:
Git tells you, what history looks like (“with 99.99% certainty, someone renamed a directory there”). I want my tool to tell me what really happened (incl. all the nasty bits).

That they add a new heuristic that changes historical perception is just another example of that difference.

It’s all about choice, isn’t it? My “excuse” is my right to choose. So is your, and everyone else’s.

LikeLike
FelipeC

October 11, 2008 at 14:45

Patrick: That’s a fallacy.

Your tool can’t tell you what really happened. Will your tool tell you what happened if you did ‘mv’ instead of ‘tool mv’? Will your tool tell you that you did ‘cat foo > bar’ instead of ‘tool cp’? Will it tell you that you copied 70% of the code to a new file?

Would you want your tool to tell you: you moved this function 100 lines below?

If you want your tool to store all the nasty bits, then you imply that you want your tool to handle function renames, perhaps refactorings too.

It’s useful for the tools to tell you as much as possible, but that kind of information should be separate, data != presentation.

That way you can have more efficient algorithms that tell you what really happened without updating the data storage all the time.

LikeLike

Felipe Contreras

Pidgin; how not to choose the right SCM

Don’t use an obscure DSCM

Don’t choose a DSCM that imposes itself

Keep an open mind, but a closed mouth

32 thoughts on “Pidgin; how not to choose the right SCM”

Leave a comment Cancel reply

Don’t use an obscure DSCM

Don’t choose a DSCM that imposes itself

Keep an open mind, but a closed mouth

Share this:

Related

32 thoughts on “Pidgin; how not to choose the right SCM”

Leave a comment Cancel reply