How not to ban a prolific git developer

On the 28th of July 2021 I received an email from Git’s Project Leadership Committee saying that my recent behavior on the mailing list was found to be in violation of the Code of Conduct. This was a complete surprise to me.

What was particularly surprising to me is the fact that I’m very familiar with the Code of Conduct, not only have I reported CoC violations in the past, but I’m the only person who has sent a patch to try to improve it, not only to the git mailing list, but the upstream project Contributor Covenant as well. It is because of this that I know the document doesn’t demand the examples of positive behavior in the section titled “Our Standards”, so the only objective reason why a project leader could argue that I violated the CoC would be if I had committed an action on the list of examples of unacceptable behavior:

  • The use of sexualized language or imagery, and sexual attention or advances of any kind
  • Trolling, insulting or derogatory comments, and personal or political attacks
  • Public or private harassment
  • Publishing others’ private information, such as a physical or email address, without their explicit permission
  • Other conduct which could reasonably be considered inappropriate in a professional setting

Did I do any of those things? You will be the judge.

If you don’t like drama then don’t read this post. Although there’s some technical discussion, it’s mostly an analysis of my alleged CoC violations and their context. According to the Project Leadership Committee the following ten emails–which I will list as exhibits–violated the CoC.

Exhibit 1

This is a common debate tactic known as “shifting the burden of proof”.

Ævar does not need to prove that your patch is undesirable, you have to prove that it is desirable.

You have the burden of proof, so you should answer the question.

https://www.logicallyfallacious.com/logicalfallacies/Shifting-of-the-Burden-of-Proof

Felipe Contreras

In this mail I am telling Johannes Schindelin my opinion about what is the default position regarding any patch: the patch is not needed. This is not even contentious. The first thing Junio C Hamano (the maintainer) does is ask: why is this patch needed? If this question is not answered to his satisfaction the patch is rejected. And I’m not the only one that has argued precisely this point:

I don’t need that data. You are proposing a change so it is your duty to support your claim that the change is worthwhile.
Otherwise it’s a change just for the sake of change.

Michal Suchánek

How is my comment remotely close to any of the examples of unacceptable behavior? For that matter, isn’t Michal’s behavior worse? Not to mention that it’s a response to Johannes’s comment, which is objectively way more aggressive:

> Why put this in an ifdef?

Why not? What benefit does this question bring to improving this patch series?

Johannes Schindelin

Johanness is avoiding a well-intentioned question by doubting the good faith of Ævar, and it became even more clear later on in the thread:

You still misunderstand. This is not about any “opinion” of yours, it is about your delay tactics to make it deliberately difficult to finish this patch series, by raising the bar beyond what is reasonable for a single patch series.

And you keep doing it. I would appreciate if you just stopped with all those tangents and long and many replies that do not seem designed to help the patch series stabilize, but do the opposite.

Johannes Schindelin

How am I the bad guy here? And for the record, Ævar is part of the Project Leadership Committee.

Exhibit 2

This is loaded language. You are inserting your opinion into the text.

Don’t. The guidelines are not a place to win arguments.

Note that this sounds ungrammatical and unnatural to some people.

And it sounds ungrammatical because it is ungramatical, not only to native English speakers, but professional linguists.

Felipe Contreras

This is part of a long discussion in which Derrick Stolee attempted to change the guidelines to explicitly avoid any and all gendered language (he/she) for ungendered (they).

One argument that Derrick kept repeating is that only non-native English speakers find the singlar “they” ungrammatical, but I kept repeating that’s not true, and as example I used the test the American Heritage Dictionary used in their usage note regarding “they”:

We thank the anonymous reviewer for their helpful comments.

58% of the panel found the sentence unacceptable. The Usage Panel is comprised of writers, professors, linguists, editors, and multiple Pulitzer Prize winners.

Derrick ignored all my feedback, and for the record he also ignored all the feedback from Ævar, and in fact native speakers in the mailing list stated that they found these sentences ungrammatical too.

Regardless of on which side you are on, Derrick tried to win the argument by adding this to the guidelines:

Note that this sounds ungrammatical and unnatural to readers who learned English in a way that dictated “they” as always plural, especially those who learned English as a second language.

All I said is that he shouldn’t insert his opinion into the guideline, and instead should use something neutral he knows everyone can agree with:

Note that this sounds ungrammatical and unnatural to some people.

Once again, how is stating my opinion “unacceptable”?

Exhibit 3

Yeah, now you are starting to see the problem.

How many more failed attempts do you need to go through before accepting that the approach you thought was feasible is in fact not feasible?

The solution is simple and self-documenting:

pull.mode={fast-forward,merge,rebase}

Felipe Contreras

In my 8,200-word article git update: the odyssey for a sensible git pull I explored around 13 years of discussions in the mailing list regarding problems with git pull, and the first and more obvious solution is my proposed pull.mode configuration which was a direct competitor to different solutions proposed by Junio C Hamano.

For some reason Junio didn’t accept this proposal in 2013 even though other people were in favor of it. I tried again in 2020 multiple times, and after trying all the approaches other people proposed I became convinced it was the only feasible solution.

Elijah Newren–who we will later see is an important actor in this story–decided to try to fix the problem by himself, but every patch series he sent struck a dead end. It became clear to me that Elijah started to see the problem:

However, even if the above table is filled out, it may be complicated enough that I’m at a bit of a loss about how to update the documentation to explain it short of including the table in the documentation.

Elijah Newren

Yes, that’s precisely the reason why I opted for a solution that was a) easy to understand, b) easy to document, c) easy to test, and d) easy to program: pull.mode.

Elijah ignored all my feedback to all his patch series, which is why he still hasn’t realized his patches will break behavior current users rely on. I ran a poll on r/git and 19% of users responded that they rely on the behavior Elijah plans to break, and 15% said even though they don’t use it, they know what it should do.

All Elijah is achieving by ignoring me is at best wasting time, and at worst hurting users. Either way it’s not my fault.

How is my comment unacceptable? I’m just stating facts and my opinion. Even if my opinion is wrong that doesn’t make my comment “unacceptable”.

Exhibit 4

That is a problem specific for your shop.

The defaults are meant for the majority of users. If a minority of users (who happen to be working under the same umbrella) have a problem with the defaults, they can change the defaults.

Felipe Contreras

I understand why some people might think this is an attack on Randall S. Becker, but it’s really not.

Randall is a contributor that often sends reports when the latest git version breaks something in his platform: NonStop OS. I think it’s fair to say this is a relatively obscure OS. Even a smaller subset are the people that work with git inside his company.

Now, of course git should consider the NonStop platform, and of course it should consider all the users inside Randall’s company, but they are probably not even 0.01% of all git users, so why should 99.99% of users suffer at their expense?

No. The defaults are for 99.99% of users, if people in Randall’s company have a problem with the defaults and want to avoid git rebase like the plague, they can configure git anyway they want. That’s what the configurations are for: for the minority.

My opinion is that the defaults are for the majority.

If anyone has a problem with my opinion, we can debate it, but how is stating my opinion “unacceptable behavior”?

Exhibit 5

I’m sending this stub series because 1. it’s still in ‘seen’ [1], 2. my conflicting version is not in ‘seen’ [2], and 3. brian has not responded to my constructive criticism of his version [3].

Felipe Contreras

In the cover letter of my patch series: doc: asciidoctor: direct man page creation and fixes (brian’s version), I explained the reasons why I was sending it, but mainly it’s because Junio continued to refuse to drop brian’s version and include mine, even though brian himself told Junio to drop his version.

Now, if you look at my comments on the patches you could say I trashed them, but this is not my fault: brian stated plenty of things are just not true. Just on the first patch:

  • We generally require Asciidoctor 1.5, but versions before 1.5.3 didn’t contain proper handling of the apostrophe… Not true
  • [GNU_ROFF] for the DocBook toolchain, as well as newer versions of Asciidoctor, makes groff output an ASCII apostrophe instead of a Unicode apostrophe in text… Not true
  • These newer versions of Asciidoctor (1.5.3 and above) detect groff and do the right thing in all cases… Not true
  • Because Asciidoctor versions before 2.0 had a few problems with man page output… Not true

The changes were correct, but I was the one that originally wrote the patch, brian merely changed the commit message and took authorship because I objected to his text.

I understand that some people have trouble hearing that they are wrong, but this is not my problem. If you send a patch to the mailing list you should be prepared to hear all the ways in which it’s wrong. And brian m. carlson is not some rookie, he is the top #5 contributor to the Git project in the past five years, he shouldn’t need training wheels.

I do not blame brian for all these inaccuracies tough, there’s way too many details in the documentation toolchain, and the only reason why I know he was wrong is that I spent several man-days investigating these details. I explained part of the story in my post: Adventures with man color.

Moreover, if you think I’m lacking tact, remember that this is the second time I’m bringing these issues, the first time brian argued back all my suggestions and did not implement a single one. Eventually he just stopped responding to me.

Now, regarding integration: the “seen” branch is an integration branch which is maintained by Junio and contains all the topic branches that are currently being discussed and Junio is considering merging. Junio picked brian’s branch even though it was full of inaccuracies and in my opinion the commits were not properly split (each commit was doing four to five different things at once).

On the other hand my commit message did not contain inaccuracies, I wrote the original patch, my patches were properly split, contained patches from other people–including Jeff King and Martin Ågren–and in addition contained plenty of cleanups and more fixes to the output of the documentation.

If that was not enough, just the commit message of first patch (doc: remove GNU troff workaround) took me several hours of investigation to write.

It’s not my fault that brian decided to stop arguing any further, it’s not my fault that Junio decided to carry brian’s version for two months and ignore my version which was objectively superior. Those are the facts, and I was merely stating them.

Suppose that I’m wrong, let’s suppose that brian’s version is superior, in that case my statement of fact is incorrect. OK, but how is that “unacceptable behavior”?

Exhibit 6

I meant that I meant what he said I meant.

Felipe Contreras

I understand how this clarification I sent to Junio might look like to some people, but it’s simply a convoluted way of saying “what he said”. SZEDER Gábor said “I think you meant X”, I replied “yes, I meant Y, which in practical terms means what you said (X)”, and Junio (who has a habit of rewriting what I write) asked if my previous statement said “yes, yours is better and I’ll use it in an update, thanks”, but I don’t agree with Junio’s restatement.

I meant Y, which in practical terms means what SZEDER said I meant (X). So I meant either of these:

  • Y: Otherwise commands like ‘for-each-ref’ are not completed correctly by __gitcomp_builtin.
  • X: Otherwise options of commands like ‘for-each-ref’ are not completed.

Which one of these is really “better”? I don’t know, and to be honest I don’t care. I’ve been sending this patch for nine months now and in truth the original “Otherwise commands like ‘for-each-ref’ are not completed” is good enough for me.

The important thing is the fix, and the code continues to be broken to this day.

Additionally, it’s not uncommon for Junio to update the commit messages himself, in fact he did so for my last patch (doc: pull: fix rebase=false documentation), even though I sent an updated commit message he ignored my suggested modification and used his own. So why can’t he simply do the same here with a simple “s/commands/options of commands/” as I suggested?

To me this simply looks like an excuse not to merge the series, which I’ve sent eleven times already.

To avoid problems I re-sent the patch series nine minutes after Junio sent his message, and I used SZEDER’s suggestion. That was on June 8, and to this day Junio keeps repeating this on his status mails:

* fc/completion-updates (2021-06-07) 4 commits
 - completion: bash: add correct suffix in variables
 - completion: bash: fix for multiple dash commands
 - completion: bash: fix for suboptions with value
 - completion: bash: fix prefix detection in branch.*

 Command line completion updates.

 Expecting a reroll.
 cf. <60be6f7fa4435_db80d208f2@natae.notmuch>

I’ve already replied multiple times that I did the reroll immediately after, and after that I’ve re-sent the series four times since then.

Now, I understand if Junio took my reply in a way that I did not intend, but why should git users suffer? The problems these patches fix are real. SZEDER Gábor has already reviewed part of the series, and David Aguilar has tested it and he confirmed the problems exist, and the fixes work.

I have plenty more fixes on top of these (41), the reason why I kept this series small was to maximize the possibility of them getting merged, and now it turns out the reason Junio hasn’t merged them is because he didn’t like one comment I said?

To me this seems like pettiness. We are adults, if he found one of my replies objectionable, he could have simply stated so on the mailing list, or he could have sent me a personal reply (he has never done so). He could have said “I don’t like your tone, so I’m going to drop this series”, but instead he made me waste my time resending patches he was never going to merge. He kept his disapproval for himself, and only used it for ammunition to justify a future ban.

The slowness from Junio to accept these and other patches is why I chose to start the git-completion project, which is a fork of the bash and zsh completion stuff. For the record I was the one that started the zsh completion, and I started the bash completion tests in order for the completion stuff to be first-class citizens of the git project, which Junio has refused to accept as well.

Choosing to knowingly hurt git users because he didn’t find one comment palatable does not seem to me to be a behavior fit for a project leader.

I foresee that some people will conclude that I’m being petty too, but I don’t think that’s the case. I found the problems, I wrote the patches, I sent the patches, I addressed the feedback, and I updated the patches with the feedback. I’ve been trying to get them merged for nine months. What more do you want? If Junio has any further problems with the patches, he can just let me know and I’ll address them. But instead he says nothing.

To exemplify even more how Junio’s pettiness is hurting users, Harrison McCullough reported a regression with the __git_complete helper (which I wrote) on June 16, and it was caused by a change from Denton Liu which introduced a variable __git_cmd_idx, but he forgot to initialize it in __git_complete. Initially I fixed the problem by initializing __git_cmd_idx to 1, and Harrison reported that my fix indeed got rid of the issue. Two days later Fabian Wermelinger reported the same issue but sent a patch that initialized __git_cmd_idx to 0. Initially I thought he made a mistake, but upon further reflection I realized that 0 was more correct, so I updated my patch.

My patch is superior because it fixes the regression not only for bash, but for zsh too. In addition it mentions Harrison McCullough reported the issue, and it’s also simpler and more maintainable too. Junio picked Fabian’s patch and ignored my patch, and my feedback, therefore the regression is still present for zsh users. This is a fact.

Ignoring me is objectively hurting users. I’ve resent my fix on top of Fabian’s patch, so all Junio has to do to fix the regression is pick it.

How does the fact that Junio is primed against me make my convoluted statement “unacceptable behavior”?

Exhibit 7

That makes me think we might want a converter that translates (local)main -> (remote)master, and (remote)master -> (local)mail everywhere, so if your eyes have trouble seeing one, you can configure git to simply see the other… Without bothering the rest of the word.

Felipe Contreras

Once again I can see how people might misinterpret what I said here, but it’s nothing nefarious.

The rename of the “master” branch has been one of the most hotly debated topics of late (this has nothing to do with me as I didn’t even participate in the discussion). Even today it’s not entirely clear what’s the future of this proposal. If you are not familiar with the debate, you can read my blog post: Why renaming Git’s master branch is a terrible idea.

But what I attempted to do was achieve what the original poster–Antoine Beaupré–wanted to achieve but in another way. I listened to Antoine, and I proposed a different solution, that’s all.

What Antoine wanted (as I understand it) was to change all his “master” branches to “main” so that he didn’t have to see “master” everywhere. But he wanted to do this properly, so he wrote a python script to address as many renaming issues as he could.

I see some value in what Antoine wanted to do, but he literally said: “I am tired of seeing the name “master” everywhere“, if that was literally the problem, then a mapping of branch names would fix it. And this is not even that foreign to me.

When I wrote git-remote-hg and git-remote-bzr one of the main features people wanted was a way to map branch names. So for example a branch named “default” in Mercurial could be mapped to “master” in Git (this was done by default, but you get the point).

Even if this didn’t help Antoine, I thought this would help other people, say for example that some Linux developers couldn’t manage to convince Linus Torvalds to rename the master branch to “main”, but for some reason they found the name “master” offensive. Well, with my patch they didn’t have to convince Linus, they could simply configure a branch mapping so they “never had to see the name “master”“.

After listening to my reply Antoine did a more adult approach than Junio, and actually replied his discontent to my comment:

I guess that I’ll take that as a “no, not welcome here” and move on…

Antoine Beaupré

This saddened me. It was never my intention to shit on Antoine’s idea, and in fact I never intended to act a representative of the Git community, so I explained very clearly to Antoine that he shouldn’t just give up:

Do not take my response as representative of the views community.

I do believe there’s value in your patch, I’m just not personally interested in exploring it. I don’t see much value in renaming branches, especially on a distributed SCM where the names are replicated in dozens, potentially thousands of repositories. But that’s just me.

Felipe Contreras

However, brian m. carlson either didn’t read my response, or chose to not factor it in, because he replied:

There is a difference between being firm and steadfast, such as when responding to someone who repeatedly advocates an inadvisable technical approach, and being rude and sarcastic, especially to someone who is genuinely trying to improve things, and I think this crosses the line.

brian m. carlson

I wasn’t trying to be “rude and sarcastic”, I was simply trying to suggest a different approach. It did’t even need to be a competing approach, because both approaches could be implemented at the same time. I explained that to brian, did he respond back? No.

Now, even if you remove me from the picture nobody else responded to Antoine, so how exactly did my responses hinder in any way the community?

If you think my response was “rude and sarcastic” I would love to debate that, but the fact of the matter is that only I know what my intentions were, and they were definitely not that.

Moreover, I don’t believe in being offended by proxy. If Antoine Beaupré had a problem with my comment, I would listen to his objection, and even though I think I have already clarified what I meant, I would even consider apologizing to him. But I will not apologize to brian m. carlson who I’m pretty sure got offended by proxy.

Also, for the record, reductio ad absurdum arguments are not uncommon in the Git mailing list, and they don’t necessarily imply anything nefarious. Here’s one recent example from Junio:

I am somewhat puzzled. What does “can imagine” exactly mean and justify this change? A script author may imagine “git cat-file” can be expected to meow, but the command actually does not meow and end up disappointing the author, but that wouldn’t justify a rename of “cat-file” to something else.

Junio C Hamano

Can Junio’s response be considered “rude and sarcastic”? Yes. But you can also assume good faith and presume he didn’t intend to offend anyone and simply tried to prove a point using a ridiculous example.

Can my comment be considered “unacceptable behavior”? In this case I’d say yes, but not necessarily so. You can also give me the benefit of the doubt and simply not assume bad faith (and I can tell you no bad faith was intended).

Elijah Newren

So far the incidents have been pretty sporadic and could easily be reduced to misunderstandings, but the following are not. Elijah Newren decided to wage a personal vendetta against me (literally), and that context is necessary to understand the rest of the evidence.

Spring cleanup challenge

It all started when I sent my git spring cleanup challenge in which I invited git developers to get rid of their carefully crafted git configuration for a month. The idea was to force ourselves to experience git as a newcomer does, and figure out which are the most important configurations we rely on.

This challenge was a success and very quickly we figured out at least two configurations virtually every experienced git developer enables: merge.conflictstyle=diff3 and rerere.enabled=true. Additionally my contention is that git should also have default aliases (e.g br => branch), but there was no clear consensus on that.

Take for example the configuration merge.defaulttoupstream. Back in 2010 while exploring issues with git pull I proposed that git merge should by default merge the upstream branch. I later sent the patch in 2011 which received pretty universal support, and there were interesting comments, like:

I totally agree — this would be a good change[*] — but this suggestion has been made before, and always gets shot down for vaguely silly reasons…

Miles Bader

Just a few days later Jared Hance sent his own version of the patch by introducing merge.defaultupstream to activate the new behavior. Jared sent in total five versions and implemented all the suggestions from everyone, including Junio, but in the end Junio decided to rewrite the whole thing, take authorship, not mention that Jared wrote the original version of the patch (see the final commit), nor the fact that originally the idea came from me.

I personally do not care too much. I had an idea and the idea got implemented, that’s mostly all I cared about. However, it wouldn’t have costed anything to Junio to add “Original-idea-by: Felipe Contreras” as it’s customary (here’s an example).

Then in 2014 I realized that before git 2.0 (which was meant to break backwards compatibility) it was the perfect time to enable merge.defaulttoupstream by default (literally no one disabled it anyway), so I sent a patch for that and other defaults. Junio disagreed and excluded these from 2.0, but in the end he ended up merging them.

So finally in git 2.1 git merge ended up merging the upstream branch by default.

What I find interesting is that in 2021 (7 years later), Ævar–a prominent git developer–didn’t even know that merge.defaulttoupstream changed it’s default value, so he still had merge.defaulttoupstream=true in his configuration. Thanks to my challenge he argued it should be true by default, and thus realized that was already the case. Additionally I noticed the default value was not mentioned in the documentation, so I sent a fix for that.

At that point my relationship with Elijah was amicable, and he decided to join the challenge in his words “to join in on the fun“.

zdiff3

The problems started when I took it upon myself to try to enable merge.conflictstyle=diff3 by default. This task turned out to be much more complicated than I initially thought. Flipping the switch is extremely easy, but once you do that a ton of tests that expect the diff2 format start to fail. No problem, I thought by simply changing merge.conflictstyle to the old value at the beginning of each test file that fails would solve it. Later on each test file can be updated to expect the diff3 format.

It turned out that didn’t work. Many commands completely ignored the configuration merge.conflictstyle, and those are clearly bugs. So even before attempting to change merge.conflictstyle we have to fix those bugs first.

This is what I attempted to do on my patch series: Make diff3 the default conflict style.

Immediately Johannes Sixt pointed out that such change resulted in a very convoluted output for him. This however I think is just bad luck, since never in all my years of using diff3 have I ever seen an output similar to that, and that’s because recursive merges are involved.

Jeff King provided a link to a discussion from 2013 regarding the output of diff3, and that discussion itself lead to other discussions from 2008. It took me a while to read all those discussions, but essentially it boils down to a disagreement between Junio C Hamano and Jeff King (both part of the leadership committee) about whether or not a level higher than XDL_MERGE_EAGER (1) made sense for diff3. Junio argued that the output wasn’t proper, but Jeff argued that even though it wasn’t proper, it was still useful. That lead to Uwe Kleine-König–a Linux kernel developer–to implement zdiff3, which is basically diff3 but without artificially capping the level as Junio did in order for the output to be proper.

When I said maybe we should consider adding this zdiff3 mode, Jeff King mentioned his experience with it:

I had that patch in my daily build for several years, and I would occasionally trigger it when seeing an ugly conflict. IIRC, it segfaulted on me a few times, but I never tracked down the bug. Just a caution in case anybody wants to resurrect it.

Jeff King

My interpretation of that message is extremely crucial. I am very precise with language, both when writing, and reading. So I read what Jeff said exactly how he said it. First, he said in “several years” (1+ years) he occasionally would try this. Then he said of of the times he tried it, it crashed a few times, but crucially he said “if I recall correctly”, so this means he is not sure. Maybe it crashed more than a few times, or maybe it crashed less than a few times, he is not sure.

Whatever the issue was, Jeff did not find it serious enough to track the bug, since he never tracked down the bug in the several years he was using the patch.

Jeff sent his message on a Thursday, on Friday Elijah Newren said he might investigate the zdiff3 stuff, and on Sunday I re-sent the 2013 patch from Uwe. I added a note to the patch stating why I was sending it, and what I did to test it. Essentially I ran the entire test suite using zdiff3 instead of diff3 and everything passed. This implies that if there’s any issue with zdiff3 it probably is not that serious.

I sent the patch at 9:30. One hour later Jeff King replied “I take it you didn’t investigate the segfault I mentioned”. I don’t know how I was supposed to investigate that, other than what I already did: run the whole test suite with zdiff3. Jeff had the idea to recreate all the merges of the git repository using zdiff3 and after 2500 merges you can find one that crashes. This is a good idea, but it never occurred to me to do that.

By 13:00 I replied with a command to replicate the issue very simply using a git merge-file command. By 16:24 I had found the issue and sent a fix.

The next Monday Elijah Newren started to complain:

This is going to sound harsh, but people shouldn’t waste (any more) time reviewing the patches in this thread or the “merge: cleanups and fix” series submitted elsewhere. They should all just be rejected.

Elijah Newren

Elijah provided a list of reasons, none of which were true–like “no attempt was made to test” (I did attempt to test, as I explained in the note of the patch). Additionally he stated that in his opinion my submissions were “egregiously cavalier”.

Egregiously cavalier? I spent several hours on a Sunday trying to find the fix for a patch one of the most prolific git developers didn’t bother to fix for several years, and I actually did it… the very same day.

This was the start of Elijah’s personal vendetta against me. He urged Junio not only to drop Uwe’s patch that I resent, but to drop all my patches:

If I were in charge, at this point I would drop all of Felipe’s patches on the floor (not just the ones from these threads), and not accept any further ones. I am not in charge, though, and you have more patience than me. But I will not be reviewing or responding to Felipe’s patches further. Please do not trust future submissions from him that have an “Acked-by” or “Reviewed-by” from me, including resubmissions of past patches.

Elijah Newren

If that wasn’t enough, Elijah accused me of fabricating his endorsement of my patches:

Troubled enough that I do not want my name used to endorse your changes, particularly when I already pointed out that you have used my name in a false endorsement of your patch and you have now responded to but not corrected that problem (including again just now in this thread), making it appear to me that this was no mistake.

Elijah Newren

This is blatantly false. Elijah very clearly said “Yes, very nice! Thanks.” to one of my patches, and that can be considered endorsement by many people. I do not take those kinds of accusations lightly, so that prompted me to start an investigation into how many “reviewed-by” commit trailers are explicitly given as opposed to inferred, and I found that 38% of them are not explicit.

Not only that, but I found one example a month earlier in which Derrick Stolee took a “Looks good to me” comment from Elijah and used it as “reviewed-by”. Did Elijah accuse Derrick of “false endorsement”? No.

I don’t know what happened to Elijah, nor why from one day to the next he decided to make me his enemy, but clearly he is not being objective. If he was he would be objecting to Derrick’s behavior as well, since he did exactly the same thing as I. Elijah’s responses are clearly emotional, as can be seen from this reply in which he cites nineteen references, some going back to 2014–a dubious debating technique called Gish gallop, but writing this response he didn’t even pause to see that the links matched what he was referring to. Moreover, the first thing he said: “attacking a random bystander like Alex is rather uncalled for”, was completely off-base because my reply wasn’t even addressed to Alex.

I have absolutely nothing against Elijah, but clearly he does have something against me, and the rest of the reports (and probably all the previous ones) are entirely the result of that antagonism.

Exhibit 8

If you didn’t mean this patch to be applied then perhaps add the RFC prefix.

Felipe Contreras

When Alex Henrie took me up on the challenge to try to fix git pull, he created a patch that broke 11 test files, each one with many unit tests broken. This is not a huge deal, some people send patches that break the test suite, but generally when they do that they add the “RFC” (request for comments) prefix to make sure this patch is not picked by Junio.

So, assuming good faith there’s two options a) Alex didn’t see the tests breaking, or b) he knew the tests were breaking, but didn’t know he had to add RFC in that case. I simply said if this was case b), RFC should have been added.

Only a person already primed to presume malice would have seen a problem with my comment.

Exhibit 9

Wouldn’t you consider sending a patch without running ‘make test’ “cavalier”?

Felipe Contreras

This response was directed to Elijah Newren, not Alex Henrie, and I wrote it because I was genuinely curious to see if Elijah was capable of seeing the obvious discrepancy between his response to Alex, and his response to me.

The patch I sent (which wasn’t even my patch):

  • Did not break any tests by itself
  • Did not break any tests by forcing diff3 to be zdiff3
  • The patch was not going to land on the integration branch “seen”
  • The patch doesn’t change anything, so everyone using diff3 wouldn’t see the crashes
  • If the patch was applied, and a person manually enables zdiff3, out of the 16,000 merges in git.git it crashes on 13 of them (0.08%)
  • Has already been tested by multiple people for many years
  • Was fixed hours later

So the patch was relatively safe by any standard.

On the other hand Alex’s patch:

  • Broke a ton of tests by default
  • Broke the existing user interface
  • Hasn’t been used by anyone
  • Was intended to be integrated

Elijah’s response to my patch is that it was “egregiously cavalier”, and his response to Alex’s patch was “thanks for working on this”.

If this is not double standards I don’t know what is.

I did not criticize Alex, I was asking a question to Elijah.

Exhibit 10

When Alex responded to my response, he thought it was directed to him. I explained to him that it wasn’t and I made sure he knew I didn’t blame him for anything:

I do appreciate all contributions, especially if they were done pro bono.

Felipe Contreras

I explained that Elijah’s personal attacks towards me are a different subject that he should probably not attempt to touch, because I didn’t think anything productive could come from that (it’s a matter between Elijah and me).

And just to be crystal clear, I thanked him again:

This has absolutely nothing to do with you, again… I appreciate you used some of your free time to try to improve git pull.

Felipe Contreras

In my opinion all the reports that have anything to do with Elijah Newren are primed and could only be judged “unacceptable” if you assume bad faith (and probably not even in that case).

Judgement

I have been a moderator on pretty big online communities, therefore I’m familiar on how justice is supposed to be dispensed outside of the justice system.

The single most important thing is the presumption of innocence. The accused should not be considered guilty until evidence against him has been presented and he has had a chance to defend himself. Even in something as silly as Twitch chat, the people that get banned have the opportunity to appeal those decisions.

The leadership committee not only did not allow me to present my case, they weren’t interested in the least, when I asked them about presenting my case their response was “this isn’t a trial”. I was presumed guilty and that’s that.

My punishment was to avoid all interaction with 17 people for three months. That included Junio C Hamano and the entire leadership committee. If I engage in any “unwanted interaction” with any of these people or if I say something similar to the alleged violations above, then I’ll get banned.

Ævar Arnfjörð Bjarmason offered to be my “point of contact”, that means he was the only person I was allowed to ask questions to. While initially Ævar did a good job of keeping an open dialog with me, eventually he stopped responding because he went on vacation.

Other than the initial email I only received one response from the leadership committee.

However Ævar did manage to respond to a few questions for this blog post (as himself):

  1. Do you believe there’s a possibility the PLC might have made a mistake?

Sure, but I think that’s more in the area of how we’d deal with clashes like this generally than that nothing should have been done in this case. Did we do the optimal thing? I don’t know.

  1. Do you believe the PLC could have done more to address this particular situation?

Yes, I think it’s the first case (at least that I’m aware of) under the CoC framework that’s gone this far.

That’s a learning process, one particular thing in this case I think could have gone better is more timely responses to everyone involved. That’s collectively on us (the PLC), harder when there’s little or no experience in dealing with these cases, everyone volunteering their time across different time zones etc.

  1. Do you believe the identify of the person(s) who reported the complaints had an effect on the final verdict?

I wouldn’t mind honestly answering this for my part at least (and not for other PLC members), but feel I can’t currently due to our promise that CoC reports and identities of reporters etc. not be made public unless on their request.

While I’m thankful for Ævar’s responses these did not answer the particulars of what I asked, especially the last one, which he could very well be answered without revealing the identity of the people who reported the “violations”.

At the end of the day I’m still fairly certain that the person who did the vast majority of the reports (if not all of them) was Elijah Newren, and it’s only because he is a big name (#7 contributor in the past 5 years) that those reports were taken seriously. Additionally he sent as many as he could as an eristic technique–Gish gallop–because he knew that way nobody in the leadership team would investigate any single one at depth.

Ultimately I do not blame Elijah: if he thinks my comments violated the code of conduct, he is entitled to believe so and report them. But I do blame the leadership team because they didn’t do their job. They didn’t investigate any of the reports carefully enough, and they did not do the minimum job any judge should do (inside or outside a real courtroom); hear the defendant.

Not only did they not do their job, but they didn’t even want to attempt to do it, and even more… it seems they don’t know what their minimum job is.

Conclusion

Linus Torvalds once said in a TED interview:

What I’m trying to say is we are different. I’m not a people person; it’s not something I’m particularly proud of, but it’s part of me. And one of the things I really like about open source is it really allows different people to work together. We don’t have to like each other — and sometimes we really don’t like each other. Really — I mean, there are very, very heated arguments. But you can, actually, you can find things that — you don’t even agree to disagree, it’s just that you’re interested in really different things.

Linus Torvalds

You may not like my style of communication, but “being abrasive” is not a crime. The question is not “can Felipe be nicer?”, the question is “did Felipe violate the code of conduct?”. I believe any objective observer who carefully investigates any of the reports would have to conclude that no violation actually took place. You would have to believe that I was trolling, insulted people, threw personal attacks, or harassed people; none of that actually took place.

I debate with people all the time, and this particular issue comes up very often, but it’s a fallacy called tone policing. Any time anybody focuses on the way somebody states an argument instead if what the argument itself is, that person is being unproductive.

Paul Graham’s hierarchy of disagreement

I often bring Paul Graham’s hierarchy of disagreement, where tone policing is the third lowest level of “argumentation”, right after ad hominem attacks.

Nobody benefits by my tone being policed, especially the Git community. My patches not only provide much needed improvements to the user interface, but fixes clear bugs that have already been verified and tested, not to mention regressions. Ignoring me only hurts git users.

And even if in your opinion the tone I used in the reports above is not acceptable, it’s worth nothing that these are merely the worst ten instances out of two thousand emails, that’s just 0.5%.

And let’s remember that unlike the vast majority of git developers: I’m doing this work entirely for free.

Moreover, if anyone is violating the code of conduct I would venture to say it’s other members of the community:

  • Being respectful of differing opinions, viewpoints, and experiences

It is very clear other members of the community do not respect my opinions. I have suggested that this particular point should be tolerance, not respect (see Cambridge University votes to safeguard free speech), but even if you lower the requirement from respect to tolerance, not even that is happening: other members do not tolerate my opinions.

Why am I being punished because other members (who happen to be big names) can’t tolerate my opinions?

The Git community claims to be inclusive, but as this incident shows that’s not truly the case since the most important diversity is not welcomed: diversity of thought.

How a proper git commit is made (by a git developer)

Throughout my career of 20 years as an open source software developer I’ve seen all kinds of projects with all kinds of standards, but none with higher standards as the Git project (although the Linux kernel comes pretty close).

Being able to land a commit on git.git is a badge of honor that not all developers would be able to get, since it entails a variety of rare skills, like for example the ability to receive criticism after criticism, and the persistence to try many times–sometimes reaching double-digit attempts, not to mention the skill necessary to implement the solution to a problem nobody else saw before–or managed to implement properly–in C, and sometimes the creativity to come up with alternative solutions that address all the criticism received previously.

The purpose of this post is not to say “this is why Git is better than your project” (the Git project has many issues), the purpose is to showcase parts of a fine-tuned development process so you as a developer might consider adopting some yourself.

Discussion

The first part of the process (if done correctly) is inevitably discussion. The git mailing list is open, anyone can comment on it, and you don’t even need to be subscribed, just send a mail to the address and you will be Cc’ed in all the replies.

This particular story starts with a mail by Mathias Kunter in which he asks why git push fails with certain branches, but only if you don’t specify them. So for example if you are in a branch called “topic”, this would fail:

git push

But not this:

git push origin topic

Why? His question is a valid one.

The first reply comes from me, and I explain to him how the code views the two commands, for which a basic understanding of refspecs is needed. Briefly, a refspec has the form of src:dst, so “test:topic” would take the “test” branch and push it to the remote as “topic”. In the first command above, the refspec would be the equivalent of “topic:” (no destination), while the second command would be “topic:topic”. In other words: git does’t assume what name you want as destination.

Did I know this from the top of my head? No. I had to look at the code to understand what it’s doing before synthesizing my understanding in as succinct as a reply as I possibly could write. I often don’t look at the official documentation because I find it very hard to understand (even for experts), and it’s often inaccurate, or ambiguous.

Notice that I simply answered his question of why the first command fails, and in addition I offered him a solution (with push.default=current git would assume the name of the destination to be the same as the source), but at no point did I express any value judgement as of what was my opinion of what git ought to actually do.

Mathias thanked me for my reply, and pushed back on the solution to use the “current” mode because he thought the “simple” mode (which is the default), should behave the same way in this particular case. For his argument he used the documentation about the simple mode:

When pushing to a remote that is different from the remote you normally pull from, work as current.

This is where the nuance starts to kick in. If the “topic” branch has an upstream branch configured (e.g. “origin/topic”), then git push would behave the same in both the “simple” and “current” modes, however, if no upstream branch is configured (which is usually the case), then it depends on the remote. According to Mathias, if he has no upstream configured, then there’s no “remote that is different from the remote you normally pull from” (the remote you normally pull from in this case is “origin”, because the upstream branch is “origin/topic”), so “simple” should work like “current”.

In my opinion he is 100% right.

Additionally, I have to say we need more users like Mathias, who even though he knows how to fix the issue for himself, he is arguing that this should be fixed for everyone.

Elijah Newren suggested that perhaps the documentation could be changed to explain that this only happens when “you have a remote that you normally pull from”, in other words: when you have configured an upstream branch. But this doesn’t make sense for two reasons. If you don’t have configured an upstream branch, then the other mode that would be used is “upstream”, but “upstream” fails if you don’t have configured an upstream branch, so it would always fail. Secondly, the reason why it doesn’t always fail is that the former is not possible: when you don’t have configured an upstream branch, “origin” is used, therefore you always have a “remote that you normally pull from”.

I explained to Elijah that the remote is never null, since by default it’s “origin”, and suggested a trivially small modification to the documentation to mention that (since even experts like him miss that), but he suggested an even bigger one:

If you have a default remote configured for the current branch and are pushing to a remote other than that one (or if you have no default remote configured and are pushing to a remote other than ‘origin’), then work as ‘current’.

Oh boy! I cannot even being to explain why I find this explanation so wrong on so many levels, but let me start by saying I find this completely unparseable. And this is the biggest problem the official git documentation has: it’s simply translating what the code is doing, but if the code is convoluted, then the documentation is convoluted as well. I shred this paragraph piece by piece and showed why changing the logic would make it much more understandable.

But at this point I got tired of reading the same spaghetti code over and over again. It’s not just that I found the code hard to follow, it’s that I wasn’t actually sure of what it was doing. And so my reorganization patch series began.

The series

I am of the philosophy of code early, and code often. I don’t believe in design as separate from code, I believe the design should evolve as the code evolves. So I just went ahead and wrote the thing. The first version of my patch series reorganized the code so that the “simple” mode was not defined in terms of “current” and “upstream”, but as a separate mode, and then once it became clear what “simple” actually did, redefine “current” in terms of “simple”, rather than the other way around. It consisted of 11 patches:

  1. push: hedge code of default=simple
  2. push: move code to setup_push_simple()
  3. push: reorganize setup_push_simple()
  4. push: simplify setup_push_simple()
  5. push: remove unused code in setup_push_upstream()
  6. push: merge current and simple
  7. push: remove redundant check
  8. push: fix Yoda condition
  9. push: remove trivial function
  10. push: flip !triangular for centralized
  11. doc: push: explain default=simple correctly

I could describe each one of these patches in great detail, and in fact I did: in the commit messages of each of these patches, but just to show an example of the refactorization these patches do, let’s look at patch #7, #8, and #9.

push: remove redundant check

 static int is_workflow_triangular(struct remote *remote)
 {
-	struct remote *fetch_remote = remote_get(NULL);
-	return (fetch_remote && fetch_remote != remote);
+	return remote_get(NULL) != remote;
 }

There is no need to do two checks: A && A != B, because if A is NULL, then NULL != B (B is never NULL), so A != B suffices.

push: fix Yoda condition

 static int is_workflow_triangular(struct remote *remote)
 {
-	return remote_get(NULL) != remote;
+	return remote != remote_get(NULL);
 }

There’s a lot of Yoda conditions in the git code, and it’s very hard for many people to parse what that code is supposed to do in those situations. Here we want to check that the remote we are pushing to is not the same as the remote of the branch, so the order is the opposite of what’s written.

push: remove trivial function

-static int is_workflow_triangular(struct remote *remote)
-{
-	return remote != remote_get(NULL);
-}
-
 static void setup_default_push_refspecs(struct remote *remote)
 {
 	struct branch *branch = branch_get(NULL);
-	int triangular = is_workflow_triangular(remote);
+	int triangular = remote != remote_get(NULL);

Now that the code is very simple, there’s no need to have a separate function for it.

The rest

Notice that there’s absolutely no functional changes in the three patches above: the code after the patches ends up doing the exactly same as before. The same applies for all the other patches.

There’s three things that I think are important of the overall result: 1. the new function setup_push_simple is a standalone function, so to understand the behavior of the “simple” mode, all you have to do is read that function, 2. the existing function is_workflow_triangular is unnecessary, has the wrong focus, and is inaccurate, and 3. now that it’s easy to follow setup_push_simple, the documentation becomes extremely clear:

pushes the current branch with the same name on the remote.

If you are working on a centralized workflow (pushing to the same repository you pull from, which is typically origin), then you need to configure an upstream branch with the same name.

Now we don’t need to argue about what the “simple” mode does, there’s no confusion, and the documentation while still describing what the code does is easy to understand by anyone, including the original reporter: Mathias.

Great. Our work is done…

Not so fast. This is just the first step. Even though this patch series is perfectly good enough, in the Git project the first version is very rarely accepted; there’s always somebody with a comment.

The first comment comes from Elijah, he mentioned that in patch #3 I mentioned that I merely moved code around, but that wasn’t strictly true, since I did remove some dead code too, and that made it harder to review the patch for him. That wasn’t his only comment, additionally he stated that in patch #4 I should probably add a qualification to explain why branch->refname cannot be different from branch->merge[0]->src, and in patch #10 he said the commit message seemed “slightly funny to [him]” but he agreed it made the code easier to read, and finally in the last patch #11 which updates the documentation:

Much clearer. Well done.

I replied to all of Elijah’s feedback mostly stating that I’m fine with implementing his suggestions, but additionally I explained the reasoning behind changing “!triangular” to “centralized”, and that’s because I heard Linus Torvalds state that the whole purpose of git was to have a decentralized version control system so it makes no sense to have a conditional if (triangular) when that conditional is supposed to be true most of the time.

Additionally Bagas Sanjaya stated that we was fine with the changes to the documentation, since the grammar was fine.

v2

OK, so the first version seemed to be a success in the sense that everyone who reviewed it didn’t find any fatal flaws in the approach, but that doesn’t mean the series is done. In fact, most of the work is yet ahead.

For the next version I decided to split the patch series into two parts. The first part includes the patches necessary to reach the documentation update which was the main goal, and the second part would be everything that makes the code more readable but which is not necessary for the documentation update. For example changing !triangular to centralized is a nice change, but not strictly necessary.

Now, v2 requires more work than v1 because not only do I have to integrate all the suggestions from Elijah, but I have to do it in a way that it’s still a standalone patch series, so anyone who chooses to review v2 but hasn’t seen v1, can understand it. So essentially v2 has to be recreated.

This is why git rebase is so essential, because it allows me to choose which commits to update, and the end result would look like all the suggestions from Elijah had always been there.

But not only that, I have to update the description of the series (the cover letter in git lingo), to explain what the patch series does for the people that haven’t reviewed it before, and in addition explain what changed from v1 to v2 for the people that have already reviewed the first version.

This is where one tool which I think is mostly unknown comes into play: git rangediff. This tool compares two versions of a patch series (e.g. v1 and v2), and then generates a format similar to a diff delineating what changed. It takes into consideration all kinds of changes, including changes to the commit message of every commit, and if there are no changes either in the code or the commit message, that’s shown too.

So essentially people who have already reviewed v1 can simply look at the rangediff for v2 and based on that figure out if they are happy with the new version. The reason why it’s called rangediff is because it receives two ranges as arguments (e.g. master..topic-v1 master..topic-v2).

This time the result was 6 patches. If you check the rangediff you can see that I made no changes in the code, whoever, I changed some of the commit messages, and 5 patches were dropped.

Let’s look at one change from the rangediff for patch #3.

3: de1b621b7e ! 3: d66a442fba push: reorganize setup_push_simple()
    @@ Metadata
      ## Commit message ##
         push: reorganize setup_push_simple()
     
    -    Simply move the code around.
    +    Simply move the code around and remove dead code. In particular the
    +    'trivial' conditional is a no-op since that part of the code is the
    +    !trivial leg of the conditional beforehand.
     
         No functional changes.
     
    +    Suggestions-by: Elijah Newren <newren@gmail.com>
         Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
     
      ## builtin/push.c ##

It should be pretty straightforward to see what I changed to the commit message.

The second part is what is most interesting. Not only does it includes better versions of the 5 patches that were dropped from the previous series, but it includes 10 patches more. So this is something that has to be reviewed from scratch, since it’s completely new.

  1. push: create new get_upstream_ref() helper
  2. push: return immediately in trivial switch case
  3. push: reorder switch cases
  4. push: factor out null branch check
  5. push: only get the branch when needed
  6. push: make setup_push_* return the dst
  7. push: trivial simplifications
  8. push: get rid of all the setup_push_* functions
  9. push: factor out the typical case
  10. push: remove redundant check
  11. push: fix Yoda condition
  12. push: remove trivial function
  13. push: only get triangular when needed
  14. push: don’t get a full remote object
  15. push: rename !triangular to same_remote

I’m not going go through all the different attempts I made locally to arrive to this series, but it was way more than one. I used git rebase over and over to create a sequence of commits to arrive to a clean result, where each commit implemented a logically-independent change that I thought would be easy to review, would not break previous functionality, and would make the code easier to read from the previous state.

The highlights from this series is that I added a new function get_upstream_ref that would be used by both the “simple” and “upstream” modes, but additionally by reorganizing the code I was able to remove all the setup_push_* functions, including the one I introduced: setup_push_simple. By moving everything into the same parent function now it’s easy to see what all the modes do, not only the “simple” mode. Additionally, instead of changing !triangular to centralized, I decided to change it to same_remote. Upon further analysis I realized that you could be in a triangular workflow and yet be pushing into the same repository (that you pull from), and the latter is what the code cared about, not triangular versus centralized. More on that later.

This time much more people commented: Mathias Kunter, Philip Oakley, Ævar Arnfjörð Bjarmason, and Junio C Hamano himself (the maintainer).

Philip Oakley pointed out that the documentation currently doesn’t explain what a triangular workflow is, but that isn’t a problem of this patch series. Ævar Arnfjörð Bjarmason provided suggestions for the first part of the series, but those were not feasible, and overridden by the second part of the series. Mathias Kunter said he actually preferred the explanation of the “simple” mode provided in another part of the documentation, but he incorrectly thought my patches changed the behavior to what he suggested, but that wasn’t the case.

Junio provided comments like:

  • Much cleaner. Nice.
  • Simpler and nicer. Good.
  • Nice.
  • Nicely done.
  • Well done.

If you want to take a look at the change from the most significant patch, check: push: get rid of all the setup_push_* functions.

His only complaints were my use of the word “hedge” (which he wasn’t familiar with), of “move” (when in fact it’s duplicated), “reorder” (when in his opinion “split” is better), and that he prefers adding an implicit break even if the last case of a switch is empty.

I explained to Junio that the word “hedge” has other meanings:

* to enclose or protect with or as if with a dense row of shrubs or low trees: ENCIRCLE

* to confine so as to prevent freedom of movement or action

Some synonyms: block, border, cage, confine, coop, corral, edge, fence, restrict, ring, surround.

So my use of it fits.

For most of his comments I saw no issue implementing them as most didn’t even involve changing the code, but additionally he suggested changing the order so the rename of triangular is done earlier in the series. While I have no problem doing this and in fact I already tried in an interim version of the series, I knew it would entail resolving a ton of conflicts, and worse than that, if later on somebody decides they don’t like the name same_remote I would have to resolve the same amount of conflicts yet again. I said I would would consider this.

Moreover, I recalled the reason why I chose same_remote instead of centralized, and I explained that in detail. Essentially it’s possible to have a triangular workflow that is centralized, if you pull and push to different branches but of the same repository. The opposite of triangular is actually two-way, not centralized.

centralized = ~decentralized
triangular = ~two-way

This triggered a discussion about what actually is a triangular workflow, and that’s one of the benefits of reviewing patches by email: discussions turn into patches, and patches turn into discussions. In total there were 36 comments in this thread (not counting the patches).

v3

For the next version I decided not just to move the same_remote rename sooner in the series as Junio suggested, but actually in the very first patch. Although a little risky, I thought there was a good chance nobody would have a problem with the name same_remote, since my rationale of triangular versus two-way seemed to be solid.

Part 1:

  1. push: rename !triangular to same_remote
  2. push: hedge code of default=simple
  3. push: copy code to setup_push_simple()
  4. push: reorganize setup_push_simple()
  5. push: simplify setup_push_simple()
  6. push: remove unused code in setup_push_upstream()
  7. doc: push: explain default=simple correctly

Part 2:

  1. push: create new get_upstream_ref() helper
  2. push: return immediately in trivial switch case
  3. push: split switch cases
  4. push: factor out null branch check
  5. push: only get the branch when needed
  6. push: make setup_push_* return the dst
  7. push: trivial simplifications
  8. push: get rid of all the setup_push_* functions
  9. push: factor out the typical case
  10. push: remove redundant check
  11. push: remove trivial function
  12. push: only check same_remote when needed
  13. push: don’t get a full remote object

Moving the same_remote patch to the top changed basically the whole series, as can be seen from the rangediff, but the end result is exactly the same, except for one little break.

Junio mentioned that now the end result matches what he had prepared in his repository and soon included then in his “what’s cooking” mails.

* fc/push-simple-updates (2021-06-02) 7 commits

 Some code and doc clarification around "git push".

 Will merge to 'next'.

* fc/push-simple-updates-cleanup (2021-06-02) 13 commits

 Some more code and doc clarification around "git push".

 Will merge to 'next'.

So there it is, unless somebody finds an issue, they will be merged.

To be honest this is not representative of a typical patch series. Usually it takes more than 3 tries to got a series merged, many more. And it’s also not common to find so many low-hanging fruit.

switch (push_default) {
default:
case PUSH_DEFAULT_UNSPECIFIED:
case PUSH_DEFAULT_SIMPLE:
    if (!same_remote)
        break;
    if (strcmp(branch->refname, get_upstream_ref(branch, remote->name)))
        die_push_simple(branch, remote);
    break;

case PUSH_DEFAULT_UPSTREAM:
    if (!same_remote)
        die(_("You are pushing to remote '%s', which is not the upstream of\n"
              "your current branch '%s', without telling me what to push\n"
              "to update which remote branch."),
            remote->name, branch->name);
    dst = get_upstream_ref(branch, remote->name);
    break;

case PUSH_DEFAULT_CURRENT:
    break;
}

refspec_appendf(&rs, "%s:%s", branch->refname, dst);
Documentation/config/push.txt | 13 +++----
builtin/push.c | 91 +++++++++++++++++++------------------------
2 files changed, 47 insertions(+), 57 deletions(-)

The commit

These two patch series cleaned up the code and improved the documentation, but they didn’t actually change any functionality. Now that it’s clear what the code is actually doing, Mathias Kunter’s question is easily answered: why is git push failing with “simple”? Because get_upstream_ref always fails if there’s no configured upstream branch, but should it?

All we have to do is make it not fail. In other words: if there’s no upstream branch configured, just go ahead, and thus it would behave as “current”.

Then this:

If you are working on a centralized workflow (pushing to the same repository you pull from, which is typically origin), then you need to configure an upstream branch with the same name.

Becomes this:

If you are working on a centralized workflow (pushing to the same repository you pull from, which is typically origin), and you have configured an upstream branch, then the name must be the same as the current branch, otherwise this action will fail as a precaution.

Which makes much more sense.

Right now pushing to master works by default:

git clone $central .

git push

But not pushing to a new topic branch:

git clone $central .
git checkout -b topic

git push

It makes no sense to allow pushing “master” which is much more dangerous, but fail pushing “topic”.

This is the patch that changes the behavior to make it sensible, where I explain all of the above, and the change in the code is simple, basically don’t die().

It took 47 days from the moment Mathias sent his question to the point where Junio merged my patches to master, but we are close the finish line, right?

No, we are not.

Convincing Junio of some code refactoring is relatively easy, because it’s simply a matter of showing that 2 + 2 = 4; it’s a technical matter. But convincing him of what git ought to do is much more difficult because it requires changing his opinion using arguments, this part is not technical, but rhetorical.

For reference, even though several years ago I managed to convince everyone that @ is a good shortcut for HEAD, Junio still complains that it “looks ugly both at the UI level and at the implementation level“, so for some reason my arguments failed to convince him.

So can Junio be convinced of this obvious fix? Well, everything is possible, but I wouldn’t be holding my breath, especially since he has decided to ignore me and all my patches.

Either way the patch is good. It’s simple thanks to the many hours I spent cleaning up the code, and benefited from all the input from many reviewers. And of course if anyone still has any comments on it they are free to state them on the mailing list and I’d be happy to address them.

git update: the odyssey for a sensible git pull

While git pull kind of works, over the years a slew of people have pointed out many flaws, and proposed several fixes. Some have even suggested that newcomers should be discouraged from using the command (as they often use it wrongly), and others to remove it entirely.

I spent several days digging through the entire history of the Git mailing list in order to document all the discussions related to git pull and its default mode. It’s a long story, so grab some coffee (a whole pot of it).

Update: I wrote a shorter article that only talks about why git pull is broken.

Preamble

The entire post is going to be talking about fast-forward merges so I’ll attempt to explain them briefly.

Basically when your history and the remote history have diverged, like:

non-fast-forward

a fast-forward merge is not possible: you need to create a real merge that will have two parents: C and B.

The fast-forward case is much simpler:

fast-forward

While you can create a merge, in this case it’s not necessary: the “master” branch pointer can simply move forward to B.

The whole story is about what happens when a novice does git pull when a fast-forward is not possible.

--ff-only

The story begins in 2008 with a request from Sverre Hvammen Johansen to add the --ff-only option to git merge (then called git-merge), since in his workflow a certain branch was only allowed to fast-forward.

A few weeks later we get the first patch to implement the --ff=only merge strategy. Unfortunately the patch was more complicated that it needed to be, and it wasn’t merged.

Sverre gave it another go a few months later. Junio C. Hamano (the maintainer) said “with vastly improved documentation and justification compared to the previous rounds, I am beginning to actually like this series”. Unfortunately even though Sverre sent several rounds, they were never merged. Good attempt by Sverre.

Another attempt was made at the beginning of 2009 by Yuval Kogman, this time only adding the --ff-only option. What it did was simple: if a fast-forward is possible then do it, if not then fail. Everyone was in favor. Initial versions of the patch had some issues, those were fixed, but unfortunately the last iteration received no feedback. Thanks for trying though.

A few months later, the question was asked “how do you fast-forward your local dev to sync up with origin/dev?“. If the previous patch was merged, the answer would have been simple: git merge --ff-only, but since that wasn’t the case, Junio replied with:

git push . origin/dev dev

This is not user-friendly if you ask me, and later on turned out it wasn’t even correct.

Months later Randal L. Schwartz brings up the same issue and demands a simple command that can be told to users in the #git IRC channel who constantly asked this question. It is pointed out such thing has been implemented twice already, but never merged.

To which Junio replied:

Do you mean twice they were both found lacking, substandard, useless, uninteresting, buggy, incorrect, or all of the above?

Geez! Eyvind was just trying to help. They clearly were not useless nor uninteresting, since many people had rallied in favor of such a feature, and yet more joined saying +1, even in this very thread.

The first big one

At the end of the year appears the first big discussion about the asymmetry of pull and push. Thomas Rast makes very bold suggestions, like deprecating git fetch completely. These drastic changes would have worked using the suggestion of Wesley J. Landaker to add a configuration, and deprecation warnings.

This proposal was not taken seriously; Junio considered the suggestion to turn git pull into git fetchmindless mental masturbation“, and did not see how git pull and git push could ever be symmetric.

On the other hand an interesting piece of data was provided by Björn Steinbrink. He investigated 10 days of IRC activity in the #git channel and found plenty of users confused by git pull, many expected not only there to be a --ff-only option, but this behavior to be the default.

Aside from --ff-only, other main ideas were:

  1. Add a --merge option to git pull
  2. Add a pull.merge configuration

Björn Gustavsson used this big thread as inspiration to implement --ff-only for both git merge and git pull.

Finally. After three implementations, Junio accepted the idea and merged the patches in October 2009.

Begging for a configuration

Not long after the introduction of git pull --ff-only (6 months later), the first request for a configuration to make it the default appears in 2010. Peter Kjellerstedt argues that pull.options and merge.options configurations would fit him well. Not only that, but he thinks --ff-only should be the default. Nothing comes of this.

In 2013 people in the Linux Kernel Mailing List found issues with git pull and Linus Torvalds brought them to the Git mailing list. Linus argued that depending on whether you are a maintainer or a submaintainer you might want to do either --ff-only, or a true merge. An alias could do the trick (for pull --ff-only), but asked the Git community for ideas.

One of the interesting ideas came from Theodore Ts’o; have a per-repository configuration. Junio did follow this thread closely, but nothing happened.

Only days later, Ramkumar Ramachandra introduces the idea of a pull.default configuration which ideally could be specified on a repository-specific basis. Ramkumar doesn’t seem to have pursued his idea further.

Second big one

A couple of months later Andreas Krey triggered a huge discussion when he pointed out that the order of the parents when doing a merge was wrong. What does that mean?

So this is what git pull does:

The new merge commit D merges B into C, this is fine if you are a maintainer, and instead of “origin/master” you are pulling something like “john/topic” into your “master” branch, in that case your “master” (C) is the first parent. But it’s not OK for normal developers (i.e. non-maintainers).

A normal developer would want to do the opposite: to merge his “master” (C) to the upstream “origin/master” (B), so the first parent is “origin/master” (B). This is not a trivial thing. The order of the parents completely changes how history is visualized, and even traversed, plus it’s important when resolving conflicts, along many other things.

Junio’s response:

Don’t do that, then.

For the record: there’s no way to tell git pull not to reverse the parents, and the four people that expressed an opinion said they would rather have the order reversed (John Szakmeister, Jeremy Rosen, John Keeping, and of course Andreas Krey).

Junio said he would not be opposed to an option to reverse the order of the parents, but as you’ll see later on, when I sent a proposal to do precisely that he ignored it.

Yet again another suggestion to make --ff-only the default, this time from John Keeping. This round I chimed in, and so did many others. Junio was not convinced this was the way to go, because integrators would be affected:

It is not about passing me, but about not hurting users like kernel folks we accumulated over 7-8 years.

Since I have worked on the Linux kernel, I explained to Junio that most lieutenants (submaintainers) don’t do git pull; they send pull requests to Linus, so essentially Linus is the only person that does git pull as an integrator (as Junio does). Linus is certainly capable of configuring git pull to something other than --ff-only (which I don’t know why Junio switched for --rebase in his responses).

That is not something I can agree or disagree without looping somebody whose judgement I can trust from the kernel circle ;-).

That somebody was none other than Linus Torvalds, who agreed with us:

It would be a horrible mistake to make “rebase” the default, because it’s so much easier to screw things up that way.

That said, making “[--ff-only]” the default, and then if that fails, saying [a big warning message], [would make sense].

Linus suggested it would be even better to make this configuration per-branch, which is similar to the suggestion of Theodore Ts’o’s idea of making it per-repository.

Eventually, using Mercurial as inspiration, I suggested the following error message:

The pull was not fast-forward, please either merge or rebase.

This seemed to land on Junio, who finally saw why --ff-only made sense:

Initially I said limiting “git pull” to “--ff-only” by default did not make sense, but when we view it that way, I now see how such a default makes sense.

John Keeping came with another good idea by using a hypothetical command git update as an example. This command, unlike git pull would have the correct order of the parents, and would be similar to svn update, and others. While John didn’t propose such command, eventually I did.

After Junio realized the value of --ff-only he event sent a patch integrating Linus’ suggestions and mine. However, I came up with a simpler approach that in my opinion was better, and that triggered another big discussion.

Third big one

My response to Junio’s patch triggered yet another big discussion.

Richard Hansen argued that git pull was straight up dangerous, and he recommended this instead:

git remote update -p
git merge --ff-only @{u}'

He suggested people have that as a git up alias.

In September of 2013 I sent a patch series to add a pull.mode configuration which can be set to merge-ff-only. It was completely ignored, however Junio soon after made a similar proposal on the other thread: pull.integrate with an option of fail back. I argued that my proposal of pull.mode was better, and other people agreed.

Another point I argued is that we needed a deprecation warning before changing the default mode, something like “in the future you would have to choose a merge or a rebase, merging automatically for now”.

Jeff King provided some useful information from GitHub trainers, essentially they didn’t like the rebase option by default because most people don’t know what a rebase is. This is what prompted me to add the following line to the warning:

If unsure, run “git pull --merge“.

v3 of my patch series did receive some feedback, but it got stuck with trivial details. I tried to fix them for v4, but didn’t receive any feedback at all.

Moving on

At the beginning of 2014 David Aguilar sent a couple of patches to add the pull.ff configuration, and Junio simply merged them with no comment.

This ignored all the previous feedback, and doesn’t add a configuration per-repository, or per-branch, but well… At least there’s a configuration now (sort of).

Pull is Evil (first huge one)

In April of 2014 Marat Radchenko pointed out a list of problems 20 newcomers in his organization faced after trying everyday tasks. Most of the problems came from the order of the parents in git pull. Marc Branchaud pretty early on renamed the thread to “Pull is Evil” since in his opinion git pull is pretty much broken, due to the fact that different people use it in different ways.

I think the pull command should be deprecated and quietly retired as a failed experiment.

Junio brought up again the idea of a git update command, which would demarcate the behavior meant for integrators, and normal developers.

Another interesting idea came from W. Trevor King: add an --interactive option (like git add or git rebase) so the user can choose to a) merge, b) rebase, or c) fast-forward.

I pointed that this has been discussed before, everyone agrees the default should change, and I had already sent the patches. Matthieu Moy rejected the notion that everyone was in favor, since he himself wasn’t, but I pointed that he was the only person not in favor:

But in fact, he was in favor, not with changing the default, but with the configuration I proposed. So he wasn’t against all my patches, just one.

Either way I still think Matthieu Moy is wrong. Yes he is right in questioning why ask if to merge or rebase if most newcomers would not want to know at that point in time, but the answer is simple: they don’t have to know, they can simply do git pull --merge and find out later why that --merge is necessary. Exactly the same thing happens with git commit --all.

Pull is Mostly Evil (Pull is Evil part 2)

Marc Branchaud din’t find a good subthread to reply to, so he started yet another thread. In it he suggested that since git pull can only do the right thing if configured properly, it should be an advanced command that must be configured to be used, and when it’s not configured it does nothing by default.

Junio once more pondered about a git update command that would separate the two clearly distinct ways of using git pull.

Being frustrated with the lack of follow-up I made the prediction that the consensus and the proposals will likely be ignored and nothing would change:

And it has been discussed before. If history is any indication, it will be discussed again.

Spoiler: I was right.

Jeff King did not like my comment, and also stated that he wasn’t sure the majority of users would want the parents on a merge done with git pull to be the other way. This despite many big threads started precisely because the order of the parents was wrong, and many developers stating that this was the only reason why they didn’t use git pull, nor did they recommend newcomers to use it. I offered to point him to the many endless discussions in the mailing list.

Richard Hansen wrote a comprehensive description of the issues, and how they could be solved with different new commands: git update and git integrate, as well as other options.

The rest of the discussion was pretty much Richard Hansen and me hashing out different use-cases and ways to deal with them.

Noise

The discussion did not go without hiccups. In particular David Kastrup and James Denholm threw some personal attacks at me that had absolutely nothing to do with the discussion. Back then I was much more abrasive that I am today, in many instances throughout the entire history of the mailing list (including these discussions) the present me would have reacted differently, but that doesn’t change the fact that I’ve always focused on the code, not so much the feelings of participants of the discussion.

I quickly ignored them and focused on the code itself.

One more try

Since this issue popped yet again, I sent v5, and a v6 of my pull.mode approach. They did receive some amount of feedback, but in particular there was a comment from Richard Hansen that showed yet another issue with git pull.

Supposing we are currently on the branch “master”, and the upstream branch is “origin/master”.

  1. git pull --rebase (rebase master onto origin/master)
  2. git pull --merge (merge origin/master into master)
  3. git pull --rebase github john (rebase master onto github/john)
  4. git pull --merge github john (merge github/john into master)

Half of these are wrong. #1 is correct, since when we do git pull --rebase we want to update our “master” to “origin/master”, so we rebase on top of, but #2 is wrong, because when we use --merge we want a similar thing: we want to merge “master” to “origin/master”, not the other way around. This is what Andreas Krey pointed out, which started the second big thread. But in addition #3 is wrong too, because when we specify a repository and a branch to pull from, we want that branch to rebase on top of master, not the other way around.

So basically it’s all wrong.

Summary

After listening and responding to everyone, all the issues with git pull were clear on my mind, but I realized not everyone had the patience, the stamina, or the time to go through all the responses, so I wrote a summary.

I reduced the main problems with git pull to two:

  1. It does unwanted merges
  2. It does merges with reversed parents

The deeper issue is that git pull is in fact used in two very different ways: to integrate and to update. So when you try to fix the defaults for one, you stumble into unintended consequences for the other. While I do see clearly in my mind how git pull can be disentangled in a way that the defaults would work correctly for everyone, it is pointless if I’m the only one who sees that, because I can’t convince anyone (except perhaps Richard Hansen).

Instead I chose an escape hatch, a completely new command: git update, which would integrate the input from everyone in order to do what git pull can’t do cleanly.

By default git update would be essentially the same as git pull --ff-only (or more like my pull.mode=ff-only), which does solve the main issue: newcomers won’t do real merges by mistake. But it also solves the issue about the order of the parents:

  • git update --merge (merge “master” into “origin/master”)
  • git pull --merge github john (merge “github/john” into “master)

Not only that, but we could fix the order of the parents for rebase too:

  • git update --rebase (rebase “master” onto “origin/master”)
  • git pull --rebase github john (rebase “github/john” onto “master”)

Now everything is fixed. Additionally the default mode for update could be configured with update.mode, and the mode for pull with pull.mode (e.g. update.mode=rebase, pull.mode=merge). Not only that, but this could be configured per-branch (as Linus Torvalds and Theodore Ts’o suggested) with branch.master.updatemode and branch.master.pullmode. This was not only a proposal, I wrote the patches for all of that.

All users agreed my proposed git update was good, as Philippe Vaucher stated:

I know patches from Felipe are sensitive subject and that a lot of people probably simply ignore him entirely, however I think this patch deserves more attention.

No developer commented on it.

Moving on

Nothing resulted from the big Pull is Evil thread, except after it was finished, Jim Garrison asked how would pull do the wrong thing? Junio sent a comprehensive reply and Stephen P. Smith’s proposed to pretty much copy-paste the email from Junio into the list of howtos, and was accepted without any pushback. Not only was the fist version accepted, but Junio himself fixed the patch so no v2 was necessary from Stephen.

Unnecessary merge

At the beginning of 2018 Linus Torvalds once again contacted the mailing list due to an “unnecessary merge” in the Linux kernel. He explained the difference between a merge and a “back-merge”, and why in one you want a fast-forward, but not the other. Unfortunately git doesn’t know which situation we are in.

He even suggested the following configuration:

   git config --global alias.update pull --ff-only

This is precisely what my proposed git update did by default. Linus Torvalds even used the exact same name.

Junio then proceeds to suggest a per-repository configuration pullFF=only, when in fact Linus had already proposed a per-branch configuration back in 2013, and I had already implemented it. Nothing happens.

Why merge?

In 2020 Robert Dailey asks why does git pull use merge by default, he argues most people prefer rebases. Once again Elijah Newren argues that it would be desirable to consider a transition to ff-only by default. Once again Konstantin Tokarev tells Junio that in reality what newbies often do is wrong merges, and that ff-only would be better. Robert Dailey also gets convinced that ff-only is better. Alex Henrie mentions a patch he wrote but was reluctant to send it in order to avoid a big discussion.

The warning

Alex Henrie finally decided to send his patch which adds a warning telling users that running git pull without specifying to either merge or rebase is discouraged. Initially Alex wrote the warning to state that the default behavior would change in a future version of git, but Junio objected to that because there was “no consensus”:

Sorry for not catching this in the earlier round, but I do not think anybody argued, let alone the community came to a concensus on doing, such a strong move. Am I mistaken?

After five rounds of comments from Junio, he accepted the patch.

Interestingly enough this change immediately caused people to ask what it was:

Also, we have this lovely comment on GitHub:

Thanks for confusing everyone instead of just doing the default strategy when no parameter is given.

Imagine when every (CLI) application warns the user about not having each default value within the configuration file or given as parameter, despite the user is OK with (or does not care about) the default behaviour.

And we have a bug report from Wolfgang Fahl completely in German. Roughly translated: the bug is the warning.

Clearly people did not like this warning.

Pick the right default

On November of 2020 Vít Ondruch complained about the warning and stated that the Git project should choose a sensible default and not bother users with a warning.

I’d like to keep my trust in Git upstream, but this is was not trustworthy decision IMO.

This started because of a bug report on Fedora.

The main argument from Junio is that there was no perfect default:

There is no clear “default that would suit everybody”.

I pointed out that this is a false dichotomy because the options are not 1) implement the perfect solution, and 2) do nothing; we have another option 3) implement a better solution than the current one, and we all know what that solution is: ff-only. Plus we had the patches for that.

Jeff King pointed out that now there’s a pull.ff=only configuration, and he didn’t see how my pull.mode=ff-only was different. However, I explained that my version provides a more friendly error message, also works correctly with pull.rebase=true, and in addition allows a default mode that throws a more correct warning than the current one.

Junio proposed to make git pull --rebase honor pull.ff=only and fail if the pull is not fast-forward:

And then making an unconfigured pull.ff to default to pull.ff=only may give a proper failure whether you merge or rebase. I dunno.

This is obviously the wrong thing to do. As I explained to both Junio and Jeff, if you make git pull --rebase respect that, and you make pull.ff=only the default, then git pull --rebase will always fail (unless it’s a fast-forward case).

Once again I stated what should be the default.

  • git pull (fail by default unless it’s a fast-forward)
  • git pull --merge (force a merge [unless it’s a fast-forward,
    depending on pull.ff])
  • git pull --rebase (force a rebase [unless it’s a fast-forward,
    depending on pull.ff])

Vít Ondruch agreed.

Both Jeff and Alex Henrie argued that what my pull.mode does could be achieved with pull.rebase and pull.ff, and that’s the direction Junio wanted to take. I wasn’t convinced, and at this point I started to convert my patches from 2014 which was no easy task because before git pull was written in shell, but now it had been converted to C, so essentially I had to rewrite everything from scratch. Alex started to write his approach too.

Junio welcomed the competition:

Let’s see how well the comparison between two approaches play out.

Pretty quickly I realized what they suggested was not possible. If I removed my pull.mode and made pull.ff=only the default, then git pull --merge would fail. We want git pull fail by default, but not git pull --merge. To make this approach possible we would need to change the semantics of pull.ff=only, and that didn’t seem a clean solution to me.

pull.rebase=ff-only

In order to avoid all the hurdles from 2013 I decided to write the simplest patch possible that would achieve the desired result. Instead of creating a new pull.mode=ff-only, I reused an existing configuration, and named the mode pull.rebase=ff-only. This obviously was not ideal–since it left the interface in an inconsistent state (didn’t fix all the issues my original patch did)–but it was better than the current situation.

Junio responded that the name looked funny:

It looks quite funny for that single variable that controls the way how pull works, whether rebase or merge is used, is pull.REBASE.

Yes Junio, that’s why I suggested pull.mode instead, but that patch was never merged.

Raymond E. Pasco also agreed ff-only was the way to go.

pull.mode revampled

My first attempt to revive my old series recreated most of the same old behavior, but not all. It only received one comment for a tiny correction on the documentation.

Since my patches for pull.mode were ignored again, on my v2 I sent a bunch of improvements that were not 100% related to my approach, and instead of using my proposed pull.mode, I changed the semantics of --ff-only, which was not my preferred solution, and I clearly demarcated that particular patch with “tentative“. Elijah Newren provided a ton of valuable feedback, and I incorporated most of it, but there was one point where I did not agree:

If unsure, run “git pull --merge“.

I believe that escape hatch is useful, even if it undermines the value of the warning because 1) the warning must be temporary anyway 2) people are complaining about the warning already, 3) input from GitHub trainers already showed many people don’t even know what a rebase is, and 4) the merge is happening already anyway.

I explained my point of view to Elijah using an analogy. If a passenger in a plane is ignoring the safety demonstration he is doing something wrong, however, he should be completely free to do it, it’s his volition. The crew should allow him to ignore the demonstration. Similarly git users should be able to ignore the warning, and they can skip it by simply doing git pull --merge.

I argued that even if he didn’t agree on that particular line of that particular patch, the rest of the changes in the series don’t suddenly make the situation worse. He said he didn’t agree.

Junio used my analogy to liken users doing git pull --merge when they shouldn’t to be the same as passengers that should be removed from a plane:

A team member who runs “pull --[no-]rebase” pushes the result, which may be a new history in a wrong shape, back to the shared repository probably falls into a different category. … Or perhaps in the same public hazard category?

I don’t see how one line in a temporary warning can potentially create a “public hazard”. It’s users doing wrong merges what creates problems, and removing one line from the warning wouldn’t magically fix that. The warning–which if we take Stack Overflow and reddit as indication–isn’t even properly understood by many users, and doesn’t stop the code from creating a merge anyway. If these merges did indeed create a “public hazard”, shouldn’t git simply not do them by default? In other words: make ff-only the default.

Sadly I did not think of that argument at that time, I merely responded that what constitutes a public hazard is up to each individual project to determine, not git. Which is true.

On the other hand he said something that was completely untrue:

I hope “his comment” does not refer to what I said. Redefining the semantics of --ff-only (but not --ff=no and friends) to make it mutually incompatible with merge/rebase was what Felipe wanted to do, not me.

That’s not true. I wanted to introduce pull.mode=ff-only, not change the semantics of --ff-only. But as I had already explained before, --ff-only cannot be made the default without changing its semantics. The patch that introduced pull.mode was ignored (again), so I took a shot at making --ff-only work, and the only way is changing its semantics, which I did in a patch clearly labeled as “tentative” because I was not advocating for that.

And to make it crystal clear I repeated it yet again:

I don’t want to change the semantics of --ff-only; I want to introduce a new “pull.mode=ff-only“.

Moreover, I explained once again that this patch series was merely part 1. In part 2 I would introduce pull.mode. And there’s more parts after that.

For some reason Junio didn’t seem to understand what I was proposing. I explained that with pull.mode=ff-only I wanted to introduce a simple error message, but with pull.mode=default a warning to give users a heads up.

I quite don’t get it. They say the same thing. At that point the user needs to choose between merge and rebase to move forward and ff-only would not be a sensible choice to offer the user.

Other people like Jacob Keller did not have any issue understanding my proposal.

After multiple rounds eventually Junio gets my proposal, but now he doesn’t see the point of taking it slow instead of flipping the default to ff-only right away. Junio said that he assumed everyone has already configured pull.rebase therefore nobody would notice the change. I disagreed and pointed why this whole thread started: because a user refused to configure git and expected git to do the right thing–choosing a useful default.

Junio responded as he did at the start:

Which is basically asking for impossible, and I do not think it is a good idea to focus our effort to satisfy such a request in general. There is no useful default that suits everybody in this particular case.

This is once again the Nirvana fallacy. Nobody asked for a perfect solution, merely a better default.

Additionally he argued the pull.mode=ff-only would only be useful for users that don’t use git “for real”:

But for anybody who uses git for real (read: produces his or her own history), it would be pretty much a useless default that forces the user to say rebase or merge every time ‘git pull’ is run.

I mean… didn’t he just argue that he wanted to make ff-only the default right away? Literally in his last reply. Very confusing.

I showed to him a real example of me using git pull --ff-only “for real”. Additionally I asked him in the most polite way I could a question that should be very valid at this point:

How often do you send pull requests?

He didn’t answer.

Since once again Junio stated that he didn’t see a good reason why certain classes of users would want to configure pull.mode=ff-only, I offered to do some mail archaeology which lead to the creation of this blog post. It is obvious this is needed.

Once again Junio misinterpreted what I was proposing, and once again I explained in great detail. This time however Junio correctly identified that without pull.mode the user cannot explicitly ask for the default behavior, and that in my opinion is a crucial thing the user must be able to do in order to test this proposed new default. Moreover, I explained that the name is irrelevant, it could be pull.mode=ff-only, or pull.rebase=ff-only, or pull.ff=only, but one option should turn on the new behavior we are proposing to users, and currently no option does.

To cut to the chase I explained that we want:

  1. git pull to eventually fail in non-fast-forward cases
  2. A grace period before that switch is flipped

And so far no other proposal but mine achieved that.

Since Junio insisted pull.mode was not necessary (which I had already demonstrated multiple times it is), I asked him a poignant question to unequivocally demonstrate that:

So, after your hypothetical patch, there would be no difference between:

git -c pull.rebase=no -c pull.ff=only pull

and:

git -c advice.pullnonff=false pull

?

Junio answered that both would error out, which is correct. But then he has to concede my point, because:

  • git -c pull.ff=only pull (fail)
  • git -c pull.ff=only pull --merge (fail)

If both fail we cannot tell the user:

Not a fast-forward, either merge or rebase.

Because the user would do git pull --merge and that command would fail as well because it will try to do git merge --ff-only and it’s not a fast-forward. That’s it. pull.ff=only can’t be used for this. Period.

Junio did not reply.

Part 2 was completely ignored.

When Junio sent his regular mail with the status of all patch series he didn’t include any of my proposals. Since Junio himself was the one that asked for Alex and me to provide our competing proposals I replied to him asking for the status of the ff-only topic. He didn’t reply.

Alex’s approach

Junio asked for both my approach and Alex’s approach which he sent in two patches. One changing the warning to state that pull.ff=only will be the new default, and the other actually changing the default.

Junio did not think the approach was correct, and neither did I. When I explained why, Junio mis-read my response, but this time he himself caught the mistake, and then said the following:

If we instead introduced a separate command, say “git update“, that is “fetch followed by rebase” (just like “git pull” is “fetch followed by merge”), to rebase your work on top of updated upstream, there wouldn’t be a need for us to be having this discussion.

Yes, if my suggestion to add git update was heard we wouldn’t be having this discussion (yet again).

But to move forward I argued that there’s three things to do:

  1. Improve the annoying warning
  2. Consider changing the semantics of --ff-only, or implement pull.mode=ff-only
  3. Consider a new “git update” command

Since my patches for pull.mode were not being reviewed, I suggested to focus only on #1 for now.

Jacob Keller chimed in and also agreed the order of the parents with git pull was a significant problem.

I also reminded Junio of the purpose of the patch, to which he replied:

Sorry, I know you keep repeating that “keep in mind”, but I do not quite see why anybody needs to keep that in mind. Has a concensus that the repurposed --ff-only should be the default been established?

This is the title of the patch from Alex we were supposed to be reviewing “pull: default pull.ff to "only" when pull.rebase is not set either“. The whole point is to make pull.ff=only the default, and in that case git pull --merge would fail, which is not what we want. Unless the semantics of pull.ff=only are changed, which is not my proposal, my proposal (pull.mode=ff-only) is better because it doesn’t need that change.

He did not reply again.

Alex stated that he had trouble following the discussion and simply left it at the table.

A fast-forward

In v3 I focused on fixing all the issues without committing to any particular approach. However, Junio objected to my use of the word “fast-forward”:

… I find the phrase “in a fast-forward way” a bit awkward. Perhaps use the ‘fast-forward’ as a verb

I argued that if fast-forward was a verb, we would have git fast-forward command (which I actually think is a good idea), but currently we don’t have it, so fast-forward is a modifier. In particular if git merge is a verb, then --ff is an adverb.

Elijah disagreed:

A square is a special type of a rectangle, but that doesn’t make “square” an adjective; both square and rectangle are nouns.

Since I’m deeply interested in language, this is a discussion I could really weigh in:

Words have multiple definitions. The word “square” is both a noun, and an adjective. It’s perfectly fine to say “square corner”.

Moreover, I found plenty of instances in the documentation which use fast-forward as an adverb. Even the git-merge man page uses “fast-forward merge”.

  • non-fast-forward update
  • fast-forward merges
  • fast-forward check
  • “fast-forward-ness”
  • fast-forward situation
  • fast-forward push
  • non-fast-forward pushes
  • non-fast-forward reference
  • fast-forward cases
  • “fast-forward” merge

This time I managed to convince him.

However it didn’t escape to me the fact that the documentation already talks of “fast-forward update”, so this suggests that the git update command could fill a whole already present in the documentation.

Everything at once

Since Junio had trouble figuring out my proposal in full, I decided to send the 19 patches together at once, and in addition I wrote a detailed explanation of what I was proposing to happen at every step of the way.

Is it more clear what is my proposal?

No reply.

Improve the default warning

Since I didn’t see much hope in fixing git pull‘s default, for v5 I decided to drop everything except what fixes the most urgent issue: the warning on every single pull.

Junio objected to the name of a variable I used: default_mode. He said the word “mode” was overused, and he proposed a “more focused” word, he came with rebase_unspecified. Additionally he didn’t want this variable to be global, even though there were already 38 global variables, so why did he feel that 39 was too much? I have no idea.

Moreover he made some comments on the test changes:

We are merely allowing fast-forward merges without unnecessary merge commits, but we are faced to merge c1 into c2, which is not ff. The command goes ahead and merges anyway, but why shouldn’t we be seeing the message? I am puzzled.

Now, for the most part I haven’t gone into code in this post, but I think it’s important you feel a little bit of my pain here.

git reset --hard c2 &&
test_config pull.ff true &&
git pull . c1 2>err &&
test_i18ngrep ! "Pulling without specifying how to reconcile" err

The name of the test gives a hint to what it’s trying to do “pull.rebase not set and pull.ff=true“. As you can see there’s no pull.rebase configuration (or --rebase), but there is pull.ff=true. So clearly it has something to do with that pull.ff.

What this test is doing is trying to pull c1 into c2, and since the history has diverged the merge cannot be a fast-forward, therefore the code should throw a warning…

except that only happens if the user has not specified any --ff* option (--ff, --no-ff, or --ff-only), either through to command line, or from the pull.ff configuration, and the test does set pull.ff=true. Therefore there’s no warning.

This is behavior Alex Henrie added and Junio accepted with no comment. In my opinion this behavior is wrong, but that’s what the code does at this point, and even Junio did not expect this, that’s why he was puzzled.

I explained to him that the code only checks for opt_ff, and I tried to fix that behavior in v1, v2, v3, and v4 of my patch series, but he did not comment on those patches.

So the tests used to do a fast-forward from c0 to c1, and that would normally throw a warning (as all pulls did), but because pull.ff=true is set, it doesn’t. This test written by Alex Henrie ensures that pull.ff=true doesn’t generate a warning. But after my patch doing a fast-forward from c0 to c1 would not generate a warning anyway, so the pull.ff=true doesn’t do anything, and therefore the test doesn’t do anything. That’s why I changed the test to start from c2, because then the merge is not fast-forward, and a warning is thrown unless pull.ff=true is set, which is what the test was supposed to be testing in the first place.

To show Junio that the tests are doing what they originally intended to do, I modified all the tests to remove the pull.ff configurations:

git reset --hard c2 &&
git pull . c1 2>err &&
test_i18ngrep ! "Pulling without specifying how to reconcile" err

Now the test fails.

Junio was not convinced. He stated that the original test was checking that the fast-forward would work fine, but that’s not true since from the test we can see that the only thing it’s doing is checking there’s no error message. The code could very well have done a real merge instead of a fast-forward, and the test would still pass. I explained to him that the tests that actually check for the fast-forward are in another file. This particular test is only about the warning message.

Junio claimed to know what the original author of the test intended to do and what he cared about, and that was to test that pulling c1 into c0 issued a message, except none of the tests actually did that, they checked the opposite: that the message is not there. So now that my patch skips the message, the tests pass, but for the wrong reason.

git reset --hard c0 &&
git pull . c1 2>err &&
test_i18ngrep ! "Pulling without specifying how to reconcile" err

If instead of starting from c2 we start from c0 as the original code, then the pull is fast forward, and there’s no message, so this test pass. But this test was supposed to be checking the output with pull.ff=true, that’s why it was called “pull.rebase not set and pull.ff=true“, if you remove the configuration it still passes. So in the fast-forward case what is this test supposed to be doing if it cannot possibly fail now? Nothing.

I explained that and in addition I said that I’m not in the habit of attempting to read minds, what the test are trying to do is very clear from the code.

Junio replied.

You do not have to read mind. What is written in the test is clear enough: run “git pull --rebase . c1” into c0 and see what happens.

Yes, that was precisely my point: no need to read minds (although my reading of the code is different).

Do you feel my pain? I don’t mind going as many rounds as necessary to improve a patch series, even ten or more, but these rounds are improving nothing. The name of the variable doesn’t matter, because in a few patches later I remove it, and even these tests for the warning don’t matter because eventually we want to remove the warning completely.

And this is only one test, of one the patches, of one of simplest patch series.

But fine, if Junio wants to keep a bunch of tests that literally test nothing, I’ll oblige. For v6 I left the useless tests untouched and added the real ones separately.

Junio isn’t entirely happy

Junio sent v7 himself, stating he wasn’t happy with the way the --ff* options were handled. He suggested that these options should imply a merge, but I showed situations in which that would become tricky.

Additionally he wrote a patch to remove my global variable, and seized the opportunity to state:

There is no reason to make the code deliberately worse by ignoring improvement offered on the list.

Really? I did not ignore the comments, I responded to all of them, but I disagreed with some. And so if I disagree with any of Junio’s suggestions for improvement I’m making the code “deliberately worse”? If I have to do what he says, then those are not suggestions, those are orders. Not to mention the v7 he is sending contains the patches I sent in v6, which have the useless tests–a change he suggested–and even though I was 100% against, I included it.

Anyway, since Junio didn’t want to discuss his proposal to imply merges any further, finally he merged this patch series.

After this patch the warning becomes much less annoying. So that’s progress.

What happened to the rest of the patches? Nothing.

Today

After this latest attempt to fix git pull, I distanced myself from the project for a few months, and then came back. I decided to resend some of my patches, but now separately, to minimize the possibility of them being rejected.

  1. Cleanups
  2. Documentation
  3. --merge

Of these only #1 was merged (though not yet in “master”). #2 generated yet another discussion of the semantics of “fast-forward” but nobody was against the changes themselves. #3 everyone is in favor, but no comment from Junio.

If you think any of these will eventually be merged you are more hopeful than I am. Also, there’s this comment from Junio: (emphasis mine):

If some contributors get too annoying to be worth our time interacting with, it is OK to ignore them.

He seems to be referring to me.

So where does that leave us? The order of the parents is still wrong. And the default mode still isn’t fast-forward-only. Regardless of what you may think of my communication style, I’m the one that has actually sent patches to fix all these issues. And unlike Junio and many other git developers–I’m not being paid to work on this, I’m doing this altruistically (I don’t even use git pull myself). If I’m not the person to fix this, who is? I left the project for about 6 years and nobody fixed it.

Recently I challenged Alex Henrie to send a patch to try to fix the situation. He took me up on the challenge, and when I reviewed his patch he immediately realized that more work is needed (as I predicted he would find).

What to do now?

The final solution

While writing this post I realized just how many suggestions were proposed, but not implemented, so I decided to try once again to fix the situation, but now consider absolutely all the feedback in a holistic way.

Here’s there result.

git fast-forward

The documentation about what a fast-forward is, and how to attempt one, is scattered all over. Additionally the simplest way to attempt one is with git merge --ff-only which isn’t user-friendly, and some people don’t even consider a fast-forward to be a modifier of a merge (i.e. adverb).

By having git fast-forward command we solve both problems and fast-forward in the git context now becomes a verb: something independent of git merge.

Additionally there’s an advice for newcomers:

hint: Diverging branches can't be fast-forwarded, you need to either:
hint:
hint:   git merge
hint:
hint: or:
hint:
hint:   git rebase
hint:
hint: For more information check "git help fast-forward".
hint:
hint: Disable this message with "git config advice.diverging false"

Advanced users can disable this specific advice (or all advice).

As the advice says, anyone who wants to learn more about fast-forwards can simply do git help fast-forward.

git update

I converted my old tool from shell to C, and although it doesn’t have yet all the features, it does have plenty of functionality.

git update without arguments simply does git fetch + git fast-forward. If the fast-forward fails, we get the above advice. This solves the first problem that newcomers often do merges by mistake.

Also, it’s possible to do git update --merge (or configure update.mode=merge), and on those cases the order of the parents is reversed properly, as everyone wants (everyone that is not an maintainer).

There’s also git update --rebase, and also a per-branch a configuration branch.<name>.updateMode.

pull.mode

And of course all the other patches that improve git pull, including pull.mode=fast-forward, a better warning that shows that in the future git pull will do fast-forwards by default, along with a per-branch configuration branch.<name>.pullMode.

Also, thanks to git fast-forward we also get the same divergence advice, which will be permanent, but easy to turn off.

In total it’s 33 patches, and one additional to flip the default in the future (after users have had a chance to configure pull.mode to whatever they prefer).

The patches have been sent, but I wouldn’t hold my breath.

Sadly this is where the story ends, even though everyone has agreed on what is the path forward “for some reason” the patches will not be merged, and new users will be forever doomed to keep making mistakes because git pull was not designed for them.

Wait a second…

Git is a distributed version control system. You are not forced to use Junio’s repository, you can use mine. Everything I’ve talked about is fixed there. If you are interested why not give it a try?

Note: Some people are thinking that the purpose of this article is to point fingers, but that could not farther from the truth. The only reason names are named is to be able to follow the discussions (or at least attempt to). I am not seeking to assign any blame to any one person for any action or inaction. In any discussion disagreements are to be expected, and it doesn’t have to mean anything more than that.

Adventures with man color

As it’s usually the case with me, a simple fix sends me to an unending rabbit hole of complex issues. And this was no exception.

It all started when I tried to help the Git project to move towards asciidoctor, a program that generates documentation from text files using a markdown language. The initial project was asciidoc, but it’s in a bit of a rot. The original asciidoc is written in Python (a language I detest), and the new asciidoctor in Ruby (a language I love), so I clearly saw an edge.

The problem starts with a feature asciidoctor has, but not asciidoc: generate man pages. Of course asciidoc can generate man pages, but it does so by first generating a docbook XML, and then docbook stylesheets can be used to convert those to man pages. The same can be done with asciidoctor, but additionally there’s an option to generate man pages directly.

Using docbook is very slow; generating man pages directly is way faster.

When I sent the initial patch (very trivial), a Git developer mentioned some “groff issues”. Apparently if your system uses groff (GNU troff), you are supposed to build git with GNU_ROFF=1:

Define GNU_ROFF if your target system uses GNU groff. This forces apostrophes to be ASCII so that cut&pasting examples to the shell will work.

It’s actually not “GNU groff”, but “GNU troff”, so this option is wrongly named, but regardless; virtually nobody is using it.

My Arch Linux system doesn’t build git with GNU_ROFF, it does use groff, and yet I don’t see any issues with apostrophes… So what’s going on?

The context

The original commit “Quote ‘ as \(aq in manpages” comes from 2009, and mentions the problem comes from “docbook/xmlto”, and apparently only affects groff. In the original thread “quote in help code example” you can see they mention the output of the git filter-branch command specifically, but when I look at that man page, it looks fine.

So I started to dig in, and for starters I don’t even know what groff is.

I tried different versions of asciidoc, they worked fine. Then I tried different versions of docbook stylesheets, they worked fine. I even tried different versions of asciidoctor, to see when the fixed the issue; the all worked fine. Weird.

So I decided to compile groff myself and find the point where it started to fail. First I tried a version two years ago, and it did not compile correctly, so I had to do some hacking to make it compile, after I did, everything worked fine. I continued going more and more into the past, fixing the compilation issues, and not finding any problem. I reached 2006, and still did not see any change.

This was strange. Clearly at some point in time, with some combination of tools there was a problem, but I couldn’t find either of those.

Then I decided to manually modify a man page, and put quotes directly… Bingo. I could see that ' was actually rendered as `, but what caused it?

I then moved forward in versions and to my surprise they all had this issue, even the most recent version of groff.

What on Earth is going on here?

While doing this I noticed something different from my compiled version of groff, and Arch Linux’s version. In the compiled version of groff I saw links rendered as blue. When I saw the generated man pages I saw \m[blue], but I assumed that was for some other kind of troff program or something totally unrelated. But no, here I was seeing blue, but not with all the groff binaries.

So I tried to build groff with the same options as Arch Linux… Still blue. After trying a few things I eventually found a difference: Arch Linux installs a file /usr/share/groff/site-tmac/man.local, if I remove that file the blue color returns. Inside the file there’s:

\" Shut off SGR by default (groff colors)
\" Require GROFF_SGR envvar defined to turn it on
if '\V[GROFF_SGR]'' \
  output x X tty: sgr 0

That’s it! If you export GROFF_SGR=1 on Arch Linux, you see man pages with colors, just like my compiled version. The reason my compiled version does this by default is that it doesn’t have Arch Linux’s man.local file.

GROFF_SGR

If you google GROFF_SGR you find that it’s not properly documented. Some distributions such as Debian and Arch Linux do disable groff’s colors, but they don’t document doing so. Debian “fixed it”. However, I don’t think most people are going to read the entirety of grotty’s man page, not even a little bit, so that doesn’t help, even if you are running Debian–where it’s documented.

However, if you read the man page, you will find another variable: GROFF_NO_SGR. Unlike GROFF_SGR, this one is standard, and it’s respected in all distributions.

This reminded me of trick I learned while reading Arch Linux’s installation guide:

man() {
    LESS_TERMCAP_md=$'\e[01;31m' \
    LESS_TERMCAP_me=$'\e[0m' \
    LESS_TERMCAP_so=$'\e[01;44;33m' \
    LESS_TERMCAP_se=$'\e[0m' \
    LESS_TERMCAP_us=$'\e[01;32m' \
    LESS_TERMCAP_ue=$'\e[0m' \
    command man "$@"
}

This code automatically converts parts of man pages to color (e.g. bold and underline), which looks much better than normal man pages, but it turns out it only works if you have groff’s SGR disabled, so… In other distributions you need to do GROFF_NO_SGR=1, for the above to work.

Cool. We found something.

Back to apostrophes

There’s something else in Arch Linux’s man.local:

char \' \N'39'

This converts \' to ', instead of groff’s default: \(aa (acute accent: ´). This is the reason why I could not reproduce the problem: Arch Linux was hiding it. However, this is the wrong way of fixing it. There is a reason groff developers decided to pick \(aa; they know better. Distribution packagers should not be overriding this.

Why did they do this? The change came due to task FS#9643 – man PKGBUILD shows slanted single quotes. This happened in 2008, which suggests there was indeed an issue around that time, and pacman documentation was built with asciidoc too.

Arch Linux fixed the issue in the wrong way, though. Debian chose a different path. In their bug report #507673 Shouldn’t parse ‘ to \’ they discussed the issue at length, and they correctly identified that the issue was in docbook-xsl (not groff), and if you are going to convert ' it should be to \(aq, but that would only work in groff. They also found that Pod::man had a portable solution:

.ie \n(.g .ds Aq (aq
.el .ds Aq '

This creates an alias: from Aq to (aq, but only when the program is groff, in all other programs it gets converted to '.

This is the correct solution in the correct layer, and generates the proper output everywhere.

But to check that it is the correct solution it would behoove us to understand what groff actually is. groff (or GNU troff) is a document formatting system; it receives text mixed with commands, and generates documents. A man page is just one of the many types of documents it can generate. It can for example generate a PDF.

So, let’s write a simple groff document:

.nf
single quotes: 'text'
single quoted quotes: \'text\'
apostrophe quote: \(aqtext\(aq
.fi

We can generate a PDF document using groff -T pdf test.groff > test.pdf. But this of course is not what we ultimately want, we want to generate a man page, in the same way as man does. To do that we need to specify the output device as utf8: groff -T utf8 test.groff > test.txt. This generates the following:

single quotes ': ’text’
single quoted quotes \': ´text´
apostrophe quote \(aq: 'text'

As you can see the output text is quite different from the input text; that’s what groff does. But this is only the case on a utf-8 system, if you specify the ascii output, then all the quotes above get translated to simply '.

The output with \(aq is correct in both utf-8 and ascii. And if we add the Aq alias:

apostrophe quote alias \(Aq: 'text'

That is indeed correct. The Debian fix seems to work. To make sure we would need a non-GNU troff, like in Solaris, but alas, I don’t have access to something like that, so I’m just going to assume it works in other systems (as other people reported it did).

This proper fix was eventually picked by docbook in 2010: Fixed bug #2412738 (apostrophe escaping) by applying the submitted patch. If you take a look at the code of git-filter-branch.1 you can see the fixed code in action:

git filter-branch --tree-filter *\(Aqrm filename*\(Aq HEAD

Therefore both Arch Linux’s and git’s workarounds are not necessary anymore. Yet they remain there ten years later.

Digging deeper

OK, so we found out what the issue was, and how it got fixed: in docbook, and also unnecessarily in git and Arch Linux (three different levels). But what caused it? Going back to groff from 2006 didn’t cause the issue, so what happened?

By looking back at docbook stylesheet’s history with git blame, I found out the commit that caused the issue: Reverted necessary escaping of backslash, dot, and dash. This commit happened in 2007, and it was made due to an internal limitation of docbook’s architecture.

So from 2007 to 2010 docbook stylesheets were generating wrong man pages, different projects worked-around the issue in different ways, but today–in 2021–these workarounds are not needed, and yet they remain in place.

Back to Git

After all this investigation I sent a patch to the Git project (doc: remove GNU troff workaround) to remove the GNU_ROFF option which clearly was not needed since at least ten years. But I also sent a comment about what I found regarding GROFF_SGR, and the trick to colorize man pages. In reply I received a suggestion to implement the LESS_TERMCAP trick into git help (which is basically an alias for man).

So I sent a patch (help: colorize man pages), and a big discussion propped up (typical due to the bike-shed effect). In that thread it was mentioned: “why not let the user configure man to do this?”. The problem is that you have too many moving parts; groff, man, git, less, distribution configurations, environment variables, aliases, workarounds, docbook and asciidoc bugs… And of course the thing that started it all: asciidoctor.

But it made me think: what is indeed the best way to configure man to do this?

After several days of investigation, and several days of trying options I arrived to what I think is the actual solution.

Solution

export MANPAGER="less -R --use-color -Dd+r -Du+b"
export MANROFFOPT="-c"

Unlike Arch Linux’s hack, the -D arguments to less are much more succinct, and they allow adding color (in addition to the style (e.g. underlined)), not removing information. So --color=d+r (long option for -D) converts d (Bold text) to r (red), and the + signifies add color (i.e. don’t remove the bold attribute). Moreover --use-color adds other colors to the less interface; the prompt is cyan, searches are in green, and warnings in yellow.

And instead of the the ugly GROFF_SGR=1, we can tell man to pass -c to groff.

So the full command is:

groff -T utf8 -m man -c git-filter-branch.1 | less -R --use-color -Dd+r -Du+b

No man involved. This is way simpler… Why is nobody using this?

After I updated my patch and other people tested it, it became clear it didn’t always work. In particular older versions of less did not have the -D options (at least not for Linux). So I checked the history of less and I found out that they enabled -D for Linux systems in 2021.

No wonder everyone is still using the LESS_TERMCAP_* variables. Nobody knows of the new option, because it’s too new!

So the patch to remove the GNU_ROFF option in Git (totally necessary in 2021) is there. And so are the updated Arch Linux instructions to use the new -D flags of less.

If you want to properly colorize man pages, you do this:

export MANPAGER="less -R --use-color -Dd+r -Du+b"
export MANROFFOPT="-c"

If you want to colorize other similar documentation (like Ruby’s documentation):

export RI_PAGER="less -R --use-color -Dd+r -Du+b"

And so on. And if you want less to format everthing:

export LESS='-RXF --use-color -Dd+r$Du+b'

That’s it. If you are running a recent enough version of less, everything works perfectly with a simple configuration.

Oh, also, I realized asciidoctor didn’t have the portable fix, so I sent them a patch that is now merged. I found an issue with less colors and searches that is fixed now. And I reported their unnecessary workaround to Arch Linux too.

Why renaming Git’s master branch is a terrible idea

Back in May (in the inauspicious year of 2020) a thread in the Git mailing list with the tile of “rename offensive terminology (master)” was started, it lasted for more than a month, and after hundreds of replies, no clear ground was gained. The project took the path of least resistance (as you do), and the final patch to do the actual rename was sent today (November).

First things first. I’ve been a user of Git since 2005 (before 1.0), and a contributor since 2009, but I stopped being active, and only recently started to follow the mailing list again, which is why I missed the big discussion, but just today read the whole enchilada, and now I’m up-to-date.

The discussion revolved around five subjects:

  1. Adding a new configuration (init.defaultbranch)
  2. Should the name of the master branch be changed?
  3. Best alternative name for the master branch
  4. Culture war
  5. The impact to users

I already sent my objection, and my rationale as to why I think the most important point–the impact to users–was not discussed enough, and in fact barely touched.

In my opinion the whole discussion was a mess of smoke screen after smoke screen and it never touched the only really important point: users. I’m going to tackle each subject separately, leaving the most important one at the end, but first I would like to address the actual context and some of the most obvious fallacies people left at the table.

The context

It’s not a coincidence that nobody found the term problematic for 15 years, and suddenly in the height of wokeness–2020 (the year of George Floyd, BLM/ANTIFA uprising, and so on)–it magically becomes an issue. This is a solution looking for a problem, not an actual problem, and it appeared precisely at the same time the Masters Tournament received attention for its name. The Masters being more renowned than Git certainly got more attention from the press, and plenty of articles have been written explaining why it makes no sense to link the word “masters” to slavery in 2020 in this context (even though the tournament’s history does have some uncomfortable relationship with racism) (No, the masters does not need renaming, Masters Name Offensive? Who Says That?, Will Masters Be Renamed Due to BLM Movement? Odds Favor “No” at -2500, Calls for The Masters to change its name over ‘slave’ connotations at Augusta). Few are betting on The Masters actually changing its name.

For more woke debates, take a look at the 2 + 2 = 5 debate (also in 2020).

The obvious fallacies

The most obvious fallacy is “others are doing it”. Does it have to be said? Just because all your friends are jumping off a cliff doesn’t mean you should too. Yes, other projects are doing it, that doesn’t mean they don’t have bad reasons for it. This is the bandwagon fallacy (argumentum ad populum).

Even if it was desirable for the git.git project to change the name of the master branch for itself–just like the Python project did, it’s an entirely different thing to change the name of the master branch for everyone. The bandwagon argument doesn’t even apply.

The second fallacy comes straight out of the title “offensive terminology”. This is a rhetorical technique called loaded language; “what kind of person has to deny beating his wife?”, or “why do you object to the USA bringing democracy to Iraq?”. Before the debate even begins you have already poisoned the well (another fallacy), and now it’s an uphill battle for your opponents (if they don’t notice what you are doing). It’s trying to smuggle a premise in the argument without anyone noticing.

Most people in the thread started arguing why it’s not offensive, while the onus was on the other side to prove that it was offensive. They had the burden of proof, and they inconspicuously shifted it.

If somebody starts a debate accusing you of racism, you already lost, especially if you try to defend yourself.

Sorry progressives, the word “master” is not “offensive terminology”. That’s what you have to prove. “What kind of project defends offensive terminology?” Is not an argument.

Adding a new configuration

This one is easy. There was no valid reason not to add a new configuration. In fact, people already had configurations that changed the default branch. Choice is good, this configuration was about making it easier to do what people were already doing.

The curious thing is that the only places in the thread where the configuration was brought up was as a diversion tactic called motte and bailey.

What they started with was a change of the default branch, a proposition that was hard to defend (bailey), and when opponents put enough pressure they retreated to the most defensible one (motte): “why are you against a configuration?”

No, nobody was against adding a new configuration, what people were against was changing the default configuration.

Should the name of the master branch be changed?

This was the crux of the matter, so it makes sense that this is where most of the time debating was spent. Except it wasn’t.

People immediately jumped to the next point, which is what is a good name for the default branch, but first it should be determined that changing the default is something desirable, which was never established.

You don’t just start discussing with your partner what color of apartment to choose. First, your girlfriend (or whatever) has to agree to live together!

Virtually any decision has to be weighted in with pros and cons, and they never considered the cons, nor established any real pro.

Pro

If the word “master” is indeed offensive, then it would be something positive to change it. But this was never established to be the case, it was just assumed so. Some arguments were indeed presented, but they were never truly discussed.

The argument was that in the past (when slavery was a thing), masters were a bad thing, because they owned slaves, and the word still has that bad connotation.

That’s it. This is barely an argument.

Not only is very tenuously relevant in the present moment, but it’s not actually necessarily true. Slavery was an institution, and masters simply played a role, they were not inherently good or bad. Just because George Washington was a slave owner, that doesn’t mean he was a monster, nor does it mean the word “master” had any negative connotation back then. It is an assumption we are making in the present, which, even if true: it’s still an assumption.

This is called presentism. It’s really hard to us to imagine the past because we didn’t live it. When we judge it we usually judge it wrong because we have a modern bias. How good or bad masters were really viewed by their subjects is a matter for debate, but not in a software project.

Note: A lot of people misunderstood this point. To make it crystal clear: slavery was bad. The meaning of the word “master” back then is a different issue.

Supposing that “master” was really a bad word in times of slavery (something that hasn’t been established), with no other meaning (which we know it isn’t true) this has no bearing in the modern world.

Prescriptivism

A misunderstanding many people have of language is the difference between prescriptive and descriptive language. In prescriptivism words are dictated (how they ought to be used). In descriptivism words are simply described (how they are actually used). Dictionaries can be found on both camps, but they are mainly on the descriptive side (especially the good ones).

This misunderstanding is the reason why many people think (wrongly) that the word “literally” should not mean “virtually” (even though many people use it this way today). This is prescriptiveness, and it doesn’t work. Words change meaning. For example, the word “cute” meant “sharp” in the past, but it slowly changed meaning, much to the dismay of prescriptivists. It does not matter how much prescriptivists kick and scream; the masses are the ones that dictate the meaning of words.

So it does not matter what you–or anyone–thinks, today the word “literally” means “virtually”. Good dictionaries simply describe the current use, they don’t fight it (i.e. prescribe against it).

You can choose how you use words (if you think literally should not mean virtually, you are free to not use it that way). But you cannot choose how others use language (others decide how they use it). In other words, you cannot prescribe language, it doesn’t matter how hard you try: you can’t fight everyone.

Language evolves on its own, and like democracy: it’s dictated by the masses.

So, what do the masses say about the word “master”? According to my favorite dictionary (Merriam-Webster):

  1. A male teacher
  2. A person holding an academic degree higher than a bachelor’s but
    lower than a doctor’s
  3. The degree itself (of above)
  4. A revered religious leader
  5. A worker or artisan qualified to teach apprentices
  6. An artist, performer, or player of consummate skill
  7. A great figure of the past whose work serves as a model or ideal
  8. One having authority over another
  9. One that conquers or masters
  10. One having control
  11. An owner especially of a slave or animal
  12. The employer especially of a servant
  13. A presiding officer in an institution or society
  14. Any of several officers of court appointed to assist a judge
  15. A master mechanism or device
  16. An original from which copies can be made

These are not even all the meanings, just the noun meanings I found relevant to today, and the world in general.

Yes, there is one meaning which has a negative connotation, but so does the word “shit”, and being Mexican, I don’t get offended when somebody says “Mexico is the shit”.

So no, there’s nothing inherently bad about the word “master” in the present. Like all words: it depends on the context.

By following this rationale the word “get” can be offensive too, one of the definitions is “to leave immediately”. If you shout “get!” to a subordinate, that might be considered offensive (and with good reason)–especially if this person is a discriminated minority. Does that mean we should ban the word “get” completely? No, that would be absurd.

Also, there’s another close word that can be considered offensive: git.

Prescriptives would not care how the word is actually used today, all they care about is to dictate how the word should be used (in their opinion).

But as we saw above: that’s not how language works.

People will decide how they want to use the word “master”. And thanks to the new configuration “init.defaultbranch”, they can decide how not to use that word.

If and when the masses of Git users decide (democratically) to shift away from the word “master”, that’s when the Git project should consider changing the default, not before, and certainly not in a prescriptive way.

Moreover, today the term is used in a variety of contexts that are unlikely to change any time soon (regardless of how much prescriptivists complain):

  1. An important room (master bedroom)
  2. An important key (master key)
  3. Recording (master record)
  4. An expert in a skill (a chess master)
  5. The process of becoming an expert (mastering German)
  6. An academic degree (Master of Economics)
  7. A largely useless thing (Master of Business Administration [MBA])
  8. Golf tournaments (Masters Tournament [The Masters])
  9. Famous classes by famous experts (MasterClass Online Classes)
  10. Online tournament (Intel Extreme Masters [IEM])
  11. US Navy rank (Master-at-Arms [MA])
  12. Senior member of a university (Master of Trinity College)
  13. Official host of a ceremony (master of ceremonies [MC])
  14. Popular characters (Jedi Master Yoda)
  15. A title in a popular game (Dungeon Master)
  16. An important order (Grand Master)
  17. Vague term (Zen master)
  18. Stephen Hawking (Master of the Universe)
  19. Inside joke (master of your domain)

And many, many more.

All these are current uses of the word, not to mention the popular BDSM context, where having a master is not a bad thing at all.

Subjectiveness

Even if we suppose that the word is “bad” (which is not), changing it does not solve the problem, it merely shuffles it around. This notion is called language creep (also concept creep). First there’s the n-word (which I don’t feel comfortable repeating, for obvious reasons), then there was another variation (which ends in ‘o’, I can’t repeat either), then there was plain “black”, but even that was offensive, so they invented the bullshit term African-American (even for people that are neither African, nor American, like British blacks). It never ends.

This is very well exemplified in the show Orange Is The New Black where a guard corrects another guard for using the term “bitches”, since that term is derogatory towards women. The politically correct term now is “poochies”, he argues, and then proceeds to say: “these fucking poochies”.

Words are neither good or bad, is how you use them that make them so.

You can say “I love you bitches” in a positive way, and “these fucking women make me vomit” in a completely derogatory way.

George Carlin became famous in 1972 for simply stating seven words he was forbidden from using, and he did so in a completely positive way.

So no, even if the word “master” was “bad”, that doesn’t mean it was always bad.

But supposing it’s always bad, who are the victims of this language crime? Presumably it’s black people, possibly descended from slaves, who actually had masters. Do all black people find this word offensive? No.

I’m Mexican, do I get offended when somebody uses the word “beaner”? No. Being offended is a choice. Just like nobody can make you angry, it’s you the one that gets angry, nobody inflicts offense on other people, it’s the choice of the recipients. There’s people with all the reason in the world, who don’t get offended, and people that have no reason, and yet they get easily offended. It’s all subjective.

Steve Hughes has a great bit explaining why nothing happens when you get offended. So what? Be offended; nothing happens. Being offended is part of living in a society. Every time you go out the door you risk being offended, and if you can’t deal with that, then don’t interact with other people. It’s that simple.

Collective Munchausen by proxy

But fine, let’s say for the sake of argument that “master” is a bad word, even on modern times, in any context, and the people that get offended by it have all the justification in the world (none of which is true). How many of these concerned offended users participated in the discussion?

Zero.

That’s right. Not one single person of African descent (or whatever term you want to use) complained.

What we got instead were complainers by proxy: people who get offended on behalf of other (possibly non-existent) people.

Gad Saad coined a term Collective Munchausen by proxy that explains the irrationality of modern times. He borrows from the established disorder called Munchausen Syndrome by Proxy.

So you see, Munchausen is when you feign illness to gain attention. Munchausen by proxy is when you feign the illness of somebody else to gain attention towards you. Collective Munchausen is when a group of people feign illness. And collective Munchausen by proxy is when a group of people feign the illness of another group of people.

If you check the mugshots of BLM activists arrested, most of them are actually white. Just like the people pushing for the rename (all white), they are being offended on behalf of others by proxy.

Black people did not ask for this (the master rename (but probably many don’t appreciate the destruction of their businesses in riots either)).

Another example is the huge backlash J. K. Rowling received for some supposedly transphobic remarks, but the people that complained were not transgender, they were professional complainers that did so by proxy. What many people in the actual transgender community said–like Blair White–is that this was not a real issue.

So why on Earth would a group of people complain about an issue that doesn’t affect them directly, but according to them it affects another group of people? Well, we know it has nothing to do with the supposed target victim: black people, and everything to do with themselves: they want to win progressive points, and are desperate to be “on the right side of history”.

They are like a White Knight trying to defend a woman who never asked for it, and in fact not only can she defend herself, but she would prefer to do so.

This isn’t about the “victims”, it’s all about them.

The careful observer probably has already noticed this: there are no pros.

Cons

Let’s start with the obvious one: it’s a lot of work. This is the first thing proponents of the change noticed, but it wasn’t such a big issue since they themselves offered to do the work. However, I don’t think they gauged the magnitude of the task, since just changing the relevant line of code basically breaks all the tests.

The tests are mostly done now, but all the documentation still needs to be updated. Not only the documentation of the project, but the online documentation too, and the Pro Git book, and plenty of documentation scattered around the web, etc. Sure, a lot of this doesn’t fall under the purview of Git developers, but it’s something that somebody has to do.

Then we have the people that are not subscribed to the mailing list and are completely unaware that this change is coming, and from one day to the next they update Git and they find out there’s no master branch when they create a new repository.

I call these the “silent majority”. The vast majority of Git users could not tell you the last Release Notes they read (probably because they haven’t read any). All they care about is that Git continues to work today as it did yesterday.

The silent majority doesn’t say anything when Git does what it’s supposed to do, but oh boy do they complain when it doesn’t.

This is precisely what happened in 2008, when Git 1.6.0 was released, and suddenly all the git-foo commands disappeared. Not only did end-users complained, but so did administrators in big companies, and distribution maintainers.

This is something any project committed to its user-base should try to avoid.

And this is a limited list, there’s a lot more than could go wrong, like scripts being broken, automated testing on other projects, and many many more.

So, on one side of the balance we have a ton of problems, and in other: zero benefits. Oh boy, such a tough choice.

Best alternative name for the master branch

Since people didn’t really discuss the previous subject, and went straight to the choice of name, this is where they spent a lot of the time, but this is also the part where I paid less attention, since I don’t think it’s interesting.

Initially I thought “main” was a fine replacement for “master”. If you had to choose a new name, “main” makes more sense, since “master” has a lot of implications other than the most important branch.

But then I started to read the arguments about different names, and really think about it, and I changed my mind.

If you think in terms of a single repository, then “main” certainly makes sense; it’s just the principal branch. However, the point of Git is that it’s distributed, there’s always many repositories with multiple branches, and you can’t have multiple “main” branches.

In theory every repository is as important as another, but in practice that’s not what happens. Humans–like pretty much all social animals–organize themselves in hierarchies, and in hierarchies there’s always someone at the top. My repository is not as important as the one of Junio (the maintainer).

So what happens is that my master branch continuously keeps track of Junio’s master branch, and I’d venture to say the same happens for pretty much all developers.

The crucial thing is what happens at the start of the development; you clone a repository. If somebody made a clone of you, I doubt you would consider your clone just as important as you. No, you are the original, you are the reference, you are the master copy.

The specific meaning in this context is:

an original from which copies can be made

Merriam-Webster

In this context the word has absolutely nothing to do with master/slaves. The opposite of a master branch is either a descendant (most branches), or an orphan (in rare cases).

The word “main” may describe correctly a special branch among a bunch of flat branches, but not the hierarchical nature of branches and distributed repositories of clones of clones.

The name “master” fits like a glove.

Culture war

This was the other topic where a lot of time was spent on.

I don’t want to spend too much time on this topic myself–even though it’s the one I’m most familiar with–because I think it’s something in 2020 most people are faced with already in their own work, family, or even romantic relationships. So I’d venture to say most people are tired of it.

All I want to say is that in this war I see three clear factions. The progressives, who are in favor of ANTIFA, BLM, inclusive language, have he/him in bio, use terms like anti-racism, or intersectional feminism, and want to be “on the right side of history”. The anti-progressives, who are pretty much against the progressives in all shapes or forms, usually conservatives, but not necessarily so. But finally we have the vast majority of people who don’t care about these things.

The problem is that the progressives are trying to push society into really unhealthy directions, such as blasphemy laws, essentially destroying the most fundamental values of modern western society, like freedom of speech.

The vast majority of people remain silent, because they don’t want to deal with this obvious nonsense, but eventually they will have to speak up, because these dangerous ideologies are creeping up everywhere.

For more about the subject I can’t recommend enough the new book of Gad Saad: The Parasitic Mind: How Infectious Ideas Are Killing Common Sense.

It really is a parasitic mindset, and sensible people must put a stop to it.

Update: The topic has been so controversial that as a result of this post reddit’s r/git decided to ban the topic completely, and remove the post. Hacker News also banned this post.

The impact to users

I already touched on this on the cons of the name change, but what I didn’t address are the mitigation strategies that could be employed.

For any change there’s good and bad ways of going about it.

Even if the change from “master” to “main’ was good and desirable (which it isn’t), simply jumping to it in the next version (Git 2.30) is the absolute worst way of doing it.

And this is precisely what the current patch is advancing.

I already briefly explained what happened in 2008 with the v1.6.0 release, but what I find most interesting is that looking back at those threads many of the arguments of how not to do a big change, apply exactly in the same way.

Back then what most people complained about was not the change itself (from git-foo to “git foo“) (which they considered to be arbitrary), but mainly the manner in which the change was done.

The main thing is that there was no deprecation period, and no clear warning. This lesson was learned, and the jump to Git 2.0 was much smoother precisely because of the warnings and period of adjustment, along with clear communication from the development team about what to expect.

This is not what is being done for the master branch rename.

I also find what I told Linus Torvalds very relevant:

What other projects do is make very visible when something is deprecated, like a big, annoying, unbearable warning. Next time you deprecate a command it might be a good idea to add the warning each time the command is used, and obsolete it later on.

Also, if it’s a big change like this git- stuff, then do a major version bump.

If you had marked 1.6 as 2.0, and added warnings when you deprecated the git-foo stuff then the users would have no excuse. It would have been obvious and this huge thread would have been avoided.

I doubt anyone listened to my suggestion, but they did this for 2.0, and it worked.

I like to refer to a panel Linus Torvalds participated in regarding the importance of users (educating Lennart Poettering). I consider this an explanation of the first principles of software: the main purpose of software is that it’s useful to users, and that it continues to be useful as it moves forward.

“Any time a program breaks the user experience, to me that is the absolute worst failure that a software project can make.”

Linus Torvalds
Linus Torvalds schools Lennart Poettering on the importance of users

Now it’s the same mistake of not warning the users of the upcoming change, except this time it’s much worse, since there’s absolutely no good reason for the change.

Update: the git.git developers listened to my advice and they eventually added a warning.

The Git project is simply another victim of the parasitic mindset that is infecting our culture. It’s being held hostage by a tiny amount of people pushing for a change nobody else wants, would benefit no one, would affect negatively everyone, and they want to do it in a way that maximizes the potential harm.

If I was a betting man, my money would be on the users complaining about this change when it hits them on the face with no previous warning.

What’s missing in Git v2.0.0

I recently blogged about the Git v2.0.0 release, what changed, and why should you care. Unfortunately the conclusion was that nothing much changed (other than the usual new features and bug fixes). In this post I will discuss what should have changed, and why.

What is needed

Fortunately, Git has had the Git User’s Survey in the past, so we know what users want.

  1. user-interface: 3.25
  2. documentation: 3.22
  3. tools (e.g. GUI): 3.01
  4. more features: 2.41
  5. portability: 2.34
  6. performance: 2.28
  7. community (mailing list): 1.70
  8. localization (translation): 1.65
  9. community (IRC): 1.65

Obviously, since user-interface and documentation are the areas that need more improvement, that’s what Git v2.0.0 should have focused, right?

History

I already mentioned this in the other post, but I’ll do it again.

First of all, Git as a long history of never breaking user expectations (other than the Git v1.6.0 fiasco (which changed all the git-foo commands with ‘git foo’)), and as such a lot of thought is devoted into ways to minimize changes in behavior, or even how to avoid it completely. Perhaps too much care is devoted into this.

The preparation for Git v2.0.0 started more than three years ago with a mail from Junio C Hamano, asking for developers to submit ideas for changes that normally would not happen because they break backwards compatibility, he invited us to think as if “we were writing Git from scratch”. This big release that would break backwards compatibility was going to be named “1.8.0″ and people started to submit ideas for this important release. Eventually too much time passed, the versioning scheme changed, v1.8.0 was released, and the changes proposed for v1.8. slipped into what is now v2.0.

Since no substantial changes in behavior happened since v1.0, it would follow that v2.0 was an important release, and a good opportunity to gather all the ideas about what needs to change in Git. However, seemingly out of nowhere, without any discussion or even a warning, the maintainer tagged v2.0.0-rc0, and therefore all the features that were not already merged couldn’t be merged for v2.0.0.

Thus v2.0.0 was destined to have a small list of changes, and that’s how it remained.

What could have changed

The following is a list of things that I argued should be part of Git v2.0.0.

git update

I wrote a whole post about the issue, but basically, ‘git pull‘ is broken for the most common use-case: update the current branch.

This is a known issue that has been discussed over and over, and everyone agrees that it is indeed an issue, and something needs to be done to fix it.

There have been different proposals, but by far the most comprehensive and simple is to add a new ‘git update‘ command.

This way when you want to merge a pull request, you do ‘git pull‘, and when you just want to update the current branch, you do ‘git update‘, which by default would barf if there’s divergence between your local branch (e.g. ‘master’), and the remote one (e.g. ‘origin/master’), instead of doing a merge by default. This should decrease substantially the amount of “evil merges”, merges that happened by mistake, usually by somebody that is not familiar with Git.

The patches are relatively new, but the command is simple, so there isn’t much danger of screwing things up.

The publish tracking branch

I also wrote a blog post about this; basically Git’s support for triangular workflows is not the best.

A triangular workflow is when you pull from one location (e.g. central repo), and push to another (e.g. personal GitHub fork). If you are using upstream tracking branches (you should), you have to make a decision where you set your upstream; the central repo, or your personal one. Depending on which you use, is the advantages you get, but you cannot have it all.

But with the publish tracking branch you can have all the advantages.

I’ve been cooking these patches for a long long time and I have to say this is one essential feature for me, and they patches work perfectly.

Support for Mercurial and Bazaar

Support for Mercurial and Bazaar repositories has been cooking for a long time in the “contrib” area (you can both pull and push). At this point in time the code is production-ready, and it was already graduated and merged to be released in Git v2.1.

However, the maintainer suddenly changed his mind and decided it would be better to distribute them as third party tools. He didn’t give any valid reason and clearly didn’t think it through, but they are now separate.

The code is already widely used (git-remote-hg, git-remote-bzr), and could easily be merged.

Use “stage” instead of “index”

Everybody agrees that “index” is a horrible name for Git’s “staging area”, however, nobody has done much to fix the problem.

One first step is to replace all the –cached and –index options with –staged and –no-work, which are much simpler to understand.

Another step is to add a ‘git stage‘ command that acts as a helper to work with the staging area: ‘git stage add‘, ‘git stage diff‘, ‘git stage reset‘, ‘git stage rm‘, ‘git stage edit‘, and so on.

The patches are very straight-forward.

Default aliases

Virtually every version control system has default aliases (e.g. hg co, cvs ci, svn di, etc.), except Git.

Adding default aliases is very simple to do and only brings advantages. If you don’t like the default alias, you can override it.

Patches here.

Shoulda coulda woulda

It would have been great if you could just do ‘git clone hg::mercurial-repo‘ without installing anything extra, if everybody could start using ‘git update‘ instead of ‘git pull‘, if you could do ‘git stage diff‘, or ‘git reset --stage‘. Also, if triangular workflows were properly supported.

Unfortunately that’s not the case, and Git v2.0.0 is already released, and there isn’t much to be excited about.

You might think “perhaps for Git v3.0” (which could happen in two years, or ten, how knows), but if the past is any indication of the future, it won’t happen, specially since I’ve given up on all these patches.

The fact of the matter is that in every release of Git, there is only one focus: performance. Despite the fact that it’s #6 in the list of concerns of users, Git developers work on this because that’s their area of expertise, because it’s fun for them, and because they get paid to do so. There are occasional new features, and a bit of portability now and then, but for the most part Windows support is neglected in Git, which is why the msysgit project was born.

The documentation will always remain cryptic, because for the developers, it’s not cryptic, it’s very clear. And the user-interface will never change, because the developers don’t like change.

If you don’t believe me look at the backwards-incompatible changes in Git v2.0.0, or in fact, try to think back to the last time Git changed anything. Personally other than the git-foo -> ‘git foo’ change in v1.6.0 (which was horribly handled), I can’t think of anything but minor changes.

Anyway, you can use all these features I listed today (and more) if you use git-fc instead of Git. It is my own fork of Git that has all the features of Git, plus more.

Is there anything in that list that I missed? Do you think Git v2.0.0 has enough changes as it is?

Git v2.0.0, what changed, and why should you care

Git v2.0.0 is a backward-incompatible release, which means you should expect differences since the v1.x series.

Unless you’ve been following closely the Git mailing list, you probably don’t know the history behind the v2.0 release, which started long time ago (more than three years). It all started with a mail from Junio C Hamano, asking for developers to submit ideas for changes that normally would not happen because they break backwards compatibility, he invited us to think as if “we were writing Git from scratch”. This big release that would break backwards compatibility was going to be named “1.8.0” and people started to submit ideas for this important release. Eventually too much time passed, the versioning scheme changed, v1.8.0 was released, and the changes proposed for v1.8. slipped into what is now v2.0.

Parts of v2.0 have been already been deployed one way or the other (for example if you have configured ‘push.default = simple’), but finally today we have v2.0 final. And here are the big changes that we got.

‘git push’ default has changed

Here’s what the release notes say:

When "git push [$there]" does not say what to push, we have used the
traditional "matching" semantics so far (all your branches were sent
to the remote as long as there already are branches of the same name
over there).  In Git 2.0, the default is now the "simple" semantics,
which pushes:

 - only the current branch to the branch with the same name, and only
   when the current branch is set to integrate with that remote
   branch, if you are pushing to the same remote as you fetch from; or

 - only the current branch to the branch with the same name, if you
   are pushing to a remote that is not where you usually fetch from.

You can use the configuration variable "push.default" to change
this.  If you are an old-timer who wants to keep using the
"matching" semantics, you can set the variable to "matching", for
example.  Read the documentation for other possibilities.

Is that clear? Given the bad track record of Git documentation it wouldn’t surprise me if you didn’t get what this chunk of text is trying to say at all. Personally I find it much easier to read the code to figure out what is happening.

So let me try to explain. When you type ‘git push’ (without any arguments), Git uses the configuration ‘push.default’ in order to find out what to push. Before ‘push.default’ defaulted to ‘matching’, and now it defaults to ‘simple’.

The ‘matching’ configuration essentially converts ‘git push‘ into ‘git push origin :‘, which means push all the matching branches, so if you have a local ‘master’, and there’s a remote ‘master’, ‘master’ is pushed; if you have a local and remote ‘fix-1’, ‘fix-1’ is pushed, if you have a local ‘ext-feature-1’, but there’s no matching remote branch, it’s not pushed, and so on.

The ‘simple’ configuration pushes a single branch instead, and it uses your configured upstream branch (see this post for a full explanation of the upstream branch), so if your current branch is ‘master’, and if ‘origin/master’ is the upstream of your ‘master’ branch, ‘git push’ will basically be the same as ‘git push origin master‘, or to be more specific ‘git push origin master:master‘ (the upstream branch can have a different name).

Note: If you are not familiar with the src:dst syntax; you can push a local branch ‘src’ and have the ‘dst’ name on the server, so you don’t need to rename a local branch, you can do ‘git push origin foobar:feature-a’, and your local branch “foobar” will be named “feature-a” on the server. This has nothing to do with v2.0.

However, if the current branch is ‘fix-1’ and the upstream is ‘origin/master’, ‘git push’ will complain that the name of the destination branch is not the same, because it doesn’t know if to do ‘git push origin fix-1:master‘ or ‘git push origin fix-1:fix-1‘.

Additionally if you do ‘git push github‘ (not the remote of your upstream branch), Git will simply use the name of the current branch, essentially ‘git push github fix-1‘ (‘fix-1’ being the name of the current branch).

This mode is anything but simple to describe. But perhaps the name is OK, because you can expect it to “simply work”.

Would I care?

If you don’t type ‘git push’, but instead specify what and where to push… you don’t care.

If you have configured ‘push.default’ already, which most likely you already did, because otherwise you will be getting the following annoying message all the time since two years ago… you don’t care.

warning: push.default is unset; its implicit value is changing in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the current behavior after the default changes, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

When push.default is set to 'matching', git will push local branches
to the remote branches that already exist with the same name.

In Git 2.0, Git will default to the more conservative 'simple'
behavior, which only pushes the current branch to the corresponding
remote branch that 'git pull' uses to update the current branch.

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

So, most likely you don’t care.

‘git add’ in directory

Here’s what the release notes say:

When "git add -u" and "git add -A" are run inside a subdirectory
without specifying which paths to add on the command line, they
operate on the entire tree for consistency with "git commit -a" and
other commands (these commands used to operate only on the current
subdirectory).  Say "git add -u ." or "git add -A ." if you want to
limit the operation to the current directory.

Although this is a clearer explanation, it’s not very clear what is changing, so let me give you can example.

Say you have modified two files, ‘README’ and ‘test/basic.t’, then you go to the ‘test’ directory, and run ‘git add -u‘, in pre-v2.0 only ‘test/basic.t’ will be staged, in post-v2.0 both files will be staged. If you run the command in the top level directory, nothing changes.

Would I care?

If you haven’t seen the following warning while doing ‘git add -u‘ or ‘git add -A‘, or if you don’t even use those options, you are fine.

warning: The behavior of 'git add --update (or -u)' with no path argument from a
subdirectory of the tree will change in Git 2.0 and should not be used anymore.
To add content for the whole tree, run:

  git add --update :/
  (or git add -u :/)

To restrict the command to the current directory, run:

  git add --update .
  (or git add -u .)

With the current Git version, the command is restricted to the current directory.

‘git add’ adds removals

Here’s what the release notes say:

"git add " is the same as "git add -A " now, so that
"git add dir/" will notice paths you removed from the directory and
record the removal.  In older versions of Git, "git add " used
to ignore removals.  You can say "git add --ignore-removal " to
add only added or modified paths in , if you really want to.

Again, it should be clearer with an example. Say you removed the file ‘test/basic.t’ and added a new file ‘test/main.t’, those changes are not staged, so you stage them with ‘git add test/’, pre-v2.0 ‘test/basic.t’ would remain tracked, post-v2.0, ‘test/basic.t’ is removed from the stage.

Would I care?

If you haven’t seen the following warning while doing ‘git add‘, you are fine.

warning: You ran 'git add' with neither '-A (--all)' or '--ignore-removal',
whose behaviour will change in Git 2.0 with respect to paths you removed.
Paths like 'test/basic.t' that are
removed from your working tree are ignored with this version of Git.

* 'git add --ignore-removal ', which is the current default,
  ignores paths you removed from your working tree.

* 'git add --all ' will let you also record the removals.

Run 'git status' to check the paths you removed from your working tree.

The rest

The "-q" option to "git diff-files", which does *NOT* mean "quiet",
has been removed (it told Git to ignore deletion, which you can do
with "git diff-files --diff-filter=d").

Most people don’t use this command, thus don’t care.

"git request-pull" lost a few "heuristics" that often led to mistakes.

Again, most people don’t use this command, which is mostly broken anyway.

The default prefix for "git svn" has changed in Git 2.0.  For a long
time, "git svn" created its remote-tracking branches directly under
refs/remotes, but it now places them under refs/remotes/origin/ unless
it is told otherwise with its "--prefix" option.

If you don’t use ‘git svn’, you don’t care. If you don’t see a difference between ‘trunk’ and ‘origin/trunk’, you don’t care.

tl;dr

You probably don’t care about these backward-incompatible changes. Sure, Git v2.0.0 received a good dosage of new features and bug-fixes, but so did v1.9.0, and all the versions before.

Given the fact that Git v2.0.0 has been cooking for three years, I think it’s a big missed opportunity that nothing really changed, specially given that in previous user surveys people have said the user-interface and documentation needs to improve, and there have been patches to try to do so. In a separate post I discuss what I think Git v2.0.0 should have included.

Is ‘git pull’ broken? If so, what’s the fix?

Is ‘git pull’ really broken? I know what you are thinking; such a pervasive and basic command cannot possibly be broken. Unfortunately, it is.

It is not some marginal issue, many experienced Git users avoid ‘git pull’ and even urge newcomers to avoid using that command, there’s many sites that encourage you to not use the command, and there have been a lot of threads on the mailing list about the issue (Pull is mostly evil, A failing attempt to use Git in a centralized environment), the maintainer, Junio C Hamano has accepted there’s a big problem, even Linus Torvalds agreed something needs to change.

In order to identify the problem we first need to define the two main ways ‘git pull’ is used.

Pull requests

One way ‘git pull’ is used, is to integrate pull requests into the mainline. For example in the Linux kernel, the DRM maintainer sends a pull request to Linus Torvalds, saying basically:

The following changes are available in the git repository at:

git://people.freedesktop.org/~airlied/linux drm-next

So Linus can just do:

git pull git://people.freedesktop.org/~airlied/linux drm-next

In this mode ‘git pull’ actually works fine, which is not too surprising, since it’s the main thing Linus Torvalds does.

However, this is not the way most people use ‘git pull’.

Update branch

What most people do is for example update their local ‘master’ branch, to the remote ‘origin/master’ branch. Essentially doing ‘git fetch origin’, ‘git merge origin/master’.

However, that’s not exactly what most people actually want to do.

If you don’t have any changes of your own in ‘master’, then yes, ‘git pull’ does what you want, but if you do have changes, and thus the branches have diverged, then ‘git pull’ will create a new merge commit. This might or might not be what you want, but the majority of Git newbies do not want that, or rather, the team they contribute to don’t want those “evil merges”. Unfortunately these newbies don’t know what they are doing, and Git is not making it easier.

So you end up with something like this:

git-pull

Most likely what the team wants is that the local chances are rebased on top of the remote ones, but if they want a merge, they want it the other way around, that is: merge the local changes to the remote ones, as if a topic branch was merged.

git-pull-fix

A merge with this order of parents has many advantages, including a clearer history, however, it’s not possible to do that with ‘git pull’, so you have to do ‘git fetch’, create a new branch, switch to the master branch, merge the other branch, and finally remove the other branch. It’s not straight-forward at all.

It is this mode that is broken, and that’s the reason many people try to avoid ‘git pull’; it rarely does what you want by default.

The solution

There have been many solutions proposed, however, there are many many use-cases to consider, and a solution that takes them all into consideration for the future is not easy to find.

The best solution that seems to accommodate all present use-cases and future ones is the introduction of a new command: ‘git update‘.

By default this command will complain if the branches have diverged, so you have to either do ‘git update --rebase‘ or ‘git update --merge‘, this ensures that newbies aren’t going to do “evil merges” by mistake.

Also, when you do a ‘git update --merge‘ the order of the parents is reversed, which means it appears you are merging ‘master’ to ‘origin/master’, and not the other way around as it happens with ‘git pull’, which means it appears as if you are merging a topic branch, which is what most people want.

git-update

There are many many more advantages to this new command, but probably too subtle to mention in this post.

When will this be ready?

Probably never. I sent a summary of the issues and the solution to the mailing list, which addresses all the use-cases that were discussed. I have the required patches with tests and documentation on my personal branch, and I’ve been using this new command for a while now.

Why isn’t this picked? Maybe it’s because none of the core developers experience these issues. Maybe because they don’t use ‘git pull’ in the second form. Who knows.

The fact is that there is no interest to get this fixed, even though the issue has been acknowledged, so it’s not likely to be fixed any time soon.

So what can you do about it? The best thing you can do right now is simply avoid using ‘git pull’. Additionally, you might want to instruct your fellow coworkers to avoid unsing it as well, specially the ones that are not very familiar with Git.

Also, you might want to use my fork, git-fc, which does have the ‘git update‘ command, which works better than ‘git pull‘ even when there’s no branch divergence, and when there is, ‘git update --merge‘ is also superior, because the order of the parents is right.

Using Git with triangular workflows; tips, tricks, and more

Chances are you are using a triangular workflow, even if you don’t know it. A triangular workflow simply means that you pull from one repository, and push to another. This is what the vast majority of Git users do, unfortunately most of the good stuff is buried in the nearly incomprehensible official manpages.

In this blog post I’ll try to shine some light into triangular workflows, how to make use of the upstream tracking branch for them, and explain the new publish tracking branch.

The basics

Say you clone a repository:

% git clone https://github.com/tiimgreen/github-cheat-sheet
% cd github-cheat-sheet

Then you do some changes and want to share them back.

What most people would do is create a fork in GitHub and push their changes there.

% git remote add mine https://github.com/felipec/github-cheat-sheet
% git push mine

After doing that they do a pull request so their changes can be merged to the original repository.

This workflow is not specific to GitHub by any means, for example the Linux kernel developers have the main repository in git.kernel.org, and they send pull requests by mail using repositories all over the map (example).

The help

If you do this over and over it becomes clear that a little help from Git would be nice.

The first thing you can do is setup the configuration ‘remote.pushdefault’ to the repository you usually push to (in the above case ‘mine’). So now you can type `git push` instead of `git push mine` every time.

The next thing would be to setup an upstream tracking branch (read my blog post about it if you are not familiar with it).

% git branch --set-upstream-to mine/fix-typos

Then Git would greet you with the following help:

Your branch is ahead of 'mine/fix-typos' by 1 commit.

This is telling you that you probably want to push your branch again, since it’s not up-to-date in the remote. It shows you that each time you switch to that branch, or when you do `git status`.

Moreover, `git branch -vv` would show you this help:

* fix-typos ... [mine/fix-typos: ahead 1] Fix a bunch of typos

So it seems Git already has tons of help for this workflow, doesn’t it? Not so fast.

The real upstream

The upstream tracking branch is useful for other purposes, but for that we need to set a different upstream:

% git branch --set-upstream-to origin/master

Now that the upstream is ‘master’ in the ‘origin’ remote, and when you run `git status`, you get:

Your branch and 'origin/master' have diverged,
and have 2 and 10 different commits each, respectively.

What that message is telling you is that ‘origin/master’ has moved, so there are 10 commits in ‘origin/master’ that your branch doesn’t have (and your branch has 2 commits ‘origin/master’ doesn’t have). In those cases you probably would want to rebase on top of ‘origin/master’ so that it’s easier for upstream maintainers to merge your branch, although you can merge ‘origin/master’ too, or simply do nothing and hope there are no conflicts. Either way the information is useful so you can decide what to do.

In addition, if you want to rebase, the command is easier; instead of `git rebase origin/master` you can just type `git rebase`, since `git rebase` by default uses the upstream tracking branch.

Moreover, if you always stay up-to-date, you can do `git pull --rebase`, which will fetch all remote the branches, and then rebase your current branch (e.g. ‘fix-typos’) on top of the upstream (e.g. ‘origin/master’). You can also configure ‘pull.rebase = true’ to always do this when you type `git pull`.

Not to mention that `git branch -vv` gives a much more useful information:

* fix-typos ... [master: ahead 2, behind 10] Fix a bunch of typos

Check how it looks in my real repository:

git branch --vv with upstream

You get other additional benefits, like for example you get warned if you try to delete a branch that hasn’t been merged to its upstream:

warning: not deleting branch 'fix-typos' that is not yet merged to
'origin/master', even though it is merged to HEAD.
error: The branch 'fix-typos' is not fully merged.
If you are sure you want to delete it, run 'git branch -D fix-typos'.

This is actually what the upstream tracking branch is meant for: to track the upstream, that is; the target branch where eventually all the commits of the source branch eventually should end up. All the commits of ‘fix-typos’ should end up in ‘origin/master’, therefore ‘origin/master’ is the upstream of ‘fix-typos’.

We want to have all the goodies of tracking ‘origin/master’ as our upstream, but we also want to track ‘mine/fix-typos’ so we know when we need to push. Unfortunately we can’t set them both as upstream, so we must choose one set of benefits over the other. Or should we?

The solution

The solution is not that hard to figure out: we need another upstream! Or rather; we need some concept that is similar to the upstream tracking branch, but instead of tracking the final destination, we track the location we push our commits to.

This is the publish tracking tracking branch.

When you set it up, you get all the information:

Your branch and 'origin/master' have diverged,
and have 2 and 10 different commits each, respectively.
Some commits haven't been published to 'mine/fix-typos'.

* fix-typos ... [origin/master, mine/fix-typos *: ahead 2, behind 10]

Notice the extra ‘*’ next to the publish branch, which hints that it needs to be published.

Also, you can type `git pull` and `git rebase`, which will use the upstream branch as you would expect, and `git push` which will use the publish branch.

In other words; everything just works perfectly.

You set up the publish branch just like you set up the upstream branch:

% git branch --set-publish-to mine/fix-typo

Or:

% git push --set-publish mine

But wait, there’s more: you are not tied to push to a single remote; you can set different branches in different remotes as publish tracking. For example ‘fix-typos’ to ‘github/fix-typos’, ‘bug-fix’ to ‘client/bug-fix’, and so on. You can even choose a different branch name in the remote: ‘client-b-bug-fix’ to ‘client-b/bug-fix’.

Nice, isn’t it?
git branch -vv publish

The problem

There is only one problem with the publish branch: it’s not in upstream git 😦

It is part of my fork, git-fc. If you use my fork, you will get this and other features, and you won’t loose any feature from official Git. Or you can use the specific branch, ‘fc/publish‘.

I’ve been using this code for more than half a year, and it has been reviewed in the Git mailing list, so you can trust it won’t eat your babies 🙂

Why isn’t it in official Git?

WARNING: if you don’t like conflicts or you know me for “adversarial” style (and don’t like it), skip this section

That’s a very good question. If the maintainer (Junio C Hamano) has accepted the triangular workflows are lacking, and a separate ‘upstream’ tracking branch is needed. Why isn’t it there?

The short answer is that they have an ad hominem thing against me, so even if my patches are correct and they solve a long-standing problem, they are not applied. They are only picked if they are trivial, or not controversial, or obvious fixes. Which is why I started a fork.

I sent the original version of the patches in September 2013, with virtually no comments. Then on January 2014 people start discussing (once again) about the issues with triangular workflows, and even complain about the lack of @{publish}. Eventually they start writing preparatory patches. But I had already written the whole thing several months ago!

It can’t be attributed to the fact they went inadvertently unnoticed because I re-sent the series once, and because I wrote about the support for @{publish} when I announced the git-fc fork.

Then I returned to the project after a long hiatus, and noticed they were working on something I already did, so let them know and send the patches again. This time they receive more feedback, and even make it into Junio’s “pu” (proposed updates) branch. Patches are often dropped from “pu”, sometimes for no reason at all, so this is not a reason they will get in.

This is the message Junio attached to the patch series:

 Add branch@{publish}; it seems that this is somewhat different from
 Ram and Peff started working on.  There were many discussion
 messages going back and forth but it does not appear that the
 design issues have been worked out among participants yet.

The “design issues” have not been worked out because “Ram” is not actively working on Git anymore (possibly thanks to the fact that nothing ever changes), and “Peff” said he wasn’t interested in the @{publish} concept, but more like a @{push} concept which will only benefit him and his weird bare-bones mode of interacting with Git. The fact that the @{publish} concept is what would benefit a vast majority of the user base is of no consequence to “Peff”.

So will it ever get into Git’s mainline? Who knows.

Get the goodies

If you want to use the publish tracking branch feature, get git-fc and follow the installation instructions. In addition you would get a ton of other features, and will loose none 🙂

If you use ArchLinux, you can get the package from AUR.

Enjoy 🙂

Announcing git-fc; a friendly fork of Git

I’ll start with the obvious question; why a fork? Well, the short answer is; my patches are not being applied, the long answer is convoluted and would require long explanation of how Git development works, principles and guidelines, but more importantly the culture of the core developers, and I’m not going to get into that, maybe in the comments section if somebody is interested.

So what is git-fc? It is a friendly fork, and by that I mean that it’s a fork that won’t deviate from the mainline, it is more like a branch in Git terms. This branch will move forward close to Git’s mainline, and it could be merged at any point in time, if the maintainer wished to do so.

git-fc doesn’t include experimental code, or half-assed features, so you can expect the same level of stability as Git’s mainline. Also, it doesn’t remove any feature, or do any backwards incompatible changes, so you can replace git with git-fc and you wouldn’t notice the difference. The delta comes in the extra features that I’ll describe in detail below, that is all.

Who am I? I’ve contributed many patches to Git, mainly the git-remote-hg/bzr two-way bridges, but many many other things. Here’s a list of the top 10 contributors to Git since last year by number of patches:

% git shortlog --since='1 year ago' --no-merges -n -s | head -n 10
   388	Junio C Hamano
   308	Felipe Contreras
   230	Jeff King
   161	Nguyễn Thái Ngọc Duy
   122	Michael Haggerty
   103	Ramkumar Ramachandra
    96	John Keeping
    69	Eric Sunshine
    59	Thomas Rast
    51	René Scharfe

More info in ohloh.

As you see, I’ve done a lot of work for Git’s mainline, so chances are you have already benefited from my code one way or the other.

However, the most interesting patches are not merged. I wrote a summary of my 160 patches, explaining their status, so Git developers would prioritize them, but I think it’s fair to say they are just not going to apply them.

So, what do you get if you use git-fc?

@ shortcut

Many people have suggested a shortcut for the non-particularly-intuitive “HEAD”, but none of these suggestions seemed very appealing, or feasible.

Because Git already has an ref@op revision syntax, where if you remove the ref, HEAD is implied, I thought @ could be thought as HEAD.

This change was welcome and accepted by the Git mainline, and it even was on track for v1.8.4 but it was dropped last minute because of some issues that are fixed now, and you probably will see it in v1.8.5. But why wait? 🙂

Nice ‘branch -v’

If you have configured the upstream tracking branch for your branches (I wrote a blog post about them), when you do ‘git branch -v’ you see something like this:

  fc/branch/fast      177dcad [ahead 2] branch: reorganize verbose options
  fc/stage            abb6ad5 [ahead 14] completion: update 'git reset' ...
  fc/transport/improv eb4d3c7 [ahead 10] transport-helper: don't update ...

While that provides useful information, it doesn’t show the upstream tracking branch, just says “ahead 2” but “ahead 2” compared to what?

If you do ‘git branch -vv’, then you see the answer:

  fc/branch/fast      177dcad [master: ahead 2] branch: reorganize ...
  fc/stage            abb6ad5 [master: ahead 14] completion: update ...
  fc/transport/improv eb4d3c7 [master: ahead 10] transport-helper: don't ...

Unfortunately both options take a lot of time (relative to most Git commands which are instantaneous), because computing the “ahead 2” takes a lot of time. So I decided to switch things around, so ‘git branch -v’ gives you:

  fc/branch/fast      177dcad [master] branch: reorganize verbose options
  fc/stage            abb6ad5 [master] completion: update 'git reset' new ...
  fc/transport/improv eb4d3c7 [master] transport-helper: don't update refs ...

And it does so instantaneously.

Default aliases

Many (if not all) version control system tools have shortcuts for their most common operations; hg ci, svn co, cvs st. But not Git. You can configure your own aliases manually, but you might have some trouble if you use somebody else’s machine.

Adding default aliases is trivial, it helps everyone, and it doesn’t hurt anyone, yet the patch to do so was rejected.

For now, there are only four aliases, but more can be added later if they are requested.

co = checkout
ci = commit
rb = rebase
st = status

If you have already these aliases, or mapped to something else, your aliases would take precedence over the default ones, so you won’t have any problems.

Streamlined remote helpers

I have spent a lot of time working on git-remote-hg and git-remote-bzr, and although they are relatively new, they have proven to be quite stable and solid, yet they are only part of the “contrib” area side by side with much simpler and way less solid scripts.

In order these in Git mainline you might need a bit of tinkering, and it’s not straight-forward to package them for distributions.

With git-fc they are installed by default, and in the right way, making things easier for distributions.

Improvements to the transport helper

The two way bridges between Git and Mercurial/Bazaar already work quite well, but they lack some features, specifically you cannot do –force, or –dry-run, or use an old:new refspec. If you are not familiar with the old:new refspec; you can do ‘git push master:my-master’, which would push your ‘master’ branch, as if it was named ‘my-master’ in the remote repository.

This is extremely useful if you are really serious about using Git as a transparent client to access a Mercurial repository.

New core.mode configuration

Git is already preparing users for the v2.0 release which would bring minor backward compatibility breakage, but some people would rather get rid of the warnings which are going to stay probably for many releases more and just move to the new behavior already.

Testing Git v2.0 behavior today would not only help git-fc, but also the Git mainline, and you can do that by setting core.mode = next, so if you do this and provide feedback about any issues, that would be greatly appreciated. Unfortunately you cannot test the v2.0 behavior in Git mainline because they rejected the patches, but you can in git-fc.

Please note that the v2.0 behavior might change in the future, before v2.0 is released, so if you enable this mode you need to be aware of that. Chances are you are not going to notice any difference anyway.

In addition to the “next” (v2.0) mode, there’s the “progress” mode. This mode enables “next” plus other configurations that have been proposed to change by default in v2.0, but hasn’t yet been agreed.

In particular, you get these:

merge.defaulttoupstream = true
branch.autosetupmerge = always
mergetool.prompt = false

There might be more in the future, and suggestions are welcome.

It is recommended that you setup this mode for git-fc:

git config --global core.mode progress

Non-ff pulls rejected by default

Even in the Git project everybody has agreed this is the way to go in order to avoid the typical Git newbie making the mistake of doing a merge, when perhaps (s)he wanted to do git reset, or git rebase. With this change git complains that that a non-fast-forward branch is being pulled, so the user has to decide what to do.

The user would have to do either ‘git pull --merge‘ or ‘git pull --rebase‘, the former being what Git mainline currently does.

The user can of course choose the old behavior, which is easy to configure:

git config --global pull.mode merge

Official staging area

Everybody already uses the term “staging area” already, and Git developers also agreed it the best term to what is officially referred to as “the index”. So git-fc has new options for all commands that modify the staging area (e.g. git grep –staged, git rm –staged), and also adds a new git stage command that makes it easier to work with the staging area.

'git stage' [options] [--] [...]
'git stage add' [options] [--] [...]
'git stage reset' [-q|--patch] [--] [...]
'git stage diff' [options] [] [--] [...]
'git stage rm' [options] [--] [...]
'git stage apply' [options] [--] [...]
'git stage edit'

Without any command, git stage adds files to the stage, same as git add, same as in Git mainline.

New fetch.default configuration

When you have configured the upstream tracking branch for all your branches, you will probably have tracking branches that point to a local branch, for example feature-a pointing to master, in which case you would get something like:

% git fetch
From .
 * branch            master     -> FETCH_HEAD

Which makes absolutely no sense, since the ‘.’ repository is not even documented, and FETCH_HEAD is a marginally known concept. In this case git fetch is basically doing nothing from the user’s point of view.

So the user can configure fetch.default = simple to get a simple sensible default; ‘git fetch‘ will always use origin by default, which is not ideal for everyone, but it’s better than the current alternative.

If you use the “progress” mode, this option is also enabled.

Publish tracking branch

Git mainline doesn’t have the greatest support for triangular workflows, a good solution for that is to introduce a second “upstream” tracking branch which is for the reverse; the branch you normally push to.

Say you clone a repository (libgit2) in GitHub, then create a branch (feature-a) and push it to your personal repository, you would want to track two branches (origin/master), and (mine/feature-a), but Git mainline only provides support for a single upstream tracking branch.

If you setup your upstream tracking branch to origin/master, then you can just do git rebase without arguments and git will pick the right branch (origin/master) to rebase to. However, git push by default will also try to push to origin/master, which is not what you want. Plus git branch -v will show how ahead/behind your branch is compared to origin/master, not mine/feature-a.

If you set up your upstream to mine/feature-a, then git push will work, but git rebase won’t.

With this option, git rebase uses the upstream branch, and git push uses the publish branch.

Setting the publish tracking branch is easy:

git push --set-publish mine feature-a

Or:

git branch --set-publish mine/feature-a

And git branch -v will show it as well:

fc/branch/fast      177dcad [master, gh/fc/branch/fast] branch: ...
fc/stage            abb6ad5 [master, gh/fc/stage] completion: ...
fc/transport/improv eb4d3c7 [master, gh/fc/transport/improv] ...

Support for Ruby

By far the most complex and interesting feature, but unfortunately also the one that is not yet 100% complete.

There is partial optional support for Ruby. Git already has tooling so any language can use it’s plumbing and achieve plenty of tasks:

IO.popen(%w[git for-each-ref]) do |io|
io.each do |line|
sha1, kind, name = line.split()
# stuff
end
end

However, this a) requires a process fork, and b) requires I/O communication to get the desired data. While this is not a big deal on many systems, it is in Windows systems where forks are slow, and many Git core programs don’t work as well as they do in Linux.

Git has a goal to replace all the core scripts with native C versions, but it’s a goal only in name that is not actually pursued. In addition, that still leaves out any third party tools since Git doesn’t provide a shared libgit library, which is why an independent libgit2 was needed in the first place.

Ruby bindings solve these problems:

for_each_ref() do |name, sha1, flags|
# stuff
end

The command ‘git ruby‘ can use this script by providing the bindings for many Git’s internal C functions (though not all), which makes it easier to write Ruby programs that take full advantage of Git without any need of forks, or I/O communication.

Conclusion

As you might guess, I’ve spent a lot of time working on all these features, plus all the ones that are already merged in Git’s mainline. Hopefully they are useful to some people.

It’s easy to compile and install:

make install

By default git will be installed in your home directory, but you can also do what I do: ‘make prefix=/opt/git install‘, and add ‘/opt/git/bin’ to your $PATH. All you need is a few development packages; zlib, curl, expat, openssl.

The code is in Github, the home page is in Google code, and the mailing list in Google groups. All comments and patches are welcome.

You can find future comments and releases in this blog, under the git-fc tag.

git-fc