What’s missing in Git v2.0.0

I recently blogged about the Git v2.0.0 release, what changed, and why should you care. Unfortunately the conclusion was that nothing much changed (other than the usual new features and bug fixes). In this post I will discuss what should have changed, and why.

What is needed

Fortunately, Git has had the Git User’s Survey in the past, so we know what users want.

  1. user-interface: 3.25
  2. documentation: 3.22
  3. tools (e.g. GUI): 3.01
  4. more features: 2.41
  5. portability: 2.34
  6. performance: 2.28
  7. community (mailing list): 1.70
  8. localization (translation): 1.65
  9. community (IRC): 1.65

Obviously, since user-interface and documentation are the areas that need more improvement, that’s what Git v2.0.0 should have focused, right?

History

I already mentioned this in the other post, but I’ll do it again.

First of all, Git as a long history of never breaking user expectations (other than the Git v1.6.0 fiasco (which changed all the git-foo commands with ‘git foo’)), and as such a lot of thought is devoted into ways to minimize changes in behavior, or even how to avoid it completely. Perhaps too much care is devoted into this.

The preparation for Git v2.0.0 started more than three years ago with a mail from Junio C Hamano, asking for developers to submit ideas for changes that normally would not happen because they break backwards compatibility, he invited us to think as if “we were writing Git from scratch”. This big release that would break backwards compatibility was going to be named “1.8.0″ and people started to submit ideas for this important release. Eventually too much time passed, the versioning scheme changed, v1.8.0 was released, and the changes proposed for v1.8. slipped into what is now v2.0.

Since no substantial changes in behavior happened since v1.0, it would follow that v2.0 was an important release, and a good opportunity to gather all the ideas about what needs to change in Git. However, seemingly out of nowhere, without any discussion or even a warning, the maintainer tagged v2.0.0-rc0, and therefore all the features that were not already merged couldn’t be merged for v2.0.0.

Thus v2.0.0 was destined to have a small list of changes, and that’s how it remained.

What could have changed

The following is a list of things that I argued should be part of Git v2.0.0.

git update

I wrote a whole post about the issue, but basically, ‘git pull‘ is broken for the most common use-case: update the current branch.

This is a known issue that has been discussed over and over, and everyone agrees that it is indeed an issue, and something needs to be done to fix it.

There have been different proposals, but by far the most comprehensive and simple is to add a new ‘git update‘ command.

This way when you want to merge a pull request, you do ‘git pull‘, and when you just want to update the current branch, you do ‘git update‘, which by default would barf if there’s divergence between your local branch (e.g. ‘master’), and the remote one (e.g. ‘origin/master’), instead of doing a merge by default. This should decrease substantially the amount of “evil merges”, merges that happened by mistake, usually by somebody that is not familiar with Git.

The patches are relatively new, but the command is simple, so there isn’t much danger of screwing things up.

The publish tracking branch

I also wrote a blog post about this; basically Git’s support for triangular workflows is not the best.

A triangular workflow is when you pull from one location (e.g. central repo), and push to another (e.g. personal GitHub fork). If you are using upstream tracking branches (you should), you have to make a decision where you set your upstream; the central repo, or your personal one. Depending on which you use, is the advantages you get, but you cannot have it all.

But with the publish tracking branch you can have all the advantages.

I’ve been cooking these patches for a long long time and I have to say this is one essential feature for me, and they patches work perfectly.

Support for Mercurial and Bazaar

Support for Mercurial and Bazaar repositories has been cooking for a long time in the “contrib” area (you can both pull and push). At this point in time the code is production-ready, and it was already graduated and merged to be released in Git v2.1.

However, the maintainer suddenly changed his mind and decided it would be better to distribute them as third party tools. He didn’t give any valid reason and clearly didn’t think it through, but they are now separate.

The code is already widely used (git-remote-hg, git-remote-bzr), and could easily be merged.

Use “stage” instead of “index”

Everybody agrees that “index” is a horrible name for Git’s “staging area”, however, nobody has done much to fix the problem.

One first step is to replace all the –cached and –index options with –staged and –no-work, which are much simpler to understand.

Another step is to add a ‘git stage‘ command that acts as a helper to work with the staging area: ‘git stage add‘, ‘git stage diff‘, ‘git stage reset‘, ‘git stage rm‘, ‘git stage edit‘, and so on.

The patches are very straight-forward.

Default aliases

Virtually every version control system has default aliases (e.g. hg co, cvs ci, svn di, etc.), except Git.

Adding default aliases is very simple to do and only brings advantages. If you don’t like the default alias, you can override it.

Patches here.

Shoulda coulda woulda

It would have been great if you could just do ‘git clone hg::mercurial-repo‘ without installing anything extra, if everybody could start using ‘git update‘ instead of ‘git pull‘, if you could do ‘git stage diff‘, or ‘git reset --stage‘. Also, if triangular workflows were properly supported.

Unfortunately that’s not the case, and Git v2.0.0 is already released, and there isn’t much to be excited about.

You might think “perhaps for Git v3.0” (which could happen in two years, or ten, how knows), but if the past is any indication of the future, it won’t happen, specially since I’ve given up on all these patches.

The fact of the matter is that in every release of Git, there is only one focus: performance. Despite the fact that it’s #6 in the list of concerns of users, Git developers work on this because that’s their area of expertise, because it’s fun for them, and because they get paid to do so. There are occasional new features, and a bit of portability now and then, but for the most part Windows support is neglected in Git, which is why the msysgit project was born.

The documentation will always remain cryptic, because for the developers, it’s not cryptic, it’s very clear. And the user-interface will never change, because the developers don’t like change.

If you don’t believe me look at the backwards-incompatible changes in Git v2.0.0, or in fact, try to think back to the last time Git changed anything. Personally other than the git-foo -> ‘git foo’ change in v1.6.0 (which was horribly handled), I can’t think of anything but minor changes.

Anyway, you can use all these features I listed today (and more) if you use git-fc instead of Git. It is my own fork of Git that has all the features of Git, plus more.

Is there anything in that list that I missed? Do you think Git v2.0.0 has enough changes as it is?

Advertisements

Git v2.0.0, what changed, and why should you care

Git v2.0.0 is a backward-incompatible release, which means you should expect differences since the v1.x series.

Unless you’ve been following closely the Git mailing list, you probably don’t know the history behind the v2.0 release, which started long time ago (more than three years). It all started with a mail from Junio C Hamano, asking for developers to submit ideas for changes that normally would not happen because they break backwards compatibility, he invited us to think as if “we were writing Git from scratch”. This big release that would break backwards compatibility was going to be named “1.8.0” and people started to submit ideas for this important release. Eventually too much time passed, the versioning scheme changed, v1.8.0 was released, and the changes proposed for v1.8. slipped into what is now v2.0.

Parts of v2.0 have been already been deployed one way or the other (for example if you have configured ‘push.default = simple’), but finally today we have v2.0 final. And here are the big changes that we got.

‘git push’ default has changed

Here’s what the release notes say:

When "git push [$there]" does not say what to push, we have used the
traditional "matching" semantics so far (all your branches were sent
to the remote as long as there already are branches of the same name
over there).  In Git 2.0, the default is now the "simple" semantics,
which pushes:

 - only the current branch to the branch with the same name, and only
   when the current branch is set to integrate with that remote
   branch, if you are pushing to the same remote as you fetch from; or

 - only the current branch to the branch with the same name, if you
   are pushing to a remote that is not where you usually fetch from.

You can use the configuration variable "push.default" to change
this.  If you are an old-timer who wants to keep using the
"matching" semantics, you can set the variable to "matching", for
example.  Read the documentation for other possibilities.

Is that clear? Given the bad track record of Git documentation it wouldn’t surprise me if you didn’t get what this chunk of text is trying to say at all. Personally I find it much easier to read the code to figure out what is happening.

So let me try to explain. When you type ‘git push’ (without any arguments), Git uses the configuration ‘push.default’ in order to find out what to push. Before ‘push.default’ defaulted to ‘matching’, and now it defaults to ‘simple’.

The ‘matching’ configuration essentially converts ‘git push‘ into ‘git push origin :‘, which means push all the matching branches, so if you have a local ‘master’, and there’s a remote ‘master’, ‘master’ is pushed; if you have a local and remote ‘fix-1’, ‘fix-1’ is pushed, if you have a local ‘ext-feature-1’, but there’s no matching remote branch, it’s not pushed, and so on.

The ‘simple’ configuration pushes a single branch instead, and it uses your configured upstream branch (see this post for a full explanation of the upstream branch), so if your current branch is ‘master’, and if ‘origin/master’ is the upstream of your ‘master’ branch, ‘git push’ will basically be the same as ‘git push origin master‘, or to be more specific ‘git push origin master:master‘ (the upstream branch can have a different name).

Note: If you are not familiar with the src:dst syntax; you can push a local branch ‘src’ and have the ‘dst’ name on the server, so you don’t need to rename a local branch, you can do ‘git push origin foobar:feature-a’, and your local branch “foobar” will be named “feature-a” on the server. This has nothing to do with v2.0.

However, if the current branch is ‘fix-1’ and the upstream is ‘origin/master’, ‘git push’ will complain that the name of the destination branch is not the same, because it doesn’t know if to do ‘git push origin fix-1:master‘ or ‘git push origin fix-1:fix-1‘.

Additionally if you do ‘git push github‘ (not the remote of your upstream branch), Git will simply use the name of the current branch, essentially ‘git push github fix-1‘ (‘fix-1’ being the name of the current branch).

This mode is anything but simple to describe. But perhaps the name is OK, because you can expect it to “simply work”.

Would I care?

If you don’t type ‘git push’, but instead specify what and where to push… you don’t care.

If you have configured ‘push.default’ already, which most likely you already did, because otherwise you will be getting the following annoying message all the time since two years ago… you don’t care.

warning: push.default is unset; its implicit value is changing in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the current behavior after the default changes, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

When push.default is set to 'matching', git will push local branches
to the remote branches that already exist with the same name.

In Git 2.0, Git will default to the more conservative 'simple'
behavior, which only pushes the current branch to the corresponding
remote branch that 'git pull' uses to update the current branch.

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

So, most likely you don’t care.

‘git add’ in directory

Here’s what the release notes say:

When "git add -u" and "git add -A" are run inside a subdirectory
without specifying which paths to add on the command line, they
operate on the entire tree for consistency with "git commit -a" and
other commands (these commands used to operate only on the current
subdirectory).  Say "git add -u ." or "git add -A ." if you want to
limit the operation to the current directory.

Although this is a clearer explanation, it’s not very clear what is changing, so let me give you can example.

Say you have modified two files, ‘README’ and ‘test/basic.t’, then you go to the ‘test’ directory, and run ‘git add -u‘, in pre-v2.0 only ‘test/basic.t’ will be staged, in post-v2.0 both files will be staged. If you run the command in the top level directory, nothing changes.

Would I care?

If you haven’t seen the following warning while doing ‘git add -u‘ or ‘git add -A‘, or if you don’t even use those options, you are fine.

warning: The behavior of 'git add --update (or -u)' with no path argument from a
subdirectory of the tree will change in Git 2.0 and should not be used anymore.
To add content for the whole tree, run:

  git add --update :/
  (or git add -u :/)

To restrict the command to the current directory, run:

  git add --update .
  (or git add -u .)

With the current Git version, the command is restricted to the current directory.

‘git add’ adds removals

Here’s what the release notes say:

"git add " is the same as "git add -A " now, so that
"git add dir/" will notice paths you removed from the directory and
record the removal.  In older versions of Git, "git add " used
to ignore removals.  You can say "git add --ignore-removal " to
add only added or modified paths in , if you really want to.

Again, it should be clearer with an example. Say you removed the file ‘test/basic.t’ and added a new file ‘test/main.t’, those changes are not staged, so you stage them with ‘git add test/’, pre-v2.0 ‘test/basic.t’ would remain tracked, post-v2.0, ‘test/basic.t’ is removed from the stage.

Would I care?

If you haven’t seen the following warning while doing ‘git add‘, you are fine.

warning: You ran 'git add' with neither '-A (--all)' or '--ignore-removal',
whose behaviour will change in Git 2.0 with respect to paths you removed.
Paths like 'test/basic.t' that are
removed from your working tree are ignored with this version of Git.

* 'git add --ignore-removal ', which is the current default,
  ignores paths you removed from your working tree.

* 'git add --all ' will let you also record the removals.

Run 'git status' to check the paths you removed from your working tree.

The rest

The "-q" option to "git diff-files", which does *NOT* mean "quiet",
has been removed (it told Git to ignore deletion, which you can do
with "git diff-files --diff-filter=d").

Most people don’t use this command, thus don’t care.

"git request-pull" lost a few "heuristics" that often led to mistakes.

Again, most people don’t use this command, which is mostly broken anyway.

The default prefix for "git svn" has changed in Git 2.0.  For a long
time, "git svn" created its remote-tracking branches directly under
refs/remotes, but it now places them under refs/remotes/origin/ unless
it is told otherwise with its "--prefix" option.

If you don’t use ‘git svn’, you don’t care. If you don’t see a difference between ‘trunk’ and ‘origin/trunk’, you don’t care.

tl;dr

You probably don’t care about these backward-incompatible changes. Sure, Git v2.0.0 received a good dosage of new features and bug-fixes, but so did v1.9.0, and all the versions before.

Given the fact that Git v2.0.0 has been cooking for three years, I think it’s a big missed opportunity that nothing really changed, specially given that in previous user surveys people have said the user-interface and documentation needs to improve, and there have been patches to try to do so. In a separate post I discuss what I think Git v2.0.0 should have included.

Is ‘git pull’ broken? If so, what’s the fix?

Is ‘git pull’ really broken? I know what you are thinking; such a pervasive and basic command cannot possibly be broken. Unfortunately, it is.

It is not some marginal issue, many experienced Git users avoid ‘git pull’ and even urge newcomers to avoid using that command, there’s many sites that encourage you to not use the command, and there have been a lot of threads on the mailing list about the issue (Pull is mostly evil, A failing attempt to use Git in a centralized environment), the maintainer, Junio C Hamano has accepted there’s a big problem, even Linus Torvalds agreed something needs to change.

In order to identify the problem we first need to define the two main ways ‘git pull’ is used.

Pull requests

One way ‘git pull’ is used, is to integrate pull requests into the mainline. For example in the Linux kernel, the DRM maintainer sends a pull request to Linus Torvalds, saying basically:

The following changes are available in the git repository at:

git://people.freedesktop.org/~airlied/linux drm-next

So Linus can just do:

git pull git://people.freedesktop.org/~airlied/linux drm-next

In this mode ‘git pull’ actually works fine, which is not too surprising, since it’s the main thing Linus Torvalds does.

However, this is not the way most people use ‘git pull’.

Update branch

What most people do is for example update their local ‘master’ branch, to the remote ‘origin/master’ branch. Essentially doing ‘git fetch origin’, ‘git merge origin/master’.

However, that’s not exactly what most people actually want to do.

If you don’t have any changes of your own in ‘master’, then yes, ‘git pull’ does what you want, but if you do have changes, and thus the branches have diverged, then ‘git pull’ will create a new merge commit. This might or might not be what you want, but the majority of Git newbies do not want that, or rather, the team they contribute to don’t want those “evil merges”. Unfortunately these newbies don’t know what they are doing, and Git is not making it easier.

So you end up with something like this:

git-pull

Most likely what the team wants is that the local chances are rebased on top of the remote ones, but if they want a merge, they want it the other way around, that is: merge the local changes to the remote ones, as if a topic branch was merged.

git-pull-fix

A merge with this order of parents has many advantages, including a clearer history, however, it’s not possible to do that with ‘git pull’, so you have to do ‘git fetch’, create a new branch, switch to the master branch, merge the other branch, and finally remove the other branch. It’s not straight-forward at all.

It is this mode that is broken, and that’s the reason many people try to avoid ‘git pull’; it rarely does what you want by default.

The solution

There have been many solutions proposed, however, there are many many use-cases to consider, and a solution that takes them all into consideration for the future is not easy to find.

The best solution that seems to accommodate all present use-cases and future ones is the introduction of a new command: ‘git update‘.

By default this command will complain if the branches have diverged, so you have to either do ‘git update --rebase‘ or ‘git update --merge‘, this ensures that newbies aren’t going to do “evil merges” by mistake.

Also, when you do a ‘git update --merge‘ the order of the parents is reversed, which means it appears you are merging ‘master’ to ‘origin/master’, and not the other way around as it happens with ‘git pull’, which means it appears as if you are merging a topic branch, which is what most people want.

git-update

There are many many more advantages to this new command, but probably too subtle to mention in this post.

When will this be ready?

Probably never. I sent a summary of the issues and the solution to the mailing list, which addresses all the use-cases that were discussed. I have the required patches with tests and documentation on my personal branch, and I’ve been using this new command for a while now.

Why isn’t this picked? Maybe it’s because none of the core developers experience these issues. Maybe because they don’t use ‘git pull’ in the second form. Who knows.

The fact is that there is no interest to get this fixed, even though the issue has been acknowledged, so it’s not likely to be fixed any time soon.

So what can you do about it? The best thing you can do right now is simply avoid using ‘git pull’. Additionally, you might want to instruct your fellow coworkers to avoid unsing it as well, specially the ones that are not very familiar with Git.

Also, you might want to use my fork, git-fc, which does have the ‘git update‘ command, which works better than ‘git pull‘ even when there’s no branch divergence, and when there is, ‘git update --merge‘ is also superior, because the order of the parents is right.

Using Git with triangular workflows; tips, tricks, and more

Chances are you are using a triangular workflow, even if you don’t know it. A triangular workflow simply means that you pull from one repository, and push to another. This is what the vast majority of Git users do, unfortunately most of the good stuff is buried in the nearly incomprehensible official manpages.

In this blog post I’ll try to shine some light into triangular workflows, how to make use of the upstream tracking branch for them, and explain the new publish tracking branch.

The basics

Say you clone a repository:

% git clone https://github.com/tiimgreen/github-cheat-sheet
% cd github-cheat-sheet

Then you do some changes and want to share them back.

What most people would do is create a fork in GitHub and push their changes there.

% git remote add mine https://github.com/felipec/github-cheat-sheet
% git push mine

After doing that they do a pull request so their changes can be merged to the original repository.

This workflow is not specific to GitHub by any means, for example the Linux kernel developers have the main repository in git.kernel.org, and they send pull requests by mail using repositories all over the map (example).

The help

If you do this over and over it becomes clear that a little help from Git would be nice.

The first thing you can do is setup the configuration ‘remote.pushdefault’ to the repository you usually push to (in the above case ‘mine’). So now you can type `git push` instead of `git push mine` every time.

The next thing would be to setup an upstream tracking branch (read my blog post about it if you are not familiar with it).

% git branch --set-upstream-to mine/fix-typos

Then Git would greet you with the following help:

Your branch is ahead of 'mine/fix-typos' by 1 commit.

This is telling you that you probably want to push your branch again, since it’s not up-to-date in the remote. It shows you that each time you switch to that branch, or when you do `git status`.

Moreover, `git branch -vv` would show you this help:

* fix-typos ... [mine/fix-typos: ahead 1] Fix a bunch of typos

So it seems Git already has tons of help for this workflow, doesn’t it? Not so fast.

The real upstream

The upstream tracking branch is useful for other purposes, but for that we need to set a different upstream:

% git branch --set-upstream-to origin/master

Now that the upstream is ‘master’ in the ‘origin’ remote, and when you run `git status`, you get:

Your branch and 'origin/master' have diverged,
and have 2 and 10 different commits each, respectively.

What that message is telling you is that ‘origin/master’ has moved, so there are 10 commits in ‘origin/master’ that your branch doesn’t have (and your branch has 2 commits ‘origin/master’ doesn’t have). In those cases you probably would want to rebase on top of ‘origin/master’ so that it’s easier for upstream maintainers to merge your branch, although you can merge ‘origin/master’ too, or simply do nothing and hope there are no conflicts. Either way the information is useful so you can decide what to do.

In addition, if you want to rebase, the command is easier; instead of `git rebase origin/master` you can just type `git rebase`, since `git rebase` by default uses the upstream tracking branch.

Moreover, if you always stay up-to-date, you can do `git pull --rebase`, which will fetch all remote the branches, and then rebase your current branch (e.g. ‘fix-typos’) on top of the upstream (e.g. ‘origin/master’). You can also configure ‘pull.rebase = true’ to always do this when you type `git pull`.

Not to mention that `git branch -vv` gives a much more useful information:

* fix-typos ... [master: ahead 2, behind 10] Fix a bunch of typos

Check how it looks in my real repository:

git branch --vv with upstream

You get other additional benefits, like for example you get warned if you try to delete a branch that hasn’t been merged to its upstream:

warning: not deleting branch 'fix-typos' that is not yet merged to
'origin/master', even though it is merged to HEAD.
error: The branch 'fix-typos' is not fully merged.
If you are sure you want to delete it, run 'git branch -D fix-typos'.

This is actually what the upstream tracking branch is meant for: to track the upstream, that is; the target branch where eventually all the commits of the source branch eventually should end up. All the commits of ‘fix-typos’ should end up in ‘origin/master’, therefore ‘origin/master’ is the upstream of ‘fix-typos’.

We want to have all the goodies of tracking ‘origin/master’ as our upstream, but we also want to track ‘mine/fix-typos’ so we know when we need to push. Unfortunately we can’t set them both as upstream, so we must choose one set of benefits over the other. Or should we?

The solution

The solution is not that hard to figure out: we need another upstream! Or rather; we need some concept that is similar to the upstream tracking branch, but instead of tracking the final destination, we track the location we push our commits to.

This is the publish tracking tracking branch.

When you set it up, you get all the information:

Your branch and 'origin/master' have diverged,
and have 2 and 10 different commits each, respectively.
Some commits haven't been published to 'mine/fix-typos'.

* fix-typos ... [origin/master, mine/fix-typos *: ahead 2, behind 10]

Notice the extra ‘*’ next to the publish branch, which hints that it needs to be published.

Also, you can type `git pull` and `git rebase`, which will use the upstream branch as you would expect, and `git push` which will use the publish branch.

In other words; everything just works perfectly.

You set up the publish branch just like you set up the upstream branch:

% git branch --set-publish-to mine/fix-typo

Or:

% git push --set-publish mine

But wait, there’s more: you are not tied to push to a single remote; you can set different branches in different remotes as publish tracking. For example ‘fix-typos’ to ‘github/fix-typos’, ‘bug-fix’ to ‘client/bug-fix’, and so on. You can even choose a different branch name in the remote: ‘client-b-bug-fix’ to ‘client-b/bug-fix’.

Nice, isn’t it?
git branch -vv publish

The problem

There is only one problem with the publish branch: it’s not in upstream git 😦

It is part of my fork, git-fc. If you use my fork, you will get this and other features, and you won’t loose any feature from official Git. Or you can use the specific branch, ‘fc/publish‘.

I’ve been using this code for more than half a year, and it has been reviewed in the Git mailing list, so you can trust it won’t eat your babies 🙂

Why isn’t it in official Git?

WARNING: if you don’t like conflicts or you know me for “adversarial” style (and don’t like it), skip this section

That’s a very good question. If the maintainer (Junio C Hamano) has accepted the triangular workflows are lacking, and a separate ‘upstream’ tracking branch is needed. Why isn’t it there?

The short answer is that they have an ad hominem thing against me, so even if my patches are correct and they solve a long-standing problem, they are not applied. They are only picked if they are trivial, or not controversial, or obvious fixes. Which is why I started a fork.

I sent the original version of the patches in September 2013, with virtually no comments. Then on January 2014 people start discussing (once again) about the issues with triangular workflows, and even complain about the lack of @{publish}. Eventually they start writing preparatory patches. But I had already written the whole thing several months ago!

It can’t be attributed to the fact they went inadvertently unnoticed because I re-sent the series once, and because I wrote about the support for @{publish} when I announced the git-fc fork.

Then I returned to the project after a long hiatus, and noticed they were working on something I already did, so let them know and send the patches again. This time they receive more feedback, and even make it into Junio’s “pu” (proposed updates) branch. Patches are often dropped from “pu”, sometimes for no reason at all, so this is not a reason they will get in.

This is the message Junio attached to the patch series:

 Add branch@{publish}; it seems that this is somewhat different from
 Ram and Peff started working on.  There were many discussion
 messages going back and forth but it does not appear that the
 design issues have been worked out among participants yet.

The “design issues” have not been worked out because “Ram” is not actively working on Git anymore (possibly thanks to the fact that nothing ever changes), and “Peff” said he wasn’t interested in the @{publish} concept, but more like a @{push} concept which will only benefit him and his weird bare-bones mode of interacting with Git. The fact that the @{publish} concept is what would benefit a vast majority of the user base is of no consequence to “Peff”.

So will it ever get into Git’s mainline? Who knows.

Get the goodies

If you want to use the publish tracking branch feature, get git-fc and follow the installation instructions. In addition you would get a ton of other features, and will loose none 🙂

If you use ArchLinux, you can get the package from AUR.

Enjoy 🙂

Announcing git-fc; a friendly fork of Git

I’ll start with the obvious question; why a fork? Well, the short answer is; my patches are not being applied, the long answer is convoluted and would require long explanation of how Git development works, principles and guidelines, but more importantly the culture of the core developers, and I’m not going to get into that, maybe in the comments section if somebody is interested.

So what is git-fc? It is a friendly fork, and by that I mean that it’s a fork that won’t deviate from the mainline, it is more like a branch in Git terms. This branch will move forward close to Git’s mainline, and it could be merged at any point in time, if the maintainer wished to do so.

git-fc doesn’t include experimental code, or half-assed features, so you can expect the same level of stability as Git’s mainline. Also, it doesn’t remove any feature, or do any backwards incompatible changes, so you can replace git with git-fc and you wouldn’t notice the difference. The delta comes in the extra features that I’ll describe in detail below, that is all.

Who am I? I’ve contributed many patches to Git, mainly the git-remote-hg/bzr two-way bridges, but many many other things. Here’s a list of the top 10 contributors to Git since last year by number of patches:

% git shortlog --since='1 year ago' --no-merges -n -s | head -n 10
   388	Junio C Hamano
   308	Felipe Contreras
   230	Jeff King
   161	Nguyễn Thái Ngọc Duy
   122	Michael Haggerty
   103	Ramkumar Ramachandra
    96	John Keeping
    69	Eric Sunshine
    59	Thomas Rast
    51	René Scharfe

More info in ohloh.

As you see, I’ve done a lot of work for Git’s mainline, so chances are you have already benefited from my code one way or the other.

However, the most interesting patches are not merged. I wrote a summary of my 160 patches, explaining their status, so Git developers would prioritize them, but I think it’s fair to say they are just not going to apply them.

So, what do you get if you use git-fc?

@ shortcut

Many people have suggested a shortcut for the non-particularly-intuitive “HEAD”, but none of these suggestions seemed very appealing, or feasible.

Because Git already has an ref@op revision syntax, where if you remove the ref, HEAD is implied, I thought @ could be thought as HEAD.

This change was welcome and accepted by the Git mainline, and it even was on track for v1.8.4 but it was dropped last minute because of some issues that are fixed now, and you probably will see it in v1.8.5. But why wait? 🙂

Nice ‘branch -v’

If you have configured the upstream tracking branch for your branches (I wrote a blog post about them), when you do ‘git branch -v’ you see something like this:

  fc/branch/fast      177dcad [ahead 2] branch: reorganize verbose options
  fc/stage            abb6ad5 [ahead 14] completion: update 'git reset' ...
  fc/transport/improv eb4d3c7 [ahead 10] transport-helper: don't update ...

While that provides useful information, it doesn’t show the upstream tracking branch, just says “ahead 2” but “ahead 2” compared to what?

If you do ‘git branch -vv’, then you see the answer:

  fc/branch/fast      177dcad [master: ahead 2] branch: reorganize ...
  fc/stage            abb6ad5 [master: ahead 14] completion: update ...
  fc/transport/improv eb4d3c7 [master: ahead 10] transport-helper: don't ...

Unfortunately both options take a lot of time (relative to most Git commands which are instantaneous), because computing the “ahead 2” takes a lot of time. So I decided to switch things around, so ‘git branch -v’ gives you:

  fc/branch/fast      177dcad [master] branch: reorganize verbose options
  fc/stage            abb6ad5 [master] completion: update 'git reset' new ...
  fc/transport/improv eb4d3c7 [master] transport-helper: don't update refs ...

And it does so instantaneously.

Default aliases

Many (if not all) version control system tools have shortcuts for their most common operations; hg ci, svn co, cvs st. But not Git. You can configure your own aliases manually, but you might have some trouble if you use somebody else’s machine.

Adding default aliases is trivial, it helps everyone, and it doesn’t hurt anyone, yet the patch to do so was rejected.

For now, there are only four aliases, but more can be added later if they are requested.

co = checkout
ci = commit
rb = rebase
st = status

If you have already these aliases, or mapped to something else, your aliases would take precedence over the default ones, so you won’t have any problems.

Streamlined remote helpers

I have spent a lot of time working on git-remote-hg and git-remote-bzr, and although they are relatively new, they have proven to be quite stable and solid, yet they are only part of the “contrib” area side by side with much simpler and way less solid scripts.

In order these in Git mainline you might need a bit of tinkering, and it’s not straight-forward to package them for distributions.

With git-fc they are installed by default, and in the right way, making things easier for distributions.

Improvements to the transport helper

The two way bridges between Git and Mercurial/Bazaar already work quite well, but they lack some features, specifically you cannot do –force, or –dry-run, or use an old:new refspec. If you are not familiar with the old:new refspec; you can do ‘git push master:my-master’, which would push your ‘master’ branch, as if it was named ‘my-master’ in the remote repository.

This is extremely useful if you are really serious about using Git as a transparent client to access a Mercurial repository.

New core.mode configuration

Git is already preparing users for the v2.0 release which would bring minor backward compatibility breakage, but some people would rather get rid of the warnings which are going to stay probably for many releases more and just move to the new behavior already.

Testing Git v2.0 behavior today would not only help git-fc, but also the Git mainline, and you can do that by setting core.mode = next, so if you do this and provide feedback about any issues, that would be greatly appreciated. Unfortunately you cannot test the v2.0 behavior in Git mainline because they rejected the patches, but you can in git-fc.

Please note that the v2.0 behavior might change in the future, before v2.0 is released, so if you enable this mode you need to be aware of that. Chances are you are not going to notice any difference anyway.

In addition to the “next” (v2.0) mode, there’s the “progress” mode. This mode enables “next” plus other configurations that have been proposed to change by default in v2.0, but hasn’t yet been agreed.

In particular, you get these:

merge.defaulttoupstream = true
branch.autosetupmerge = always
mergetool.prompt = false

There might be more in the future, and suggestions are welcome.

It is recommended that you setup this mode for git-fc:

git config --global core.mode progress

Non-ff pulls rejected by default

Even in the Git project everybody has agreed this is the way to go in order to avoid the typical Git newbie making the mistake of doing a merge, when perhaps (s)he wanted to do git reset, or git rebase. With this change git complains that that a non-fast-forward branch is being pulled, so the user has to decide what to do.

The user would have to do either ‘git pull --merge‘ or ‘git pull --rebase‘, the former being what Git mainline currently does.

The user can of course choose the old behavior, which is easy to configure:

git config --global pull.mode merge

Official staging area

Everybody already uses the term “staging area” already, and Git developers also agreed it the best term to what is officially referred to as “the index”. So git-fc has new options for all commands that modify the staging area (e.g. git grep –staged, git rm –staged), and also adds a new git stage command that makes it easier to work with the staging area.

'git stage' [options] [--] [...]
'git stage add' [options] [--] [...]
'git stage reset' [-q|--patch] [--] [...]
'git stage diff' [options] [] [--] [...]
'git stage rm' [options] [--] [...]
'git stage apply' [options] [--] [...]
'git stage edit'

Without any command, git stage adds files to the stage, same as git add, same as in Git mainline.

New fetch.default configuration

When you have configured the upstream tracking branch for all your branches, you will probably have tracking branches that point to a local branch, for example feature-a pointing to master, in which case you would get something like:

% git fetch
From .
 * branch            master     -> FETCH_HEAD

Which makes absolutely no sense, since the ‘.’ repository is not even documented, and FETCH_HEAD is a marginally known concept. In this case git fetch is basically doing nothing from the user’s point of view.

So the user can configure fetch.default = simple to get a simple sensible default; ‘git fetch‘ will always use origin by default, which is not ideal for everyone, but it’s better than the current alternative.

If you use the “progress” mode, this option is also enabled.

Publish tracking branch

Git mainline doesn’t have the greatest support for triangular workflows, a good solution for that is to introduce a second “upstream” tracking branch which is for the reverse; the branch you normally push to.

Say you clone a repository (libgit2) in GitHub, then create a branch (feature-a) and push it to your personal repository, you would want to track two branches (origin/master), and (mine/feature-a), but Git mainline only provides support for a single upstream tracking branch.

If you setup your upstream tracking branch to origin/master, then you can just do git rebase without arguments and git will pick the right branch (origin/master) to rebase to. However, git push by default will also try to push to origin/master, which is not what you want. Plus git branch -v will show how ahead/behind your branch is compared to origin/master, not mine/feature-a.

If you set up your upstream to mine/feature-a, then git push will work, but git rebase won’t.

With this option, git rebase uses the upstream branch, and git push uses the publish branch.

Setting the publish tracking branch is easy:

git push --set-publish mine feature-a

Or:

git branch --set-publish mine/feature-a

And git branch -v will show it as well:

fc/branch/fast      177dcad [master, gh/fc/branch/fast] branch: ...
fc/stage            abb6ad5 [master, gh/fc/stage] completion: ...
fc/transport/improv eb4d3c7 [master, gh/fc/transport/improv] ...

Support for Ruby

By far the most complex and interesting feature, but unfortunately also the one that is not yet 100% complete.

There is partial optional support for Ruby. Git already has tooling so any language can use it’s plumbing and achieve plenty of tasks:

IO.popen(%w[git for-each-ref]) do |io|
io.each do |line|
sha1, kind, name = line.split()
# stuff
end
end

However, this a) requires a process fork, and b) requires I/O communication to get the desired data. While this is not a big deal on many systems, it is in Windows systems where forks are slow, and many Git core programs don’t work as well as they do in Linux.

Git has a goal to replace all the core scripts with native C versions, but it’s a goal only in name that is not actually pursued. In addition, that still leaves out any third party tools since Git doesn’t provide a shared libgit library, which is why an independent libgit2 was needed in the first place.

Ruby bindings solve these problems:

for_each_ref() do |name, sha1, flags|
# stuff
end

The command ‘git ruby‘ can use this script by providing the bindings for many Git’s internal C functions (though not all), which makes it easier to write Ruby programs that take full advantage of Git without any need of forks, or I/O communication.

Conclusion

As you might guess, I’ve spent a lot of time working on all these features, plus all the ones that are already merged in Git’s mainline. Hopefully they are useful to some people.

It’s easy to compile and install:

make install

By default git will be installed in your home directory, but you can also do what I do: ‘make prefix=/opt/git install‘, and add ‘/opt/git/bin’ to your $PATH. All you need is a few development packages; zlib, curl, expat, openssl.

The code is in Github, the home page is in Google code, and the mailing list in Google groups. All comments and patches are welcome.

You can find future comments and releases in this blog, under the git-fc tag.

git-fc

Advanced Git concepts; the upstream tracking branch

Probably one of most powerful and under-utilized concepts of Git is the upstream tracking branch, and to be honest it probably was too difficult to use properly in the past, but not so much any more.

Here I’ll try to explain what it is, and how you can take the most advantage out of it.

Remote tracking branches

Before trying to understand what the upstream tracking branch is, you need to be familiar with remote branches (e.g. origin/master). If you are not, you probably want to read the section about them in the Pro Git book here.

To see all your remote tracking branches, you can use ‘git branch –remotes’.

The upstream tracking branch

Even if you have never heard of the concept, you probably already have at least one upstream tracking branch: master -> origin/master. When you clone a repository the current HEAD (usually ‘master’) is checked out for you, but also, it’s setup to track ‘origin/master’, and thus ‘origin/master’ is the “upstream” of ‘master’.

This has some implications on some Git tools, for example, when you run ‘git status‘ you might see a message like this:

# Your branch is behind 'origin/master' by 1 commit.

Also, if you run ‘git branch -vv‘:

* master 549ca22 [origin/master: behind 1] Add bash_profile

This is useful in order to keep your local branches synchronized with the remote ones, but it’s only scratching the surface.

Once you have realized that your local branch has diverged from the remote one, you will probably want to either rebase or merge, so you might want to do something like:

git rebase origin/master

However, ‘origin/master’ is already configured as the upstream tracking branch of ‘master’, so you can do:

git rebase master@{upstream}

Maybe you think @{upstream} is too much to type, so you can do @{u} instead, and since we are already on ‘master’ we can do HEAD@{u}, or even simpler:

git rebase @{u}

But Git is smarter than that, by default both ‘git merge’ and ‘git rebase’ will use the upstream tracking branch, so:

git rebase

Configuring the upstream branch

So now you know that upstream tracking branches are incredibly useful, but how to configure them? There’s many ways.

By default, when you checkout a new branch, and you are using a remote branch as the starting point, the upstream tracking branch will be setup automatically.

git checkout -b dev origin/dev

Or:

git checkout dev

If the starting point is a local branch, you can force the tracking by specifying the –track option:

git checkout --track -b dev master

If you already created the branch, you can update only the tracking info:

git branch --set-upstream-to master dev

There’s a very similar option called –set-upstream, however, it’s not intuitive, and it’s now deprecated in favor of –set-upstream-to, to be sure and avoid confusion, simply use -u.

You can also set it up at the same time as you are pushing:

git push --set-upstream origin dev

Finally, you can configure Git so they are always created, even if you don’t specify the –track option:

git config --global branch.autosetupmerge always

Conclusion

So there you have it, go nuts and configure the upstream branch for all your branches 😉

git branch --vv with upstream

An in-depth analysis of Mercurial and Git branches

I’ve discussed the advantages of Git over Mercurial many times (e.g. here, and here), and I even created a challenge for Mercurial supporters, but in this blog post I’ll try to refrain from doing judgments and concentrate on the actual facts (the key-word being try).

Continuing this full disclosure; I’ve never actually used Mercurial, at least on a day-to-day basis, where I actually had to get something done. But I’ve used it plenty of times testing many different things, precisely to find out how to do things that I can do easily in Git. In addition, I’ve looked deep into the code to figure out how to overcome some of what I considered limitations of the design. And finally, I wrote Git’s official GitMercurial bridge; git-remote-hg (more here).

So, because I’ve spent months figuring out how to achieve certain things in Mercurial, and after talking with the best and the brightest (Git, gitifyhg, hg-git, and Mercurial developers), and exploring the code myself, I can say with a good degree of confidence that if I claim something cannot be done in Mercurial, that’s probably the case. In fact, I invited people from the #mercurial IRC channel in Freenode to review this article, and I invite everyone to comment down below if you think there’s any mistake (comments are welcome).

Git vs. Mercurial branches

Now, I’ve explained before why I think the only real difference between Git and Mercurial is how they handle branches. Basically; Git branches are all-purpose, all-terrain, and Mercurial have different tools for different purposes, and can almost do as much as Git branches, but not quite.

I thought the only real limitation was that Mercurial branches (or rather bookmarks), didn’t nave a per-repository namespace. For example: in Git the branch “development” can be in different repositories, and point to different commits, and to visualize them, you can refer to “max/development” (Max’s development branch), “sarah/development” (Sarah’s), “origin/development” (The central repository version), “development” (your own version). In Mercurial you only have “development”, and that’s it. I consider that a limitation of Mercurial, but feel free to consider it a “difference”. But it turns out there’s more.

In Git, it’s easy to add, remove, rename, and move branches. In Mercurial, bookmarks are supposed to work like Git branches, however, they don’t change the basics of how Mercurial works, and in Mercurial it doesn’t matter if you have a bookmark or not pointing to a commit, it’s still there, and completely visible; in Mercurial, each branch can have multiple “heads”, it doesn’t matter if there’s a bookmark pointing to it or not. So in order to remove a bookmark (and its commits), you need to use “hg strip” command, and to use that command, you need to enable the MqExtension, however, that’s for local repositories, for remote ones you need to cross your fingers, and hope your server has a way to do that — Bitbucket does through its web UI, but it’s possible that there is just no way.

Mercurial advocates often repeat the mantra “history is sacred”, and Mercurial’s documentation attempts to explain why changing history is hard, that shows why it’s hard to remove bookmarks (and it’s commits); it’s just Mercurial’s design.

On the other hand, if you want to remove a branch in git; you can just do “git push :feature-a“. Whether “history is sacred” or not is left for each project to decide.

Solving divergence

In any version control system, divergence is bound to happen, and in distributed ones, even more. Mercurial and Git solve this problem in very different ways, lets see how by looking at a very simple divergent repository:

Diverged

As you can see we have a “Fix” in our local branch, but somebody already did an “Update” to this branch in the remote repository. Both Mercurial and Git would barf when you try to push this “Fix” commit, but lets see how to solve it in each.

In Git this problem is called a “non fast-forward” push, which means that “Fix” is not an ancestor of the tip of the branch (“Update”), so the branch cannot be fast-forwarded to “Fix”. There are three options: 1) force the push (git push --force), which basically means override “origin/master” to point to “master”, which effectively dumps “Update” 2) merge “Update” and “Fix” and then push 3) rebase “Fix” on top of “Update” and then push. Obviously dropping commits is not a good idea, so either a merge or a rebase are recommended, and both would create a new commit that can be fast-forwarded from “Update”.

In Mercurial, the problem is called “multiple heads”. In Git “origin/master” and “master” are two different branches, but in Mercurial, they are two heads of the same branch. To solve the problem, you can start by running “hg heads“, which will show you all the heads of all the branches, in this case “Fix” and “Update” would be the heads of the “default” branch (aka. “master”). Then you have also three options: 1) force the push (hg push --force), although in appearance it looks the same as the Git command, it does something completely different; it pushes the new head to the remote 2) merge and push 3) rebase and push (you need the rebase extension). Once again, the first option is not recommended, because it shifts the burden from one developer to multiple ones. In theory, the developer that is pushing the new commit would know how to resolve the conflicts in case they arise, so (s)he is the one that should resolve them, and not take the lazy way out and shift the burden to other developers.

Either way solves the problem, but Git uses remote namespaces, which I already shown are useful regardless, and the other requires the concept of multiple heads. That is one reason why the concept of “anonymous heads”, that is used as an example of a feature Mercurial has over Git, is not really needed.

Mercurial bookmarks and the forced push problem

The biggest issue (IMO) I found with Mercurial bookmarks is how to create them in the first place. The issue is subtle, but it affects Git-like workflows, and specially Git<->Mercurial bridges, either way it’s useful to understand Mercurial’s design and behavior.

Suppose you have a very simple repository:

Simple repository

In Git, “feature-a” is a branch, and you can just push it without problems. In Mercurial, if “feature-a” is a bookmark, you can’t just push it, because if you do, the “default” branch would have two heads. To push this new bookmark, you need to do “hg push --force“. However, this only happens if the commit “Update” is made, also, you can push “feature-a” if it points to “Init”, and after pushing the bookmark, you can update it to include the “Feature A” commit. The end result is the same, but Mercurial barfs if you try to push the bookmarks and the commits at the same time, and there’s an update on the branch.

There’s no real reason why this happens, it’s probably baggage from the fact that Mercurial bookmarks are not an integral part of the design, and in fact began as an extension that was merged to the core in v1.8.

To workaround this problem in git-remote-hg, I wrote my own simplified version of the push() method that ignores checks for new heads, because in Git there cannot be more than one head per branch. The code still checks that the remote commit of this branch is an ancestor of the new one, if not, you would need to do ‘git push –force’, just like in Git. Essentially, you get exactly the same behavior of Git branches, with Mercurial bookmarks.

Fixing Git

All right, I’m done trying to avoid judgement, but to try to be fair, I’ll start by mentioning the one (and only one) feature that Git lacks in comparison to Mercurial; find the branch-point of a branch, that is; the point where a branch was created (or rebased onto). It is trivial to figure that out visually, and there are scripts that do a pretty good job of finding that out from the topology of the repository, but there are always corner-cases where this doesn’t work. For more details on the problem and proposed solutions check the stackoverflow question.

Personally I’ve never needed this, but if you absolutely need this, it’s easy to patch Git, I wrote a few patches that implement this:

https://github.com/felipec/git/commits/fc/base

This implements the @{tail} notation, which is similar to the official @{upstream} notation, so you can do something like “development@{tail}”, which will point to the first commit the “development” branch was created on.

If this was really needed, the patches could be merged to upstream Git, but really, it’s not.

Fixing Mercurial

On the other hand fixing Mercurial wouldn’t be that easy:

  1. Support remote ‘hg strip’. Just like Git can easily delete remote commits, Mercurial should be able to.
  2. Support remote namespaces for bookmarks. Begin able to see where “sarah/development” points to, is an invaluable feature.
  3. Improve bookmark creation. So the user doesn’t need to force the push depending on the circumstances

Thanks to git-remote-hg, you can resolve 2) and 3) by using Git to work with Mercurial repositories, unfortunately, there’s nothing anybody can do for 1), it’s something that has to be fixed in Mercurial’s core.

Conclusion

I often hear people say that what you can achieve with Git, you can achieve with Mercurial, and vice versa, and at the end of the day it’s a matter of preference, but that’s not true. Hopefully after reading this blog post, you are able to distinguish what can and cannot be done in each tool.

And again, as usual, all comments are welcome, so if you see a mistake in the article, by all means point it out.

Cheers.