Is ‘git pull’ broken? If so, what’s the fix?

Is ‘git pull’ really broken? I know what you are thinking; such a pervasive and basic command cannot possibly be broken. Unfortunately, it is.

It is not some marginal issue, many experienced Git users avoid ‘git pull’ and even urge newcomers to avoid using that command, there’s many sites that encourage you to not use the command, and there have been a lot of threads on the mailing list about the issue (Pull is mostly evil, A failing attempt to use Git in a centralized environment), the maintainer, Junio C Hamano has accepted there’s a big problem, even Linus Torvalds agreed something needs to change.

In order to identify the problem we first need to define the two main ways ‘git pull’ is used.

Pull requests

One way ‘git pull’ is used, is to integrate pull requests into the mainline. For example in the Linux kernel, the DRM maintainer sends a pull request to Linus Torvalds, saying basically:

The following changes are available in the git repository at:

git://people.freedesktop.org/~airlied/linux drm-next

So Linus can just do:

git pull git://people.freedesktop.org/~airlied/linux drm-next

In this mode ‘git pull’ actually works fine, which is not too surprising, since it’s the main thing Linus Torvalds does.

However, this is not the way most people use ‘git pull’.

Update branch

What most people do is for example update their local ‘master’ branch, to the remote ‘origin/master’ branch. Essentially doing ‘git fetch origin’, ‘git merge origin/master’.

However, that’s not exactly what most people actually want to do.

If you don’t have any changes of your own in ‘master’, then yes, ‘git pull’ does what you want, but if you do have changes, and thus the branches have diverged, then ‘git pull’ will create a new merge commit. This might or might not be what you want, but the majority of Git newbies do not want that, or rather, the team they contribute to don’t want those “evil merges”. Unfortunately these newbies don’t know what they are doing, and Git is not making it easier.

So you end up with something like this:

git-pull

Most likely what the team wants is that the local chances are rebased on top of the remote ones, but if they want a merge, they want it the other way around, that is: merge the local changes to the remote ones, as if a topic branch was merged.

git-pull-fix

A merge with this order of parents has many advantages, including a clearer history, however, it’s not possible to do that with ‘git pull’, so you have to do ‘git fetch’, create a new branch, switch to the master branch, merge the other branch, and finally remove the other branch. It’s not straight-forward at all.

It is this mode that is broken, and that’s the reason many people try to avoid ‘git pull’; it rarely does what you want by default.

The solution

There have been many solutions proposed, however, there are many many use-cases to consider, and a solution that takes them all into consideration for the future is not easy to find.

The best solution that seems to accommodate all present use-cases and future ones is the introduction of a new command: ‘git update‘.

By default this command will complain if the branches have diverged, so you have to either do ‘git update --rebase‘ or ‘git update --merge‘, this ensures that newbies aren’t going to do “evil merges” by mistake.

Also, when you do a ‘git update --merge‘ the order of the parents is reversed, which means it appears you are merging ‘master’ to ‘origin/master’, and not the other way around as it happens with ‘git pull’, which means it appears as if you are merging a topic branch, which is what most people want.

git-update

There are many many more advantages to this new command, but probably too subtle to mention in this post.

When will this be ready?

Probably never. I sent a summary of the issues and the solution to the mailing list, which addresses all the use-cases that were discussed. I have the required patches with tests and documentation on my personal branch, and I’ve been using this new command for a while now.

Why isn’t this picked? Maybe it’s because none of the core developers experience these issues. Maybe because they don’t use ‘git pull’ in the second form. Who knows.

The fact is that there is no interest to get this fixed, even though the issue has been acknowledged, so it’s not likely to be fixed any time soon.

So what can you do about it? The best thing you can do right now is simply avoid using ‘git pull’. Additionally, you might want to instruct your fellow coworkers to avoid unsing it as well, specially the ones that are not very familiar with Git.

Also, you might want to use my fork, git-fc, which does have the ‘git update‘ command, which works better than ‘git pull‘ even when there’s no branch divergence, and when there is, ‘git update --merge‘ is also superior, because the order of the parents is right.

Advertisements

Using Git with triangular workflows; tips, tricks, and more

Chances are you are using a triangular workflow, even if you don’t know it. A triangular workflow simply means that you pull from one repository, and push to another. This is what the vast majority of Git users do, unfortunately most of the good stuff is buried in the nearly incomprehensible official manpages.

In this blog post I’ll try to shine some light into triangular workflows, how to make use of the upstream tracking branch for them, and explain the new publish tracking branch.

The basics

Say you clone a repository:

% git clone https://github.com/tiimgreen/github-cheat-sheet
% cd github-cheat-sheet

Then you do some changes and want to share them back.

What most people would do is create a fork in GitHub and push their changes there.

% git remote add mine https://github.com/felipec/github-cheat-sheet
% git push mine

After doing that they do a pull request so their changes can be merged to the original repository.

This workflow is not specific to GitHub by any means, for example the Linux kernel developers have the main repository in git.kernel.org, and they send pull requests by mail using repositories all over the map (example).

The help

If you do this over and over it becomes clear that a little help from Git would be nice.

The first thing you can do is setup the configuration ‘remote.pushdefault’ to the repository you usually push to (in the above case ‘mine’). So now you can type `git push` instead of `git push mine` every time.

The next thing would be to setup an upstream tracking branch (read my blog post about it if you are not familiar with it).

% git branch --set-upstream-to mine/fix-typos

Then Git would greet you with the following help:

Your branch is ahead of 'mine/fix-typos' by 1 commit.

This is telling you that you probably want to push your branch again, since it’s not up-to-date in the remote. It shows you that each time you switch to that branch, or when you do `git status`.

Moreover, `git branch -vv` would show you this help:

* fix-typos ... [mine/fix-typos: ahead 1] Fix a bunch of typos

So it seems Git already has tons of help for this workflow, doesn’t it? Not so fast.

The real upstream

The upstream tracking branch is useful for other purposes, but for that we need to set a different upstream:

% git branch --set-upstream-to origin/master

Now that the upstream is ‘master’ in the ‘origin’ remote, and when you run `git status`, you get:

Your branch and 'origin/master' have diverged,
and have 2 and 10 different commits each, respectively.

What that message is telling you is that ‘origin/master’ has moved, so there are 10 commits in ‘origin/master’ that your branch doesn’t have (and your branch has 2 commits ‘origin/master’ doesn’t have). In those cases you probably would want to rebase on top of ‘origin/master’ so that it’s easier for upstream maintainers to merge your branch, although you can merge ‘origin/master’ too, or simply do nothing and hope there are no conflicts. Either way the information is useful so you can decide what to do.

In addition, if you want to rebase, the command is easier; instead of `git rebase origin/master` you can just type `git rebase`, since `git rebase` by default uses the upstream tracking branch.

Moreover, if you always stay up-to-date, you can do `git pull --rebase`, which will fetch all remote the branches, and then rebase your current branch (e.g. ‘fix-typos’) on top of the upstream (e.g. ‘origin/master’). You can also configure ‘pull.rebase = true’ to always do this when you type `git pull`.

Not to mention that `git branch -vv` gives a much more useful information:

* fix-typos ... [master: ahead 2, behind 10] Fix a bunch of typos

Check how it looks in my real repository:

git branch --vv with upstream

You get other additional benefits, like for example you get warned if you try to delete a branch that hasn’t been merged to its upstream:

warning: not deleting branch 'fix-typos' that is not yet merged to
'origin/master', even though it is merged to HEAD.
error: The branch 'fix-typos' is not fully merged.
If you are sure you want to delete it, run 'git branch -D fix-typos'.

This is actually what the upstream tracking branch is meant for: to track the upstream, that is; the target branch where eventually all the commits of the source branch eventually should end up. All the commits of ‘fix-typos’ should end up in ‘origin/master’, therefore ‘origin/master’ is the upstream of ‘fix-typos’.

We want to have all the goodies of tracking ‘origin/master’ as our upstream, but we also want to track ‘mine/fix-typos’ so we know when we need to push. Unfortunately we can’t set them both as upstream, so we must choose one set of benefits over the other. Or should we?

The solution

The solution is not that hard to figure out: we need another upstream! Or rather; we need some concept that is similar to the upstream tracking branch, but instead of tracking the final destination, we track the location we push our commits to.

This is the publish tracking tracking branch.

When you set it up, you get all the information:

Your branch and 'origin/master' have diverged,
and have 2 and 10 different commits each, respectively.
Some commits haven't been published to 'mine/fix-typos'.

* fix-typos ... [origin/master, mine/fix-typos *: ahead 2, behind 10]

Notice the extra ‘*’ next to the publish branch, which hints that it needs to be published.

Also, you can type `git pull` and `git rebase`, which will use the upstream branch as you would expect, and `git push` which will use the publish branch.

In other words; everything just works perfectly.

You set up the publish branch just like you set up the upstream branch:

% git branch --set-publish-to mine/fix-typo

Or:

% git push --set-publish mine

But wait, there’s more: you are not tied to push to a single remote; you can set different branches in different remotes as publish tracking. For example ‘fix-typos’ to ‘github/fix-typos’, ‘bug-fix’ to ‘client/bug-fix’, and so on. You can even choose a different branch name in the remote: ‘client-b-bug-fix’ to ‘client-b/bug-fix’.

Nice, isn’t it?
git branch -vv publish

The problem

There is only one problem with the publish branch: it’s not in upstream git 😦

It is part of my fork, git-fc. If you use my fork, you will get this and other features, and you won’t loose any feature from official Git. Or you can use the specific branch, ‘fc/publish‘.

I’ve been using this code for more than half a year, and it has been reviewed in the Git mailing list, so you can trust it won’t eat your babies 🙂

Why isn’t it in official Git?

WARNING: if you don’t like conflicts or you know me for “adversarial” style (and don’t like it), skip this section

That’s a very good question. If the maintainer (Junio C Hamano) has accepted the triangular workflows are lacking, and a separate ‘upstream’ tracking branch is needed. Why isn’t it there?

The short answer is that they have an ad hominem thing against me, so even if my patches are correct and they solve a long-standing problem, they are not applied. They are only picked if they are trivial, or not controversial, or obvious fixes. Which is why I started a fork.

I sent the original version of the patches in September 2013, with virtually no comments. Then on January 2014 people start discussing (once again) about the issues with triangular workflows, and even complain about the lack of @{publish}. Eventually they start writing preparatory patches. But I had already written the whole thing several months ago!

It can’t be attributed to the fact they went inadvertently unnoticed because I re-sent the series once, and because I wrote about the support for @{publish} when I announced the git-fc fork.

Then I returned to the project after a long hiatus, and noticed they were working on something I already did, so let them know and send the patches again. This time they receive more feedback, and even make it into Junio’s “pu” (proposed updates) branch. Patches are often dropped from “pu”, sometimes for no reason at all, so this is not a reason they will get in.

This is the message Junio attached to the patch series:

 Add branch@{publish}; it seems that this is somewhat different from
 Ram and Peff started working on.  There were many discussion
 messages going back and forth but it does not appear that the
 design issues have been worked out among participants yet.

The “design issues” have not been worked out because “Ram” is not actively working on Git anymore (possibly thanks to the fact that nothing ever changes), and “Peff” said he wasn’t interested in the @{publish} concept, but more like a @{push} concept which will only benefit him and his weird bare-bones mode of interacting with Git. The fact that the @{publish} concept is what would benefit a vast majority of the user base is of no consequence to “Peff”.

So will it ever get into Git’s mainline? Who knows.

Get the goodies

If you want to use the publish tracking branch feature, get git-fc and follow the installation instructions. In addition you would get a ton of other features, and will loose none 🙂

If you use ArchLinux, you can get the package from AUR.

Enjoy 🙂

Announcing git-fc; a friendly fork of Git

I’ll start with the obvious question; why a fork? Well, the short answer is; my patches are not being applied, the long answer is convoluted and would require long explanation of how Git development works, principles and guidelines, but more importantly the culture of the core developers, and I’m not going to get into that, maybe in the comments section if somebody is interested.

So what is git-fc? It is a friendly fork, and by that I mean that it’s a fork that won’t deviate from the mainline, it is more like a branch in Git terms. This branch will move forward close to Git’s mainline, and it could be merged at any point in time, if the maintainer wished to do so.

git-fc doesn’t include experimental code, or half-assed features, so you can expect the same level of stability as Git’s mainline. Also, it doesn’t remove any feature, or do any backwards incompatible changes, so you can replace git with git-fc and you wouldn’t notice the difference. The delta comes in the extra features that I’ll describe in detail below, that is all.

Who am I? I’ve contributed many patches to Git, mainly the git-remote-hg/bzr two-way bridges, but many many other things. Here’s a list of the top 10 contributors to Git since last year by number of patches:

% git shortlog --since='1 year ago' --no-merges -n -s | head -n 10
   388	Junio C Hamano
   308	Felipe Contreras
   230	Jeff King
   161	Nguyễn Thái Ngọc Duy
   122	Michael Haggerty
   103	Ramkumar Ramachandra
    96	John Keeping
    69	Eric Sunshine
    59	Thomas Rast
    51	René Scharfe

More info in ohloh.

As you see, I’ve done a lot of work for Git’s mainline, so chances are you have already benefited from my code one way or the other.

However, the most interesting patches are not merged. I wrote a summary of my 160 patches, explaining their status, so Git developers would prioritize them, but I think it’s fair to say they are just not going to apply them.

So, what do you get if you use git-fc?

@ shortcut

Many people have suggested a shortcut for the non-particularly-intuitive “HEAD”, but none of these suggestions seemed very appealing, or feasible.

Because Git already has an ref@op revision syntax, where if you remove the ref, HEAD is implied, I thought @ could be thought as HEAD.

This change was welcome and accepted by the Git mainline, and it even was on track for v1.8.4 but it was dropped last minute because of some issues that are fixed now, and you probably will see it in v1.8.5. But why wait? 🙂

Nice ‘branch -v’

If you have configured the upstream tracking branch for your branches (I wrote a blog post about them), when you do ‘git branch -v’ you see something like this:

  fc/branch/fast      177dcad [ahead 2] branch: reorganize verbose options
  fc/stage            abb6ad5 [ahead 14] completion: update 'git reset' ...
  fc/transport/improv eb4d3c7 [ahead 10] transport-helper: don't update ...

While that provides useful information, it doesn’t show the upstream tracking branch, just says “ahead 2” but “ahead 2” compared to what?

If you do ‘git branch -vv’, then you see the answer:

  fc/branch/fast      177dcad [master: ahead 2] branch: reorganize ...
  fc/stage            abb6ad5 [master: ahead 14] completion: update ...
  fc/transport/improv eb4d3c7 [master: ahead 10] transport-helper: don't ...

Unfortunately both options take a lot of time (relative to most Git commands which are instantaneous), because computing the “ahead 2” takes a lot of time. So I decided to switch things around, so ‘git branch -v’ gives you:

  fc/branch/fast      177dcad [master] branch: reorganize verbose options
  fc/stage            abb6ad5 [master] completion: update 'git reset' new ...
  fc/transport/improv eb4d3c7 [master] transport-helper: don't update refs ...

And it does so instantaneously.

Default aliases

Many (if not all) version control system tools have shortcuts for their most common operations; hg ci, svn co, cvs st. But not Git. You can configure your own aliases manually, but you might have some trouble if you use somebody else’s machine.

Adding default aliases is trivial, it helps everyone, and it doesn’t hurt anyone, yet the patch to do so was rejected.

For now, there are only four aliases, but more can be added later if they are requested.

co = checkout
ci = commit
rb = rebase
st = status

If you have already these aliases, or mapped to something else, your aliases would take precedence over the default ones, so you won’t have any problems.

Streamlined remote helpers

I have spent a lot of time working on git-remote-hg and git-remote-bzr, and although they are relatively new, they have proven to be quite stable and solid, yet they are only part of the “contrib” area side by side with much simpler and way less solid scripts.

In order these in Git mainline you might need a bit of tinkering, and it’s not straight-forward to package them for distributions.

With git-fc they are installed by default, and in the right way, making things easier for distributions.

Improvements to the transport helper

The two way bridges between Git and Mercurial/Bazaar already work quite well, but they lack some features, specifically you cannot do –force, or –dry-run, or use an old:new refspec. If you are not familiar with the old:new refspec; you can do ‘git push master:my-master’, which would push your ‘master’ branch, as if it was named ‘my-master’ in the remote repository.

This is extremely useful if you are really serious about using Git as a transparent client to access a Mercurial repository.

New core.mode configuration

Git is already preparing users for the v2.0 release which would bring minor backward compatibility breakage, but some people would rather get rid of the warnings which are going to stay probably for many releases more and just move to the new behavior already.

Testing Git v2.0 behavior today would not only help git-fc, but also the Git mainline, and you can do that by setting core.mode = next, so if you do this and provide feedback about any issues, that would be greatly appreciated. Unfortunately you cannot test the v2.0 behavior in Git mainline because they rejected the patches, but you can in git-fc.

Please note that the v2.0 behavior might change in the future, before v2.0 is released, so if you enable this mode you need to be aware of that. Chances are you are not going to notice any difference anyway.

In addition to the “next” (v2.0) mode, there’s the “progress” mode. This mode enables “next” plus other configurations that have been proposed to change by default in v2.0, but hasn’t yet been agreed.

In particular, you get these:

merge.defaulttoupstream = true
branch.autosetupmerge = always
mergetool.prompt = false

There might be more in the future, and suggestions are welcome.

It is recommended that you setup this mode for git-fc:

git config --global core.mode progress

Non-ff pulls rejected by default

Even in the Git project everybody has agreed this is the way to go in order to avoid the typical Git newbie making the mistake of doing a merge, when perhaps (s)he wanted to do git reset, or git rebase. With this change git complains that that a non-fast-forward branch is being pulled, so the user has to decide what to do.

The user would have to do either ‘git pull --merge‘ or ‘git pull --rebase‘, the former being what Git mainline currently does.

The user can of course choose the old behavior, which is easy to configure:

git config --global pull.mode merge

Official staging area

Everybody already uses the term “staging area” already, and Git developers also agreed it the best term to what is officially referred to as “the index”. So git-fc has new options for all commands that modify the staging area (e.g. git grep –staged, git rm –staged), and also adds a new git stage command that makes it easier to work with the staging area.

'git stage' [options] [--] [...]
'git stage add' [options] [--] [...]
'git stage reset' [-q|--patch] [--] [...]
'git stage diff' [options] [] [--] [...]
'git stage rm' [options] [--] [...]
'git stage apply' [options] [--] [...]
'git stage edit'

Without any command, git stage adds files to the stage, same as git add, same as in Git mainline.

New fetch.default configuration

When you have configured the upstream tracking branch for all your branches, you will probably have tracking branches that point to a local branch, for example feature-a pointing to master, in which case you would get something like:

% git fetch
From .
 * branch            master     -> FETCH_HEAD

Which makes absolutely no sense, since the ‘.’ repository is not even documented, and FETCH_HEAD is a marginally known concept. In this case git fetch is basically doing nothing from the user’s point of view.

So the user can configure fetch.default = simple to get a simple sensible default; ‘git fetch‘ will always use origin by default, which is not ideal for everyone, but it’s better than the current alternative.

If you use the “progress” mode, this option is also enabled.

Publish tracking branch

Git mainline doesn’t have the greatest support for triangular workflows, a good solution for that is to introduce a second “upstream” tracking branch which is for the reverse; the branch you normally push to.

Say you clone a repository (libgit2) in GitHub, then create a branch (feature-a) and push it to your personal repository, you would want to track two branches (origin/master), and (mine/feature-a), but Git mainline only provides support for a single upstream tracking branch.

If you setup your upstream tracking branch to origin/master, then you can just do git rebase without arguments and git will pick the right branch (origin/master) to rebase to. However, git push by default will also try to push to origin/master, which is not what you want. Plus git branch -v will show how ahead/behind your branch is compared to origin/master, not mine/feature-a.

If you set up your upstream to mine/feature-a, then git push will work, but git rebase won’t.

With this option, git rebase uses the upstream branch, and git push uses the publish branch.

Setting the publish tracking branch is easy:

git push --set-publish mine feature-a

Or:

git branch --set-publish mine/feature-a

And git branch -v will show it as well:

fc/branch/fast      177dcad [master, gh/fc/branch/fast] branch: ...
fc/stage            abb6ad5 [master, gh/fc/stage] completion: ...
fc/transport/improv eb4d3c7 [master, gh/fc/transport/improv] ...

Support for Ruby

By far the most complex and interesting feature, but unfortunately also the one that is not yet 100% complete.

There is partial optional support for Ruby. Git already has tooling so any language can use it’s plumbing and achieve plenty of tasks:

IO.popen(%w[git for-each-ref]) do |io|
io.each do |line|
sha1, kind, name = line.split()
# stuff
end
end

However, this a) requires a process fork, and b) requires I/O communication to get the desired data. While this is not a big deal on many systems, it is in Windows systems where forks are slow, and many Git core programs don’t work as well as they do in Linux.

Git has a goal to replace all the core scripts with native C versions, but it’s a goal only in name that is not actually pursued. In addition, that still leaves out any third party tools since Git doesn’t provide a shared libgit library, which is why an independent libgit2 was needed in the first place.

Ruby bindings solve these problems:

for_each_ref() do |name, sha1, flags|
# stuff
end

The command ‘git ruby‘ can use this script by providing the bindings for many Git’s internal C functions (though not all), which makes it easier to write Ruby programs that take full advantage of Git without any need of forks, or I/O communication.

Conclusion

As you might guess, I’ve spent a lot of time working on all these features, plus all the ones that are already merged in Git’s mainline. Hopefully they are useful to some people.

It’s easy to compile and install:

make install

By default git will be installed in your home directory, but you can also do what I do: ‘make prefix=/opt/git install‘, and add ‘/opt/git/bin’ to your $PATH. All you need is a few development packages; zlib, curl, expat, openssl.

The code is in Github, the home page is in Google code, and the mailing list in Google groups. All comments and patches are welcome.

You can find future comments and releases in this blog, under the git-fc tag.

git-fc