Why is git pull broken?

A lot of people complained that my previous post–git update: the odyssey for a sensible git pull–was too long (really? an article documenting 13 years of discussions was long?), and that a shorter version would be valuable. The problem is that the short version is actually too short:

Do not use git pull.

That’s it, really.

But why? Even thought it’s obvious for me, and many other developers why git pull is broken and should not be used by most users, presumably a lot of people don’t know that, since they continue to use it.

Here it is.

Caveat

Let’s start by explaining where git pull is not broken.

It was created for maintainers; when you send a pull request, a maintainer is supposed to run git pull on the other side. For this git pull works perfectly fine.

If you are a developer (non-maintainer), and use a topic branch workflow, then you don’t even need git pull.

That leaves developers who work on a centralized workflow (e.g. trunk-based development). The rest of the article is with them in mind, who unfortunately are the vast majority of users, especially novices.

It creates merge commits

What most people want to do is synchronize their local branch (e.g. “master”) with the corresponding remote branch (e.g. “origin/master”), in particular because if they don’t, git push fails with:

To origin
! [rejected] master -> master (non-fast-forward)
error: failed to push some refs to 'origin'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull …') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

OK, so we need to “integrate” the remote changes with git pull, so presumably git pull is the mirror of git push, so it makes sense.

Except it’s not, and git pull was never designed with this use case in mind. The mirror of git push is git fetch which simply pulls the remote changes locally so you can decide later on how to integrate those changes. In Mercurial hg pull is the equivalent of git fetch, so in Mercurial hg push and hg pull are symmetric, but not in git.

At some point in time a path to make git pull be symmetric to git push was delineated, but the maintainer of the Git project considered it “mindless mental masturbation“, so forget about it.

After you have pulled the changes with git fetch, then there’s two possibilities: fast-forward and diverging.

fast-forward

A fast-forward is simple; if the local branch and the remote branch have not diverged, then the former can be easily updated to the latter.

In this case “master” (A) can be fast-forwarded to “origin/master” (C) (only possible if the branches have not diverged).

merge

However, if the branches have diverged, it’s not that easy:

In this case “master” (D) has split of “origin/master” (C) so a new commit (E) is needed to synchronize both.

rebase

There’s another more advanced possibility if the branches have diverged:

In this case the diverging commit of the local branch “master” (D) is recreated on top of “origin/master” (C) so the resulting history is linear (as if it never had diverged in the first place and the base of the local branch was C).

Choices

OK, so if the branches have diverged you have two options (merge or rebase), which one should you pick? The answer is: it depends.

Some projects prefer a linear history, in those cases you must rebase. Other projects prefer to keep the history intact, so it’s fine if you merge. If you don’t do many changes then most of the time you can fast-forward.

Most experts would do a rebase, but if you are new to git a merge is easier.

We are still nowhere near a universal answer, and what do most people do when the answer is not clear? Nothing. By default git pull does a merge, so that’s what most people end up doing by omission, but that’s not always right.

So that’s the first problem: git pull creates a merge commit by default, when it shouldn’t. People should be doing git fetch instead and then decide whether to merge or rebase if the branches have diverged (a fast-forward is not possible).

Merges are created in the wrong order

Let’s say the project allows merges, in that case it’s OK to just do git pull (since the default action is merge) right?

Wrong.

This is what git pull does by default: a merge commit. However, it’s merging “origin/master” (C) into “master” (D), but upstream is the remote repository, not the local one.

The order is wrong:

This is a correct merge: the local “master” (D) is merged into the remote “origin/master” (C). A similar result would happen if you had created a topic branch for D, and then merged that into “master”

In git, merge commits are commits with more than one parent, and the order matters. In the example above the first parent of E is C, and the second one is D. To refer to the first parent you do master^1, the second is master^2.

Proper history

Does it really matter which is the first parent? Yes it does.

Correct vs. incorrect order

In the correct history (left) it’s clear how the different topic branches are integrated into “master” (blue). Visualization tools (e.g. gitk) are able to represent such history nicely. Additionally you can do git log --first-parent to traverse only the main commits (blue).

In the incorrect history (right) the merges are a mess. It’s not clear what merged into what, visualization tools will show a mess, and git log --first-parent will traverse the wrong commits (green ones).

Better conflict resolution

If that wasn’t enough, at the time of resolving conflicts it makes more sense to think of integrating your changes to upstream (“origin/master”) rather than the other way around. Mergetools like meld would present the flow correctly: from right to the middle.

Consensus

Update: In the original version of the article I only concentrated on the facts, and I didn’t include the opinion of other developers, but since there seems to be a lot of people ignoring the facts, and distrusting my judgement, I’ve decided to list some of the developers who agree git pull isn’t doing what it should be doing (at least by default, for non-maintainers).

Conclusion

So every time you do a merge, you do it wrong. The only way to use git pull correctly is to configure it to always do a rebase, but since most newcomers don’t know what a rebase is, that’s hardly a universal solution.

The proper solution is my proposal for a git update command that creates merge commits with the correct order of the parents, does only fast-forwards by default, and can be properly configured.

So there you have it. Now you know why git pull is definitely broken and should not be used. It was never intended to be used by normal users, only maintainers.

Do git fetch instead, and then decide how to integrate your changes to the remote branch if necessary.

8 thoughts on “Why is git pull broken?

  1. Pingback: git update: the odyssey for a sensible git pull – Felipe Contreras

  2. You wouldn’t believe how many developers don’t understand this and how many more do not even care when you try to explain. Your writing a post about this is an act of heroism at this point.

    Liked by 1 person

  3. Great article, one piece of commentary: it would be great if the article actually included instructions/examples to choose between a merge and rebase after the git fetch!

    I use something similar to follows:

    git fetch origin master
    git log master..origin/master (look at changes)

    Then what? (git merge FETCH_HEAD or git rebase FETCH_HEAD?)

    How do you execute the correct merge scenario described above?

    Liked by 2 people

    • OK. I’m not opposed to that, but most of those options are not necessary: if you are in “master”, then “master” is implied. And whether or not you have an upstream branch configured “origin” will end up being used, so `git fetch origin master` is the same as `git fetch`.

      Then, I would not recommend a merge, so `rebase` it is. `git rebase FETCH_HEAD` would be the best, but if you did `git fetch` as before, and you have configured the upstream branch, `git rebase` suffices.

      Granted, **if** you haven’t configured the upstream branch, then `FETCH_HEAD` is needed. I’ll consider it for a bit.

      Liked by 1 person

  4. There is one subtle difference:

    git fetch implies a destination in the refspec and applies a fast-forward if it can.

    git fetch origin master has no destination in the refspec so consequently will show meaningful differences for git log command enabling you to then run `git merge –ff-only origin/master` if preferred or do a rebase.

    I admit a lot of this is academic and has little practical consequence, either way, I found your article really useful for understanding some finer points I had missed!

    Cheers

    Like

  5. Thanks for the great article, it really helped me understanding the git behavior.

    I would like to share this article with colleagues in my country who are not native english speakers.
    Could you allow me to translate this article (and post it to other sites)?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.