Understanding the Enigma machine

I was fascinated by the movie The Imitation Game, not just because it brings awareness of a great man that advanced our civilization tremendously, and the great injustice he suffered, but also because it presents the study of cryptanalysis, something that most people don’t even know it exists, but it’s incredibly important when dealing with information, specially in our modern day and age.

However, the Enigma machine to me was simply that; an enigma. I’m not a mechanic, so you put that thing in front of me, and it would take me forever to understand what it does, if I ever manage to find the interest to do so. I was happy thinking it was a magic box.

That was until Albert Still decided to write the code of the machine in Ruby (my favorite computer language), which he explained in a blog post. I’m a programmer, code I understand, and this was 30 lines, in a minute I understood the machine (literally).

I was blown away by the simplicity of it, and I thought: hey! anybody can understand this. And ultimately that’s the beauty of cryptography; it doesn’t matter if you know exactly how the algorithm works; you still cannot decrypt the message. This is what security today relies on; everybody knows the algorithms running in your web browser, yet you are secure accessing your bank account, because those algorithms are cryptographically secure. The phrase “cryptographically secure” might not mean much to most people, but it’s really important.

I will try to explain how the Enigma machine works in simple terms, if you are a programmer, you might be better off just reading the code.

The reflector

You don’t need to understand this code, but it might help to understand the algorithm.

$reflector = Hash[*CHARS.to_a.shuffle]
$reflector.merge!($reflector.invert)

reflector

So, what this means is that we pair each one of the 26 characters (A to Z) with another one randomly, so for example W is paired with L, which means that whenever we find a W, we switch it with an L, and when we find and L, we switch it with a W.

If we run this algorithm with the text HI, we get RF (H=>R, I=>F), pretty simple. The interesting thing is what happens when we feed this back to the algorithm; it becomes HI again (R=>H, F=>I). This is why it’s called a reflector.

This is actually so simple that you don’t even need a machine to do the conversion, you can even do it manually by looking to a piece of paper with the mapping. And there’s nothing cryptoraphic about this; if the Enigma machine only had this algorithm, you only need to steal one machine, and you could decypher every message immediately. It’s not cryptographically secure at all.

You intercept the message RFJWNH, you feed this to the machine, and you get HITLER. And that’s it.

Let’s put a cryptographic value to this algorithm: 0. It’s useful, but not for cryptographic reasons.

The rotor

Let’s jump to something more complicated.

$rotor = Hash[CHARS.zip(CHARS.to_a.shuffle)]

rotor_0

This time each character gets another character, randomly, and there’s no reciprocity (A=>K, K=>V). This is the twist; here the rotor starts with K, however it could be configurable, so let’s say, tomorrow it starts with N, then the values associated rotate, and you get this:

rotor_1

Now it’s not so easy any more. You receive the message DGOKIP, but you can’t do anything with that unless you know which was the first value, or “key” (in this case it was E). The only alternative you have is to do what is called a brute force attack; you try every possibility. Fortunately there are only 26 possibilities, so soon enough you will stumble with the key E, and unlock the message: HITLER.

rotor_2

The value of this is: 26. It’s not much, but it’s better than zero.

The rotor, part two

We’ve managed to make things a bit difficult for our cyrptoanalysists, however if say, they notice the character G appearing too often in today’s messages, they’ll assume that perhaps G is actually a vowel, we need to make things mote difficult for them.

As right know, the message III would be encrypted into GGG; that’s too easy. Instead, what we can do is rotate the first part of the rotor each time a character is processed, so III, becomes GDM (I=>G, rotate, I=>D, rotate, I=>M)

rotor_2
rotor_3
rotor_4

This doesn’t really increase the possibilities to test, but makes their job harder.

The rotor, part three

Since the thing is already rotating it would make sense to start with something other than A. This starting position is also part of the key, and again, you need to get it right in order to decrypt the message properly.

So you have 26 ways to configure the rotor, and 26 ways to start it, now the value is: 676. This would take quite a bit of time to go through each and every possibility now.

The plugboard

This is where the fun begins.

$plugboard = Hash[*CHARS.to_a.shuffle.first(20)]

plugboard

We take 20 random characters and we pair them to each other. In a way, this is similar to the reflector, except this is configurable, and this time we are not picking 1 out of 26, the combinations are many more than that.

The formula to find the number of ways to choose m pairs out of n objects is: n! /((n-2m)! m! 2m). We are picking 10 pairs out of 26 objects, so: 26! / (6! 10! 2^10). The result is: 150,738,274,937,250.

That would take a bit more to test :/

More rotors

Each rotor needed 676 tries to brute force, why not add two more? That moves us up to 308,915,776.

While we are at it, make the order if the rotors part of the daily key, that’s 3 * 2 * 1: 6 possibilities.

And why not add two more to pick from, so every day you pick 3 out of 5; 5 * 4 * 3: 60 possibilities.

In total, that’s 18,534,946,560 just from the rotors.

And hey, make them rotate at different speeds to make the job of the analysts even harder.

Bring it home

Put everything together, and the process goes like this:

Enigma machine

  1. Plugboard
  2. Rotor 1
  3. Rotor 2
  4. Rotor 3
  5. Reflector
  6. Rotor 3
  7. Rotor 2
  8. Rotor 1
  9. Plugboard

So, here is a simple message: YWXRVH. In order to decrypt it you need the full key: the whole plugboard, the configuration of the rotors, and their starting position. Even if I tell you the original message was HITLER, you would still need to do a lot of work.

For the record, this was the key used to generate that message:

I V III, BFR, SD HY GM EB UO LJ WZ QT AC FR, OIZ

If you try every key until you find it, you potentially would need 2,793,925,870,508,516,103,360,000 tries. Clearly, pure brute force is not the way to solve the problem :/

This is just the machine itself, on top of that there were many protocols to cypher the message even more, but let’s just leave it at that.

Back to the present

That is the power of cryptography; understanding the machine, understanding the algorithm gives you absolutely no leverage, that is the easy part. You are supposed to understand it, and still be unable to crack it.

The algorithm in Enigma is puny compared to modern algorithms which are incredibly complex and with a lot of research behind them. That’s what keeps the communication to your bank secure, and even though most people don’t know it, you can use these algorithms to send secure messages to anyone that in theory not even the government using the most powerful supercomputers can decrypt.

I think it’s time we stop saying “this is not rocket science”; rocket science is easy, we should be saying “this is not cryptanalysis”.

The white and gold dress, and the illusion of free will

At first I didn’t really understand what was all the fuzz about, the dress was obviously white and gold, and everybody that saw it any other way was wrong, end of story. However I saw an article in IFLScience that explained why this might be an optical illusion, but I still thought I was seeing it right, the other people were the ones getting it wrong. Then I saw the original dress:

Original dress

#TheDress

Well, maybe it was a different version of the dress, or maybe the colors were washed away, or maybe it was a weird camera filter, or a bug in the lens. Sure, everything is possible, but maybe, I was just seeing it wrong.

I’ve read and heard a lot about cognitive science and the more we learn about the brain, the more faults we find in it. We don’t see the world as it is, we see the world as it is useful for us to see the world. In fact, we cannot see the world as it is, in atoms and quarks, we cannot, because we don’t even fully understand it yet. We see the world in ways that managed to get us where we are, we sometimes get an irrational fear of the dark and run quickly up the stairs in our safe home even if we know there can’t possibly be any tigers chasing behind us, but in the past it was better to be safe than sorry, and the ones that didn’t have that fear gene are not with us any more; they got a Darwin award.

I know what some people might be thinking; my brain is not faulty! I see the world as it truly is! Well, sorry to burst your bubble, but you don’t. Optical illusions are a perfect example, and here is one:

Optical illusion

If you are human, you will see the orange spot at the top darker than the one at the bottom, why? Because your brain assumes the one at the bottom is a shadow, and therefore it should be darker. However, they are exactly the same color (#d18600 in hex notation), remove the context, and you’ll see that, put the context back, and you can’t see them the same, you just can’t, and we all humans have the same “fault”.

This phenomenon can be explained by the theory of color constancy, and these faults are not limited to our eyes, but ears, and even rational thinking.

So, could the white and gold vs. blue and black debate be an example of this? The argument is that the people that see the dress as white and gold perceive it to be in a shadow behind a brightly lit part of a room, the people that see it as blue and black see it washed in bright light. Some people say they can see as both; some times white, some times blue.

XKCD

I really did try not to see it in a shadow, but I just couldn’t, even after I watched modified photos; I just saw a white and gold dress with a lot of contrast. I decided they were all wrong, no amount of lighting would turn a royal blue dress into white.

But then I fired GIMP (the open version of Photoshop), and played around with filters. Eventually I found what did the trick for me, and here you can see the progress:

So eventually I managed to see it, does that mean I was wrong? Well, yes, my brain saw something that wasn’t there, however, it happened for a reason, if the context was different, what my brain saw would have been correct. Perhaps in a parallel universe there’s a photo that looks exactly the same, but the dress was actually white and gold.

At the end of the day our eyes are the windows through which we see reality, and they are imperfect, just like our brains. We can be one hundred percent sure that what we are seeing is actually there, that what we remember is what happened, and that we are being rational in a discussion. Sadly one can be one hundred percent sure of something, and still be wrong.

The me the most perfect example is the illusion that we are in control of our lives. The more science finds out about the brain, the more we realize how little we know of what actually happens in the 1.5 kg meatloaf between our ears. You are not in control of your next thought any more than you are of my next thought, and when people try to explain their decisions, their reasons are usually wrong. Minds can be easily manipulated, and we rarely realize it.

There’s a lot of interesting stuff in the Internet about the subconscious and how the brain really works (as far as we know). Here’s is one talk that I particularly find interesting.

So, if you want to believe you are the master of your own will, go ahead, you can also believe the dress was white and gold. Those are illusions, regardless of how useful they might be. Reality, however, is different.

My favorite public intellectuals

Here’s a selection of my favorite public intellectuals. I love how these guys talk, write, and generally everything they do. Might be worth checking them out :)

Sam Harris

Sam Harris is an author, philosopher, and neuroscientist. Among his most notable books are The End of Faith, and The Moral Landscape. He has a blog, is on Twitter, appears on many TV shows as guest, has been on many debates, as well as lengthy talks, and has written numerous articles in respectable magazines such as The New York Times.

His topics mostly concentrate around religion, faith, morality, and science.

What I like about Sam Harris the most is the way he conveys very complex and nuanced ideas in a very effective way. He is very precise with words and has the patience to go on for ages in order to explain his ideas, but also, he is very witty and can deliver crushingly funny one-liners.

@samharrisorg

In the following video Harris is in a debate with a religious apologist and shows with very funny train of thought the ridiculousness of believing in things without evidence.

This is a quick talk at TED in which he explains how science can answer moral questions, which is the main idea behind The Moral Landscape.

Finally, my favorite talk, in which he basically destroys the idea of free will. Every minute in this hour long talk is pure gold.

Steven Pinker

Steven Pinker is an experimental psychologist, cognitive scientist, linguist, and popular science author. He is best known for his advocacy of evolutionary psychology, and the computational theory of mind.

Being an expert of language, the way he communicates in every medium is simply superb. Aside from linguistics, he goes into other topics, such as the history of violence, religion, and reason.

@sapinker

Here Pinker explains why taboos are bad, and political correctness can be dangerous.

This is a quick video where Pinker explains the importance of language in order to understand human nature.

Here’s a much longer version in which he goes into a lot of detail to explain language, and what we know about it.

Noam Chomsky

Noam Chomsky should need no introduction, he is a linguist, philosopher, cognitive scientist, logician, political commentator, anarcho-syndicalist activist. He has hundreds of books, countless articles, has been in many debates, constant talks all around the globe, in fact, he has done so many things in his life that there is even a documentary devoted to him; Noam Chomsky: Rebel Without a Pause. Not content with defining the whole field of modern linguistics at an early age, he devoted his life to political activism, even risking the well being of his own family. Today he is considered the most influential living intellectual, and the most cited author alive, right after Plato. Even at his advanced age and after losing his wife of almost 60 years, he continues to tirelessly inform the public about what happens in the world, and as he stated before, he will continue to do so as long as he is ambulatory.

Chomsky might not be the most entertaining public speaker, but what he lacks in charisma, he provides in full of content. He is basically a human encyclopedia, and he rarely states his opinion, everything he says is basically facts gathered from one place or another, and for every fact he says, he knows the reference where you can verify it.

It’s hard to find a short video that shows Chomsky’s brilliance, but this interview seems to do the job perfectly. Watch this interviewer get completely owned by Chomsky. Don’t forget part two.


Manufacturing Consent is one of Chomsky’s most powerful ideas, and if you are not in the mood of reading the book, this documentary explains the idea very well. It’s long, but you wouldn’t regret watching it.

Sorry Lennart, but you are wrong once again

Lennart Poettering’s post in G+ is gathering a lot of attention these days, most of the feedback is supportive, and positive, which is not surprising to me, because although Poettering would like us to believe otherwise, most of the open source community is pretty accommodating and non-confrontational.

I am however going to go against the current here, and criticize him, but first let me state clearly that I do not condone any physical attacks towards his person, or the threats of such. His ideas however are a different matter.

Lennart’s chief mistake is to attack the way the Linux’s kernel community is run, and say their success happens despite this. How does he know? Has he ever run a more successful community? Has anybody ever? Linux is the most successful software project in history, by more than one order of magnitude from any way you look at it. It would be presumptuous for anybody to say they know how to run this project better, specially without any evidence to back such claim, which is precisely what Poettering is doing.

In this blog I’ve analyzed the many reasons why the Linux kernel is so successful, and one of them is its combative style of discussion in which ideas are not exempt from ridicule, and strong language is often used to drive one’s point home as efficiently as possible. Many people in the community agree this is desirable, and there’s even scientific evidence that supports this notion; the best ideas arise in a confrontational environment, not in a protective one.

What’s more, Poettering himself accepts he hasn’t been involved in this community. So what the hell does he know about it? Nothing.

Poettering’s second mistake is to assume that for non-white, non-western, non-straight people the situation surely must be worst… That is not the case. Maybe, just maybe, he receives such vitriolic feedback not just because of what he does, but because of the horrible way he does it. Of course not, Poettering doesn’t need to change, his approach is perfect, in fact, the only reason he receives criticism is because he is too progressive, too audacious, too efficient, surely, that must be the reason!

Personally, my beef with Poettering starts from the fact that he blocked me from Google+. Why? Because I was complaining about a technical issue with systemd, which he initially spotted and commented, but then ignored. In the middle of the discussion I made some value judgements about certain systemd code, and he stopped responding and blocked me. That is the worst way to end a discussion; block the people who disagree with you.

Sorry Lennart, but actions have consequences, and you can only do so much disruptive changes to the Linux ecosystem without much care or consideration for others, there’s a limit to the amount of people you can block, and the criticism you ignore. You can grow as thick a skin as you want, you are still wrong. No community is going to let you continue being wrong and acting as if you are beyond reproach just like that (unless you run that community and have blocked any dissident voices of course).

Maybe it’s time to take a hard look in the mirror.

What’s missing in Git v2.0.0

I recently blogged about the Git v2.0.0 release, what changed, and why should you care. Unfortunately the conclusion was that nothing much changed (other than the usual new features and bug fixes). In this post I will discuss what should have changed, and why.

What is needed

Fortunately, Git has had the Git User’s Survey in the past, so we know what users want.

  1. user-interface: 3.25
  2. documentation: 3.22
  3. tools (e.g. GUI): 3.01
  4. more features: 2.41
  5. portability: 2.34
  6. performance: 2.28
  7. community (mailing list): 1.70
  8. localization (translation): 1.65
  9. community (IRC): 1.65

Obviously, since user-interface and documentation are the areas that need more improvement, that’s what Git v2.0.0 should have focused, right?

History

I already mentioned this in the other post, but I’ll do it again.

First of all, Git as a long history of never breaking user expectations (other than the Git v1.6.0 fiasco (which changed all the git-foo commands with ‘git foo’)), and as such a lot of thought is devoted into ways to minimize changes in behavior, or even how to avoid it completely. Perhaps too much care is devoted into this.

The preparation for Git v2.0.0 started more than three years ago with a mail from Junio C Hamano, asking for developers to submit ideas for changes that normally would not happen because they break backwards compatibility, he invited us to think as if “we were writing Git from scratch”. This big release that would break backwards compatibility was going to be named “1.8.0″ and people started to submit ideas for this important release. Eventually too much time passed, the versioning scheme changed, v1.8.0 was released, and the changes proposed for v1.8. slipped into what is now v2.0.

Since no substantial changes in behavior happened since v1.0, it would follow that v2.0 was an important release, and a good opportunity to gather all the ideas about what needs to change in Git. However, seemingly out of nowhere, without any discussion or even a warning, the maintainer tagged v2.0.0-rc0, and therefore all the features that were not already merged couldn’t be merged for v2.0.0.

Thus v2.0.0 was destined to have a small list of changes, and that’s how it remained.

What could have changed

The following is a list of things that I argued should be part of Git v2.0.0.

git update

I wrote a whole post about the issue, but basically, ‘git pull‘ is broken for the most common use-case: update the current branch.

This is a known issue that has been discussed over and over, and everyone agrees that it is indeed an issue, and something needs to be done to fix it.

There have been different proposals, but by far the most comprehensive and simple is to add a new ‘git update‘ command.

This way when you want to merge a pull request, you do ‘git pull‘, and when you just want to update the current branch, you do ‘git update‘, which by default would barf if there’s divergence between your local branch (e.g. ‘master’), and the remote one (e.g. ‘origin/master’), instead of doing a merge by default. This should decrease substantially the amount of “evil merges”, merges that happened by mistake, usually by somebody that is not familiar with Git.

The patches are relatively new, but the command is simple, so there isn’t much danger of screwing things up.

The publish tracking branch

I also wrote a blog post about this; basically Git’s support for triangular workflows is not the best.

A triangular workflow is when you pull from one location (e.g. central repo), and push to another (e.g. personal GitHub fork). If you are using upstream tracking branches (you should), you have to make a decision where you set your upstream; the central repo, or your personal one. Depending on which you use, is the advantages you get, but you cannot have it all.

But with the publish tracking branch you can have all the advantages.

I’ve been cooking these patches for a long long time and I have to say this is one essential feature for me, and they patches work perfectly.

Support for Mercurial and Bazaar

Support for Mercurial and Bazaar repositories has been cooking for a long time in the “contrib” area (you can both pull and push). At this point in time the code is production-ready, and it was already graduated and merged to be released in Git v2.1.

However, the maintainer suddenly changed his mind and decided it would be better to distribute them as third party tools. He didn’t give any valid reason and clearly didn’t think it through, but they are now separate.

The code is already widely used (git-remote-hg, git-remote-bzr), and could easily be merged.

Use “stage” instead of “index”

Everybody agrees that “index” is a horrible name for Git’s “staging area”, however, nobody has done much to fix the problem.

One first step is to replace all the –cached and –index options with –staged and –no-work, which are much simpler to understand.

Another step is to add a ‘git stage‘ command that acts as a helper to work with the staging area: ‘git stage add‘, ‘git stage diff‘, ‘git stage reset‘, ‘git stage rm‘, ‘git stage edit‘, and so on.

The patches are very straight-forward.

Default aliases

Virtually every version control system has default aliases (e.g. hg co, cvs ci, svn di, etc.), except Git.

Adding default aliases is very simple to do and only brings advantages. If you don’t like the default alias, you can override it.

Patches here.

Shoulda coulda woulda

It would have been great if you could just do ‘git clone hg::mercurial-repo‘ without installing anything extra, if everybody could start using ‘git update‘ instead of ‘git pull‘, if you could do ‘git stage diff‘, or ‘git reset --stage‘. Also, if triangular workflows were properly supported.

Unfortunately that’s not the case, and Git v2.0.0 is already released, and there isn’t much to be excited about.

You might think “perhaps for Git v3.0″ (which could happen in two years, or ten, how knows), but if the past is any indication of the future, it won’t happen, specially since I’ve given up on all these patches.

The fact of the matter is that in every release of Git, there is only one focus: performance. Despite the fact that it’s #6 in the list of concerns of users, Git developers work on this because that’s their area of expertise, because it’s fun for them, and because they get paid to do so. There are occasional new features, and a bit of portability now and then, but for the most part Windows support is neglected in Git, which is why the msysgit project was born.

The documentation will always remain cryptic, because for the developers, it’s not cryptic, it’s very clear. And the user-interface will never change, because the developers don’t like change.

If you don’t believe me look at the backwards-incompatible changes in Git v2.0.0, or in fact, try to think back to the last time Git changed anything. Personally other than the git-foo -> ‘git foo’ change in v1.6.0 (which was horribly handled), I can’t think of anything but minor changes.

Anyway, you can use all these features I listed today (and more) if you use git-fc instead of Git. It is my own fork of Git that has all the features of Git, plus more.

Is there anything in that list that I missed? Do you think Git v2.0.0 has enough changes as it is?

Git v2.0.0, what changed, and why should you care

Git v2.0.0 is a backward-incompatible release, which means you should expect differences since the v1.x series.

Unless you’ve been following closely the Git mailing list, you probably don’t know the history behind the v2.0 release, which started long time ago (more than three years). It all started with a mail from Junio C Hamano, asking for developers to submit ideas for changes that normally would not happen because they break backwards compatibility, he invited us to think as if “we were writing Git from scratch”. This big release that would break backwards compatibility was going to be named “1.8.0” and people started to submit ideas for this important release. Eventually too much time passed, the versioning scheme changed, v1.8.0 was released, and the changes proposed for v1.8. slipped into what is now v2.0.

Parts of v2.0 have been already been deployed one way or the other (for example if you have configured ‘push.default = simple’), but finally today we have v2.0 final. And here are the big changes that we got.

‘git push’ default has changed

Here’s what the release notes say:

When "git push [$there]" does not say what to push, we have used the
traditional "matching" semantics so far (all your branches were sent
to the remote as long as there already are branches of the same name
over there).  In Git 2.0, the default is now the "simple" semantics,
which pushes:

 - only the current branch to the branch with the same name, and only
   when the current branch is set to integrate with that remote
   branch, if you are pushing to the same remote as you fetch from; or

 - only the current branch to the branch with the same name, if you
   are pushing to a remote that is not where you usually fetch from.

You can use the configuration variable "push.default" to change
this.  If you are an old-timer who wants to keep using the
"matching" semantics, you can set the variable to "matching", for
example.  Read the documentation for other possibilities.

Is that clear? Given the bad track record of Git documentation it wouldn’t surprise me if you didn’t get what this chunk of text is trying to say at all. Personally I find it much easier to read the code to figure out what is happening.

So let me try to explain. When you type ‘git push’ (without any arguments), Git uses the configuration ‘push.default’ in order to find out what to push. Before ‘push.default’ defaulted to ‘matching’, and now it defaults to ‘simple’.

The ‘matching’ configuration essentially converts ‘git push‘ into ‘git push origin :‘, which means push all the matching branches, so if you have a local ‘master’, and there’s a remote ‘master’, ‘master’ is pushed; if you have a local and remote ‘fix-1′, ‘fix-1′ is pushed, if you have a local ‘ext-feature-1′, but there’s no matching remote branch, it’s not pushed, and so on.

The ‘simple’ configuration pushes a single branch instead, and it uses your configured upstream branch (see this post for a full explanation of the upstream branch), so if your current branch is ‘master’, and if ‘origin/master’ is the upstream of your ‘master’ branch, ‘git push’ will basically be the same as ‘git push origin master‘, or to be more specific ‘git push origin master:master‘ (the upstream branch can have a different name).

Note: If you are not familiar with the src:dst syntax; you can push a local branch ‘src’ and have the ‘dst’ name on the server, so you don’t need to rename a local branch, you can do ‘git push origin foobar:feature-a’, and your local branch “foobar” will be named “feature-a” on the server. This has nothing to do with v2.0.

However, if the current branch is ‘fix-1′ and the upstream is ‘origin/master’, ‘git push’ will complain that the name of the destination branch is not the same, because it doesn’t know if to do ‘git push origin fix-1:master‘ or ‘git push origin fix-1:fix-1‘.

Additionally if you do ‘git push github‘ (not the remote of your upstream branch), Git will simply use the name of the current branch, essentially ‘git push github fix-1‘ (‘fix-1′ being the name of the current branch).

This mode is anything but simple to describe. But perhaps the name is OK, because you can expect it to “simply work”.

Would I care?

If you don’t type ‘git push’, but instead specify what and where to push… you don’t care.

If you have configured ‘push.default’ already, which most likely you already did, because otherwise you will be getting the following annoying message all the time since two years ago… you don’t care.

warning: push.default is unset; its implicit value is changing in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the current behavior after the default changes, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

When push.default is set to 'matching', git will push local branches
to the remote branches that already exist with the same name.

In Git 2.0, Git will default to the more conservative 'simple'
behavior, which only pushes the current branch to the corresponding
remote branch that 'git pull' uses to update the current branch.

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

So, most likely you don’t care.

‘git add’ in directory

Here’s what the release notes say:

When "git add -u" and "git add -A" are run inside a subdirectory
without specifying which paths to add on the command line, they
operate on the entire tree for consistency with "git commit -a" and
other commands (these commands used to operate only on the current
subdirectory).  Say "git add -u ." or "git add -A ." if you want to
limit the operation to the current directory.

Although this is a clearer explanation, it’s not very clear what is changing, so let me give you can example.

Say you have modified two files, ‘README’ and ‘test/basic.t’, then you go to the ‘test’ directory, and run ‘git add -u‘, in pre-v2.0 only ‘test/basic.t’ will be staged, in post-v2.0 both files will be staged. If you run the command in the top level directory, nothing changes.

Would I care?

If you haven’t seen the following warning while doing ‘git add -u‘ or ‘git add -A‘, or if you don’t even use those options, you are fine.

warning: The behavior of 'git add --update (or -u)' with no path argument from a
subdirectory of the tree will change in Git 2.0 and should not be used anymore.
To add content for the whole tree, run:

  git add --update :/
  (or git add -u :/)

To restrict the command to the current directory, run:

  git add --update .
  (or git add -u .)

With the current Git version, the command is restricted to the current directory.

‘git add’ adds removals

Here’s what the release notes say:

"git add " is the same as "git add -A " now, so that
"git add dir/" will notice paths you removed from the directory and
record the removal.  In older versions of Git, "git add " used
to ignore removals.  You can say "git add --ignore-removal " to
add only added or modified paths in , if you really want to.

Again, it should be clearer with an example. Say you removed the file ‘test/basic.t’ and added a new file ‘test/main.t’, those changes are not staged, so you stage them with ‘git add test/’, pre-v2.0 ‘test/basic.t’ would remain tracked, post-v2.0, ‘test/basic.t’ is removed from the stage.

Would I care?

If you haven’t seen the following warning while doing ‘git add‘, you are fine.

warning: You ran 'git add' with neither '-A (--all)' or '--ignore-removal',
whose behaviour will change in Git 2.0 with respect to paths you removed.
Paths like 'test/basic.t' that are
removed from your working tree are ignored with this version of Git.

* 'git add --ignore-removal ', which is the current default,
  ignores paths you removed from your working tree.

* 'git add --all ' will let you also record the removals.

Run 'git status' to check the paths you removed from your working tree.

The rest

The "-q" option to "git diff-files", which does *NOT* mean "quiet",
has been removed (it told Git to ignore deletion, which you can do
with "git diff-files --diff-filter=d").

Most people don’t use this command, thus don’t care.

"git request-pull" lost a few "heuristics" that often led to mistakes.

Again, most people don’t use this command, which is mostly broken anyway.

The default prefix for "git svn" has changed in Git 2.0.  For a long
time, "git svn" created its remote-tracking branches directly under
refs/remotes, but it now places them under refs/remotes/origin/ unless
it is told otherwise with its "--prefix" option.

If you don’t use ‘git svn’, you don’t care. If you don’t see a difference between ‘trunk’ and ‘origin/trunk’, you don’t care.

tl;dr

You probably don’t care about these backward-incompatible changes. Sure, Git v2.0.0 received a good dosage of new features and bug-fixes, but so did v1.9.0, and all the versions before.

Given the fact that Git v2.0.0 has been cooking for three years, I think it’s a big missed opportunity that nothing really changed, specially given that in previous user surveys people have said the user-interface and documentation needs to improve, and there have been patches to try to do so. In a separate post I discuss what I think Git v2.0.0 should have included.

Is ‘git pull’ broken? If so, what’s the fix?

Is ‘git pull’ really broken? I know what you are thinking; such a pervasive and basic command cannot possibly be broken. Unfortunately, it is.

It is not some marginal issue, many experienced Git users avoid ‘git pull’ and even urge newcomers to avoid using that command, there’s many sites that encourage you to not use the command, and there have been a lot of threads on the mailing list about the issue (Pull is mostly evil, A failing attempt to use Git in a centralized environment), the maintainer, Junio C Hamano has accepted there’s a big problem, even Linus Torvalds agreed something needs to change.

In order to identify the problem we first need to define the two main ways ‘git pull’ is used.

Pull requests

One way ‘git pull’ is used, is to integrate pull requests into the mainline. For example in the Linux kernel, the DRM maintainer sends a pull request to Linus Torvalds, saying basically:

The following changes are available in the git repository at:

git://people.freedesktop.org/~airlied/linux drm-next

So Linus can just do:

git pull git://people.freedesktop.org/~airlied/linux drm-next

In this mode ‘git pull’ actually works fine, which is not too surprising, since it’s the main thing Linus Torvalds does.

However, this is not the way most people use ‘git pull’.

Update branch

What most people do is for example update their local ‘master’ branch, to the remote ‘origin/master’ branch. Essentially doing ‘git fetch origin’, ‘git merge origin/master’.

However, that’s not exactly what most people actually want to do.

If you don’t have any changes of your own in ‘master’, then yes, ‘git pull’ does what you want, but if you do have changes, and thus the branches have diverged, then ‘git pull’ will create a new merge commit. This might or might not be what you want, but the majority of Git newbies do not want that, or rather, the team they contribute to don’t want those “evil merges”. Unfortunately these newbies don’t know what they are doing, and Git is not making it easier.

So you end up with something like this:

git-pull

Most likely what the team wants is that the local chances are rebased on top of the remote ones, but if they want a merge, they want it the other way around, that is: merge the local changes to the remote ones, as if a topic branch was merged.

git-pull-fix

A merge with this order of parents has many advantages, including a clearer history, however, it’s not possible to do that with ‘git pull’, so you have to do ‘git fetch’, create a new branch, switch to the master branch, merge the other branch, and finally remove the other branch. It’s not straight-forward at all.

It is this mode that is broken, and that’s the reason many people try to avoid ‘git pull'; it rarely does what you want by default.

The solution

There have been many solutions proposed, however, there are many many use-cases to consider, and a solution that takes them all into consideration for the future is not easy to find.

The best solution that seems to accommodate all present use-cases and future ones is the introduction of a new command: ‘git update‘.

By default this command will complain if the branches have diverged, so you have to either do ‘git update --rebase‘ or ‘git update --merge‘, this ensures that newbies aren’t going to do “evil merges” by mistake.

Also, when you do a ‘git update --merge‘ the order of the parents is reversed, which means it appears you are merging ‘master’ to ‘origin/master’, and not the other way around as it happens with ‘git pull’, which means it appears as if you are merging a topic branch, which is what most people want.

git-update

There are many many more advantages to this new command, but probably too subtle to mention in this post.

When will this be ready?

Probably never. I sent a summary of the issues and the solution to the mailing list, which addresses all the use-cases that were discussed. I have the required patches with tests and documentation on my personal branch, and I’ve been using this new command for a while now.

Why isn’t this picked? Maybe it’s because none of the core developers experience these issues. Maybe because they don’t use ‘git pull’ in the second form. Who knows.

The fact is that there is no interest to get this fixed, even though the issue has been acknowledged, so it’s not likely to be fixed any time soon.

So what can you do about it? The best thing you can do right now is simply avoid using ‘git pull’. Additionally, you might want to instruct your fellow coworkers to avoid unsing it as well, specially the ones that are not very familiar with Git.

Also, you might want to use my fork, git-fc, which does have the ‘git update‘ command, which works better than ‘git pull‘ even when there’s no branch divergence, and when there is, ‘git update --merge‘ is also superior, because the order of the parents is right.