Los salarios en México y las malas estadísticas del INEGI

(This one is in Spanish since it mainly interests Mexican people. I might translate it later)

Yo crecí en México en lo que consideré la clase media, pero después viví en E.U.A y en Europa, por lo que tal vez mi concepción de las diferentes clases dejó de estar apegada a la realidad. Yo pensaba que un salario mensual de $20,000 (una cantidad módica en países del primer mundo) se consideraría clase media, y cuando una persona me dijo que el salario promedio era $8,000 no lo creí, y así comenzó la tarea de buscar los salarios de las diferentes clases en México que resultó no ser tan fácil como parecía.

ENIGH

El INEGI realizó una encuesta (Encuesta Nacional de Ingresos y Gastos de los Hogares) que supuestamente contiene estos datos, sin embargo, estos son los resultados:

I $1,674
II $3,033
III $3,977
IV $4,900
V $5,959
VI $7,183
VII $8,800
VIII $11,313
IX $16,012
X $42,120
* Trimestral.

En teoría ahí está toda la información y la tarea está hecha, sin embargo hay un problema al tratar de entender estos números. La tabla se titula “ingreso corriente total promedio trimestral per cápita en deciles de personas”. Ahí vemos los diez grupos, pero un decil se define como: “cualquiera de los nueve valores que dividen los datos ordenados en diez partes iguales”; nueve valores, y en la tabla hay diez, esos números no son deciles.

Básicamente, la tabla es completamente inútil. Si una persona gana $10,000 trimestrales, ¿Está en el grupo VII o VIII? El número que necesitamos para saber eso no está en esta tabla.

A mano

Afortunadamente el INEGI provee los datos originales, y gracias a mis habilidades de programación pude hacer las manipulaciones necesarias para sacar los datos de interés. Desafortunadamente en el proceso me di cuenta que las tablas del INEGI están llenas de errores, así que tuve que hacer los cálculos por mi cuenta.

10% $632
20% $959
30% $1,234
40% $1,541
50% $1,907
60% $2,358
70% $3,012
80% $4,024
90% $6,476

Estos números sí son deciles, y es fácil saber a qué grupo perteneces. Si tu salario mensual es de $3,000 pesos (y no tienes familia), eso significa que ganas más que el 70% de la población (grupo VII). Curiosamente es fácil ver la media (50%), que es $1,907, es decir: 50% de la población gana menos de $1,907, 50% gana más.

De forma similar podemos dividir la población en tres grupos:

baja menos de $1,329
media de $1,329 a $2,777
alta más de $2,777

Parece difícil de creer, pero estos números se pueden comprobar fácilmente. El tamaño de la muestra del INEGI son 19479 personas, con un filtro para ver cuántas personas ganan más de $2,777, el resultado es 6491 (33.32307%). Cabe mencionar que los números son per cápita. Es decir, si ganas $8,000 pesos y mantienes a una familia de 4, cada persona se considera que recibe un ingreso de $2,000 pesos, más detalles abajo.

Los números para la clase súper rica son:

91% $6,926
92% $7,433
93% $8,028
94% $8,750
95% $9,674
96% $10,853
97% $12,713
98% $15,858
99% $20,724

Promedios

Los promedios pintan un panorama muy diferente, por ejemplo; la clase alta es más de $2,777, sin embargo hay mucha diferencia entre un ingreso de $3,000 y $300,000 pesos, pero ambos están en el mismo rango y al promediar toda la gente de este rango, el resultado está muy lejos. El promedio de la clase alta (top %33) es de $6,795, el promedio del top 90% es de $13,115, y el promedio del top %1 es de $37,644.

Por eso son peligrosos los promedios. A pesar de que el promedio de todo el país es de $3,187, el promedio del bottom 99% es de $2,839, pero al juntarlo con el top %1 de $37,644 se eleva bastante (2839 * 0.99 + 37644 * 0.01).

Detalles

Hay muchos detalles de estos números, pero en general es el ingreso de todo el hogar: salarios, utilidades, rentas, ganancias de negocios propios, y transferencias, dividido por el número de integrantes. Por alguna razón el aguinaldo no lo cuentan.

Si agregamos el aguinaldo e ignoramos los miembros que no reciben ingresos (e.g. niños), el resultado es un poco más positivo.

10% $895
20% $1,416
30% $1,946
40% $2,533
50% $3,189
60% $3,983
70% $5,091
80% $6,912
90% $10,480

Desigualdad

Existe un número que se usa para medir la desigualdad de forma rápida, el coeficiente Gini. Aunque no es perfecto es el más utilizado, y no deja de ser útil. Una sociedad perfectamente igual tendría un valor de 0%, mientras que una totalmente desigual 100%. Alemania, un país con mucha igualdad social tiene un valor de 27%, Estados Unidos, conocido por su desigualdad, 45%. Según el INEGI México tiene un valor de 48%, pero según mis cálculos el valor es 52%.

Lorenz Curve

Probablemente la forma más fácil de visualizar la increíble desigualdad que hay es graficando todos los ingresos de la muestra:

graph

Errores del INEGI

Ya mencioné el hecho de que para empezar su tabla de deciles no contiene deciles, al parecer contiene promedios de los diversos grupos, que como ya vimos los promedios son peligrosos por que pueden pintar las cosas más positivas de lo que son. Pero aún así los números no cuadran.

Además hay discrepancias muy curiosas. Por ejemplo, hay dos tablas ligadas; ‘ingresos’ y ‘concentradohogar’, la segunda como su nombre lo dice es un concentrado.

Aquí hay un ejemplo simplificado de la tabla de ‘ingresos’:

folioviv foliohg numren clave ing_tri
0860298316 1 01 P043 4499.99
0860298316 1 01 P071 24173.8

Aquí vemos dos ingresos de una persona, P043 es un beneficio de PROCAMPO, P071 es la clave de negocios agrícolas.

Los datos correspondientes a el mismo hogar en la tabla de ‘concentradohogar’:

folioviv foliohg ing_mon agricolas bene_gob
0860298316 1 175412.92 170912.93 4499.99

Vemos que los beneficios del gobierno están correctos, pero los ingresos por negocios agrícolas son 7 veces la cantidad original. ¿De dónde salió ese número? La descripción de la tabla dice que esa columna se genera sumando los ingresos de clave P071 o P078, y como ya vimos arriba, sólo hay dos ingresos. Buscando el número 175412.92 en la tabla de ingresos no regresa nada, así que no parece haber ningúna razón para la existencia de ese número.

La mayoría de los números parecen estar correctos, pero sí existen discrepancias, tanto positivas como negativas. El total de discrepancia son $4,585,356, pero como unas son negativas la diferencia neta es de $2,685,778.

Mapa de la muestra

Update: Mucha gente preguntó que de dónde sacaron las encuestas, aquí hay un mapa para visualizarlo. Por alguna extraña razón en el centro de Tabasco es donde tomaron más información. Se ve muy evidente que a el norte no le prestaron mucha atención.

Mapa

Conclusión

No queda más que aceptar que estamos mucho peor de lo que pensaba, no solo en cuestión de salarios, pero desigualdad, e incluso disponibilidad de la información. Si el organismo dedicado a proveer datos estadísticos no sabe ni lo que es un decil, realmente no se puede esperar mucho del futuro.

Nota: Estos números son confiables sólo si la muestra del INEGI es realmente aleatoria. Dado que ya detecté muchos errores en sus tablas, es posible que la muestra del INEGI también la hayan hecho mal. Desafortunadamente no hay mejores datos, así que hasta donde yo sé, estos son los números más confiables.

Método

Cualquier persona puede verificar los datos si le interesa. Las bases de datos se encuentran en liga de microdatos, ‘ingresos’ es “Ingresos y percepciones financieras y de capital de cada uno de los integrantes del hogar”, y ‘concentradohogar’ es “Principales variables por hogar”. El formato debe ser CSV, una vez extraídos los archivos se corre mi script que genera los ingresos corregidos.

El resultado es este archivo, cada renglón corresponde a un hogar. La columna “ingreso” contiene el ingreso monetario por hogar trimestral, y “ingreso_pc” es lo mismo pero per cáptia.

Se puede importar con Excel y hacer las operaciones ahí, pero yo utilicé un software estadístico llamado R.

data = read.csv("corregido.csv")
ingreso = data$ingreso_pc / 3

# deciles
quantile(ingreso, probs = seq(1/10, 9/10, 1/10))

# terciles
quantile(ingreso, probs = seq(1/3, 2/3, 1/3))

# ricos
quantile(ingreso, probs = seq(91/100, 99/100, 1/100))

# promedio
mean(ingreso[ingreso > 2777])

# gini
library(ineq)
ineq(ingreso)

Best TV series of all time

After watching a lot of TV series, here is my list of what I consider the best TV series of all time. It’s mostly based on this list by IMDB, but also my personal preferences.

1. Game of Thrones

This one doesn’t really need an explanation, it’s the best TV series of all time by far. Not only it’s based on an amazing series of books, but it has an unparalleled production value. Each character is incredibly rich and complex, and there’s scores of them, many which will die sooner than you would expect.

It’s a huge phenomenon and if you haven’t watched it already, you should be ashamed and do it now.

Yes, it’s fantasy, but only the right amount. Paradoxically it is more realistic than most shows; there is no such thing as good or evil, just people with different points of view, motivations and in different circumstances. Good people die, bad people win, honor can kill you, a sure victory can turn into crap. And just when you think you know what will happen next; your favorite character dies.

2. Breaking Bad

Breaking Bad is the story of a high school teacher going, as the title suggests, bad. Step by step a seemingly average family man starts to secretly change his life. While at first you might think you would do the same morally dubious actions, eventually you will reach a point where you will wonder if the protagonist has gone too far.

It is incredibly rewarding to see how a teacher of chemistry, a man of science, would fare in the underworld of drug cartels. His knowledge and intelligence come in handy in creative ways to find solutions to hard problems.

His arrival to the scene doesn’t go unnoticed, and a host of characters are affected by this new player, and the chain reaction that follow is interesting to see to say the least.

3. The Wire

The Wire is simply a perfect story. It is local, and although you might not relate with most of the characters; it feels very real. The politics, the drama, the power dynamics, the every day struggles, everything is dealt with masterfully.

The characters are rich, some drug dealers are human, some politicians monsters, street soldiers incredibly smart. This show would give you insight into why a clean police detective would choose not to investigate a series of (possible) murders, why breaking the law can be sometimes good, and why in general violence is a much deeper problem that won’t be solved by simply putting some bad people in jail.

4. True Detective

What are Matthew McConaughey and Woody Harrelson doing in a TV series? History. True Detective is anything but a typical show. It might start slow, and if you are not keen in admiring the superb acting that shows in every gesture, you might find it boring, but sooner or later it will hit you like a truck.

This is not CSI, do not expect easy resolutions to multiple cases, in fact do not expect any resolution at all. The show is about the journey of investigation and everything that goes along with it, including the political roadblocks, and the toll it has on the people doing it (officially or unofficially), and their loved ones.

Also, thanks to the beloved character played by McConaughey (Rust); we are greeted with a heavy dose of philosophy, human relations, and in general; life.

6. Last Week Tonight with John Oliver

John Oliver is relatively new to the world of comedy, and as many students of The Daily Show, he graduated to be one of the best. Now he has his own political/comedic show dealing with subjects that actually matter, weekly, and deals with them masterfully, and at length.

Since the show is in HBO, it is not afraid of the reprisal of advertisers, and fiercely attacks commercial companies (as any real news show should) when they do something bad (which is very often).

The first season became and instant hit, and since all the important segments are available in YouTube for free, and are from 10 to 30 minutes in length, you really have no excuse not to watch it. In fact, do it now. Seriously.

6. Sherlock

Imagine the most egotistical asshole you know, add a big dose of raw pure genius, spray a chunk of autistic disregard to what anybody else thinks, disinterest in money, love, or hobbies. Finally add a side-kick who is well mannered, polite, and in general: normal. Use this concoction to solve crimes, and what you have is Sherlock.

Sherlock is a very uncommon show, starting from the fact that each episode feels more like a movie. so if you don’t want to watch a movie, perhaps you shouldn’t watch an episode of Sherlock either.

The show is not without its flaws, and sometimes caricaturesque endings–as I said, it’s different–but it is definitely worthwhile.

7. The Sopranos

Can you ever sympathize with a psychopath? After watching The Sopranos you might. The show follows the life of Tony Soprano, the boss of a New Jersey-based mafia. As you would expect, there will be violence, betrayals, and a constant supply of lies. However, you would also experience Tony’s human side, including caring for a family of ducks, and his constant duel with his psychologist.

Can you actually get better if you can’t even tell your psychologist that you killed one of your closest friends? How do you take care of your friend’s family with a straight face? These are the problems Tony faces all the time, not to mention trying to raise a couple of teenagers, and keep a marriage together which is surrounded by mystery.

And can you even blame him for being the way he is after you learn about his mother and father? Can a monster have a conscience?

After watching the show a lot of these questions will have clearer answers.

8. Rick and Morty

Rick and Morty is a cartoon, but it’s deep, funny, witty, definitely not for children. It centers around an old mad drunk scientist, and his grandson companion (which is not so smart). Together they have so many ridiculous adventures, so crazy that the mere premise of them will make you laugh.

Yet, despite the overblown adventures they have (due to the impossibly advanced technology the old man has developed), the show is at times deep and will leave you thinking with a renewed perspective about life, family, love, priorities, the human race and its place in the universe, and all the things that could have been, and might be… In a parallel universe.

9. Firefly

Cowboys in space. Star Wars but better. Relatable, warm and interesting characters. Renegades, an empire, the wild outskirts of the galaxy in a distant future that is so different, yet feels so familiar.

Easily the best science fiction series of all time, unfortunately there’s only one season, which is why Firefly became so much of a cult, and a phenomenon. There’s a movie (not as good), and even a documentary about the phenomenon. It is really something else.

There is only one drawback; after watching it, you will become one of us and wonder–why the f*ck did they cancel this wonder?

10. Better Call Saul

Better Call Saul is a spin-off of Breaking Bad. A good honest lawyer in an extremely precarious situation tries his best to succeed with integrity, but it turns out it’s not so easy to achieve that.

The show is very recent, and the first season hasn’t finished yet, so there is really not much more to explain, except that it is dark and intense.

So why is it in the list of the best tv shows of all time? I just know :)

Understanding the Enigma machine

I was fascinated by the movie The Imitation Game, not just because it brings awareness of a great man that advanced our civilization tremendously, and the great injustice he suffered, but also because it presents the study of cryptanalysis, something that most people don’t even know it exists, but it’s incredibly important when dealing with information, specially in our modern day and age.

However, the Enigma machine to me was simply that; an enigma. I’m not a mechanic, so you put that thing in front of me, and it would take me forever to understand what it does, if I ever manage to find the interest to do so. I was happy thinking it was a magic box.

That was until Albert Still decided to write the code of the machine in Ruby (my favorite computer language), which he explained in a blog post. I’m a programmer, code I understand, and this was 30 lines, in a minute I understood the machine (literally).

I was blown away by the simplicity of it, and I thought: hey! anybody can understand this. And ultimately that’s the beauty of cryptography; it doesn’t matter if you know exactly how the algorithm works; you still cannot decrypt the message. This is what security today relies on; everybody knows the algorithms running in your web browser, yet you are secure accessing your bank account, because those algorithms are cryptographically secure. The phrase “cryptographically secure” might not mean much to most people, but it’s really important.

I will try to explain how the Enigma machine works in simple terms, if you are a programmer, you might be better off just reading the code.

The reflector

You don’t need to understand this code, but it might help to understand the algorithm.

$reflector = Hash[*CHARS.to_a.shuffle]
$reflector.merge!($reflector.invert)

reflector

So, what this means is that we pair each one of the 26 characters (A to Z) with another one randomly, so for example W is paired with L, which means that whenever we find a W, we switch it with an L, and when we find and L, we switch it with a W.

If we run this algorithm with the text HI, we get RF (H=>R, I=>F), pretty simple. The interesting thing is what happens when we feed this back to the algorithm; it becomes HI again (R=>H, F=>I). This is why it’s called a reflector.

This is actually so simple that you don’t even need a machine to do the conversion, you can even do it manually by looking to a piece of paper with the mapping. And there’s nothing cryptoraphic about this; if the Enigma machine only had this algorithm, you only need to steal one machine, and you could decypher every message immediately. It’s not cryptographically secure at all.

You intercept the message RFJWNH, you feed this to the machine, and you get HITLER. And that’s it.

Let’s put a cryptographic value to this algorithm: 0. It’s useful, but not for cryptographic reasons.

The rotor

Let’s jump to something more complicated.

$rotor = Hash[CHARS.zip(CHARS.to_a.shuffle)]

rotor_0

This time each character gets another character, randomly, and there’s no reciprocity (A=>K, K=>V). This is the twist; here the rotor starts with K, however it could be configurable, so let’s say, tomorrow it starts with N, then the values associated rotate, and you get this:

rotor_1

Now it’s not so easy any more. You receive the message DGOKIP, but you can’t do anything with that unless you know which was the first value, or “key” (in this case it was E). The only alternative you have is to do what is called a brute force attack; you try every possibility. Fortunately there are only 26 possibilities, so soon enough you will stumble with the key E, and unlock the message: HITLER.

rotor_2

The value of this is: 26. It’s not much, but it’s better than zero.

The rotor, part two

We’ve managed to make things a bit difficult for our cyrptoanalysists, however if say, they notice the character G appearing too often in today’s messages, they’ll assume that perhaps G is actually a vowel, we need to make things mote difficult for them.

As right know, the message III would be encrypted into GGG; that’s too easy. Instead, what we can do is rotate the first part of the rotor each time a character is processed, so III, becomes GDM (I=>G, rotate, I=>D, rotate, I=>M)

rotor_2
rotor_3
rotor_4

This doesn’t really increase the possibilities to test, but makes their job harder.

The rotor, part three

Since the thing is already rotating it would make sense to start with something other than A. This starting position is also part of the key, and again, you need to get it right in order to decrypt the message properly.

So you have 26 ways to configure the rotor, and 26 ways to start it, now the value is: 676. This would take quite a bit of time to go through each and every possibility now.

The plugboard

This is where the fun begins.

$plugboard = Hash[*CHARS.to_a.shuffle.first(20)]

plugboard

We take 20 random characters and we pair them to each other. In a way, this is similar to the reflector, except this is configurable, and this time we are not picking 1 out of 26, the combinations are many more than that.

The formula to find the number of ways to choose m pairs out of n objects is: n! /((n-2m)! m! 2m). We are picking 10 pairs out of 26 objects, so: 26! / (6! 10! 2^10). The result is: 150,738,274,937,250.

That would take a bit more to test :/

More rotors

Each rotor needed 676 tries to brute force, why not add two more? That moves us up to 308,915,776.

While we are at it, make the order if the rotors part of the daily key, that’s 3 * 2 * 1: 6 possibilities.

And why not add two more to pick from, so every day you pick 3 out of 5; 5 * 4 * 3: 60 possibilities.

In total, that’s 18,534,946,560 just from the rotors.

And hey, make them rotate at different speeds to make the job of the analysts even harder.

Bring it home

Put everything together, and the process goes like this:

Enigma machine

  1. Plugboard
  2. Rotor 1
  3. Rotor 2
  4. Rotor 3
  5. Reflector
  6. Rotor 3
  7. Rotor 2
  8. Rotor 1
  9. Plugboard

So, here is a simple message: YWXRVH. In order to decrypt it you need the full key: the whole plugboard, the configuration of the rotors, and their starting position. Even if I tell you the original message was HITLER, you would still need to do a lot of work.

For the record, this was the key used to generate that message:

I V III, BFR, SD HY GM EB UO LJ WZ QT AC FR, OIZ

If you try every key until you find it, you potentially would need 2,793,925,870,508,516,103,360,000 tries. Clearly, pure brute force is not the way to solve the problem :/

This is just the machine itself, on top of that there were many protocols to cypher the message even more, but let’s just leave it at that.

Back to the present

That is the power of cryptography; understanding the machine, understanding the algorithm gives you absolutely no leverage, that is the easy part. You are supposed to understand it, and still be unable to crack it.

The algorithm in Enigma is puny compared to modern algorithms which are incredibly complex and with a lot of research behind them. That’s what keeps the communication to your bank secure, and even though most people don’t know it, you can use these algorithms to send secure messages to anyone that in theory not even the government using the most powerful supercomputers can decrypt.

I think it’s time we stop saying “this is not rocket science”; rocket science is easy, we should be saying “this is not cryptanalysis”.

The white and gold dress, and the illusion of free will

At first I didn’t really understand what was all the fuzz about, the dress was obviously white and gold, and everybody that saw it any other way was wrong, end of story. However I saw an article in IFLScience that explained why this might be an optical illusion, but I still thought I was seeing it right, the other people were the ones getting it wrong. Then I saw the original dress:

Original dress

#TheDress

Well, maybe it was a different version of the dress, or maybe the colors were washed away, or maybe it was a weird camera filter, or a bug in the lens. Sure, everything is possible, but maybe, I was just seeing it wrong.

I’ve read and heard a lot about cognitive science and the more we learn about the brain, the more faults we find in it. We don’t see the world as it is, we see the world as it is useful for us to see the world. In fact, we cannot see the world as it is, in atoms and quarks, we cannot, because we don’t even fully understand it yet. We see the world in ways that managed to get us where we are, we sometimes get an irrational fear of the dark and run quickly up the stairs in our safe home even if we know there can’t possibly be any tigers chasing behind us, but in the past it was better to be safe than sorry, and the ones that didn’t have that fear gene are not with us any more; they got a Darwin award.

I know what some people might be thinking; my brain is not faulty! I see the world as it truly is! Well, sorry to burst your bubble, but you don’t. Optical illusions are a perfect example, and here is one:

Optical illusion

If you are human, you will see the orange spot at the top darker than the one at the bottom, why? Because your brain assumes the one at the bottom is a shadow, and therefore it should be darker. However, they are exactly the same color (#d18600 in hex notation), remove the context, and you’ll see that, put the context back, and you can’t see them the same, you just can’t, and we all humans have the same “fault”.

This phenomenon can be explained by the theory of color constancy, and these faults are not limited to our eyes, but ears, and even rational thinking.

So, could the white and gold vs. blue and black debate be an example of this? The argument is that the people that see the dress as white and gold perceive it to be in a shadow behind a brightly lit part of a room, the people that see it as blue and black see it washed in bright light. Some people say they can see as both; some times white, some times blue.

XKCD

I really did try not to see it in a shadow, but I just couldn’t, even after I watched modified photos; I just saw a white and gold dress with a lot of contrast. I decided they were all wrong, no amount of lighting would turn a royal blue dress into white.

But then I fired GIMP (the open version of Photoshop), and played around with filters. Eventually I found what did the trick for me, and here you can see the progress:

So eventually I managed to see it, does that mean I was wrong? Well, yes, my brain saw something that wasn’t there, however, it happened for a reason, if the context was different, what my brain saw would have been correct. Perhaps in a parallel universe there’s a photo that looks exactly the same, but the dress was actually white and gold.

At the end of the day our eyes are the windows through which we see reality, and they are imperfect, just like our brains. We can be one hundred percent sure that what we are seeing is actually there, that what we remember is what happened, and that we are being rational in a discussion. Sadly one can be one hundred percent sure of something, and still be wrong.

To me the most perfect example is the illusion that we are in control of our lives. The more science finds out about the brain, the more we realize how little we know of what actually happens in the 1.5 kg meatloaf between our ears. You are not in control of your next thought any more than you are of my next thought, and when people try to explain their decisions, their reasons are usually wrong. Minds can be easily manipulated, and we rarely realize it.

There’s a lot of interesting stuff in the Internet about the subconscious and how the brain really works (as far as we know). Here’s is one talk that I particularly find interesting.

So, if you want to believe you are the master of your own will, go ahead, you can also believe the dress was white and gold. Those are illusions, regardless of how useful they might be. Reality, however, is different.

My favorite public intellectuals

Here’s a selection of my favorite public intellectuals. I love how these guys talk, write, and generally everything they do. Might be worth checking them out :)

Sam Harris

Sam Harris is an author, philosopher, and neuroscientist. Among his most notable books are The End of Faith, and The Moral Landscape. He has a blog, is on Twitter, appears on many TV shows as guest, has been on many debates, as well as lengthy talks, and has written numerous articles in respectable magazines such as The New York Times.

His topics mostly concentrate around religion, faith, morality, and science.

What I like about Sam Harris the most is the way he conveys very complex and nuanced ideas in a very effective way. He is very precise with words and has the patience to go on for ages in order to explain his ideas, but also, he is very witty and can deliver crushingly funny one-liners.

@samharrisorg

In the following video Harris is in a debate with a religious apologist and shows with very funny train of thought the ridiculousness of believing in things without evidence.

This is a quick talk at TED in which he explains how science can answer moral questions, which is the main idea behind The Moral Landscape.

Finally, my favorite talk, in which he basically destroys the idea of free will. Every minute in this hour long talk is pure gold.

Steven Pinker

Steven Pinker is an experimental psychologist, cognitive scientist, linguist, and popular science author. He is best known for his advocacy of evolutionary psychology, and the computational theory of mind.

Being an expert of language, the way he communicates in every medium is simply superb. Aside from linguistics, he goes into other topics, such as the history of violence, religion, and reason.

@sapinker

Here Pinker explains why taboos are bad, and political correctness can be dangerous.

This is a quick video where Pinker explains the importance of language in order to understand human nature.

Here’s a much longer version in which he goes into a lot of detail to explain language, and what we know about it.

Noam Chomsky

Noam Chomsky should need no introduction, he is a linguist, philosopher, cognitive scientist, logician, political commentator, anarcho-syndicalist activist. He has hundreds of books, countless articles, has been in many debates, constant talks all around the globe, in fact, he has done so many things in his life that there is even a documentary devoted to him; Noam Chomsky: Rebel Without a Pause. Not content with defining the whole field of modern linguistics at an early age, he devoted his life to political activism, even risking the well being of his own family. Today he is considered the most influential living intellectual, and the most cited author alive, right after Plato. Even at his advanced age and after losing his wife of almost 60 years, he continues to tirelessly inform the public about what happens in the world, and as he stated before, he will continue to do so as long as he is ambulatory.

Chomsky might not be the most entertaining public speaker, but what he lacks in charisma, he provides in full of content. He is basically a human encyclopedia, and he rarely states his opinion, everything he says is basically facts gathered from one place or another, and for every fact he says, he knows the reference where you can verify it.

It’s hard to find a short video that shows Chomsky’s brilliance, but this interview seems to do the job perfectly. Watch this interviewer get completely owned by Chomsky. Don’t forget part two.


Manufacturing Consent is one of Chomsky’s most powerful ideas, and if you are not in the mood of reading the book, this documentary explains the idea very well. It’s long, but you wouldn’t regret watching it.

Sorry Lennart, but you are wrong once again

Lennart Poettering’s post in G+ is gathering a lot of attention these days, most of the feedback is supportive, and positive, which is not surprising to me, because although Poettering would like us to believe otherwise, most of the open source community is pretty accommodating and non-confrontational.

I am however going to go against the current here, and criticize him, but first let me state clearly that I do not condone any physical attacks towards his person, or the threats of such. His ideas however are a different matter.

Lennart’s chief mistake is to attack the way the Linux’s kernel community is run, and say their success happens despite this. How does he know? Has he ever run a more successful community? Has anybody ever? Linux is the most successful software project in history, by more than one order of magnitude from any way you look at it. It would be presumptuous for anybody to say they know how to run this project better, specially without any evidence to back such claim, which is precisely what Poettering is doing.

In this blog I’ve analyzed the many reasons why the Linux kernel is so successful, and one of them is its combative style of discussion in which ideas are not exempt from ridicule, and strong language is often used to drive one’s point home as efficiently as possible. Many people in the community agree this is desirable, and there’s even scientific evidence that supports this notion; the best ideas arise in a confrontational environment, not in a protective one.

What’s more, Poettering himself accepts he hasn’t been involved in this community. So what the hell does he know about it? Nothing.

Poettering’s second mistake is to assume that for non-white, non-western, non-straight people the situation surely must be worst… That is not the case. Maybe, just maybe, he receives such vitriolic feedback not just because of what he does, but because of the horrible way he does it. Of course not, Poettering doesn’t need to change, his approach is perfect, in fact, the only reason he receives criticism is because he is too progressive, too audacious, too efficient, surely, that must be the reason!

Personally, my beef with Poettering starts from the fact that he blocked me from Google+. Why? Because I was complaining about a technical issue with systemd, which he initially spotted and commented, but then ignored. In the middle of the discussion I made some value judgements about certain systemd code, and he stopped responding and blocked me. That is the worst way to end a discussion; block the people who disagree with you.

Sorry Lennart, but actions have consequences, and you can only do so much disruptive changes to the Linux ecosystem without much care or consideration for others, there’s a limit to the amount of people you can block, and the criticism you ignore. You can grow as thick a skin as you want, you are still wrong. No community is going to let you continue being wrong and acting as if you are beyond reproach just like that (unless you run that community and have blocked any dissident voices of course).

Maybe it’s time to take a hard look in the mirror.

What’s missing in Git v2.0.0

I recently blogged about the Git v2.0.0 release, what changed, and why should you care. Unfortunately the conclusion was that nothing much changed (other than the usual new features and bug fixes). In this post I will discuss what should have changed, and why.

What is needed

Fortunately, Git has had the Git User’s Survey in the past, so we know what users want.

  1. user-interface: 3.25
  2. documentation: 3.22
  3. tools (e.g. GUI): 3.01
  4. more features: 2.41
  5. portability: 2.34
  6. performance: 2.28
  7. community (mailing list): 1.70
  8. localization (translation): 1.65
  9. community (IRC): 1.65

Obviously, since user-interface and documentation are the areas that need more improvement, that’s what Git v2.0.0 should have focused, right?

History

I already mentioned this in the other post, but I’ll do it again.

First of all, Git as a long history of never breaking user expectations (other than the Git v1.6.0 fiasco (which changed all the git-foo commands with ‘git foo’)), and as such a lot of thought is devoted into ways to minimize changes in behavior, or even how to avoid it completely. Perhaps too much care is devoted into this.

The preparation for Git v2.0.0 started more than three years ago with a mail from Junio C Hamano, asking for developers to submit ideas for changes that normally would not happen because they break backwards compatibility, he invited us to think as if “we were writing Git from scratch”. This big release that would break backwards compatibility was going to be named “1.8.0″ and people started to submit ideas for this important release. Eventually too much time passed, the versioning scheme changed, v1.8.0 was released, and the changes proposed for v1.8. slipped into what is now v2.0.

Since no substantial changes in behavior happened since v1.0, it would follow that v2.0 was an important release, and a good opportunity to gather all the ideas about what needs to change in Git. However, seemingly out of nowhere, without any discussion or even a warning, the maintainer tagged v2.0.0-rc0, and therefore all the features that were not already merged couldn’t be merged for v2.0.0.

Thus v2.0.0 was destined to have a small list of changes, and that’s how it remained.

What could have changed

The following is a list of things that I argued should be part of Git v2.0.0.

git update

I wrote a whole post about the issue, but basically, ‘git pull‘ is broken for the most common use-case: update the current branch.

This is a known issue that has been discussed over and over, and everyone agrees that it is indeed an issue, and something needs to be done to fix it.

There have been different proposals, but by far the most comprehensive and simple is to add a new ‘git update‘ command.

This way when you want to merge a pull request, you do ‘git pull‘, and when you just want to update the current branch, you do ‘git update‘, which by default would barf if there’s divergence between your local branch (e.g. ‘master’), and the remote one (e.g. ‘origin/master’), instead of doing a merge by default. This should decrease substantially the amount of “evil merges”, merges that happened by mistake, usually by somebody that is not familiar with Git.

The patches are relatively new, but the command is simple, so there isn’t much danger of screwing things up.

The publish tracking branch

I also wrote a blog post about this; basically Git’s support for triangular workflows is not the best.

A triangular workflow is when you pull from one location (e.g. central repo), and push to another (e.g. personal GitHub fork). If you are using upstream tracking branches (you should), you have to make a decision where you set your upstream; the central repo, or your personal one. Depending on which you use, is the advantages you get, but you cannot have it all.

But with the publish tracking branch you can have all the advantages.

I’ve been cooking these patches for a long long time and I have to say this is one essential feature for me, and they patches work perfectly.

Support for Mercurial and Bazaar

Support for Mercurial and Bazaar repositories has been cooking for a long time in the “contrib” area (you can both pull and push). At this point in time the code is production-ready, and it was already graduated and merged to be released in Git v2.1.

However, the maintainer suddenly changed his mind and decided it would be better to distribute them as third party tools. He didn’t give any valid reason and clearly didn’t think it through, but they are now separate.

The code is already widely used (git-remote-hg, git-remote-bzr), and could easily be merged.

Use “stage” instead of “index”

Everybody agrees that “index” is a horrible name for Git’s “staging area”, however, nobody has done much to fix the problem.

One first step is to replace all the –cached and –index options with –staged and –no-work, which are much simpler to understand.

Another step is to add a ‘git stage‘ command that acts as a helper to work with the staging area: ‘git stage add‘, ‘git stage diff‘, ‘git stage reset‘, ‘git stage rm‘, ‘git stage edit‘, and so on.

The patches are very straight-forward.

Default aliases

Virtually every version control system has default aliases (e.g. hg co, cvs ci, svn di, etc.), except Git.

Adding default aliases is very simple to do and only brings advantages. If you don’t like the default alias, you can override it.

Patches here.

Shoulda coulda woulda

It would have been great if you could just do ‘git clone hg::mercurial-repo‘ without installing anything extra, if everybody could start using ‘git update‘ instead of ‘git pull‘, if you could do ‘git stage diff‘, or ‘git reset --stage‘. Also, if triangular workflows were properly supported.

Unfortunately that’s not the case, and Git v2.0.0 is already released, and there isn’t much to be excited about.

You might think “perhaps for Git v3.0” (which could happen in two years, or ten, how knows), but if the past is any indication of the future, it won’t happen, specially since I’ve given up on all these patches.

The fact of the matter is that in every release of Git, there is only one focus: performance. Despite the fact that it’s #6 in the list of concerns of users, Git developers work on this because that’s their area of expertise, because it’s fun for them, and because they get paid to do so. There are occasional new features, and a bit of portability now and then, but for the most part Windows support is neglected in Git, which is why the msysgit project was born.

The documentation will always remain cryptic, because for the developers, it’s not cryptic, it’s very clear. And the user-interface will never change, because the developers don’t like change.

If you don’t believe me look at the backwards-incompatible changes in Git v2.0.0, or in fact, try to think back to the last time Git changed anything. Personally other than the git-foo -> ‘git foo’ change in v1.6.0 (which was horribly handled), I can’t think of anything but minor changes.

Anyway, you can use all these features I listed today (and more) if you use git-fc instead of Git. It is my own fork of Git that has all the features of Git, plus more.

Is there anything in that list that I missed? Do you think Git v2.0.0 has enough changes as it is?