First principles of income distribution

Anyone attempting to live well within a society must have a concept of fairness; there’s a limit to how much debasement you should tolerate. You wouldn’t just let somebody cut in line in front of you, not only because you would have to wait longer, but it’s a matter of principle: it’s not fair. This isn’t unique to humans, even monkeys wouldn’t do the same job another monkey is doing, if the other monkey is receiving a grape, while he a cucumber slice. I wouldn’t do it either; I prefer grapes too.

If your potatoes are as good as the neighbor’s potatoes, but he sells them at $20, while you can only sell them at $10, either he is doing something right, you are doing something wrong, or your view of the world is incorrect, and your potatoes are not as good as you thought. It is a puzzle that must be solved, because those extra $10 might be the difference between survival, attracting a potential mate, or death.

You might not be able to solve every mystery of life and the universe, but surely you should be able to sell your potatoes at $20, and so you must.

This is the simplest explanation why we are hard-wired for fairness, and we refuse to be part of a system that exploits us. We might not understand every aspect of an extremely complex socio-economic system, but we recognize we should be paid the same amount as other cogs in the machine of the same level, at least.


We understand we can’t all have the same wealth, we can see people that work less than we do, or have less ability, so we should be paid more than them. If follows that other people provide more value than us, therefore certain level of inequality might be tolerated, if not even favored.

Many staunch capitalists shrug at the question of inequality. “You want everyone to have the same wealth?” they belch. Few things in this world are black-and-white, and inequality is no exception. The question is not “equality vs. inequality”, the question is “how much inequality?”.

It should not be surprising that high levels of inequality create social instability; large masses of people don’t like to be screwed over. If high levels of inequality are not rectified: crime increases, and eventually revolutions erupt. Ask Marine Antoinette if leveling the playing field a bit more wouldn’t have been a good policy, or rather–ask her detached head.

Therefore it should not be surprising either that elites pay close attention to inequality metrics as it stands to reason that nobody is fond of surprise revolutions. But more pressing than inequality metrics, are perceptions of inequality, because it doesn’t matter if large masses of people are being screwed over… If they don’t know it.

However, extremely high levels of inequality can’t be ignored forever, and eventually society descends into chaos. But what is that level? How much is way way too much?


The first thing a right-wing capitalist would tell you is that “it doesn’t matter”. It doesn’t matter how much money your neighbor is making selling potatoes, as long as you are making good money. This feels wrong, just like it feels wrong to receive a cucumber slice instead of a grape, but perhaps it is our base instincts at play, and in fact there is nothing inherently wrong.

The phrase they often use is “a rising tide lifts all boats”. The idea is that rich people are the ones that provide the most value to the economy, so if they have a lot of profit, they will know how to use that money best, and therefore the whole economy would benefit, including you. This is also called trickle-down economics; the earnings of the rich trickle down to the poor.

However, it doesn’t take a genius to find a caveat: what if the rich hedge 100% of the earnings? How much do you get in that case? Well, nothing. Right-wing governments have tried time and time again to decrease the taxes for the rich, in order to incentivize the supposed “job-creators”, increase the economy, and receive more total taxes as a result. The latest instance is Trump’s Tax Reform. It has never worked.

What many dogmatic capitalists seem to forget is the first principle of economics: resources are limited. So therefore naturally there’s a limit to how many resources the rich may hedge before the poor classes start to starve.

There is no magic bullet: there is a limit to the amount of value an economy can create. And how you distribute the fruits of that value does matter, and that is the distribution of income.


So we start with two premises a) too much inequality is bad, and b) income is finite. We have to find a number to express how much inequality there is, that is certain, and anybody that lives in the real world understands you can’t give to four people half the pie each (4/2 ≠ 100%) (staunch capitalists seem to forget that).

I can tell that a nation has a R/P 10% of 23.05, a Gini of 48.86, or a top 1% share of 13.5, but what does that really say? I would have to explain what each metric means, and you would still not get a good picture. I could show the Lorenz curve as well, but I would have to explain it, and it still would be hard to see what is the problem, if there is any.

Example 1

Let’s say there’s an economy of two people, a total of value created of $100,000 (it doesn’t matter the units), and we divide that total evenly ($50,000, $50,000). This is perfect equality, or no inequality, something nobody is advocating for, or even possible. The Gini index is 0, but we’ll see later how to get that number in a more realistic example.

Example 1 (Gini: 0.0).
Median: $50,000 – Richest: $50,000 – R/M: 1.0

Example 2

A slightly more realistic example divides the value unevenly ($20,000, $80,000). In this case there is inequality, but how much?

Example 2 (Gini: 30.0).
Median: $20,000 – Richest: $80,000 – R/M: 4.0

The Gini index is often referred as a representation of the Lorenz curve of an income distribution, but we don’t need extra layers of complexity to understand what the value means. Another way to define Gini is in terms of the relative mean absolute difference: we find all the relative differences, and divide by n.

The total is $100,000, x₁ is $20,000, x₂ is $80,000, so: |x₁x₂| / total →|$20,000 – $80,000| / $100,000 →$60,000 / $100,000 → 60%. The relative difference of x₁ and x₂ is 60%, and the other way around (x₂x₁) is the same, so the sum is 120%, we divide that by n (2), and the result is 60%. The Gini index is half of the RMAD, so: 30.

So when you see a Gini index of 30, you can picture the above distribution (20, 80), but is that a fair distribution? Well, 30 or above is considered medium inequality (30 < x < 50), but I leave it to you to decide if it actually is.

Example 3

Let’s move to a more complicated example ($7,000, $13,000, $20,000, $60,000):

Example 3 (Gini: 41.5).
Median: $13,000 – Richest: $60,000 – R/M: 4.6

At first this looks like it has more inequality, but in fact the economy follows the same distribution as the previous example, except with more granularity: x₁ + x₂ = $20,000, x₃ + x₄ = $80,000.

It’s much more tedious to calculate the Gini mathematically by hand, just the first element would be: (|x₁x₁| + |x₁x₂| + |x₁x₃| + |x₁x₄|) / total → ($0 + $6,000 + $13,000 + $53,000) / $100,000 → $72,000 / $100,000 → 72%. The whole RMAD is (72% + 60% + 60% + 140%) / 4 → 332% / 4 → 83%. So the Gini is 41.5.

But wait a second! Why is the Gini higher in this case, if the distribution is the same? Well, that’s the first caveat of the Gini index: it depends entirely on the number of samples of the population: the more samples, the more precise it is.

But that’s not the only caveat. If you have been paying attention, you might have deduced already that there’s more than one set of four numbers whose relative absolute difference equals to 332%. Which means there’s many income distributions that result in the same Gini index, and there are:

Alt 1 (Gini: 41.5).
Median: $17,000 – Richest: $54,000 – R/M: 3.2
Alt 2 (Gini: 41.5).
Median: $11,000 – Richest: $46,000 – R/M: 4.2
Alt 3 (Gini: 41.5).
Median: $12,000 – Richest: $51,000 – R/M: 4.2
Alt 4 (Gini: 41.5).
Median: $11,000 – Richest: $64,000 – R/M: 5.8

So that’s the second caveat: a single Gini index cannot represent entirely a distribution of income. It is by far the best way to represent the economic inequality in a single number, but it cannot give you the whole picture.

The last example is particularly interesting, as the richest person earns 5.8 times more income than the average person, yet the Gini is exactly the same because the bottom 75% is quite homogeneous.

Example 4

Finally we arrive to the most realistic example ($2,000, $3,000, $4,000, $5,000, $6,000, $7,000, $8,000, $11,000, $15,000, $39,000):

Example 4 (Gini: 46.2).
Median: $6,000 – Richest: $39,000 – R/M: 6.5

This again follows the same distribution of the previous examples: add up the first five elements and it will give you $20,000, add the rest and it will give you $80,000. But the granularity makes the inequality more visible, the Gini index is increasing, and so is the ratio between the richest and the average person.

At this point I must confess that this is not an entirely fake economy; this is in fact a simplification of the economy of Mexico, and each example follows exactly the distribution of income in Mexico, which has a Gini of 48.86. As the granularity increases, the Gini index gets closer to the real value. Unfortunately even official sources list the Gini index as 47.13, but that’s because the economy has been simplified to ten values, when the real Gini is 48.86 (if you use the whole surveyed sample).

So we actually have reached the limit of official sources and we are going beyond.

Real numbers

It’s time to move away from fake numbers to real ones, instead of sets of 4 or 10, to hundreds or millions. Values are adjusted to have a mean of $10,000, but the proportions are the same.

Real 10

If we divide the real sample into 10 values, we get a graph closely following our fake example #4 (these numbers are not rounded):

Real 10 (Gini: 47.1).
Median: $5,791 – Richest: $39,476 – R/M: 6.8

Real 100

Dividing the real sample into 100 values we start to see how the inequality shapes up. Also, the Gini index is very close to the real value.

Real 100 (Gini: 48.8).
Median: $6,259 – Richest: $134,956 – R/M: 21.6

If you pay attention to the richest person you would see his income keeps increasing as we add more samples. At this point he receives 21.6 times more income than the average person.


Finally, if we plot the real sample as it is (122,643,890 weighted values), we get the following graph:

Real (Gini: 48.9).
Median: $6,313 – Richest: $4,406,353 – R/M: 698

Does that graph looks remotely similar to a fair distribution of income? The richest person has an income of $4,400,000; 700 times what the average person gets. That doesn’t even reach the 50 index needed to be considered high inequality. 48.9 is still considered medium. And yes; this is real.

There is a final caveat to income surveys: the richest of the rich are extremely underrepresented. The richest person in Mexico doesn’t receive an income of $4,400,000, it’s closer to $4,000,000,000 (400,000 times the media), but the chances of interviewing that person in a random survey are virtually zero. The real number of entries in the survey are 70,000, with a mean household size of 3.6, so you can’t say much about the top 0.001%, except: they have an insane amount of income.

At which point does an economy becomes ridiculously unfair? Well, apparently it’s not with a Gini of 48.9 (or at least this distribution), because Mexico has not exploded into a revolution, although that might be due to ignorance. Perhaps if the population of Mexico knew how unfair the distribution of income is, they would do something about it. But at the moment it seems a Gini index of 50 is manageable.


Hopefully after reading this article you have a better understanding of what the Gini index is, and why it’s a good measure of inequality, although not a perfect one. And what a distribution of income with a Gini of 50 looks like.

This article only scratches the surface of income distribution measurements. There are many ways to stratify the data: by area, by urban vs. rural areas, by number of habitants, by age, by work status (full-time vs. part-time), by sex, etc. The per capita income can be recalculated through equivalization, which increases it dramatically. And the top incomes can be calculated through other means. Plus, there are confidence intervals to take into consideration.

And we didn’t even mention wealth and income dynamics. The income distribution is the number that is more easily obtained, but what is most important is how that number changes, and increases the wealth of each individual. The distribution of wealth is a much more complicated subject, but suffice to say: it’s much more unequal than the distribution of income.

But all this doesn’t change the fact that an inequality in the distribution of income can be measured and visualized. Personally I think anyone with a pair of working eyes can say with confidence: yes, some distributions of income are unfair.

Los salarios en México y las malas estadísticas del INEGI

Update: Inicialmente mis cálculos no consideraban el factor de ponderación que usa el INEGI. Los números han sido actualizados para reflejarlo.

Yo crecí en México en lo que consideré la clase media, pero después viví en E.U.A y en Europa, por lo que tal vez mi concepción de las diferentes clases dejó de estar apegada a la realidad. Yo pensaba que un salario mensual de $20,000 (una cantidad módica en países del primer mundo) se consideraría clase media, y cuando una persona me dijo que el salario promedio era $8,000 no lo creí, y así comenzó la tarea de buscar los salarios de las diferentes clases en México que resultó no ser tan fácil como parecía.


El INEGI realizó una encuesta (Encuesta Nacional de Ingresos y Gastos de los Hogares) que supuestamente contiene estos datos, sin embargo, estos son los resultados:

I $1,674
II $3,033
III $3,977
IV $4,900
V $5,959
VI $7,183
VII $8,800
VIII $11,313
IX $16,012
X $42,120
* Trimestral.

En teoría ahí está toda la información y la tarea está hecha, sin embargo hay un problema al tratar de entender estos números. La tabla se titula “ingreso corriente total promedio trimestral per cápita en deciles de personas”. Ahí vemos los diez grupos, pero un decil se define como: “cualquiera de los nueve valores que dividen los datos ordenados en diez partes iguales”; nueve valores, y en la tabla hay diez, esos números no son deciles.

Básicamente, la tabla es completamente inútil. Si una persona gana $10,000 trimestrales, ¿Está en el grupo VII o VIII? El número que necesitamos para saber eso no está en esta tabla.

A mano

Afortunadamente el INEGI provee los datos originales, y gracias a mis habilidades de programación pude hacer las manipulaciones necesarias para sacar los datos de interés. Desafortunadamente en el proceso noté discrepancias en las tablas del INEGI, así que tuve que hacer los cálculos por mi cuenta.

10% $835
20% $1,180
30% $1,478
40% $1,798
50% $2,182
60% $2,632
70% $3,306
80% $4,315
90% $6,753

Estos números sí son deciles, y es fácil saber a qué grupo perteneces. Si tu ingreso mensual es de $3,000 pesos (y no tienes familia), eso significa que ganas más que el 70% de la población (grupo VII). Curiosamente es fácil ver la media (50%), que es $2,182, es decir: 50% de la población gana menos de $2,182, 50% gana más.

De forma similar podemos dividir la población en tres grupos:

baja menos de $1,579
media de $1,579 a $3,033
alta más de $3,033

Parece difícil de creer, pero estos números se pueden comprobar fácilmente. El tamaño de la muestra del INEGI son 19479 personas, con un filtro para ver cuántas personas ganan más de $3,033, el resultado es 8040 (41.28%), sin embargo al usar el factor de expansión el resultado es 33.32% (una tercera parte).

Cabe mencionar que los números son per cápita. Es decir, si ganas $8,000 pesos y mantienes a una familia de 4, cada persona se considera que percibe un ingreso de $2,000 pesos. Más detalles abajo.

Los números para la clase súper rica son:

91% $7,156
92% $7,663
93% $8,325
94% $9,130
95% $10,065
96% $11,516
97% $13,159
98% $16,475
99% $22,763


Los promedios pintan un panorama muy diferente. Por ejemplo; la clase alta es más de $3,033, sin embargo hay mucha diferencia entre un ingreso de $4,000 y $400,000 pesos, y ambos están en el mismo grupo. Al promediar a toda la gente de éste grupo, el resultado cambia mucho. El promedio de la clase alta (top %33) es de $7,262, el promedio del top 90% es de $14,040, y el promedio del top %1 es de $42,910.

Por eso son peligrosos los promedios. A pesar de que el promedio de todo el país es de $3,499, el promedio del bottom 99% es de $3,101, pero al juntarlo con el top %1 de $42,911 se eleva a $3,499 (3101 * 0.99 + 42910 * 0.01).

Ingresos por trabajo

Hay muchos detalles de estos números, pero en general es el ingreso de todo el hogar: salarios, utilidades, rentas, transferencias, y estimado de alquiler, dividido por el número de integrantes.

Si tomamos en cuenta sólo el ingreso por trabajo de las personas ocupadas, el resultado es más prometedor.

10% $782
20% $1,545
30% $2,201
40% $2,872
50% $3,563
60% $4,303
70% $5,324
80% $7,008
90% $10,492

En tres grupos:

33% $2,413
66% $4,937

Y el top 10%:

91% $11,041
92% $11,848
93% $12,739
94% $13,960
95% $15,473
96% $16,988
97% $19,565
98% $24,508
99% $32,983



Existe un número que se usa para medir la desigualdad de forma rápida, el coeficiente Gini. Aunque no es perfecto, es el más utilizado, y no deja de ser útil. Una sociedad perfectamente igual tendría un valor de 0%, mientras que una totalmente desigual 100%. Alemania, un país con mucha igualdad social tiene un valor de 27%, Estados Unidos, conocido por su desigualdad, 45%. Según el INEGI México tiene un valor de 48%, pero según mis cálculos el valor es 49.70%. La diferencia se debe a que el INEGI usa sus promedios por decil (10 datos), mientras que yo uso todos los registros (19479 datos), por lo que mi cálculo es más preciso.


Probablemente la forma más fácil de visualizar la increíble desigualdad que hay es graficando todos los ingresos de la muestra:


Errores del INEGI

Ya mencioné el hecho de que para empezar su tabla de deciles no contiene deciles,  contiene promedios de los diversos grupos, que como ya vimos los promedios son peligrosos por que pueden pintar las cosas más positivas de lo que son.

Además hay discrepancias muy curiosas. La misma tabla de ‘ingresos’ está en un formato “tradicional” y de “nueva construcción”.

folioviv foliohg numren clave ing_5
0860298316 1 01 P043 8000
0860298316 1 01 P071 8237
folioviv foliohg numren clave ing_5
0860298316 1 01 P043 8000
0860298316 1 01 P071 58237

Aquí vemos dos ingresos de una persona; P043 es un beneficio de PROCAMPO, P071 es la clave de negocios agrícolas. En una tabla dice que sacó $8,237 de negocios agrícolas, y en la otra $58,237. Parece ser un error de dedo (que cambia las cantidades drásticamente), pero por qué no comparan sus propias tablas?

Como éste tipo de errores parece haber muchos. Por ejemplo en la documentación de la variable ‘ing_cor’:

inc_cor: ingreso corriente
La suma de ing_cor y percep_tot

Eh? Para sacar ‘ing_cor’ necesito ‘ing_cor’?

Otro ejemplo son registros marcados como “indemnizaciones” (P034), que no parecen usarlos en ningún lado.

Todo indica que al INEGI le hace falta revisar su propia información.

Mapa de la muestra

Update: Mucha gente preguntó que de dónde sacaron las encuestas, aquí hay un mapa para visualizarlo. Por alguna extraña razón en el centro de Tabasco es donde tomaron más información. Se ve muy evidente que a el norte no le prestaron mucha atención.



No queda más que aceptar que estamos mucho peor de lo que pensaba, no solo en cuestión de salarios, pero desigualdad, e incluso disponibilidad de la información. Si el organismo dedicado a proveer datos estadísticos no sabe ni lo que es un decil, realmente no se puede esperar mucho del futuro.

Nota: Estos números son confiables sólo si la muestra del INEGI es realmente aleatoria. Dado que ya detecté muchos errores en sus tablas, es posible que la muestra del INEGI también deje que desear. Desafortunadamente no hay mejores datos, así que hasta donde yo sé, estos son los números más confiables.


Cualquier persona puede verificar los datos si le interesa. Todo el código se encuentra en GitHub, yo utilicé Linux, pero es posible correr Ruby en Windows también.