That there is inequality in income is beyond question, but it's worth asking about the best way to measure it. A common approach is to state something like this:
The 30% of individuals with the lowest incomes in Scotland received 14% of the total income, whereas the top 30% receive 51%, and the middle 40% get 35%.
I've picked this just as one example, but you can find many more like this for many countries. In this post I'll argue why this is a poor measure of inequality, and how other measures can yield more meaningful information that is more sensitive to changes in inequality.
The mathematics that underpin the arguments here can be found in this blog post on my personal blog.
A simple distribution
Let's look at how these income percentages are calculated. I'm conscious that some might find this bit rather numerical and, well, boring, so if you really don't like it, eyeball the graphs then skip to the table
To keep it clear I'll start with the mathematically simple,
uniform distribution, as illustrated in the graph below, and consider more realistic ones later on.
|
A uniform income distribution in which 5 million people are evenly spread between incomes of £0 and £50,000 per year. |
It has a constant value of 100 from an annual income of £0 up to £50,000, and is zero above £50,000. This is telling us that no-one earns more than £50,000 in this population. The size of the population is the area under the graph, which is 100 x 50,000 = 5 million, roughly equal to a small country such as Scotland, Norway or Slovakia.
We can use the distribution to calculate the number of people in any range of incomes. So, for example, the number of people with income between £0 and £5000 is
100 x (5000-0)=500,000
Similarly, the number who earn between £45,000 and £50,000 is
100 x (50,000-45,000)=500,000
In fact, any £5000-wide range of income will have 500,000 people - that's the simplicity of a uniform distribution.
You'll notice in these two examples that we've given examples of the poorest and richest 500,000 people in terms of income. With a total population of 5,000,000 that means we're talking about the poorest and richest 10% of the population. In the jargon these are known as the poorest and richest deciles of the population. We can divide the whole population into ten such deciles ("dec-" means ten). The uniform distribution makes this simple: the boundary between the first and second decile is at £5000, the second and third is at £10,000, the third and fourth is at £15,000 and so on in increments of £5000 up to the tenth decile which starts at £45,000.
Next, let's calculate the total income of this population. This usually requires calculus, specifically integration, but since we're dealing with a uniform distribution there's a simple way to do it: multiply the income at the centre of the population, £25,000, by the number of people in the population, 5 million:
total income = £25,000 x 5,000,000=£125,000,000,000
The middle income is the mean (or average) income, which also happens to be the median income for a uniform distribution. The median is the income that splits the population into two equally sized groups. If you think about it, the median is equal to the upper boundary of the fifth decile.
Let's now calculate the total income of the bottom 30%, or, in other words, the bottom three deciles. This will be everyone earning up to 30% of £50,000, which is £15,000. The same trick as we used above still works, we multiply the mean income of this group, £7500, by its population, 30% of 5 million, which is 1,500,000:
poorest 30%'s income = £7500 x 1,500,000 = £11,250,000,000
So the total income is £11.25 billion. Likewise, we can calculate the income of the richest 30%, which is a group with the same number of people, but its mean income is £42,500, so:
richest 30%'s income = £42,500 x 1,500,000 = £63,750,000,000
The middle 40% has the same mean income as the whole distribution, £25,000, and 40% of 5,000,000 is 2,000,000 so
middle 40%'s income = £25,000 x 2,000,000 = 50,000,000,000
Now let's summarise all of this in a table, and also include the percentage of the total income:
Group | Mean income | People | Group income | % of total |
Poorest 30% | £7500 | 1.5mn | £11.25bn | 9% |
Middle 40% | £25,000 | 2.0mn | £50.00bn | 40% |
Richest 30% | £42,500 | 1.5mn | £63.75bn | 51% |
Total | £25,000 | 5.0mn | £125.00bn | 100% |
Note that the percentages 9%/40%/51% are not too different from the reality of Scotland mentioned at the start, i.e. 14%/35%/51%. This might strike you as odd because the actual income distribution of Scotland is quite different to our toy uniform distribution. There's an interesting reason for this, which we can demonstrate by playing with our uniform distribution for a bit longer.
Imagine now that we double the top salary of the distribution from £50,000 to £100,000. The distribution looks like this:
|
A uniform income distribution in which 5 million people are evenly spread between incomes of £0 and £100,000 per year. |
Notice that the population is 5 million, as before, because although the range has doubled (zero to £100,000) the height has halved to 50. You can think of this as everyone in the population having their income doubled. Intuitively, you might think that inequality is more pronounced with this distribution, and you can rationalise it as follows: the mean income of the poorest 30% increases by £7500, but that of the richest 30% increases by £42,500. The rich get a lot richer than the poor do, so inequality is greater.
But let's work out the income percentages. The mean salary is double what we had before at £50,000, as are the mean salaries of the poorest and richest groups. So if we were to amend the table above we'd need to double the
Mean income column, which when multiplied by the number of people means that the
Group income doubles also, including the
Total. And here's the punchline: the factors of two cancel when calculating the percentages which meanins that the
% of total column
remains unchanged at 9%/40%/51%.
Scale invariant
This result is not just true for doubling: we could have halved income, tripled it, multiplied it by million or divided by a billion or pi; or any (positive) number. The percentages will stay the same. The percentages of 9%/40%/51% are true of a uniform distribution that starts at zero income and goes up to
any maximum value.
This is an example of what is called a scale invariant property. The more astounding thing (unless you're used to mathematical statistics) is that this is true for
any distribution, not just uniform distributions.
To emphasise the point, imagine if everyone in Scotland saw their income double. There would be a significant improvement in living standards for all, with the rich benefiting most in absolute terms, but the income percentages would remain
exactly the same as they are today, i.e. 14%/35%/51%.
The bottom line is this: the percentages of total income going to income groups can be completely insensitive to dramatic changes in the distribution of income.
Symmetry invariant
But there's more. It turns out that for
any distribution that's symmetric about its mean income, which is true of a uniform distribution and many others, you can mathematically prove that the middle 40% always have a 40% share of the income. The corollary of this is that the poorest and richest 30% groups combined will have 60%.
From this we can conclude that Scotland's distribution cannot be symmetric because its middle group receives only 35% of the income, not 40%. You can think of the "lost" 5% as going to the poor which is 14% rather than 9%. This is a common feature of real income distributions: they are not symmetric but significantly skewed so that they peak towards lower incomes, with a long tail extending up to high incomes.
So what do income percentages tell us?
It turns out that the percentages are determined by two things:
- the shape of the distribution
- the width of the distribution relative to its mean
We've seen one effect of point 1 in considering symmetry, and will come back to it later, but let's consider point 2.
In statistics, the width of a distribution is usually measured by something called the standard deviation. I don't want to get into the gorey details, but rest assured there is a simple mathematical formula for calculating the standard deviation, and you can find it implemented in spreadsheets as the STDEV() function.
For the first uniform distribution we looked at above, the mean is £25,000 and the standard deviation is £14,434, as shown on the red distribution in the graph below.
|
Means are shown with blobs, and a standard deviation either side of the
mean is marked with short vertical lines. The blue distribution's mean and standard
deviation are twice that of red. |
The blue distribution is the "doubled" uniform distribution discussed above. As you may have guessed, its mean and standard deviation are double the red values: £50,000 and £28,868.
If instead we imagined increasing everyone's income by £25,000, that would just shift the red distribution £25,000 to right, but leave the width unchanged, ranging from £25,000 up to £75,000:
|
Here the red distribution is moved to the right by £25,000. Its mean is £50,000, same as the blue's, but the standard deviation is unchanged. |
The mean is now doubled, but the standard deviation remains the same. As a result the income percentages change significantly to 19.5%/40%/40.5% indicating reduced inequality. This suggests that the larger the mean to standard deviation ratio, the smaller the inequality, or, to put it more clearly, a more equal society has a narrower distribution about the mean.
A more realistic shape
Let's now look at a different distribution shape called a Gaussian (also called a bell-curve or a normal distribution). It falls off in a more realistic way than the "cliff edges" of the uniform distribution, but is still perfectly symmetric about the mean. The means and standard deviations of the red and blue Gaussian distributions in the graph below are identical to the uniform ones plotted previously.
|
Gaussian distributions with the same mean and standard deviation as the red and blue uniform distributions above. The vertical line marks the mean, and the horizontal line is two standard deviations long. |
Although it may not be immediately obvious, the area under these two curves are the same as before, that is, equal to a population of 5 million (though notice that a small fraction of the population are on negative income).
The income percentages for both these distributions can be calculated (using calculus) as 10%/40%/50%, i.e. almost the same as for the uniform distributions.
In fact, for a whole range of realistic distributions the shape of the distribution has little effect on the percentages, generally altering them by no more than a few percentage points. However, as we saw above, the mean to standard deviation ratio can significantly change the income percentages.
A better measure of inequality
I hope that you're now convinced that income percentages are an insensitive measure of inequality. In addition, they lack an intuitive meaning for most people, because few would think of themselves as, say, belonging to the middle 40%, and would have little feel for the fraction of total income that group receives.
Instead, most people assess their prosperity relative to their peers with similar incomes; possibly feeling jealous of a neighbour's expensive holiday, or perhaps experiencing a slight smugness at having a better car. A good inequality measure would be both sensitive to changes in the income distribution, and present it in an intuitive way that accords with pre-existing perceptions.
Once you understand what the income percentages are telling you, as we've explored above, you might prefer to know the mean and the standard deviation. After all, the ideal of perfect equality is obtained when the width of a distribution is zero and everyone earns the same. In that sense, perhaps the best measure of inequality is the standard deviation divided by the mean, because as it approaches zero, so does inequality. However, although this might appeal to a statistician or mathematician, I think most people would be put off by the very mention of standard deviation. Also, the standard deviation to mean ratio is, like the income percentages, scale invariant, and so is insensitive to significant changes in income distributions.
Decile boundaries offer a better measure of inequality. They have a tangible meaning: if a person can see that their income lies between, say, the second and third decile boundaries, then they know they are relatively poor, but far from being the poorest. Further, if they see that both boundaries increase over time then they can conclude that they, along with all of their peers, are benefiting (assuming inflation is accounted for). In this sense they are sensitive to crucial changes in the income distribution.
And it needn't be any more complicated: instead of quoting three
percentages, you could summarise the state of the income distribution by
quoting just two incomes: the top of the poor 30% group and the bottom
of the rich 30% group.
As we saw above, if a society did experience a doubling of everyone's income, then the income percentages remain the same, but all the decile boundaries will double. More importantly, if poor people gain more than the rich, this will evidenced by the poor decile boundary increasing more than the rich one.
Real data
Compare the two graphs below showing real data for Scotland and decide for yourself which is more informative about the state of its inequality over the last twenty years.
The
Poverty and Income Inequality in Scotland report contains this graph in Chapter 2 showing the percentage of total income going to the poorest 30% (deciles 1-3); the middle (deciles 4-7); and richest 30% (deciles 8-10):
|
Income percentages going to bottom and top 30% groups and middle 40% group. |
From data in the
spreadsheet supplied with that report I constructed the graph below showing the changes in the boundaries that divide these three groups, i.e. the 3rd and the 7th deciles. Also shown for reference is the median.
|
Decile boundaries for Scotland. Income is stated after tax, including any benefits received, but
before housing costs, and corrected for inflation so it's in 2013/14
prices. The spreadsheet with this data can be downloaded here - see Table A10. |
The income percentages graph shows very little change whereas the decile boundary graph clearly shows an increasing trend and also a dip due to the recession after 2008. Interestingly, the dip is more pronounced for the 7th decile than for the 3rd decile, i.e. the recession slightly reduced inequality. There is a hint of this in the income percentages graph in that the poorest 30% changes from 13% to 15% during 2010.
If we look at the ratio of the 7th decile to 3rd decile we find that it went from 1.84 in 1994/5 to 1.71 2013/14. By this measure there has been a small but significant drop on in income inequality. In contrast, the income percentages in the previous graph are completely insensitive to this change.
A similar measure of inequality I've seen used is the so-called 90/10 ratio, which is just the upper boundary of the 9th decile divided by the upper boundary of the 1st decile. In this case it drops from 3.89 in 1994/95 down to 3.49 in 2013/14.
It's important to note that everyone has benefited from a real income rise in Scotland over the last two decades (even if they're unware of it because it has been gradual). The 3rd decile has risen by £4160 over the twenty years, an increase of 30%, and the 7th decile by £5304, an increase of 23%.
The 1%
The Occupy movement popularised the notion of the top 1% as a symbol of inequality, and so I couldn't resist a brief look at the income percentages
for the top 1% of the income distributions. For the uniform distributions
described above, the income percentage is 2% and for the Gaussian it is
2.5%.
It's hard to find comparable post-tax figures that include
benefits, but by combining data from this
older IFS report (inflating using RPI) with the
HBAI distribution,
my rough estimate for the whole UK is 4%. It's likely that Scotland's
figure would be little lower than this (London bumps up the UK figure), but still well above the 2.5% of
the Gaussian. This is a reflection of the fact that the income distribution is skewed
so that the peak is pushed to lower incomes and there's a "long tail"
extending up to extremely high incomes.
There are much
greater uncertainties in estimating the top 1% figures due because very rich
people receive income in a different way to the rest of us, and because they can "tax efficiently" shunt money across national boundaries. See
Thomas Piketty's book Capital in the twenty-first century if you'd like to
understand why. In all probability most estimates are under-estimates.
Y U lie - inequality is increasing!
When I started investigating income inequality I fully expected to find what I'd been frequently been told and what I'd often read - inequality is increasing in Scotland. But the numbers above do not lie. It is a fact that post-tax income including benefits shows inequality has lessened over the last twenty years. But there's no contradiction here, for three reasons.
Firstly, I'm not talking about wealth inequality. It is quite different and much greater (see Chapter 7 of Piketty).
Secondly, most figures I've seen quoted on income inequality relate to pre-tax income
excluding benefits which shows inequality as constant or rising slightly over the last twenty years. But, to my mind, ignoring the redistributive actions of tax and benefits is misleading: it includes a substantial of portion of rich peoples' earnings that is actually income to the government (tax), and excludes a significant portion of what makes up poor peoples' income (benefits). For some uses this may be appropriate, but I'd argue that post-tax income including benefits is most appropriate for headline figures.
Finally, I'm not looking at the extremes of the income distribution. The top 1% (and especially 0.1%) and bottom few % show distinct trends that require special analysis using other datasets.
This
report by the Bell and Eiser gives further information on these points and their Figure 6 shows the drop in inequality that I've found, including across the recession since 2008. It's also interesting to compare how their work translates into headlines that lead on growing inequality, such in this
article in the Herald. The plight of the super-poor is rightly given prominence, but the fact that the vast majority of households have seen their real income rise, and post-tax inequality has reduced, is only touched on.
Conclusion
Income percentages are insensitive to some significant changes in real income distributions. I've argued that stating the incomes which mark out the poorest 30% and the richest 30% provide a more intuitive measure of inequality that is better suited to highlighting important changes in the income distribution.
Taking the ratio of these two incomes, or using the 90/10 measure also gives interesting information, but I don't think such ratios should be considered alone. The problem with dividing the two numbers in this way is that it provides a scale invariant measure which, for example, would show no change if everyone's income doubled.
Unless there is good reason to do otherwise, I think it is most appropriate to work with post-tax income including benefits. This is the income that people receive and correctly accounts for a society's income redistribution.
Finally, it's always important to use the right tool for the job. I've concentrated on inequality measures for general use, but for specific purposes other measures must be used. For example, if you are looking at either extreme poverty or very high incomes, then that information is hidden within the first and last deciles. You can turn to percentiles rather than deciles (100 divisions rather than 10), but its important to remember that such statistics are one-dimensional and exclude important human aspects. In my opinion, the area that requires most attention and urgent action is extreme poverty, and for that you must consider foodbanks, homelessness, substance abuse amongst many other things. A statistic on income inequality is just one tool in the box for analysing society.
The mathematics that underpin the arguments here can be found in this blog post on my personal blog.