Skip to main content

Decomposition analysis of earnings inequality in rural India: 2004–2012


We analyze the changes in earnings of paid workers (wage earners) in rural India from 2004/05 to 2011/12. Real earnings increased at all percentiles, and the percentage increase was larger at the lower end. Consequently, earnings inequality declined. Recentered influence function decompositions show that throughout the earnings distribution, except at the very top, both changes in “worker characteristics” and in “returns to these characteristics” increased earnings, with the latter having played a bigger role. Decompositions of inequality measures reveal that although the change in characteristics had an inequality-increasing effect, chiefly attributable to increased education levels, inequality declined because workers at lower quantiles experienced greater improvements in returns to their characteristics than those at the top.

JEL:JEL Classification: J30, J31, O53

1 Introduction

In their discussion of India’s economic growth, Kotwal et al. (2011) point to the existence of two Indias: “One of educated managers and engineers who have been able to take advantage of the opportunities made available through globalization and the other—a huge mass of undereducated people who are making a living in low productivity jobs in the informal sector—the largest of which is still agriculture.” This paper is about the second India that mainly resides in its rural parts. Agriculture, the mainstay of the rural economy, continues to employ the largest share of the Indian workforce, but its contribution to gross value added (GVA) is much smaller. In 2011, the employment shares of agriculture, industry, and services were 49, 24 and 27 %, respectively, whereas their shares in GVA were 19, 33, and 48 %, respectively (GOI 2015). In addition, between 2004/05 and 2011/12, real gross domestic product (GDP) in these sectors grew at 4.2, 8.5 and 9.6 % per annum, respectively, making agriculture the slowest growing sector of the economy (authors’ calculations based on RBI 2015). Given these figures, the concern about whether high overall GDP growth has benefitted those at the bottom, and to what extent they have benefitted compared to those at the top, is even more pertinent for rural India. We therefore focus on rural India and examine how real earnings of paid workers (wage earners) evolved over the 7-year period between 2004/05 and 2011/12.

Several studies have documented that along with the high growth rates of GDP that have characterized the Indian economy since the 1980s, there has been an increase in inequality.Footnote 1 However, most of these studies have either focused on consumption expenditure (Sen and Himanshu 2004; Cain et al. 2010; Motiram and Vakulabharanam 2012; Jayaraj and Subramanian 2015; Datt et al. 2016)Footnote 2 or on earnings of paid workers in urban India (Kijima 2006; Azam 2012a). Two notable exceptions are Hnatkovska and Lahiri (2013) and Jacoby and Dasgupta (2015). Hnatkovska and Lahiri (2013) focus on wage comparisons between rural and urban areas between 1983 and 2010. They find that urban agglomeration led to a massive increase in urban labor supply that in turn reduced the rural-urban wage gap. Unlike Hnatkovska and Lahiri (2013), we focus exclusively on rural India to provide a more detailed picture of the changes within this sector. Jacoby and Dasgupta (2015) adopt the supply-demand-institutions (SDI) framework pioneered by Katz and Murphy (1992) and Bound and Johnson (1992), to decompose wage changes between 1993 and 2011 in both rural and urban India. We use a very different approach, namely, the recentered influence function (RIF) decomposition developed by Firpo, Fortin, and Lemieux (2009) to study earnings evolution in rural India.Footnote 3 Jacoby and Dasgupta (2015) decompose the change in an indirect measure of wage inequality, namely, the relative wages of educated and uneducated workers, into changes in employment shares of different demographic groups and changes in the industrial composition. In this paper, we focus on direct measures of inequality such as the Gini and the 90/10 percentile ratio, and decompose changes in these measures into changes in worker characteristics and changes in returns to these characteristics. Our finding that the change in returns to characteristics is driving the decline in earnings inequality in rural India is a novel one. Moreover, we document changes not just at the mean but also at various quantiles. It is important to do so because several studies have found that earnings inequality is mainly concentrated at the upper end. For India, Azam (2012a) and Kijima (2006) find this for urban wage earners and Banerjee and Piketty (2005) find it for income tax payers. We use unconditional quantile regressions to account for the effects of workers’ characteristics at different quantiles and thereby make inferences about their effects on earnings inequality. Finally, we use the RIF decompositions to divide the overall change in earnings inequality into a composition effect (the component due to changes in the distribution of worker characteristics) and a structure effect (the component due to changes in returns to these characteristics).

We find that during the period from 2004 to 2012, real earnings among paid workers increased at all percentiles and the percentage increase was greater at lower percentiles. Consequently, earnings inequality declined in rural India. The RIF decompositions reveal that throughout the earnings distribution, except at the very top, both the composition effect and the structure effect increased earnings, with changes in the latter having played a bigger role. Decompositions of inequality measures reveal that in spite of the composition effect having had an inequality-increasing role, inequality fell because workers at lower quantiles experienced greater improvements in returns to their characteristics than those at the top. Earnings inequality increased as workers acquired higher levels of education. At the same time, lower returns to higher education reduced inequality.

The rest of the paper is organized as follows. Section 2 discusses the methodology used to analyze the change in earnings. Section 3 describes the data and the analysis sample. Section 4 presents the results, and Section 5 concludes.

2 Methodology

We briefly explain the RIF regression for unconditional quantiles, followed by the RIF decomposition technique. For a detailed exposition of this and other decomposition techniques, see Fortin et al. 2011.

2.1 Unconditional quantile regressions

Unconditional quantile regressions (UQR) introduced by Firpo et al. (2009) help us examine the marginal effects of covariates on the unconditional quantiles of an outcome variable. UQR differ from the traditional quantile regressions (Koenker and Bassett 1978) in that the latter examine the marginal effects on the conditional quantiles. For instance, if we observe that the conditional quantile regression coefficients for college education increase as we move from the first to the ninth decile, we can say that having more people with a college education would increase earnings dispersion within a group of individuals having the same vector of covariate values. However, in order to claim that college education increases overall earnings dispersion (among all individuals irrespective of their covariates), we need to rely on unconditional quantile regressions. To understand UQRs, we begin with the concept of an influence function (IF).

The IF of any distributional statistic represents the influence of an observation on that statistic. Specifically, let w denote earnings and let q θ denote the θth quantile of the unconditional earnings distribution. Then,

$$ \mathrm{IF}\left(w,{q}_{\theta}\right)=\left(\theta -\mathbb{I}\left\{w\le {q}_{\theta}\right\}\right)/{f}_w\left({q}_{\theta}\right) $$

where \( \mathbb{I}\left\{.\right\} \) is an indicator function and f w is the density of the marginal distribution of earnings. The RIF is obtained by adding back the statistic to the IF. Thus, the RIF for the θth quantile is given by:

$$ \mathrm{R}\mathrm{I}\mathrm{F}\left(w,{q}_{\theta}\right)={q}_{\theta } + \mathrm{I}\mathrm{F}\left(w,{q}_{\theta}\right)={q}_{\theta }+\left(\theta -\mathbb{I}\left\{w\le {q}_{\theta}\right\}\right)/{f}_w\left({q}_{\theta}\right) $$

Note that the expected value of the RIF is q θ itself. The conditional expectation of the RIF modelled as a function of certain explanatory variables, X, gives us the UQR or RIF regression model:

$$ E\left[\mathrm{R}\mathrm{I}\mathrm{F}\left(w,{q}_{\theta}\right)\Big|\boldsymbol{X}\right]={m}_{\theta}\left(\boldsymbol{X}\right) $$

In its simplest form,

$$ E\left[\mathrm{R}\mathrm{I}\mathrm{F}\left(w,{q}_{\theta}\right)\Big|\boldsymbol{X}\right]=\boldsymbol{X}\boldsymbol{\beta } $$

where β represents the marginal effect of X on the θth quantile. β can be estimated by ordinary least squares (OLS) wherein the dependent variable is replaced by the estimated RIF. The RIF is estimated by plugging the sample quantile, \( \widehat{q_{\theta }} \), and the empirical density, \( \widehat{f_w\left({q}_{\theta}\right)} \), the latter estimated using kernel methods, in Eq. (2).

2.2 RIF decomposition

The RIF decomposition divides the overall change in any distributional statistic into a structure effect (due to the changes in returns to characteristics/covariates) and a composition effect (due to the changes in the distribution of covariates). Compared to other decomposition methods such as the Machado-Mata (Machado and Mata 2005), the RIF decomposition has the added advantage of further dividing the structure and composition effects into the contribution of each covariate. In this way, it is closest in spirit to the decomposition method proposed by Blinder (1973) and Oaxaca (1973).

In the case of quantiles, the RIF decomposition is carried out using the estimated UQR/RIF regression coefficients explained in Section 2.1. The RIF regression coefficients for each year (T) are given by:

$$ {\widehat{\boldsymbol{\beta}}}_{T,\theta }={\left({\displaystyle {\sum}_{i\in T}{\boldsymbol{X}}_{Ti}\cdot {\boldsymbol{X}}_{Ti}^{\prime }}\right)}^{-1}{\displaystyle {\sum}_{i\in T}\widehat{\mathrm{RIF}}\left({w}_{Ti},{q}_{T\theta}\right)}\cdot {\boldsymbol{X}}_i,\kern1em T=1,2 $$

The aggregate decomposition for any unconditional quantile θ is given by:

$$ {\widehat{\varDelta}}_{\mathrm{Total}}^{\theta }=\underset{{\widehat{\varDelta}}_{\mathrm{Structure}}^{\theta }}{\underbrace{{\overline{\boldsymbol{X}}}_2\left({\widehat{\boldsymbol{\beta}}}_{2,\theta }-{\widehat{\boldsymbol{\beta}}}_{1,\theta}\right)}}+\underset{{\widehat{\varDelta}}_{\mathrm{Composition}}^{\theta }}{\underbrace{\left({\overline{\boldsymbol{X}}}_2-{\overline{\boldsymbol{X}}}_1\right){\widehat{\boldsymbol{\beta}}}_{1,\theta }}} $$

To examine the contribution of each covariate, the two terms in (6) can be further written as:

$$ {\widehat{\varDelta}}_{\mathrm{Composition}}^{\theta }={\displaystyle {\sum}_{k=1}^K\left({\overline{X}}_{2k}-{\overline{X}}_{1k}\right){\widehat{\beta}}_{1k,\theta }} $$
$$ {\widehat{\varDelta}}_{\mathrm{Structure}}^{\theta }={\displaystyle {\sum}_{k=0}^K{\overline{X}}_{2k}\left({\widehat{\beta}}_{2k,\theta }-{\widehat{\beta}}_{1k,\theta}\right)} $$

Equations (7) and (8) represent the detailed decompositions of the composition and structure effects, respectively.

The detailed decomposition of the structure effect has a limitation when categorical variables are included as covariates. The choice of the omitted or reference group (for caste, education, industry, occupation, or state of residence in our analysis) can influence the contribution of each covariate to the structure effect. Since the choice of the reference categories is arbitrary, results of the detailed decomposition can vary. Existing solutions to the omitted category problem come at the cost of interpretability (see Fortin et al. 2011). To ensure the robustness of our results regarding the contribution of factor-specific structure effects, we use several specifications, each of which uses a different set of omitted categories for the categorical variables.

Though the above discussion on RIF decomposition focused on quantiles, it is also applicable to any other distributional statistic. We present the RIF decomposition for quantiles as well as selected inequality measures including the Gini.

3 Data

We use two rounds of the nationally representative Employment Unemployment Survey (EUS) conducted by the National Sample Survey Organization (NSSO) for the years 2004/05 and 2011/12. Our target population is wage earners between the ages of 15 and 64 (working age), living in rural areasFootnote 4 of 23 major states of India.Footnote 5

In both years, wage earners constituted around 25 % of the rural working age population.Footnote 6 Nominal earnings are converted into real terms (2004/05 prices) using consumer price indices provided by the Labour Bureau, Government of India.Footnote 7 We also trim the real earnings distribution of each year by dropping 0.1 % of observations from the top and the bottom.Footnote 8 Ultimately, our analysis sample consists of, 44,634 workers in 2004/05 and 36,050 in 2011/12. This corresponds to about 104 million paid workers in 2004/05 and about 118 million in 2011/12.

4 Results

In this section we present our findings related to the evolution of the earnings distribution in rural India between 2004/05 and 2011/12.

4.1 Changes in the distribution of earnings from paid work

Figure 1 presents the kernel density estimates of the log of real weekly earnings for 2004/05 and 2011/12. The earnings density for each year is skewed to the right implying that the median earning was less than the mean. Over the 7-year period, the earnings density shifted to the right and became more peaked (less dispersed). The mean real weekly earnings increased from 391 to about 604 rupees, while median increased from 263 to 457 rupees. For 2004/05, the all-India rural poverty line (defined in terms of minimum consumption expenditure needed to meet a specified nutritional and living standard) was 447 rupees per capita per month (Planning Commission 2014).Footnote 9 Thus, the mean (median) real monthly earnings was 3.5 (2.4) times the poverty line, and in 2011/12 it was 5.4 (4.1) times this value.

Fig. 1
figure 1

Earnings densities, 2004/05 and 2011/12

4.1.1 Changes in earnings inequality

Figure 2 plots the real weekly earnings (in rupees) at each percentile for 2004/05 and 2011/12. At each percentile, earnings were higher in 2011/12 than in 2004/05. The gap between the two curves reveals that the increase in earnings was, in absolute terms (i.e., measured in rupees), greater for higher percentiles. For instance, real weekly earnings increased by 99 rupees at the first decile, 194 rupees at the median, and 307 rupees at the ninth decile. However, as seen in Fig. 3, the percentage increase in earnings was greater at the lower end of the distribution.Footnote 10 For instance, earnings increased by 91 % at the first decile, 74 % at the median, and 44 % at the ninth decile. Thus, earnings inequality―defined in relative rather than absolute terms―declined over the 7-year period.

Fig. 2
figure 2

Real weekly earnings, by percentile, 2004/05 and 2011/12

Fig. 3
figure 3

Change in log real weekly earnings, by percentile, 2004/05 to 2011/2012

Figure 4 confirms the decline in earnings inequality: It shows that the Lorenz curve of weekly earnings for 2011/12 lies above the one for 2004/05, unambiguously indicating that inequality declined.

Fig. 4
figure 4

Lorenz curves of real weekly earnings, 2004/05 and 2011/12

Table 1 supplements Figs. 2, 3, and 4 and shows how various summary measures of inequality changed over time. The ratio of the (raw) earnings at the 25th to the 10th percentile was steady at about 1.52. At the middle of the distribution, there was some decrease in inequality as measured by the 60th to the 40th percentile. In contrast, the ratio at the 90th to the 75th percentile fell very sharply from 1.72 to 1.53. Thus, it is clear that the decrease in inequality mainly came from changes at the top and middle of the distribution than from the bottom.

Table 1 Inequality measures for real weekly earnings from paid work

The decrease in inequality is also reflected in the variance of log earnings and in the Gini coefficients. The Gini of real weekly earnings fell from 0.462 to 0.396.Footnote 11 This is in sharp contrast to the picture in urban India where earnings inequality remained virtually unchanged over the period: The Gini of real weekly earnings in urban India was 0.506 in 2004/5 and 0.499 in 2011/12. Jayaraj and Subramanian (2015) use consumption expenditure data (also from the NSSO) and find that between 2004/05 and 2009/10, the Gini declined from 0.305 to 0.299 in rural India. For urban India, it increased from 0.376 to 0.393. It is noteworthy that while the direction of change in rural inequality that they find using consumption expenditure is the same as what we find using earnings, this is not the case for urban inequality. This makes a strong case for studying both consumption and earnings inequality.

4.1.2 Wage rates or days worked: decomposition of the variance in log earnings

So far our analysis has been about weekly earnings. The EUS also collects data on the number of half-days worked during the week. The following equations illustrate the decomposition of earnings inequality as measured by the variance in log earnings:

$$ \begin{array}{l}\mathrm{Weekly}\ \mathrm{earnings}\ (E)=\mathrm{Average}\ \mathrm{daily}\ \mathrm{wage}\ \mathrm{rate}(W)*\mathrm{Number}\ \mathrm{of}\ \mathrm{days}\ \mathrm{worked}\ (D)\\ {}\Rightarrow \ln (E)= \ln (W)+ \ln (D)\\ {}\Rightarrow \underset{1}{\underbrace{\mathrm{Var}\left[ \ln (E)\right]}}=\underset{2}{\underbrace{\mathrm{Var}\left[ \ln (W)\right]}}+\underset{3}{\underbrace{\mathrm{Var}\left[ \ln (D)\right]}}+\underset{4}{\underbrace{2\ast \mathrm{Covariance}\left[ \ln (W), \ln (D)\right]}}\end{array} $$

The decomposition tells us how much of the earnings inequality (1) is accounted by inequality of wage rates (2), inequality of workdays (3), and the co-movement of wage rates and workdays (4). We implement this decomposition for both years and then calculate the difference between corresponding terms.Footnote 12 The results are shown in Table 2.

Table 2 Decomposition of earnings inequality

In both years, the covariance between wage rates and days worked was positive implying that highly paid workers worked more number of days. Also, earnings inequality was largely on account of inequality of wages rates rather than inequality of days worked or because highly paid workers also worked for a longer time: Over 70 % of the earnings inequality was due to inequality of wage rates.Footnote 13

The last row of Table 2 presents the decomposition of decline in earnings inequality as seen in the decrease in the variance of log earnings. About 50 % of this decline was due to a decline in inequality of wage rates. The rest was due to a decrease in inequality of days worked (about 30 %) and a weaker relationship between highly paid workers working more number of days (about 20 %).

4.2 Unconditional quantile regression results

Before moving to the regression results, we present some descriptive statistics in Table 3 for paid workers in rural India. Mean (log) weekly earnings increased over the period. The average age also increased by about 1.7 years, perhaps an indication of later entry into the labor market as more people acquire higher education. There was also an increase in the share of males, married workers, and Muslims. The proportion of those belonging to ST (Scheduled Tribes) and SC (Scheduled Castes) declined.Footnote 14 Education levels rose significantly: The share of illiterates decreased by around 11 percentage points, while the share of each schooling level, including college education, increased.

Table 3 Descriptive statistics, wage earners in rural India

We classify industries into seven categories: agriculture, manufacturing (including mining), construction, utilities, wholesale and retail trade, public administration (including defense), and other services (including education, health, real estate, and finance). Over the period, the major change in the industrial distribution came primarily from agriculture, which saw a 12 percentage point decrease, and construction, which saw a roughly equivalent increase.Footnote 15

Next, we estimate earnings regressions (both OLS and UQR) separately for the years 2004/05 and 2011/12 with the log of real weekly earnings as the dependent variable. The covariates include all characteristics shown in Table 3 and the state of residence.Footnote 16 Age enters the regressions in a quadratic form as a proxy for work experience. “Others”, and illiterates, are the omitted categories for caste and education, respectively. Agriculture, and laborers and unskilled workers, are the omitted categories for industry and occupation, respectively. Figures 5 and 6 plot regression coefficients for select covariates. The left column of plots is for 2004/05 and the right for 2011/12. For each selected covariate, UQR regression coefficients are plotted against the corresponding nine deciles. The dashed lines represent the 95 % confidence interval of the coefficients. The solid horizontal line is the OLS coefficient. As we move across deciles, whether coefficients for a particular characteristic are increasing or decreasing reveals the effect of changing the characteristic on wage inequality. An upward slope suggests that increasing the share of workers with that characteristic would increase inequality, while a downward slope would decrease it. It is important to note that these predictions are based on the assumption that the wage structure, i.e., the returns to observed worker characteristics, remains intact as the distribution of characteristics changes. In effect, this amounts to assuming away the presence of general equilibrium effects, a standard assumption made in this literature.

Fig. 5
figure 5

UQR coefficients for select covariates, 2004/05 and 2011/12

Fig. 6
figure 6

UQR coefficients for education categories, 2004/05 and 2011/12

The first row of plots in Fig. 5 shows that the coefficients for being male were positive and significant, implying the presence of a gender earnings gap. The UQR male coefficients were decreasing across deciles: In 2011/12, the male coefficient value was 0.69 at the first decile, 0.44 at the median, and 0.40 at the ninth decile. This is termed as the “sticky floor” effect and shows that while men earned more than women throughout the distribution, the penalty for being female was more pronounced at the bottom of the distribution.Footnote 17 The decreasing UQR coefficients also mean that having a greater proportion of men would reduce earnings inequality among wage earners. This was unambiguously true for 2004/05 as the coefficients decline monotonically across deciles, and it was true for the lower part of the 2011/12 distribution.

The second through fourth rows of plots in Fig. 5 show the presence of caste earnings gaps, though we do not see such gaps in all parts of the distribution. In 2004/05, the UQR coefficients for ST, SC, and Other Backward Classes (OBC) vis-à-vis “Others” show that there was an earnings penalty for all three groups at the upper deciles but not at the lower ones.Footnote 18 In 2011/12, the caste penalty for ST persisted, although, unlike 2004/05, it was experienced at the lower deciles. Surprisingly, the caste penalty for SC and OBC disappeared in 2011/12. Interestingly, in the regressions without industry and occupation controls, the caste earnings gap for SC and OBC persisted even for 2011/12. This suggests that in 2011/12, the caste earnings gaps were overwhelmingly because of occupation and industrial segregation by caste.

The fifth row of Fig. 5 indicates that returns to being married moved from being insignificant at lower deciles to being positive at upper ones. Thus, if the proportion of married individuals were to increase, earnings inequality among wage earners would increase. Except at the ninth decile in 2004/05, there was no penalty for being Muslim in both years.

Figure 6 examines coefficients for various education categories vis-à-vis the illiterates. First, there is clear evidence of positive returns to education. Additionally, in 2004/05, for each education category, there was a monotonic increase in returns as we moved up the earnings distribution, with an especially sharp increase at the ninth decile. This pattern persisted in 2011/12 for all categories except primary and middle: For instance, the coefficient of “college and beyond” was 0.22 at the first decile, 0.28 at the median, and 1.7 at the ninth decile. Thus, educating the illiterate population would increase earnings dispersion.Footnote 19 Figure 6 also reveals how the impact of education on earnings dispersion changed over time. The profile of UQR coefficients across deciles was flatter in 2011/12 than what it was in 2004/05 revealing that the inequality enhancing effect of education weakened over the period. The detailed decomposition of the structure effect in Section 4.3.3 shows this more formally.

4.3 RIF decomposition results

Next we turn to RIF decompositions to understand the factors behind the changes in the real earnings distribution. We first present the aggregate decomposition followed by the detailed decompositions of the composition and structure effects.

4.3.1 Aggregate decomposition of change in earnings

Figure 7 shows the results of the aggregate decomposition of the change in the (log) real earnings distribution at different vigintiles. We present the decomposition based on the counterfactual that relies on the characteristics of 2004/05 and returns of 2011/12.Footnote 20 For each vigintile, the total difference in log real earnings over the period is plotted (solid line). The downward slope of the total difference graph once again shows that the lower quantiles experienced a larger percentage increase in earnings than the higher quantiles.

Fig. 7
figure 7

The RIF aggregate decomposition

The total difference is decomposed into the structure (dashed) and the composition effects (dotted). Both components made significant contributions to the overall increase in earnings over the 7-year period. The only exception to this is at the 19th vigintile (95th percentile), where the structure effect is not significant. Thus, the contribution of the structure effect to the overall increase in earnings was positive and much larger than the composition effect at all but the top vigintile.Footnote 21

An important conclusion from the decomposition is that most of the decline in inequality occurred because the returns to characteristics improved a lot more at lower percentiles. In fact, it is clear that while changing characteristics did lead to an improvement in real earnings throughout the distribution, it had an inequality-increasing effect: The composition effect increased sharply after the eighth decile, implying that had “returns to characteristics” been held constant over the period, earnings inequality would have risen.

Table 4 confirms this by decomposing several measures of inequality.Footnote 22 The first column shows the difference between the log of real weekly earnings at the 90th and the 10th percentiles, while the second and the third columns present the 50-10 and 90-50 differences. The final column gives the Gini values for real weekly earnings. The third row presents the difference between the years that is to be decomposed. Aggregate decompositions of all four inequality measures confirm that the structure effect had an inequality decreasing effect, while the composition effect (with the exception of the 50-10 measure which was statistically insignificant) had an inequality-increasing effect. In other words, had labor market characteristics remained the same in 2011/12 as they were in 2004/05, earnings inequality would have dropped: e.g., the Gini coefficient would have dropped from 0.461 to 0.389 instead of the observed Gini of 0.396 in 2011/12. Decompositions of the 90-50 and 50-10 measures reveal that the inequality-increasing effect of the composition effect was mainly coming from changes at the top end of the wage distribution. This is reflected by the larger contribution of the composition effect on the 90-50 measure compared to the 50-10 measure and the fact that the latter is not statistically significant.

Table 4 Decomposition of changes in inequality measures from 2004/05 to 2011/12

In summary, the aggregate decomposition of all inequality measures reveals that the decline in inequality came exclusively from the structure effect, but the detailed decomposition that follows presents a more nuanced picture.

4.3.2 Detailed decomposition of the composition effect

The second panel of Table 4 and Fig. 8 present the detailed decomposition of the composition effect to ascertain which set of covariates were important in driving the total composition effect. Looking at the 90-10 and the Gini, we find that the inequality-increasing effect was mainly driven by changes in the distribution of education, and to a lesser extent of experience and occupation. The same pattern is observed when we focus at the top of the distribution (90-50 measure). However, education and occupation did not play a significant role at the bottom (50-10 measure). On the other hand, the change in the industrial distribution had a significant inequality decreasing effect, confined to the top of the distribution (the change was significant for the 90-50 measure but not for the 50-10). Further decomposing the industry category into its constituents points to a large contribution from the shift into construction. The large shift from agriculture to construction noted earlier decreased earnings inequality. The greater proportion of male workers also contributed to the decline in inequality, mainly driven by changes at the bottom of the distribution (the change was significant for the 50-10 measure but not for 90-50). Changes in the distribution of state of residence, marital status, caste, and religion did not have a major effect on change in inequality.

Fig. 8
figure 8

Detailed decomposition of the composition effect for select covariates

Before we move to the detailed decomposition of the structure effect, we would like to remark on the inclusion of industry and occupation as separate factors in the decomposition. Changes in the composition of and returns to industry and occupation may be partly driven by changes in education. To that extent, we should not be including them as controls if we are interested in studying the overall contribution of education. Following the decomposition literature, we also estimate Table 4 without industry and occupation controls. The results are in Appendix 1.Footnote 23 Comparing with Table 4, one major difference with regard to the composition effect is that without industry and occupation controls, the change in distribution of education plays a significant role even in the bottom of the distribution (as seen by the 50-10 measure). Otherwise, the conclusions are qualitatively the same.

4.3.3 Detailed decomposition of the structure effect

The bottom panel of Table 4 presents the decomposition of the structure effect. Both the 90-10 and the Gini decompositions reveal that education, occupation, and being married were largely responsible for the negative structure effect. Further, comparing the 50-10 and 90-50 measures shows that for all three characteristics, it was changes in returns at the top end of the distribution that mainly contributed to the overall negative structure effect.

This was also noted in Fig. 6 where the returns to education (with illiterates as the base category) actually declined at the higher end of the wage distribution, whereas returns did not change significantly in the middle. The same is true for the return to higher occupations (with laborers and unskilled workers as the base category). Comparing with Appendix 1 (without industry and occupation controls), the conclusions broadly remain the same.

The contribution of returns to industry in Table 4 is interesting: it changed in such a manner that it had an inequality decreasing effect at the bottom and an inequality-increasing effect at the top as seen by the negative and positive effects for the 50-10 and 90-50 measures, respectively. It is therefore not surprising that it has an insignificant contribution toward the 90-10 measure.

In Table 4, the contribution of the “constant” term to the overall structure effect is large and statistically significant. It is hard to give a meaningful interpretation to it as it depends on the choice of omitted categories for categorical variables. As described in Section 2.2, the choice of omitted category affects the decomposition of the structure effect. We test for the sensitivity of our results vis-à-vis choice of omitted categories by re-estimating Table 4 using two additional specifications presented in Appendix 2. Given that returns to education were largely driving the structure effect, in the first specification we change the omitted category for education from illiterates to the highest educational category, namely, “college and beyond”. As seen in Appendix 2, the returns to education are now positive (vis-à-vis college and beyond) and the constant term is now negative. The broad conclusions are therefore the same. In the second specification, we convert all categorical variables into dummy variables by defining the variable to be “0” for the omitted category and defining it to be “1” for the remaining categories.Footnote 24 Education continues to explain a large part of the composition and structure effects.

4.4 Robustness check using state poverty lines

Recall that we used the Consumer Price Index – Rural Labourers (CPI-RL) to deflate nominal earnings to 2004/05 prices. These price indices do not account for spatial price adjustment across states. As a robustness check, we use state-level poverty lines computed using the Tendulkar methodology (Planning Commission 2014) which account for spatial variation across states. We replicate Tables 1 and 4 using state-level poverty lines and present them in Appendix 3. Our results are robust to the choice of deflators.

5 Conclusions

Using nationally representative data from the Employment Unemployment Survey, we examine the changes in real weekly earnings from paid work for rural India from 2004/05 to 2011/12.

For wage earners who constituted about a quarter of the rural working age population, we find that their real earnings increased at all percentiles. Using consumption expenditure data that span the entire population, other studiesFootnote 25 have also documented an improvement in all parts of the distribution. Taken together, there is clear evidence that economic growth in the post-reform period (after the early 1990s) has been accompanied by a reduction in poverty.Footnote 26 At the same time, according to official estimates, in 2011/12, 25.7 % of the rural population was below the poverty line. This figure represents about 216.7 million poor persons, a large number of people living below a minimum acceptable standard.Footnote 27

Our analysis also reveals that earnings inequality in rural India decreased over the 7-year period, and about half of the decline can be accounted for by the decline in daily wage inequality. However, while the rural Gini fell over this period, it remained virtually unchanged in urban India. This suggests that the dynamics of earnings is different for the two sectors. This could be because the underlying structural characteristics are different across the two sectors. For example, while agriculture is the largest employer in rural India, for urban India it is services. It could also be the result of different redistributive policies followed in the two sectors. These aspects need to be recognized when designing future policies to tackle inequality in the two regions.

Aggregate decompositions of the change in inequality measures reveal that the change in returns to worker characteristics was mainly responsible for the decrease in earnings inequality. Further detailed decompositions reveal that higher levels of education in the population contributed to an increase in earnings inequality, while lower returns to higher education contributed to a decrease. Rural India experienced a construction boom during this period that also contributed to the decrease in earnings inequality.

Some studies (Datt et al. 2016; Thomas 2015) have attributed the tightening of the rural casual labor market between 2000 and 2012 to the expansion of schooling and to the construction boom. Others (Azam 2012b; Berg et al. 2015; Imbert and Papp 2015) have found that the MGNREGS (Mahatma Gandhi National Rural Employment Guarantee Scheme), a large-scale employment guarantee scheme initiated in rural India in 2005, led to an increase in casual wages.

One cannot be certain that this trend of rising casual wages and declining earnings inequality will continue into the future. Regardless of the underlying causes of the recent decline in earnings inequality in rural India, volatility in global crop prices and the drought conditions currently experienced by large parts of the country because of two consecutive weak monsoons are important reminders that policies designed to foster employment opportunities and wage growth of unskilled workers outside of agriculture are crucial for improving the economic wellbeing of the second part of India.

Finally, we end with the caveat that although India has the lowest Gini value among the BRICS countries,Footnote 28 and we find that earnings inequality declined in rural India between 2004/05 and 2011/12, these facts mask extreme deprivations and inequities in access to health care, education, and physical infrastructure such as safe water and sanitation (Drèze and Sen 2013). One needs to be cognizant that extreme inequalities prevail in many other dimensions beyond earnings and consumption expenditure.


  1. A notable exception is Dutta (2005). For the period, 1983–1999, at the all-India level, she finds an increase in wage rate inequality among regular salaried workers, but a decrease among casual labor.

  2. There are some advantages in looking at consumption expenditure instead of earnings (Goldberg and Pavcnik 2007). The former are a better measure of lifetime wellbeing and suffer from fewer reporting errors. In spite of this, we feel that it is important to juxtapose the two to get a complete picture. This is especially important as the two measures may exhibit different trends. Krueger and Perri (2006) document this for the USA and then develop a model to show how income inequality can affect consumption inequality.

  3. It is hard to establish the superiority of one approach over the other. In the SDI framework, changes in supply (changes in employment shares of demographics groups) and demand (changes in industrial composition) are assumed exogenous and therefore unaffected by changes in the relative wage structure. In the RIF decomposition, the feedback between changing characteristics and changing returns is ignored. Both these assumptions ignore general equilibrium effects.

  4. In 2004/05, 75.3 % of India’s working age population lived in rural areas, while in 2011/12 this figure was 71.1 %.

  5. In 2004/05 India had 28 states and 7 union territories. We excluded the states and union territories for which there were no price deflators. The 23 included states are Andhra Pradesh, Assam, Bihar, Chhattisgarh, Gujarat, Haryana, Himachal Pradesh, Jammu and Kashmir, Jharkhand, Karnataka, Kerala, Madhya Pradesh, Maharashtra, Manipur, Meghalaya, Orissa, Punjab, Rajasthan, Tamil Nadu, Tripura, Uttar Pradesh, Uttaranchal, and West Bengal. In both years, they constituted 99.3 % of India’s rural working age population.

  6. In 2011/12, of the remaining rural working age population, 30 % were self-employed, 2 % were unemployed, and 43 % were not in the labor force. The main reason for restricting our analysis to wage earners is that the EUS does not collect earnings data for self-employed individuals. Kijima (2006) imputes the earnings of the self-employed using Mincerian equations estimated on the sample of regular wage/salaried workers. We refrain from this imputation as it imposes identical returns to covariates for both sets of workers, an assumption that may not be true.

  7. We use the Consumer Price Index – Rural Labourers (CPI-RL), the relevant price index for rural areas.

  8. While we are aware that this may underestimate our inequality measures, we do this in order to remove potential data entry errors.

  9. The poverty line is based on the methodology proposed by the Tendulkar Committee in 2009. The committee was appointed by the Planning Commission, Government of India.

  10. Using consumption expenditure data (also collected by the NSSO), for the period between 2004/05 and 2009/10, Jayaraj and Subramanian (2015) find a similar pattern of an increase in real consumption expenditures at all deciles for rural India, with the highest growth occurring at the third and fourth deciles.

  11. If we consider daily wage rates instead of real weekly earnings, the Gini fell from 0.398 to 0.358. This indicates that it is wage rates, and not so much the time spent working, that is driving the decrease in earnings inequality. We study this in detail in the next sub-section where we show the same result by decomposing the variance in log earnings.

  12. Although the variance of log weekly earnings allows us to quantify a “wage rate effect”, a “workday effect”, and a “covariance effect”, it does not necessarily fall when one rupee is transferred from a rich worker to a poor one. However, this limitation is inconsequential since we have shown (using the Lorenz curves) that inequality has unambiguously fallen over time.

  13. Admittedly, as there are bounds to the number of days worked, ranging from half a day to 7 days, this may have partly contributed to the lower inequality of days worked.

  14. Scheduled Castes and Tribes (SC and ST, respectively) are administrative categories and represent groups of castes and tribes that are entitled to benefits from affirmative action policies such as reservations in educational institutions and government jobs to overcome historical social and economic discrimination against them. OBC stands for Other Backward Classes and is a collective term used by the Government of India to classify other castes that are socially and educationally backward (for details on the caste system, see Deshpande 2011).

  15. This shift in industrial distribution in rural India has been documented in several other studies including Thomas 2015 and Jacoby and Dasgupta 2015.

  16. Following the literature on earnings regressions, we also estimated the regressions and decompositions without the industry and occupation controls. The results are qualitatively the same and are available from the authors on request.

  17. Deshpande et al. (2015) also find a sticky floor for 1999/2000 and 2009/10 among regular salaried workers in India.

  18. The “Others” group includes, but is not confined to, the Hindu upper castes as the EUS data do not allow us to isolate the Hindu upper castes. Consequently, this four-way division understates the gaps between the Hindu upper castes and the most marginalized ST and SC groups (Deshpande 2011).

  19. This finding for rural India is similar to the evidence presented in Azam 2012a for regular salaried workers in urban India. Using conditional quantile regressions on EUS data for 1983, 1993/94, and 2004/05, he finds that returns to secondary and tertiary education have increased over time and are larger at higher quantiles.

  20. The results based on the other counterfactual that relies on the characteristics of 2011/12 and returns of 2004/05 are very similar and are available on request.

  21. We also implemented the aggregate decomposition using Melly’s refinement (Melly 2006) of the Machado-Mata Decomposition (Machado and Mata 2005) and found similar results.

  22. Standard errors for Table 4 (and for all its variants in various appendices) were calculated using 1000 replications of the bootstrap procedure followed by Fortin et al (2011). The basic codes for this are available from Fortin’s website and were suitably modified for this paper.

  23. We decided to present the decomposition with industry and occupation controls in the main text because as noted earlier there was a massive shift from agriculture to industry which we believe was largely exogenous to education. Because this change has been widely discussed in related literature on the Indian economy, we feel that readers may be more interested in the specification that includes industry and occupation controls, despite the endogeneity issue that it suffers from.

  24. We had to exclude controls for state of residence, as there is no natural criteria of classifying the states as high or low.

  25. Kotwal et al. 2011, for all-India, 1983–2004/05; Jayaraj and Subramanian 2015, for rural and urban separately, 2004/05–2009/10.

  26. Using NSS data on consumption expenditure from 1957 to 2012, Datt et al. (2016) provide direct evidence that growth in India has been accompanied with a decline in poverty, especially after economic reforms were initiated in the early 1990s.

  27. The corresponding figures for below poverty line population in urban India are 13.7 % (53.1 million).

  28. According to estimates from the World Bank, the Gini values for BRICS countries are as follows: Brazil-0.539 (2009); Russia-0.397 (2009); India-0.339 (2009); China-0.421 (2010), and South Africa-0.630 (2008). These are available at Gini Index (World Bank Estimate) Accessed on June 1, 2016.



Brazil, Russia, India, China, South Africa


Consumer Price Index – Rural Labourers


Employment Unemployment Survey


Gross domestic product


Gross value added


Influence functions


Mahatma Gandhi National Rural Employment Guarantee Scheme


National Sample Survey Organization


Other Backward Classes


Ordinary least squares


Recentered influence functions


Scheduled Castes


Supply, demand, and institutions


Scheduled Tribes


Unconditional quantile regressions


  • Azam M (2012a) Changes in Wage Structure in Urban India, 1983–2004: A Quantile Regression Decomposition. World Dev 40(6):1135-1150

  • Azam M (2012b) The Impact of Indian Job Guarantee Scheme on Labor Market Outcomes: Evidence from a Natural Experiment. IZA Discussion Papers; IZA DP No. 6548

  • Banerjee A, Piketty T (2005) Top Indian incomes, 1922-2000. World Bank Econ Rev 19(1):1–20

    Article  Google Scholar 

  • Berg E, Bhattacharyya S, Rajasekhar D, Manjula R (2015) Can Public Works Increase Equilibrium Wages? Evidence from India’s National Rural Employment Guarantee. Available at Accessed on 1 June 2016

  • Blinder A (1973) Wage discrimination: reduced form and structural estimates. J Hum Resour 8:436–455

    Article  Google Scholar 

  • Bound J, Johnson G (1992) Changes in the structure of wages in the 1980’s: an evaluation of alternative explanations. Am Econ Rev 82(3):371–392

    Google Scholar 

  • Cain JS, Hasan R, Magsombol R, Tandon A (2010) Accounting for inequality in India: evidence from household expenditures. World Dev 38(3):282–297

    Article  Google Scholar 

  • Datt G, Ravallion M, Murgai R (2016) Growth, Urbanization, and Poverty Reduction in India. World Bank Group, Policy Research Working Paper 7568

  • Deshpande A (2011) The grammar of caste: economic discrimination in contemporary India. Oxford University Press, New Delhi

    Book  Google Scholar 

  • Deshpande A, Goel D, Khanna S (2015) Bad Karma or Discrimination? Male-Female Wage Gaps among Salaried Workers in India. IZA Discussion Papers, IZA DP No. 9485

  • Drèze J, Sen A (2013) An Uncertain Glory: India and its Contradictions. Princeton University Press

  • Dutta P V (2005) Accounting for Wage Inequality in India. Poverty Research Unit at Sussex, PRUS Working Paper No. 29

  • Firpo S, Fortin NM, Lemieux T (2009) Unconditional quantile regressions. Econometrica 77:953–973. doi:10.3982/ECTA6822

    Article  Google Scholar 

  • Fortin NM, Lemieux T, Firpo S (2011) Decomposition methods in economics. In: Ashenfelter O, Card DE (eds) Handbook of labor economics, vol 4A, Chapter 1

    Google Scholar 

  • GOI (2015) Economic survey 2014-15. Government of India, Ministry of Finance

    Google Scholar 

  • Goldberg PK, Pavcnik N (2007) Distributional effects of globalization in developing countries. J Econ Lit XLV:39–82

    Article  Google Scholar 

  • Hnatkovska V, Lahiri A (2013) Structural Transformation and the Rural-Urban Divide. Working Paper, International Growth Center, London School of Economics

  • Imbert C, Papp J (2015) Labor market effects of social programs: evidence from India’s employment guarantee. Am Econ J Appl Econ 7(2):233–263

    Article  Google Scholar 

  • Jacoby H, Dasgupta B (2015) Changing Wage Structure in India in the Post-reform Era: 1993-2011. Policy Research Working Paper 7426, World Bank

  • Jayaraj D, Subramanian S (2015) Growth and Inequality in the Distribution of India’s Consumption Expenditure: 1983-2009-10. Econ Pol Wkly 50(32):39–47

  • Katz LF, Murphy KM (1992) Changes in relative wages, 1963‐1987: supply and demand factors. Q J Econ 107(1):35–78

    Article  Google Scholar 

  • Kijima Y (2006) Why did wage inequality increase? Evidence from urban India 1983–99. J Dev Econ 81:97–117

    Article  Google Scholar 

  • Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50

    Article  Google Scholar 

  • Kotwal A, Ramaswami B, Wadhwa W (2011) Economic liberalization and Indian economic growth: what’s the evidence? J Econ Lit 49(4):1152–1199

    Article  Google Scholar 

  • Krueger D, Perri F (2006) Does income inequality lead to consumption inequality? evidence and theory. Rev Econ Stud 73(1):163–93

    Article  Google Scholar 

  • Machado JF, Mata J (2005) Counterfactual decomposition of changes in wage distributions using quantile regression. J Appl Economet 20:445–465

    Article  Google Scholar 

  • Melly B (2006) Estimation of Counterfactual Distributions using Quantile Regression. University of St. Gallen, Discussion Paper

  • Motiram S, Vakulabharanam V (2012) Indian Inequality: Patterns and Changes, 1993 – 2010. India Development Report, vol 7. New Delhi: Oxford University Press, p 224–232

  • Oaxaca RL (1973) Male-female wage differentials in urban labor markets. Int Econ Rev 14:693–709

    Article  Google Scholar 

  • Planning Commission (2014) Report of the expert group to review the methodology for measurement of poverty. Planning Commission, Government of India

    Google Scholar 

  • RBI (2015) Handbook of statistics on the Indian economy 2014-15., Reserve Bank of India

    Google Scholar 

  • Sen A, Himanshu (2004) Poverty and inequality in India: II: widening disparities during the 1990s. Econ Pol Wkly 39(39):4361–4375

    Google Scholar 

  • Thomas JJ (2015) India’s labour market during the 2000s: an overview. In: Ramaswamy KV (ed) Labour, employment and economic growth in India. Cambridge University Press, New Delhi

    Google Scholar 

Download references


The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013) under grant agreement no. 290752. The views expressed in this paper are those of the authors and do not reflect the views of Statistics Canada or of other institutions that the authors are affiliated to. We are grateful to participants at the Nopoor India Policy Conference in Delhi, and to an anonymous referee and the editor for many insightful comments, which greatly improved the paper.

Responsible editor: David Lam


The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013) under grant agreement no. 290752. The funding agency had no role in collecting data, interpreting the results, or writing the manuscript.

Availability of data and materials

This paper uses the Employment Unemployment Survey data collected by the National Sample Survey Organization (NSSO), Government of India. This data is available for purchase from the NSSO.

Competing interests

The IZA Journal of Labor & Development is committed to the IZA Guiding Principles of Research Integrity. The authors declare that they have observed these principles.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Deepti Goel.


Appendix 1

1.1 Re-estimating Table 4 without controls for industry and occupation

Table 5 Decomposition of changes in inequality measures from 2004/05 to 2011/12

Appendix 2

2.1 Sensitivity checks to choice of omitted categories

Table 6 Decomposition of changes in inequality measures from 2004/05 to 2011/12
Table 7 Decomposition of changes in inequality measures from 2004/05 to 2011/12

Appendix 3

3.1 Robustness check using state-level poverty lines as deflators

Table 8 Inequality measures for real weekly earnings from paid work
Table 9 Decomposition of changes in inequality measures from 2004/05 to 2011/12

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khanna, S., Goel, D. & Morissette, R. Decomposition analysis of earnings inequality in rural India: 2004–2012. IZA J Labor Develop 5, 18 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: