The Wage Returns to On-the-Job Training: Evidence from Matched Employer-Employee Data

Skills shortages and skill mismatch are a pressing concern for policymakers in several developing countries, and in East Asia specifically. Providing on-the-job training can be an effective policy tool to shape the skills of the existent workforce to the specific needs of the firms. This paper explores a unique data set of matched employer-employee data for Malaysia and Thailand to estimate the wage return to on-the-job training in these two countries. Exploring propensity score matching estimates, we show that the average wage returns to on-the-job training are 7.7% for Malaysia and 4.5% for Thailand. Furthermore, we find evidence that the wage returns to on-the-job training are higher for males than for females in Malaysia and that, for both countries, returns are higher for workers with at least secondary education.


Introduction
Many economists have emphasized the importance of human capital accumulation for growth (e.g. Lucas, 1988;Romer, 1990;Aghion-Howitt, 1998). Human capital accumulation is done throughout life, but more than one half of this life time accumulation is done on-the-job after completing formal schooling (e.g., Heckman et al., 1998). In spite of the importance, much more is known about the investment in formal schooling, and specifically on the returns to formal schooling, than on the investment in on-the-job training and their returns. 2004, these economies were growing at similar rates (5.6% and 6.1%). In terms of population, Thailand more than doubles the size of Malaysia (65m vs. 25m). Both countries present a high literacy rate of around 93%. East Asian countries have been drawing a lot of attention to themselves. Impressive growth rates, competitive wages and high levels of education of the workforce are just some of the reasons why it has now become so interesting to study these countries. Having a detailed data-set from these two countries offers the possibility of studying the dynamics of these labor markets and their investment in job training.
In developing countries, governments are increasingly concerned with the rapidly changing demand for skills and the slow response of the general and vocational schooling tracks have had to adjust the provision of skills. As a consequence, many employers complain with the lack of skills and education of their workforce. Policymakers are thus 3 increasingly concerned that the supply of skills in the labor market does not keep pace with the demand. The investment by firms in on-the-job training is one important way to mitigate this skills' gap as it develops job relevant skills among the existing workforce.
The evidence on both the incidence and the economic returns to on the job training is generally scarce in developing countries. 4 And it is unclear how different the returns should be in developed and developing countries. On the one hand, the returns to the investment in job training (as well as in schooling) could be higher in developing than in developed countries simply because skilled labor is scarcer in developing countries (e.g. Psacharopoulos and Patrinos, 2004). On the other hand, if skilled labor and capital are complements, the returns to this investment could be smaller in developing countries, where capital is relatively scarce.
In theory, whether workers with and without on-the-job training receive, all else constant, significantly different wages will also relate to whether the training offered general or firm specific skills. It may also relate to whether there are differences in the competitiveness of the local labor markets. When the labor market is perfectly competitive and training is general, workers will support the cost of job training through lower wages during that same period. Once training is received, the worker will be paid the equivalent to his marginal productivity, which we now assume to be higher (e.g. Becker, 1964). But when training is firm specific the costs and benefits will likely be shared between the firm and the worker depending on the bargaining power of each one of them. In principle, the worker will receive a lower wage at the time of the training, to account for his share of the costs, and a higher wage after the training event, depending on the benefit he could extract from the firm (e.g. Leuven and Oosterbeek, 2001). If the labor market is not competitive and firms are able to pay a wage lower than the worker's marginal productivity, firms will only want to invest in training if the increase in productivity is higher than the effect in the growth rate of wages (e.g. Acemoglu and Pischke, 1999). Even in this scenario, there is no theoretical reason for the wages to decrease after the training program. They should increase or remain constant. In sum, no matter the assumptions we have, theory predicts that after participating in a training event the worker's wage should increase or stay 4 invariable. Finally, as training is a decision variable for the firm one expects wage returns to job training to be a lower bound estimate for the impact of training in firm productivity. 5 This paper estimates whether the firm's investment in job training translates into higher wages for the workers in Malaysia and in Thailand. Our findings show that the wage returns to the investment in job training decrease significantly as one controls for worker's and firm's characteristics. We find that on-the-job training is associated with increases in individual wages of 7.7% in Malaysia and 4.5% in Thailand. We also estimate that wage returns to on-the-job training tend to be quantitatively higher for men than for women although in Thailand they are not statistically significant for males. In Malaysia, the returns for male are 11% while for women they are not statistically different from zero. Workers that have at least completed secondary education also report higher returns to on-the-job training than other workers (returns are 9% and 10% for Malaysia and Thailand, respectively).
In the empirical work, we start from a simple worker level Mincer type equation relating hourly wages with several observable worker and firm characteristics, including differences in the incidence of on-the-job training. Our main coefficient of interest quantifies the average effect on wages of having received on-the-job training. However, the estimation of the effect of on-the-job training on wages poses a major challenge as training is likely to be an endogenous variable to wages. On-the-job training is a choice variable for both firms and workers and most likely is also correlated with worker and firm characteristics, which in turn are also correlated with labor productivity and wages. Failure to control in a flexible manner for these characteristics may create a bias in the estimates of the effect of training on wages, as workers selecting into training may have different characteristics. In our empirical approach, we hope to minimize this problem by exploring a rich data set with many worker and firm characteristics and the propensity score matching (PSM) method. When compared to ordinary least squares (OLS), the PSM estimates allow for a more flexible (non-linear) functional form relating observable worker characteristics and their wages.

5
The propensity to score matching method is developed in two steps. First, it estimates the probability of each worker to be selected into the training provided by the firm, given his or her observable characteristics. Based on this probability, it generates a "control group" of workers that did not participate in the training but whose probability of being selected into on-the-job training was very similar to the probability observed for the sample of individuals trained. These workers are very "similar" to those actually participating in training in all their observable characteristics (e.g., education, occupation in the labor market, years of experience). The only thing that distinguishes them from the trained workers is the sole fact of not having received on-the-job training. Hence, the wage difference between these two groups can be fully attributed to the wage impacts of on-thejob training.
Our empirical findings document two interesting patterns across the two countries.
First, the incidence of on-the-job training differs significantly by several worker and firm characteristics. In particular, we find that the more educated and more tenured workers are, in both countries, more likely to receive on-the-job training. We also show that larger, more innovative foreign firms are also more likely to invest in on-the-job training. Second, there is strong evidence that the workers' wages increase with the incidence of on-the-job training in both countries. In our preferred estimates exploring propensity score matching, the average wage returns to on-the-job training are 7.7% in Malaysia and 4.5% in Thailand.
Furthermore, the heterogeneity analysis shows that in Malaysia the wage returns to job training are larger for men (11%) than for women (for whom they are not statistically different from zero). We also find that in both countries there are higher wage returns to job training for workers with completed secondary education or more years of education when compared to those who have not completed secondary education. The returns to on-the-job training for workers with at least secondary education are 9% in Malaysia and 10% in Thailand, respectively. In contrast, for workers with lower levels of education, there is no evidence of positive wage impacts both for Malaysia and Thailand. These findings clearly reinforce the idea that the investment in job training is complementary to the initial level of education of workers.
Our paper relates closely to two empirical literatures. First, it relates to the work analyzing the firm's investment in on-the-job training in developing countries (e.g., Ariga 6 and Brunello, 2002, Almeida and Aterido, 2010, 2011, Almeida, 2010and Almeida and Cho, 2012. The main empirical patterns found in these papers for developing countries are close to the findings found for developed countries (e.g., Bassanini et al, 2002). Larger, more open and innovative firms, with a more skilled workforce and operating in more technological intensive sectors are more likely to train their employees. The major exception is Ariga and Brunello (2002). Exploring an employee survey for Thailand in 2001, they find a significant and negative relationship between years of formal educational and training.
Second, we relate to the empirical work quantifying the wage returns to on-the-job training exploring worker level data. 6 Table A1 in the appendix summarizes some of the main empirical studies quantifying the wage returns to on-the-job training, for developing and developed countries. Panels A and B report the estimates from papers using worker level data. Panel A refers to developed countries and panel B to developing countries. 7 A word of caution is needed when comparing cross countries estimates of the returns to onthe-job training. First, the variable capturing on-the-job training differs significantly across data sets yielding reduced comparability across studies. Second, there is little comparability in the reduced form equation used across most of the analysis. 8 The point estimates reported in Panel A for developed countries are very diverse.
Some studies report positive and significant wage returns to training. However, more recently, as longitudinal data becomes available and experimental methods are used, the wage returns to on-the-job training tend to be smaller than in the cross section studies.
Furthermore, in some cases, the returns are even zero (e.g., Oosterbeek, 2002, 2004). 9 The point estimates for most developing countries, reported in Panel B, are 6 Few empirical papers have looked at the extent to which the benefits of training (ultimately effects on higher firm productivity) are shared with workers. One exception is Dearden et al., (2005) for the UK. 7 Panels C and D summarize the works using firm and industry level data. 8 For example, some papers have defined training incidence with a dummy variable capturing whether training was offered over the previous year to the survey. Others, like Bassinini et al., (2005) use the accumulated stock of training hours over the sample period (6 years). Moreover, the reduced form estimated typically depends on the data available which in turn differs across data sets and countries. For similar point see Haelermans and Borghans (2012) 9 Oosterbeek (2002, 2004) use two different methods to estimate the returns to training in Holland. Leuven and Oosterbeek (2002) identify individuals planning to enroll in a training program but who did not do so due to a random event and find evidence of no returns to job training. Leuven and Oosterbeek (2004) explore a discontinuity which allowed firms to deduce their training expenses only for workers more than 40 years-old. Although their results are just valid locally, they also conclude that there were no returns generally in the order of 20%. The evidence in the panel is also quite diverse. Chung (2000) and Johanson and Wanga (2008) explore cross sectional data and find evidence of large returns (between 20% and 38%) for Malaysia and Tanzania, respectively. On the other hand, Frazer (2006) finds that in Ghana, during the 90s, the returns to apprenticeship training were not statistically different form zero. Monk et al. (2008) find in addition some heterogeneity within the country and across education levels. They show that the returns of apprenticeships are 50% for individuals with no education but decline as education raises.
They find evidence that the returns are zero for individuals with more than 6 years of formal education.
The methodology in our paper is closer to Rosholm et al. (2005). They estimate that the returns to training are on average 21% for Kenya and that in Zambia training is not associated with higher wages. Like us, they also explore a matched employer and employee data set (collected by the World Bank) and a propensity score matching methodology.
However, the larger number of observations in our sample and the more detailed information on worker and firm characteristics allowed us to conduct a deeper analysis.
First, we consider hourly wages as dependent variable while Rosholm et al. (2005) consider only monthly wages. Second, we are able to control both for detailed worker human capital characteristics and for several firm characteristics that they do not. At the worker level, we include variables such as having received training at a previous employer, owing a bank account and using the internet. These variables, especially past training, will prove to be important in explaining the selection into training. At the firm level, we are able to control for the average years of schooling of the workforce, for the degree of innovation or for the degree of exports.
The paper proceeds as follows. Section 2 describes the dataset used and the descriptive statistics. Particularly, in Section 2.1 we explain in detail our main dependent variable of interest: the logarithm of workers' hourly wages. In Section 3 we analyze which variables determine the selection into training. Section 4 presents the propensity score matching estimates for the wage returns to on-the-job training. In section 4.1, we explain the empirical model and in section 4.2 we report the main empirical findings for the wage from the investment in job training. Similarly, Sousounis (2009) explores longitudinal data and does not find evidence that training increases wages in the U. K. between 1998 and 2005. returns to on-the-job training. In Section 5, we report heterogeneity analysis by gender and level of education. Finally, Section 6 concludes.

Data and Descriptive Statistics
We explore a matched employer-employee data set collected by the World Bank, Enterprise Surveys, for Malaysia (2002) and Thailand (2004). 10 A total of 1,152 firms were surveyed in Malaysia and 1,385 in Thailand. For each firm, a random sample of 10 employees in each firm was interviewed yielding a total of 10,822 and 13,850 firm-worker observations in Malaysia and Thailand, respectively. However, in the analysis we have excluded observations with missing values for the main covariates of interest both at the firm and worker level. As a result, the number of observations used will be 6,679 for Malaysia and to 9,418 for Thailand, respectively.
This data set has several advantages to study this topic. First, the questionnaire is similar across the two countries, which ensures comparability of the results. Second, the survey collects simultaneously detailed information on worker and firm characteristics. In particular, at the firm level it collects information on the sector of activity, geographical location, 11 total number of employees, public and foreign ownership as well as information on the human capital of the manager, on the average years of formal education of the workforce, number of employees per occupation, and percentage of women in the firm. The survey also gathers information on technological variables or investments in new production technologies such as R&D expenses, introduction of new products and adoption of new technologies. At the worker level, it collects information on gender, age, marital status and nationality. Most importantly it also collects detailed human capital characteristics like years of formal education, tenure with the firm, years of experience in 10 The information collected in the Enterprise Surveys is based on one to two hour interview with the firm manager. This data set has been used for studying this and other topics (see e.g., Svensson, 2003, Almeida and Aterido, 2011, Almeida and Carneiro, 2008a, Almeida and Fernandes, 2008, Pierre and Scarpetta, 2004, and Aterido et al., 2007. Previous versions of this project within the World Bank include the Regional Program on Enterprise Development collecting firm and worker level data in Sub-Saharan Africa countries for a decade (e.g., Rosholm et al, 2007, Frazer, 2006, and the World Business Environment Survey. 11 In the Malaysian sample we have firms from Central Region: Selangor, KL, Melaka (4,641 observations); North Region: Penang, Kedah (1,899 observations); South Region: Johor (3,290 observations); East Coast: Terengganu (181 observations); Northeast (320 observations) and South (390 observations). In Thailand firms in the sample operate in the North (730 observations); Centre (3,260 observations); Bangkok and Vicinity (6,160 observations); East (1,920 observations); Northeast (320 observations) and South (390 observations). the labor market, and whether each worker enrolled on vocational training programs in the past. Finally, the survey collects information on whether the firm offered on-the-job training to their employees last year and whether the employees interviewed took any formal training since they joined that firm. In addition, monthly wages and hours of work per week are also reported.
In particular, the survey contains the following information about formal training programs at the firm and at the worker level. At the firm level, the survey asks: "Did your plant run formal in-house training programs for its employees in 2001?", " Did your plant send employees to formal training programs run by other organizations during the fiscal year of 2001"? At the worker level the survey asks: "Have you received formal training since you joined this firm?". Based on these two questions, we constructed two variables capturing the incidence of on-the-job training at the firm and at the worker level. First, we constructed a firm level dummy variable that equals one if the firm offered formal training to its workers in the year prior to the survey. Second, we constructed a worker level dummy variable that equals one if the worker has received formal training since he joined that firm.
In addition, for those workers whose the current position is not their first job, we have information on whether the worker received training at his previous job. 12 Table A2 in the appendix describes the main variables used. Tables A3 and A4 in the appendix report summary statistics for the main firm and worker characteristics used in the paper. In Malaysia, the final sample covers manufacturing (79%) and services (21%). In Thailand, the sample only covers manufacturing. In addition, the distribution of firms across the two countries is different. While in Malaysia, small firms are approximately half of the sample, in Thailand, medium, large and very large firms account for more than 70% of the sample. In the two countries, approximately 70% of the firms are domestic owned and a large share exports at least some of their sales (66% in Thailand and 62% in Malaysia). Rubber and Plastics (22%) and food processing (18%) are the two more represented industries in Malaysia. In the Thai sample, firms are more equally divided among the different sectors. Finally, firms in Malaysia have a higher share of skilled labor (49%) than in Thailand (24%) and the average of years of formal education is also slightly higher in Malaysia than in Thailand.
10 Finally, table also shows that the training incidence at the firm level is 51% in Malaysia and 76% in Thailand. 13 The incidence of training is smaller in Malaysia in part due to the low training incidence of job training among the firms operating in Rubber and Plastics (36% of the firms train) and in Food Processing (55% of the firms train). Also interestingly, in Malaysia most of the firms that offer training explore both facilities in house and externally. Most of the training costs are supported directly by the firms (at least formally, as firms can transfer the cost of training to employees through lower wages).
Only 6% of the firms for Malaysia and 3% for Thailand report to have shared the costs of training with their employees. 14 Table A4 in the appendix computes summary statistics for the sample of workers in both countries. In both samples women represent approximately half of the sample. The average age, tenure and years of experience is also quite similar across the two countries.
Again, the human capital of the workforce is higher in Malaysia than in Thailand. In Malaysia only 15% of the workers have up to primary education compared with 30% of the workers in Thailand. In Malaysia there are also more workers with polytechnic or vocational education than in Thailand (15% vs. 6%). This higher human capital translates also into more skilled occupations in Malaysia than in Thailand in our sample. While skill production workers is the most represented group in the Malaysian sample (36%), in Thailand, the most represented occupation group is unskilled production workers (37%).
Also interestingly, Malaysian workers have been more exposed to foreign languages and cultures. In particular, 7% of the Malaysian workers but less than 1% of Thai workers studied in a foreign country. Table A4 in the appendix also shows that the incidence of on-the-job training is higher in Thailand than in Malaysia also at the worker level. In Malaysia, 33% of the workers report having received some training since they joined the firm. In Thailand, this number compares with is 52% of the workforce. The percentage of workers that received training at the previous employer was 17% in Malaysia and 24% in Thailand.

11
The main dependent variable of interest is the natural logarithm of the hourly wage (in USD). 15 In Table 1 we present the average log hourly wage as well as the raw difference in the average hourly wage for both trained and not trained workers.
The results in the table show that workers who report having received some formal on-the-job training since joining the firm report higher earnings than non-trainees in both countries. However, this difference in the average wages is likely capturing the effect of worker and firm characteristics that drive the selection of workers into training and that, simultaneously, also influence their hourly wages. 15 We report log hourly wages in USD in 2002 prices for Malaysia and 2004 prices for Thailand.

The selection of firms and workers into training
In order to understand which variables influence the selection into on-the-job training we run a series of regressions at the firm and worker level. We assume that firms decide whether or not to train their workers if the profits from this investment are greater than the costs: where jfr Train is a dummy variable that equals one if firm j, operating in industry f and region r offered on-the-job training to its employees during the year prior to the survey and jfr *  are the net benefits of investing in training . Since jfr *  is unobservable we assume jfr *  is a function of several firm, industry and region characteristics. We also assume that is a vector of firm characteristics and f  are industry fixed effects, r  are region fixed effects, and jfr  captures unobserved firm characteristics. Given this linear form, the probability that firm j offers formal on-the-job training to its employees is given by: Assuming that the error term jfr  follows a normal distribution, equation (2) can be estimated by maximum likelihood (probit). Tables A5 and A6 in the appendix report the estimates of different specifications of equation (2) in the text for Malaysia and Thailand, respectively. Specifications (1) through (6) differ in the observable firm characteristics that are included. In all specifications we control for two-digits ISIC industry codes and for region dummies. Specification (7), which we will consider our baseline specification of firm characteristics, includes the interaction of industry and region fixed effects. In this specification, we control for size of the firm, age, export intensity, foreign ownership, education of the workforce (including managerial education) and degree of technological adoption. The findings show that training incidence increases with firm size in both countries although not with age of the firm. Training incidence increases also with the firm's presence in external markets and with foreign ownership. For example, in Malaysia, 13 training incidence is 56.4 percentage points higher in a firm with more than 250 employees than in a micro firm. We also find robust evidence that training incidence increases with the human capital of the workforce (measured both with years of education, skills of the workforce and by managerial education) and with the degree of technological adoption in the firm.
In tables A7 and A8 we replicate the estimation of equation (2) with maximum likelihood (probit model) but considering on-the-job training incidence at the worker level as the dependent variable. 16 This analysis is a critical first step of the propensity score matching methodology. We consider several observable characteristics that likely determine the selection into training and that we can quantify with our detailed data set. We will then estimate the fitted values for each worker level observation. Therefore, for each worker who has received training it is feasible to match him/her with a worker with a close enough fitted probability. This group of workers will constitute the control group in the estimation of the impacts of on-the-job training on wages.
Specifications (1) through (5) of tables A7 and A8 always include the baseline firm characteristics reported in column (7) of tables A5 and A6. However, the set of worker level characteristics differs across columns. Column (1), in addition to the baseline firm characteristics, controls for the worker's education (including vocational education), gender, age, tenure with the firm, potential experience, marital status, occupation, if the worker is an apprentice and if he belongs to the trade union. In columns (2) through (5) we add dummy variables capturing if the worker has a computer at home (specification 2), owes a bank account (specification 3), uses regularly internet for transactions (specification 4) and has received training at a previous employer (specification 5). In table 2, we report the results for both countries exploring our preferred specification (specification (5) in tables A7 and A8 in the appendix). 16 We assume that a firm offers formal training to a worker if there is a net positive benefit of this investment.
The main difference is that now the benefits should also be a function of the worker level observable characteristics, captured by ijfr X . In this case, the probability of a worker i being employed in firm j is determined by his characteristics ( ijfr X ) and the firm characteristics ( 14 The findings show that training incidence increases with the level of human capital of the worker from secondary education onwards and, interestingly, also increases as individuals hold some degree of vocational training. Women are less likely to receive onthe-job training in Thailand but not in Malaysia. Workers with longer tenure with the firm Note: T he dependent variable is a dummy variable that assumes the value 1 if the worker received any formal training after joining the firm. T able reports the marginal effects (at mean values) on the firm's propensity to train from probit regressions (equation 3 in the text). T he regressions control for several firm level characteristics, as it is listed in the specification reported in column (7) of tables A5 and A6 of the appendix. * significant at 10%, ** significant at 5%, *** significant at 1%. Columns (1) and (3) report the coefficient of the variable and columns (2) and (4) report standard errors. T hese are the same coefficients as column (5) in tables A7 and A8 of the appendix.
Yes Yes are more likely to receive on-the-job training in both countries. Differences in the probability to participate in training are not statistically significant when comparing workers of different ages. In Thailand, training incidence is high both for skilled and unskilled production workers than for non-production workers. In Malaysia, we only find that unskilled production workers are less likely to train than non-production workers. This might be a result driven by the industries represented in the sample. 17 In both countries, we find that workers that belong to a trade union and use computer at home are more likely to participate in job training than others. In Malaysia, workers that have ever made a transaction through the internet also tend to be selected into training more frequently than those who have never used e-commerce. Moreover, we also find that current training incidence is very strongly and positively correlated with past training incidence for both countries.

The Wage Returns to On-the-Job Training
The accumulation of human capital has long been seen as an investment decision (Becker, 1964). While investing, each individual gives up some proportion of income during the education and training period in exchange of increased future earnings.
Individuals will be willing to take additional schooling or training if the costs (tuition and training course fees, forgone earnings while at school and reduced wages during the training period) are compensated by higher future earnings. Assuming perfectly competitive labor markets, wages reflect the marginal product of workers and should increase with the accumulation of human capital if individuals become more productive in their current job. 18

1. Propensity Score Matching
We use propensity score matching to quantify the wage returns of job training at the worker level. The main idea underneath the propensity to score matching methodology is to match as closely as possible individuals who have received training to those not receiving 17 As explained in section 2 of the text our sample only includes formal manufacturing firms. 18 With imperfect competition wages do not necessarily reflect labor productivity and therefore might not reflect changes in the worker's productivity. As we have argued in a previous section, to assess the impact of job training it is not enough to compute the average wage difference between the workers that received training and the ones that did not participate in training. The main reason is that most likely a large part of this difference can be caused not by training itself but also be driven by other worker and firm characteristics that determine the selection into training. The propensity to score matching (PSM) methodology departs from the assumption that all the relevant differences between the treated and the untreated individuals are captured by their observables. Within the group of the untreated it selects a group as similar as possible to the treated group. The difference in wages across workers who received training and this set of workers is a better estimate of the returns to training.
First, given the richness of our data set, we assume that a significant number of worker and firm variables ( X ) explains the relevant differences between the treated and the untreated groups. For a consistent estimations it is required that where X is the set of observed variables 20 . However, if X is multidimensional it becomes difficult to match the individuals. Rosenbaum and Rubin (1983) is the propensity score fitted values, or the probability of participating in training. Therefore the untreated individuals that present higher probabilities of receiving training will compose the counterfactual group.

17
The first step of this method is thus to estimate the probability of each worker to receive on-the-job training. This is given by the fitted values of the worker level probit regressions for the incidence of training. We consider the specification reported in table 2.
There, we control for several worker (including education, gender, age, tenure with the firm, potential experience, marital status, occupation) and firm characteristics (including firm size, age, export intensity, foreign ownership, education of the workforce, degree of technological adoption). The reason why we also include firm level characteristics is to control for the fact that the training decision depends partly on the firm. By matching workers with similar characteristics and who work for similar firms we hope to minimize the selection bias that is likely arising from the fact that individuals selected into training may be the ones with higher unobserved ability.
Because the treated individual i can be matched with one or n individuals on the non-treated group, we choose the one-to-n matching method. This implies that each individual in the "treatment" group is matched with a weighted average of all individuals in the "control" group that have similar fitted values. 21 After associating each treated individual i with a mean of untreated individuals with different weights we simply compute the difference between the averages of the log wages in the treated group and in the control "weighted average" to quantify the causal effect of on-the-job training on wages.
In Malaysia there is a set of 4,425 untreated individuals and 2,202 treated (a total support group of 6,627). In Thailand, the set of untreated and treated groups have 4,477 and 4,941 individuals, respectively (9,418 individuals in total). We compute the average treatment effect on the treated individuals (ATT), yielding the impact of the training on the set of workers who actually end up receiving it. Table A11, in the appendix, reports the balancing tests to check the quality of the matching methodology to our sample. It is reassuring to see statistically similar means in most of the covariates for both the treated and the control groups. The only exception is the variable tenure in Malaysia for which the t-statistic rejects the null hypothesis.

Empirical Results for the Returns to Training
21 A normal kernel is used to define the weights.

18
As mentioned above workers selected into training may receive on average different wages than those not selected. In this section, we present estimates of the average treatment effect on the treated (ATT), which is the effect of the training (received since joining the firm) on the hourly log wages of trained workers. We report both the raw log wage differences and the ATT using propensity score matching. Table 3 reports that the raw difference in wages between the treated and the untreated groups is 42.9% for Malaysia and 28.4% for Thailand. Once we explore the propensity to score matching methodology, we find that the impact of on-the-job training on hourly wages falls to 7.7% for Malaysia and 4.5% for Thailand, respectively. The estimates are significant at a 5% level of confidence for both countries. Therefore, Malaysia presents higher returns from job training than Thailand. A priori, this is not immediate. On the one hand, Malaysia has a higher per capita gross domestic product and also has more youth in schools than Thailand, suggesting that their returns to human capital could be smaller than in Thailand. On the other hand, the accumulated stock of capital in Malaysia is higher and if skills and capital are complementary, all else constant, the returns to human capital could be higher than in Thailand. Also if training presents decreasing returns, it is reassuring to see that returns are lower in the country with the highest training incidence: Thailand.
For comparability, in tables A9 and A10 in the appendix, we show the estimates for the impact of training on wages, using the least squares methodology (OLS) and considering alternative specifications. The specification that is closer to the variables we control in the PSM estimates is reported in column (3). Comparing columns (1) and (3) in tables A9 and A10, we see that the wage difference between trainees and non-trainees falls from 43.1% to 4.3% in Malaysia and from 28.4% to 4.2% in Thailand. Even though the numbers are very similar for Thailand and for Malaysia the OLS estimates are lower than the PSM estimates, suggesting that least squares estimates have a downward bias.

Heterogeneity Analysis
Until now we assumed that returns to on-the-job training are the same for all the workers and firms within each country. In this section, we allow for the returns to be different by two fundamental worker characteristics: gender and education. Our results show that the returns to training are higher for men than for women in Malaysia, and for workers with completed secondary education or more education in both countries, when compared with workers that have not completed secondary education.  Table 4 shows that, in Malaysia, wage returns are higher for men than for women. Men present 20 wage returns to on-the-job training of 11% while there are no statistically significant returns for women. This may be explained by the fact that women tend go into and out of the job market more frequently than men and thus may be less likely to receive on-the-job training. This higher turnover may also make it difficult to appropriate the returns from the investments in job training. We do not find, however, that same result for Thailand. There the wage returns to job training are quantitatively larger for men than for women although for men they are not statistically different from zero (at a 10% level).  Source: Authors' calculations based on the Enterprise Surveys (World Bank). Note: * significant at 10%, ** significant at 5%, *** significant at 1%. T able uses propensity score matching to estimate equation (4) in the text. We estimate separate regressions by gender. Columns (1) and (4) report Average T reatment Effect on the T reated (AT T ) which evaluates the wage impact of training for those actually participating in training. Columns (2) and (5) report standard errors. Columns (3) and (6)

Conclusion
In developing countries, governments are increasingly concerned with the rapidly changing demand for skills and the slow response of the general and vocational schooling tracks. As a consequence, many employers complain with the lack of skills and education of their workforce. Policymakers are thus increasingly concerned that the supply of skills in the market does not keep pace with the demand and think about the design of policies to address this problem. The investment in on-the-job training is one important way to mitigate this gap by developing job relevant skills among the workforce.
The evidence on this topic is generally scarce for developing countries. The measurement of returns to training presents several challenges and this is the reason why we find so different results in the literature. Variables are usually not comparable across studies and sometimes data-sets do not allow for an accurate estimation of the results.
In this paper we quantify the wage returns from on-the-job training in Malaysia and in Thailand exploring a unique data set matching workers and firms. Using a matching estimators method to control for the selection bias we find returns of 7.7% and 4.5% for Malaysia and Thailand, respectively. In Malaysia, we find that returns are clearly higher for men than for women. Workers that have completed secondary education or more also show Note: * significant at 10%, ** significant at 5%, *** significant at 1%. T able uses propensity score matching to estimate equation (4) in the text. We estimate separate regressions by education group .Columns (1) to (3) refer to the sample of workers that have completed secondary education or more. Columns (4) through (6) refer to workers with lower levels of education (that is those with up to incomplete secondary education).
Columns (1) and (4) report AT T (Average T reatment Effect on the T reated), it evaluates the wage impact of training for those actually participating in training. Columns (2) and (5) report standard errors and columns (3) and (6) report the t-statistic. T reated individuals are those who have participated in training and the untreated individuals are the "control group" that is similar for all characteristics to the treated group except for the fact of receiving training. Panel A reports the estimates for the sample of workers in Malaysia and Panel B reports the estimates for the sample of workers in T hailand.

Workers Completed Secondary Education or More Years of Schooling
Workers with up to Incomplete Secondary Education 22 higher wage returns, than those who have not completed secondary schooling. Economic theory tells us that the wage effects are a lower bound estimate for the effect of training in productivity. Therefore the productivity impact of training in these countries should be even higher than the estimated values. Following Mincer, 1974, we assume that (log) wages are a linear function of several human capital and other worker characteristics, and of firm characteristics:

A1. Least Squares Returns to On-the-Job Training
where ij w is the worker's hourly wage (local currency) for worker i in firm j, The least squares estimates for  are consistent if ij Train is uncorrelated with the error term ij  . However, this assumption may not hold. On the one hand, there is likely self selection into on-the-job training. We have shown that workers with certain observable characteristics (and most likely also unobservable) are more likely to have taken on-the-job training programs than others. Therefore, it is possible that the higher earnings for those who are trained are caused not by training itself but because those taking up training could have a greater earning capacity and ability than the non-trainees. In this case, the least squares estimates of  will probably be upward biased due to a possible "ability bias". On the other hand, if the variable on-the-job training is measured with error, the least squares estimates could be downward biased. Therefore it is unclear the overall sign of the least square bias.
We minimize the first problem by accounting in the reduced form for several observable individual and firms characteristics simultaneously correlated with training and also with hourly wages. In particular, in ij X we include detailed information on schooling, gender, age, tenure in the firm, potential experience, marital status, occupation, ethnicity and age. In j Z we include information on firm size, foreign ownership, exports, average schooling of the workforce, managerial ability, degree of technological innovation, industry and geographical location of the firm.
Tables A9 and A10 report the least square estimates for  when exploring different specifications and after clustering the standard errors at the firm level. Column (1) controls only for training incidence since joining the firm, column (2) adds the baseline worker characteristics (as in column (6) of tables A7 and A8), 22 column (3) adds the baseline firm characteristics (reported in column (7) of tables A5 and A6) 23 to the specification in column (3) The OLS estimates strongly suggest that there are positive returns to the investment in on-the-job training in both countries. As expected, the magnitude of the returns decreases as we introduce additional firm and worker controls. In table A9, the wage returns of onthe-job training for Malaysia start at 43.1% but fall to 8.1% when we control for worker characteristics and to 4.3% once we control for firm characteristics. Table A10 reports similar findings for Thailand. Returns start at 28.4% falling to 6.2% when we include workers characteristics and to 4.2% when we include firm characteristics. 24 22 The worker variables included were: educational attainment, gender, age, tenure in the firm, years of labor market experience, marital status, occupation, whether the individual is member of a labor union, owes a computer, a bank account, has ever made an internet transaction and whether the worker received training at a previous employer. 23 In addition to the worker variables we described in the previous footnote we include the following firm characteristics: size, foreign capital participation, exports, average years of education of the work force, education of the manager, introduction of new production technologies, industry and region. 24 Tables A9 and A10, in the appendix, report the results for the worker variables included in the regressions. We focus on the findings in column (3). The estimates show that, in both countries, the returns to schooling are increasing with the level of formal education completed. Women and unionized workers earn lower wages than men and non-unionized workers. Wages also tend to increase with age, tenure and experience. Moreover, wages for managers and professionals are higher than the wages of non-production workers (omitted occupation group) and skill production, unskilled production in both countries. Finally, those that report having a computer at home, a bank account, and using regularly internet also report higher wages. The same happens for those individuals reporting having received training with their previous employer. Perhaps surprisingly, in Malaysia, we find that returns from past training, are higher than returns from more recent training events (6.8% vs. 4.3%), as we expect training to depreciate with time.

Yes
Randomization: control group composed by people that were planning to engage on a training activity by did not because of some random event.
Yes They use the RD data design method (see: Campbell 1969). They explore the discontinuity introduced by a new tax law that allows tax deduction for firms' expenditures on training for workers with more than 40 years. So the decision of training workers around age 40 suffers will be influenced by an exogenous effect (the law).  (1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996) log hourly wage industry aggregated incidence for training in the previews 4 weeks log capital per worker, log hours per worker, log of R&D over sales, region, time and tenure dummies, proportion of: men, age groups, occupation, qualified workers, small firms.

Yes
Raising training incidence by 5% increases wages and productivity by 1.6% and 4% respectively.
Tan and Lopez-Acevedo (2003) Firm Level Data for Mexico (1992,1999)     Dependent variable is a dummy variable that assumes the value 1 if the firm offered formal on-the-job training to its employees. T able reports the marginal effects (at mean values) on the firm's propensity to train from probit regressions. Robust standard errors are in brackets. * significant at 10%, ** significant at 5%, *** significant at 1%. All variables are defined in T able A2. Micro firms (with less than 10 employees) is the omitted size group. Age squared is also included in the regressions (not reported). Industry fixed effects refer to 2 digit industry or service. Source: Authors' calculations based on the Enterprise Surveys (World Bank). Dependent variable is a dummy variable that assumes the value 1 if the firm offered formal on-the-job training to its employees. T able reports the marginal effects (at mean values) on the firm's propensity to train from probit regressions. Robust standard errors are in brackets. * significant at 10%, ** significant at 5%, *** significant at 1%. All variables are defined in T able A2. Micro firms (with less than 10 employees) is the omitted size group. Age squared is also included in the regressions (not reported). Industry fixed effects refer to 2 digit industry or service.

Malaysia Thailand
Note: T able reports balancing tests between the sample means of the variables listed. We contrast the means of the subsample of treated and untreated individuals. T he t-test reported in column (3) and (6) for Malaysia and T hailand respectively, verifies if the difference between the means of the variables reported is, for each country, statistically different from zero across the two samples. T reated individuals are those that participated in training and untreated individuals are those reporting not having participated in training.