Longitudinal Survey : Fostering Research on Labor Markets in China *

This paper describes the Longitudinal Survey on Rural Urban Migration in China (RUMiC), a unique data source in terms of spatial coverage and panel dimension for research on labor markets in China. The survey is a collaboration project between the Australian National University, Beijing Normal University and the Institute for the Study of Labor (IZA), which makes data publicly available to the scientific community by producing Scientific Use Files. The paper illustrates the structure, sampling frame and tracking method of the survey, and provides an overview of the topics covered by the dataset, and a review of the existing studies based on RUMiC data. JEL Codes: C81, J01, P36, R23


Introduction
China has witnessed rapid demographic and socio-economic changes during the last three decades. Starting from the end of the 1970s and accompanied by the rise in foreign investment, the economic reforms introduced by the government led to a sudden expansion in the demand for unskilled labor in urban areas. Excess labor force was generated in many rural areas thanks to productivity growth in the primary sector. The household registration system that was used by the government as a policy to control and restrict internal movement of labor -the hukou -was progressively relaxed, albeit not eliminated. These rapid transformations set the background of the largest movement of labor force within a country, referred to as the Great Migration in China throughout this paper. According to the National Bureau of Statistics of China, more than 260 million Chinese left their hometown for at least 6 months, including more than 160 million migrant workers moving from rural to urban areas (NBS, 2013). This large-scale movement of labor is one of the driving forces of economic growth in China.
Mass internal migration in China has important reverberations for the global economy. On the one hand, a substantial part of the global demand has been indirectly supported by the shift of labor from the primary sector to export-oriented industries and services located in urban areas.
On the other hand, the growing influence in the world trade and the massive upsurge in domestic consumption have led China to affect the world prices of many commodities, including food and energy.
The Great Migration brought along unprecedented levels of urbanization. To put it in perspective, while the rate of urbanization in Europe increased from 30% to 50% in the first half of the 20 th century -at its fastest pace -urbanization in China is at least twice as fast (Frijters and Meng, 2009). Nowadays, there are over 700 million individuals living in urban areas (more than double that of just twenty years ago). Projections indicate that such number could be as large as 900 million by 2030 (Kamal-Chaoui et al, 2009), with most of this increase expected to be fuelled by rural-to-urban migration.
The persistence of the hukou system means that migrants generally cannot permanently settle in cities, making most migrations temporary. Nonetheless, the frequency, circularity and volume of the flows are such that rural-to-urban migrants hold a constant presence in urban areas, while many villages face an increasing lack in of the working-age population. As a consequence, 2 besides affecting migrants themselves, migration also has socio-economic repercussions on family members left behind in rural villages, as well as urban residents.
With the goal of understanding the relationship between the Great Migration and changing labor markets in China, a group of international researchers has established the Rural-Urban Migration in China (RUMiC) project. The project's main output has been the design and implementation of a large scale longitudinal household survey covering individuals in rural and urban areas, as well as temporary migrants working in cities. This paper outlines documents the structure, sampling frame and tracking method of the first two waves of the RUMiC survey, and provides an overview of the topics covered, and a review of the existing studies based on these data.

Overview
The Longitudinal Survey on Rural Urban Migration in China (RUMiC) consists of three independent surveys: the Urban Household Survey (UHS), the Rural Household Survey (RHS) and the Migrant Household Survey (MHS). It was initiated by a group of researchers at the Australian National University, the University of Queensland and the Beijing Normal University and supported by IZA, which provides the Scientific Use Files through IDSC, its data bank center. Financial support for RUMiC was obtained from the Australian Research Council, the Australian Agency for International Development (AusAID), the Ford Foundation, IZA and the Chinese Foundation of Social Sciences.
The fieldwork started in 2008, and since then four waves of the UHS and RHS and five waves of the MHS have been collected. The RHS and UHS have been conducted in collaboration with the National Bureau of Statistics of China (NBS), while the MHS has been conducted in partnership with a professional survey company. The project is designed to track households as long as they remain in the surveyed cities and villages. A systematic tracking strategy -especially relevant for migrant households -is adopted to follow individuals over the project's lifespan.
Its large scale, in-depth topics and longitudinal aspect make RUMiC a unique tool to explore migration and labor markets in China. The RHS comprises around 8,000 households, while the UHS and MHS each involve around 5,000 households. Urban (rural) residents are individuals who possess urban (rural) hukou. A migrant is defined as an individual who has rural hukou, but is living in a city at the time of the survey. The availability of three surveys allows multiple "control groups" to investigate the effects of rural-urban migration (Kong, 2010). For instance, when analyzing the impact of migration on rural areas, non-migrant individuals serve as a control group for those who are currently migrating in cities or those who have returned. Similarly, urban residents can be used as a control group when investigating the economic situation of migrant workers.
Each of the three surveys include comprehensive information on household and personal characteristics, detailed health status, employment, income, training and education of adults and children, social networks, family and social relationships, life events, and mental health measures of the individuals. The MHS additionally includes questions related to migration history.
In addition to its rich set of information at such a large scale, one of the key features of RUMiC lies in its longitudinal structure, which allows researchers to incorporate dynamic aspects in their empirical analyses. Each of the three surveys can be utilized as repeated cross-section -which is particularly useful when analyzing trends in economics outcomes of certain populations of interest -or as panel data -allowing the use of fixed effect estimators.
RUMiC complements existing surveys that have recently been conducted in China, such as the Chinese Household Income Project (CHIP) survey, which collects data on rural and urban surveys yet largely excludes migrant workers, and the China Health and Retirement Longitudinal Study (CHARLS), a biennial survey which aims to support research on the elderly, and thus only collects data about individuals aged 45 and older. 12 RUMiC also complements other surveys that IZA has made or is planning to make publicly available to the research community, such as the IZA Evaluation Dataset (see Caliendo et al, 2011;Arni et al, 2013) and the Ukrainian Longitudinal Monitoring Survey for Ukraine (see Lehmann et al, 2012). IZA currently provides the scientific use files for the first two RUMiC waves, and the IZA team is currently processing the remaining waves, which will be made available soon. 4

Listing and Sampling Frame
The RHS and UHS were conducted using random samples from the annual household income and expenditure surveys carried out in cities and rural villages (Kong, 2010). During the sampling process, efforts were made to cover rural and urban areas with representative income and population levels.
Due to the mobile and temporary nature of internal migration in China, there was no existing sampling frame to conduct the MHS. Migrants are typically clustered in dormitories near factories and construction sites, often without a registered address. Therefore, the sampling frame from the UHS would not be representative for migrant workers. To cope with this issue and ensure reasonable representativeness of the migrant worker population, the RUMiC team devised an innovative sampling frame. 3 The first step involved creating a listing of migrants and workplaces to estimate the total size of the migrant population in each city. This was based on a pre-survey census collected across randomly selected blocks in which the cities were divided into (see Kong, 2009, for details). Using the listing data, a sampling frame based on workplaces (rather than residences) was created. Furthermore, all businesses -including street vendors -in randomly selected enumeration areas within defined city boundaries were included. For each city, a sample of migrant workers was randomly selected within each workplace, based on their birth month. The enumerators subsequently conducted face-to-face interviews with migrant workers and their families.

Coverage
The RUMiC survey covers principal migrant sending and receiving regions. The RHS was conducted in villages across nine provinces, while the UHS and MHS were carried out in nineteen and fifteen cities, respectively. Table 1

Tracking Method
A systematic tracking strategy is adopted to follow individuals over the lifespan of the project, as long as they remain in the surveyed cities and villages. For the RHS and UHS, enumerators track individuals using their permanent addresses. For the MHS, due to the higher mobility of migrants and the fact that they usually do not have a permanent address in cities, a more complex tracking strategy was adopted. 4 The survey team recorded the individuals' work and home addresses, as well as other contact details in both cities and home villages. They also recorded the phone numbers of three close relatives or friends to be contacted in the case that households moved (Kong, 2010). As an incentive for improving tracking, the team designed three lotteries for each year, with prizes assigned to survey participants ranging from 50 to 2000 Yuan, as well as a yearly dispatch of small presents before the Chinese New Year (Kong et al, 2009).

Sample Size and Panel Attrition
Table 2 provides the sample size for each survey and wave. For wave 2, the table includes figures concerning households and individuals who are tracked over time, the attrition rate, the additional household members who were not surveyed in wave 1, and for the MHS, the size of the new sample that was collected. The panel attrition was essentially inexistent for individuals in the RHS (0.4%) and rather low for those in the UHS (5.8%). In contrast, the attrition rate was rather 6 significant for the MHS, despite substantial efforts to track individuals over time (58.4%). This is partly due to the mobile nature of migrant workers, as well as the consequences of the global financial crisis that also hit China's economy in 2009 -and especially export-oriented sectors in which many migrant workers are concentrated.
To counter the substantial attrition of the MHS and maintain the original sample size, a resampling based on the pre-survey census was conducted (see Kong et al, 2009 andMeng, 2013, for details). This implies that, starting from the second wave, the MHS consists of two separate samples: the "old sample", which tracks migrants from the first wave, and the "new sample", a fresh randomly drawn sample that is followed in the subsequent waves. The representativeness of the new sample is discussed in detail in Kong et al (2009). 5

Questionnaire Modules and Variable Content
The RUMiC survey provides a rich set of variables. The questionnaires cover detailed standard demographic and socioeconomic characteristics of household heads and members, and include also questions on physical and mental health status, life events, social networks, household consumption, assets and expenditure. This information offers a significant opportunity to investigate interesting yet under-researched topics concerning migration and labor markets in China. The modules about employment provide information on the type of employment (e.g. wagework, self-employment, domestic work without pay), hours of work, earnings, job search, firm ownership, as well as occupation and industry codes (which have been harmonized across the two waves). Several questions are asked about entrepreneurship and self-employment, including aspects related to borrowing and the existence of credit constraints to starting a business. The

Summary Statistics
We provide a simple effective picture of RUMiC data in Table 4, reporting statistics for selected variables including age, gender, marital status, number of children, work status, and an indicator for self-employment. The general health status is a self-reported categorical variable indicating "very poor", "poor", "average", "good" or "excellent" health. Happiness refers to a question asking whether individuals are happy when considering all aspects of their life, with responses including "much less than usual", "less so than usual", "same as usual" and "more so than usual". Individuals in rural areas report a number of children slightly above 2; in contrast, the number for urban residents is slightly above 1. This difference is a consequence of the one-child policy, which was strictly enforced in cities but was less binding in rural areas, with a second child allowed in many provinces if the first was a girl. All individuals report good levels of health and happiness.
With regard to employment, there is much variation across the three surveys. Only 40% of individuals in rural areas are employed in non-farm work, reflecting the continuing importance of the agricultural sector in villages. About 60% of individuals in urban areas are in wage work, while 5% engage in self-employment activities. Approximately two-thirds of migrants in the first wave were working as employees; however, as many as 20% were in selfemployment, reflecting that many migrants decide to start their own activity once in the cityoften in the informal sector. One remarkable aspect is the sharp decrease in the share of wage workers in the tracked sample of migrants, and the increase in the share of individuals engaging in self-employment. One likely explanation behind such a change is that the economic crisis had a relatively larger impact on export oriented sectors, inducing migrants to change city or return to their home village and thus drop out from the survey. Source: RUMiC Waves 1 and 2. Notes: standard deviation in parentheses. * statistics refer to individuals older than 16 present during the survey. ** statistics refer to individuals 16-64. Health status is measured on a scale 1 to 5, where 1 is "Excellent" and 5 "Very poor". Happiness is measured on a scale 1 to 4, where 1 is "Happy more than usual" and 4 is "Happy much less than usual".

Access to RUMiC
To date, IZA has made the scientific use files for the first two waves publicly available.
Access is granted for scientific purposes to researchers from universities and research institutes. Data application forms can be downloaded from the International Data Service Center (IDSC) of IZA website (http://idsc.iza.org/rumic) and should be completed with a description of the research project and submitted to idsc@iza.org. Data files are provided in formats that are compatible with standard statistical software.

A Review of Literature Based on RUMiC
Despite being relatively recent, a significant amount of research has already been produced using RUMiC data. In this section, we provide an overview of selected studies. Table 5 lists the research topics and main findings of these papers.

UHS, MHS
Discrimination in salaried jobs leads to migrants choosing self-employment, which offsets the negative effects of credit constraints on selfemployment. Ge and Lehmann (2013) Worker displacement; labor market segmentation

UHS, MHS
Re-employment outcomes after displacement differ between migrants and urban workers. Giulietti et al (2012) Self-employment; wage differentials RHS, MHS Wage differential is a key determinant in the choice between self-employment and wage work. Giulietti et al (2013) Entrepreneurship; leftbehind; return migration RHS Return migration promotes self-employment among non-migrants, while current migration reduces it. Qu and Zhao (2013) Wage inequality UHS, MHS During the 2000s, wage inequality decreased among migrant workers while it increased among urban workers. Zhang and Zhao (2011) Entrepreneurship; social networks MHS Migrants with larger social-family network are more likely to choose self-employment. Zhang and Zhao (2013) Migration decision MHS Rural-urban migrants who move further away need to be compensated with larger income gains.
It is possible to group the studies in Table 5 into the main topics covered. These include occupational choice and entrepreneurship, subjective well-being, wage inequality and labor market segmentation, and the determinants and consequences of migration.

Entrepreneurship and occupational choice
A strand of the literature focuses on the determinants of entrepreneurship among migrant workers. Frijters et al (2011) analyze the link between self-employment and credit constraints.
In order to explore the effect of credit constraints, they use information from the second wave of the UHS and MHS, classifying workers into self-employed, involuntary self-employed, want-to-be self-employed and happy-to-be-salaried. Their descriptive analysis shows that being discriminated in salaried jobs leads to migrants becoming self-employed, thereby offsetting the negative effects of credit constraints on entrepreneurship. Giulietti et al (2012) model migrant workers' choice between wage work and self-employment using the first wave of the MHS. Their key explanatory variable is the wage differential between the two sectors, estimated through an endogenous switching model. Migration selectivity is taken into account using information from the RHS. The authors document that self-employed migrants have higher earnings than wage workers, and that they are positively selected in terms of unobservable characteristics. Zhang and Zhao (2011)

Subjective well-being
Another line of research focuses on the determinants of migrants' subjective well-being (SWB). Akay et al (2012)

Wage inequality and labor market segmentation
Another group of studies explores wage inequality and labor market segmentation in urban areas. Frijters et al (2010)  In a recent contribution, Ge and Lehmann (2013) investigate the cost of job loss and the consequences of worker displacement in urban labor markets, using the second wave of the UHS and MHS. They first distinguish between displaced workers, quitters and stayers, before subsequently analyzing these groups upon re-employment, assessing the length of unemployment spell, earnings, happiness and health. Their results indicate that displaced migrant workers do not face relatively long unemployment spells or wage penalties, unlike displaced urban workers. The authors interpret these results as evidence of segmented labor markets in urban areas. Zhang and Zhao (2013) investigate the determinants of migration choice by focusing on the role of distance. They develop a framework in which the distance between the home village and destination city is included in the utility function. For the empirical exercise, the authors first calculate a variable measuring individuals' expected income in each potential destination, which is included in their OLS and discrete choice models analyses as an explanatory variable. Their results based on the first wave of the MHS suggest that rural-to-urban migrants do not find it convenient to move far away from their home villages. In particular, their estimates imply that in order to induce migrants to move 10% further away from home, their income has to increase by at least 15%.

15
A recent paper by Biavaschi et al (2013) focuses on the effects of parental migration on the educational outcomes of children left behind, highlighting the importance of sibling interactions in such a context. Exploiting the panel dimension of the RHS, the authors estimate the relationship between older and younger siblings' school performance using OLS and fixed effects methods. Their main findings suggest that sibling influence is stronger among left-behind children. In particular, older sisters primarily have a positive influence on their younger siblings. The authors interpret their results as evidence that sibling effects are a mechanism to shape children's educational outcomes and that adjustments within the family left behind can reduce the hardship determined by parental migration.

Conclusions
This paper provides a brief description of the RUMiC Survey, which has been created with the aim to foster research on migration and labor markets in China. The survey comprises the Rural Household Survey, the Urban Household Survey and the Migrant Household Survey.
Scientific Use Files for the first two waves are made publicly available by IDSC, the data bank center of IZA. Further waves will be made available in the near future.
The large scale, in-depth topics and longitudinal aspect are the core features of RUMiC. The surveys follow more than 18,000 households over time and include comprehensive information on household and personal characteristics. With such features, RUMiC data allow studying the consequences of the Great Migration for rural and urban areas in China, as well as the migrant themselves. Furthermore, it is possible to better understand the evolution of labor markets over the past few years, including the consequences of the financial crisis.
In the paper, we have surveyed a few studies based on RUMiC data, highlighting topical questions concerning entrepreneurship, happiness and wage inequality. Much more research is expected to be produced now that Scientific Use Files for the second wave have been made publicly available, which allow exploiting the panel aspect of the survey. Future lines of research could include a more rigorous study of the consequences of migration for left-behind individuals (especially children and the elderly) and the in-depth analysis of internal migration patterns over time, devoting particular attention to modeling circular and return migration.