We hypothesize that individuals with a larger social-family network are more likely to choose self-employment. We test this hypothesis using data on temporary rural–urban migrants in China. The size of a migrant’s social-family network is measured by the number of relatives and friends this migrant greeted during the past Spring Festival. After controlling for endogeneity using an instrumental variable approach, our results show that a rural–urban migrant with a larger social-family network is more likely to be self-employed. This finding is robust to alternative model specifications and various restrictions on the estimation sample.
For people who choose to become self-employed, a social and family network is often the most reliable source of assistance. This is particularly true in a society like China. For example, when one needs financial capital for an initial investment, one turns to family members, relatives, and close friends. Similarly, when a self-employed individual needs to find customers, the word-of-mouth advertising by friends and relatives is often more effective than advertising through formal channels; when a small business owner needs to hire an employee, he or she also asks friends and relatives for referrals. More importantly, in a developing country where the institutional environment is full of uncertainty and hidden rules, a self-employed individual constantly needs personal connections to facilitate navigating the system (Yueh, 2009). Indeed, an extensive literature has documented that the self-employed rely heavily on the assistance of friends and family members.1
Given that a well-developed social-family network can greatly increase the feasibility of self-employment and enhance the chance of success for the self-employed, one would naturally hypothesize that individuals with a larger social-family network are in a better position to choose self-employment. In this paper, we empirically investigate whether this is indeed true.
Our study brings together two strands of literature. One concerns the effect of personal networks on an individual’s labor market outcomes and the other how the various factors influence a person’s decision to become self-employed or engage in entrepreneurship.
There is a vast and growing literature on networks and labor market outcomes, focusing mainly on how social and family connections increase one’s employment opportunities and earnings.2 The bulk of this literature is motivated by the idea that a large social-family network facilitates the job search because social contacts, relatives, and family members can provide information about job openings and referrals. This line of research does not distinguish between wage workers and the self-employed. We want to emphasize here that a well-developed network is more important for the self-employed than for wage workers. Without a supportive network, it is still possible to find a job but will be extremely difficult to survive self-employment. Moreover, while “weak ties” and “informal networks” are generally good enough to be helpful when one looks for a wage-earning job (Granovetter, 1973; Bayer et al., 2008), strong ties are often necessary for the kind of assistance needed during self-employment. For example, an acquaintance in your neighborhood may provide you some useful information about job openings at his company, but it is unlikely that he will lend you money when you are in need of capital as a small business owner. The latter type of help almost always comes from family members, relatives, or close friends. For these reasons, we expect that a large social-family network is crucial for self-employment, and having access to such a network increases the likelihood of being self-employed.
The existing literature on the choice of self-employment or entrepreneurship has mostly focused on factors such as liquidity constraints, human capital, and family background. There has been considerable evidence that higher household wealth increases the probability of entrepreneurship, perhaps by relaxing capital market constraints.3 A person’s human capital matters too. For example, Lazear (2004, 2005) shows that individuals with more balanced skills, acquired through formal education or work experience, are more likely to become entrepreneurs. Others find that family and social backgrounds, such as having a self-employed parent or residing in highly entrepreneurial neighborhoods, also have a positive effect on the choice of self-employment.4 However, the size of social-family network, as a potential determining factor in self-employment decisions, is very much under-researched.5
In this paper, we empirically examine whether individuals with a larger social-family network are more likely to choose self-employment, paying close attention to the issues of endogeneity and measurement error. We use a survey database that was recently constructed in China. This database contains detailed information on rural–urban migrants in China, including many variables on their personal characteristics as well as their social and family networks. Our main reason to focus on a sample of migrants is that they have all changed residential locations substantially and thus experienced major disruptions to their social-family networks. As a result, we expect to see extensive variations in network size among these individuals, which should help us identify the effect of network size on the choice of self-employment.
We take the standard instrumental variable (IV) approach to overcome the endogeneity and measurement error problems. In particular, we use the distance from home province when a migrant first moved to an urban area as an instrument for social-family network size today. As will be shown, the individuals who originally migrated far away from home tend to have smaller social-family networks today, because people who grew up in rural areas have highly local networks and long-distance migration disrupts such previously established networks. Using this distance as an instrument, we are assuming that it does not directly affect a migrant’s self-employment decision through any other uncontrolled channels.
We believe this assumption is plausible for several reasons. First, as we will emphasize below, the unique institutional context of rural–urban migration in China has determined that the first-time migrants face a great deal of uncertainty and almost always consider the move temporary. The decision where to migrate for the first time is particularly uninformed and largely random, depending on where some early movers they happened to know had gone. So the distance of the first-time migration is arguably exogenous. Second, our analysis focuses on a sample of migrants who started as wage workers in urban sectors and changed jobs over time. Since none of them were self-employed originally and none of them are in their first job any longer, it is plausible that whether they are self-employed today is not directly affected by the distance of their first migration. And third, we control for home province fixed effects. By comparing migrants from the same province, we think it is more reasonable to consider the distance of first migration as exogenous to today’s employment status.
We find that migrants with a smaller social-family network, as a result of a longer-distance migration in the past, are less likely to be self-employed today. This finding holds true for various network size measures and it is robust to different model specifications and sample restrictions. We consider these results convincing evidence that the size of social-family network has a positive effect on the choice of self-employment.
2 Institutional setting
2.1 Temporary rural–urban migration in China
Along with its fast economic growth, China has experienced a rapid urbanization during the past three decades. The share of urban population in China has risen from 18 percent in 1978 to 50 percent in 2011. This fast urban growth is achieved primarily through a massive migration from rural to urban areas (Zhang and Song, 2003). According to the National Bureau of Statistics, by the end of 2008, there was a total of 225 million rural–urban migrants in China.
This wave of rural–urban migration in China occurred in a unique institutional context. On the one hand, there is a long-standing residence registration (hukou) system in China designed to control the movement of people within the country (Chan and Zhang, 1999). Each individual is issued a residence permit, a so-called hukou, which gives the person the right to live in a jurisdiction and access local public goods such as public education and health care. If a person with a rural hukou wants to move to a city and work in urban sectors, he or she has to apply through the relevant bureaucracies. Since the mid-1980s, this system has been gradually relaxed and the controls have been weakened, primarily in response to the rapid expansion of the urban economy and the increased demand for cheap labor in urban sectors. However, although people with a rural hukou are now generally allowed to find work in urban areas, jobs in certain urban sectors are still reserved for residents with the local urban hukou, and the migrants from rural areas have very limited access to urban public goods.
On the other hand, a household responsibility system implemented in the late 1970s in countryside was a key component of economic reform in China (Lin, 1992). In rural areas, land ownership belongs to local economic collectives. Under the household responsibility system, land use rights are contracted to households, with the size of the land for each household determined by the number of household members who have a hukou in the village. As long as farmers fulfill grain procurement obligations, they can retain the surplus for their own use or sell it on the market. Over the years, the central government removed most of the procurement obligations; in 2006, China also repealed all agricultural taxes to lift the burden on farmers. Thus a farmer who does not want to seek urban employment can make a basic living by farming on his family’s land. Similarly, a migrant who has difficulty in finding a job in an urban area, due to a slowdown of the urban economy or any other reason, can always return to his village and resume farm work on his family’s land.
Because of this institutional arrangement, rural–urban migration in China appears to be “temporary.” The migrants, even having lived and worked in a city for many years, tend to consider themselves as outsiders and are reluctant to make an effort to assimilate into the city. They also tend to be footloose and move from one city to another to chase jobs. Partly because they feel unwelcome in the city and partly because they have access to a piece of land in their villages, rural–urban migrants tend to consider their villages as home, and many of them leave their children in their villages together with grandparents. These migrants regularly send money back to pay for their children’s education, build houses, or make other investments (Wei, 2008).
2.2 China as a “guanxi society”
In Chinese, guanxi means connections. China is a “guanxi society” where connections really matter and personal relationships are central in every aspect of the society. Despite a comprehensive economic reform aimed to establish institutions compatible with a modern market economy, doing business in China, to a great extent, is still about managing interpersonal relationships rather than faceless transactions (Xin and Pearce, 1996; Luo, 2007).
Consider an aspiring entrepreneur who needs to borrow some money from a bank in China. His most important task is not to craft a sound business plan or put up enough collateral. Rather, he will have to find out whether he can get to know one of the loan officers in person through a friend or a relative. Such a personal connection is often more helpful than a good business plan.
The same is true for the self-employed; their business opportunities often come through personal connections. In Xu and Qian (2009), there is a revealing story about a rural–urban migrant who makes a living by sharpening scissors for others. He is very good at his job, but he earns far less money than a local competitor. The local person, not necessarily a better scissor-sharpener, knows the owner of an apparel factory that has thousands of scissors and uses his service every other week. Similarly, while an ordinary scrap metal collector has to dig around at junk yards, a person whose relative is managing a state-owned steel factory can regularly pick up some waste metal at several plants.
The importance of personal connections in China has two implications. First, if the size of the social-family network indeed affects one’s choice of self-employment, we should expect to see this effect in China more than most other societies. Second, and perhaps more importantly, this usefulness of personal connections in China implies that the self-employed will intentionally build and maintain a large network. Thus, it is necessary to develop an identification strategy to solve the reverse-causation problem.
3 Data and empirical strategies
3.1 Data source and key variables
This study uses a unique survey on Rural–urban Migration in China (RUMiC). The RUMiC database is constructed by a team of researchers from Australia and China, with the goal of studying issues such as the effect of rural–urban migration on income mobility and poverty alleviation, the state of education and health of children in migrant families, and the assimilation of migrant workers into the city (Akgüç et al. 2014).
The first wave of the survey was conducted in 2008, and the data became available in 2009. Three representative samples of households were surveyed, including a sample of 8,000 rural households, a sample of 5,000 rural–urban migrant households, and a sample of 5,000 urban households. In this paper, our empirical analyses use information mainly from the migrant sample. Since the migrants all came from rural areas, 99.4 percent of them have a rural hukou, although they currently live in cities.
The migrants surveyed were randomly chosen from the fifteen cities that are the top rural–urban migration destinations in China (see Figure 1). Eight of these cities are in coastal regions (Shanghai, Nanjing, Wuxi, Hangzhou, Ningbo, Guangzhou, Shenzhen, and Dongguan); five of them are in central inland regions (Zhengzhou, Luoyang, Hefei, Bengbu, and Wuhan); and two of them are in the west (Chengdu and Chongqing). A sampling procedure was very carefully designed to ensure that migrants in the database constituted a representative sample of all the migrants in the fifteen cities.6
The migrant survey was designed to collect information about every household member. It asked detailed questions that generate more than 700 variables. In terms of basic information of a household member, we know the person’s age, gender, education level, current address, home address before migration, etc. For information regarding employment experience, we know whether the person is self-employed or a wage worker, occupation, monthly income, how he/she found the current job, what was his/her first job, how he/she found the first job, etc. For the self-employed, we know why they chose self-employment, the amount and sources of money they borrowed for initial investment, the number of workers they currently hire, etc. Particularly useful for our study, the survey also asked about the migrant’s social and family network. We know who the migrant’s most important social contacts are and whether they live in the same city, whether the migrant’s parents and siblings also live in the same city, how many people the migrant greeted during the past Spring Festival, etc.
In our regression analysis, the dependent variable is whether an individual is currently self-employed or not. Among all of the migrant household heads in the database, 19.6 percent are self-employed.7 These individuals can be restaurant owners, convenient store owners, scrap metal collectors, street vendors, etc. or provide services such as shining shoes and repairing bicycles or electronics. A large proportion of these self-employed migrants simply work alone; only a quarter of them (25.4 percent) also hire other people. Among those who hire other people, the average number of employees is 3.5.
The type of work self-employed migrants do seems to be rather informal. It makes one wonder whether only the truly unemployable people fall into this status. It turns out this is not the case. In response to a survey question asking about the migrant’s reason to choose self-employment, the top three answers are: (1) it brings a higher income (answered by 38 percent of the self-employed migrants); (2) it gives more flexibility and freedom (29 percent); and (3) it allows one to be one’s own boss (19 percent). Only a small fraction (12 percent) report being self-employed because they cannot find wage work.
Consistent with their stated top reason, we indeed find that the self-employed migrants earn more income. The average monthly income is 1,447.7 Chinese yuan for wage-workers, 2,331.1 yuan for the self-employed who work alone, and 3,534.7 yuan for the self-employed who hire other people. We regress monthly income on employment status, controlling for gender, age, marital status, years of schooling, number of children, years since the person first migrated out of rural area, city fixed effects, and home province fixed effects. The results show that the self-employed with no employees earn 964.7 yuan more than wage workers, and those with employees earn an additional 973.5 yuan a month. Thus, for most migrants in our sample, self-employment status seems to be desirable.8
In our regression analysis of whether a migrant is self-employed, the key independent variable is the size of a person’s social-family network. To measure this size, we use the number of friends one greeted during the past Spring Festival, the number of relatives one greeted during the past Spring Festival, or the sum of these two numbers.
Spring Festival is the most important traditional holiday in China, which starts on the first day and ends on the fifteenth day of the first month of the Chinese lunar calendar. There are many traditional activities during the festival which vary widely across different regions in the country. But one tradition is followed throughout the country: during the festival, people greet family members, relatives, and friends, wishing them a happy, healthy, and wealthy new year. We therefore use the self-reported number of friends and relatives an individual greeted during the festival to measure the size of this migrant’s social-family network. Traditionally, greetings during the spring festival are mostly sent through personal visits. In recent years, greetings by phone, post, or even email are also becoming common, especially among the younger generations. Therefore, the persons greeted (i.e., the social-family network measured this way) are not necessarily local. Indeed, about half of the people greeted are currently not living in urban areas, most of whom are perhaps friends and relatives in their home villages.
This network size measure is behaviorally revealed and is more relevant for our purpose in this study. For example, a person may have a first cousin who is by definition one of his relatives. However, if they have a soured relationship and are not on speaking terms, or if they live far away from each other and have lost contact, then the cousin is in effect out of this person’s network. It is important to discount the cousin for our purpose because it is unlikely the cousin will provide any help when this person needs assistance during self-employment. Our measure will achieve this because if a relative was effectively outside a person’s network, this person would not have greeted him during the Spring Festival. Similarly, we believe that only a friend greeted is truly a friend, and our network size measure only includes such real friends.
3.2 Identification strategy and econometric specification
Despite the good features of this network size measure, it also has its drawbacks. For example, if a person has already chosen self-employment, he may have incentives to greet more friends and relatives simply because he has used or will likely seek their assistance during self-employment. For this reason, a simple correlation between self-employment status and network size cannot be interpreted as a causal effect of network size on the choice of self-employment. It may be a result of reverse causation, which is also interesting in itself but not exactly what we intend to study here.
Another issue with the network size measure is the concern of measurement error. During the survey, a respondent has to recall how many friends and relatives he greeted. Due to imperfect memory or lack of effort to do an accurate count, a respondent tends to report a number that appears to be a best guess. As we can see in Figure 2, most surveyed individuals reported round numbers, numbers that are multiples of five or ten. There is no reason to believe, for example, that a person is so much more likely to have actually greeted twenty than nineteen friends or relatives. Thus the spiky distributions in Figure 2 are almost surely a result of rounding or misreporting. As is well known, classical measurement errors in the independent variable will bias the OLS coefficient toward zero. Therefore, even if a larger social-family network indeed increases the probability of self-employment, a simple OLS regression may fail to identify a statistically significant effect because of errors in the measurement of network size.
The standard technique to overcome these reverse-causation and measurement-error problems is to instrument for the independent variable, which is the approach we take here. That is, we will use an instrumental variable that is correlated with the network size but does not affect the choice of self-employment through any other unaccounted channels. The particular IV we will use is the distance from home province when a migrant first left his village to work in the urban sector.
More specifically, we construct a distance variable using information about a migrant’s home address and the province he migrated to when he first left his village.9 Since this first migration typically occurred a few years ago (with a median of six years ago) and the RUMiC project focuses on the migrant’s current situation, the survey did not ask about the exact destination of the first migration at the sub-provincial level. So we can only construct a distance variable at the province level. For each migrant, we calculate the log railway distance between the capital of the home province and the capital of the first destination province.10 If the home province is the same as the first destination province, we set the log distance equal to zero.
We expect, and the data have confirmed, that the distance of the first migration is correlated with the number of friends and relatives greeted during the past Spring Festival. The reason is simple. For people who grew up in rural China, their social and family networks are highly local because they usually interact with and marry with other people in the same or nearby villages. A person who migrated far away would have been disconnected from many individuals in his original network for a considerable period of time. This is true even if the migrant later moved to a city closer to his home village. Because of this disruption, he tended to lose contact with some friends and relatives in his network. In the meantime, because he moved far away from home, he tended to know few locals and thus had difficulty in developing a new network.
Our key identifying assumption is that the distance of the first migration does not affect today’s choice of self-employment through any other channels that are not controlled for in our regressions. We cannot test this assumption but believe it is plausible given the specific context of rural–urban migration in China and the particular samples of migrants used for estimation.
In recent years, as rural–urban migration has become an increasingly prominent social phenomenon in China, many field studies have been conducted to document the life experiences of these migrants.11 We have therefore learned a great deal about the process of their migration decisions from both anecdotal and statistical evidence. The key fact to keep in mind is that a typical villager in China has no chance to travel to many places and has very limited information about how the urban economy is organized in different cities. It is clear that the migration is usually triggered by a need or an urge to improve one’s individual or family economic conditions. But the initial migration location is mostly an accidental choice not based on an informed calculation of feasibility and potential returns of different locations.
A migrant almost always chose the first city because he happened to know someone who was already there. It could be a relative, a neighbor, a friend, or simply an acquaintance who already migrated to that city and demonstrated that it might be feasible for this person to do the same thing (Zhao, 1999, 2003).12 Also, because the migration was not meant to be permanent, the first-timers tend to have a trial-and-error attitude: “Let me give it a shot and see what happens.” For this reason, when looking at a random sample of migrants, it seems reasonable to think of their first migration distance as random, especially after controlling for home province fixed effects. That is, given two first-time migrants from the same province, whether one went farther away than the other is likely to be exogenous, driven mostly by whether one happened to know someone who had migrated far away. Note that we do not need this distance to be completely random; we only need it to be exogenous to the choice of self-employment today.
The most serious threat to the credibility of our identification strategy is that the first migration destination and the type of the first job in urban sectors (whether self-employed or not) may be jointly determined. If this is true, it is problematic to think of the distance of first migration as exogenous to a migrant’s self-employment decision, especially for those who are still in their first jobs in cities today. To overcome this problem, in our empirical analysis below, we will focus on the sample of migrants who did not start as self-employed and who are not in their first jobs today. In other words, we will examine the sample of migrants who all moved to urban areas to work for some employers and all changed their jobs over time. Some of them would change from wage workers to self-employment and others would remain as wage workers but have moved to different employers. We then ask the following empirical question: Among the rural–urban migrants who started as wage workers and later changed their jobs, who are more likely to have chosen self-employment today? Because all the migrants in this sample started as wage workers in urban sectors, it is much more plausible to assume that their first migration destinations were not chosen for the purpose of self-employment. It is thus reasonable to exclude the distance of the first migration from the main equation that explains a migrant’s self-employment status today.
Another threat to the credibility of our identification strategy is the possibility that the distance of first migration is correlated with some unobserved characteristics of the migrant that in turn are correlated with the migrant’s choice of self-employment. In that case, the distance is not a valid instrumental variable. A most plausible scenario is perhaps that the more adventurous individuals are more likely to migrate far away from home and those people are also more willing to take risks and therefore more likely to choose self-employment. As it turns out, we find that individuals who migrated far away the first time tend to have a smaller social-family network today and are less likely to be self-employed today. Therefore, this concern about unobserved attitude toward risks actually works against our findings. In particular, if it is indeed true that the less risk-averse individuals tend to migrate a longer distance and are more likely to choose self-employment, then the true effect of network size is even higher than what we find. That is, our IV estimate can be thought of as a lower bound of the true effect.
where the outcome variable yji is a dummy variable taking value 1 if migrant i from province j is self-employed; sji is the key independent variable that measures the size of social-family network for this individual; Xji is a vector of control variables including the migrant’s age, gender, years of schooling, marital status, number of children, and years since the person first migrated out of rural area; HPj is a home-province fixed effect that captures the effect of all unobserved factors common to migrants from province j; and εji is the error term.
When using the IV strategy, we estimate two-stage least squares (2SLS) regressions with the following first-stage equation:
where dji is the log-distance between the home and destination provinces when individual i from province j first migrated to a city. Predicted sji from this first-stage regression are then used for estimating equation (1) in the second stage.
4 Empirical results
4.1 Descriptive statistics
The survey of rural–urban migrants was conducted at the household level. Some migrants are married; their spouses, and sometimes their grown-up children, may stay in the same household and also work in the city. In our empirical analysis, we focus on the household heads only and exclude the household heads aged below 16 or above 70. We also drop any observations with a missing dependent, independent, control, or instrumental variable. And finally, we exclude 32 outliers who have greeted more than 200 people and are most likely a result of reporting or recording errors. This procedure leaves us with 4,442 observations, for which the descriptive statistics are shown in Table 1.
One fifth of the household heads are self-employed; 68.9 percent are male; and 53.4 percent are married. Their average age is 30.4, and average years of schooling is 9.3. The average number of children is 0.76, which is so small because close to half (47.9 percent) of the migrant households do not have any children. Among the households that do have children, this average is 1.48. The average household head first migrated out to work in an urban area 7.6 years ago. When they first migrated out of rural areas, 48.2 percent went to a city in the same province and the rest migrated to a different province. The average log distance between the home and destination provinces during the first migration is 3.146, which translates to 23.2 kilometers.13 The distribution of this log distance is highly skewed, with a maximum of 8.313 (equal to 4,077 kilometers).
On average, a household head greeted 29 people during the past Spring Festival, among which 13 are currently living in urban areas. Out of the 28.7 people greeted, 16 are identified as friends and 13 as relatives.
4.2 Regression results
We now present the regression results. We use the number of friends, the number of relatives, and the number of friends and relatives as alternative measures of the size of an individual’s social-family network. We run 2SLS regressions to estimate equation (1), which will be the focus of our discussion. For each set of regressions, we construct four different samples as follows:
Sample A — all household heads aged between 16 and 70 years, excluding outliers whose total number of contacts is above 200.
Sample B — all household heads in sample A who are not in their first jobs in urban sectors.
Sample C — all household heads in sample A who initially were not self-employed and who are not in their first jobs in urban sectors.
Sample D — all household heads in sample A who initially were not self-employed and who are not in their first jobs in urban sectors, excluding those who chose to become self-employed at some point only because they could not find any wage-earning jobs.
Going from sample A to B imposes the most stringent restriction, reducing the number of observations by 44 percent, from 4,442 to 2,474. This implies that many of the migrants are still in their first jobs after they migrated out of rural areas. Among those who have left their first jobs, some have stayed in the same city, yet others moved to different cities or even different provinces. For example, there are 431 household heads who originally moved out of their home provinces but currently work in cities within the home provinces. There are also 224 household heads who originally migrated to cities within their home provinces but currently work in places outside the home provinces. Presumably these individuals moved to different provinces because over time they found better job opportunities in other provinces.
Among the 2,474 household heads who have changed jobs, there are also moves into and out of self-employment. There are 409 individuals who started as wage workers in urban areas but are now self-employed. There are 40 household heads who were initially self-employed and are now wage workers. There are also 55 household heads who started and remain self-employed.
In sample C, we further exclude all of the 95 household heads who started as self-employed when they first moved to cities. As discussed above, this sample restriction makes it more reasonable to assume that the distance of the first migration years ago does not directly affect the choice of self-employment today. To be precise, regressions using Sample C answer the following question: Among all of the individuals who originally migrated to cities only to take wage-earning jobs, which are more likely to be self-employed today?
For descriptive purposes, we divide sample C into two groups. One group initially migrated to cities within the home province and the other initially to cities outside the home province. The first group, migrants who originally stayed within the home province, greeted an average of 33.95 friends and relatives today; the other group, those who originally moved out of the home province, greeted an average of 28.58 friends and relatives today. We show in Figure 3 that in the first group, 823 migrants changed jobs over time but remained as wage workers, and 212 migrants (or 20.48 percent of this group) moved from the wage-worker status to self-employment. In contrast, in the second group that initially moved outside of the home province, 1,147 migrants changed jobs but remained as wage workers and only 197 (or 14.66 percent of this group) switched from wage work to self-employment. That is, those who originally moved far away from home—and therefore have smaller social-family networks today—are less likely to become self-employed. These differences between the two groups are the key variations in the data that help us identify the effect of network size on the choice of self-employment.
Sample D adds one more restriction to sample C by dropping the household heads who moved from wage work to self-employment at some point because they could not find wage-earning jobs. The idea is that some migrants may be self-employed primarily to avoid unemployment. Indeed, 12 percent of the self-employed migrants indicated that they chose self-employment because they could not find wage work. Since these people did not intend to be entrepreneurs, their choices might not be determined by the same factors as those of other self-employed individuals. We consider results from samples C and D more convincing, and thus our discussion below focuses on the results from these two samples. Results from samples A and B are useful for comparison purposes, which are available in an earlier working paper (Zhang and Zhao, 2012) but not presented here.
We first examine how the number of friends affects the choice of self-employment, and the results are in Table 2. Columns (1) and (2) show the results from 2SLS regressions, using samples C and D. In each regression, we include the same set of control variables, a constant, and home province fixed effects. The upper panel shows the results from the first stage and the lower panel the results from the second stage. For comparison purposes, in the very last row at the bottom, we also show the OLS coefficient estimated using each sample.
The results are similar between the two samples. In the first stage, log distance in the first migration has a negative and statistically significant coefficient, confirming our expectation that individuals who initially migrated further away have greeted fewer friends today. Male, younger, and more educated migrants tend to have more friends, yet marital status, number of children, and years since first migration are not significantly correlated with the number of friends greeted.
In the second stage, the number of friends has positive and statistically significant coefficients in all of the four regressions. The IV coefficients from the two samples are of the same order in magnitude. Consider results in column (2), estimated using sample D. The coefficient suggests that one more friend leads to a 1.17-percentage-point increase in the probability of self-employment. That is, if one’s number of friends increases by one standard deviation, which is 26.8 for sample D, this person’s likelihood of becoming self-employed increases by 31 percentage points. This is clearly a substantial effect.
Comparing these IV coefficients with the corresponding OLS coefficients at the bottom, we see that the IV coefficients are more precisely estimated and are much larger than the OLS coefficients. Again consider column (2), which uses sample D and gives the larger coefficient between the two OLS regressions. The OLS coefficient suggests that one more friend is associated with a 0.041-percentage-point increase in the probability of being self-employed, which is 27 times smaller than the IV coefficient.
Our discussion above suggests that the OLS coefficient of network size may be biased for two reasons. One is reverse causation, which biases the coefficient upward; the other is measurement errors in the key independent variable, which biases the coefficient toward zero. Given that our IV estimates are so much larger, it seems that biases from measurement errors are dominant in the OLS regressions.
The coefficients of the control variables show rather consistent patterns between the two samples. Age and being male have small coefficients and are statistically insignificant in both cases. The coefficients of schooling and marital status, in contrast, are always statistically significant. The IV results suggest that one more year of schooling decreases the probability of self-employment by about three percentage points, which is a sizeable effect. One possible explanation is that employers prefer to hire the more educated, and consequently such individuals have better alternatives on the job market. It is also possible that better educated people have become more risk averse and do not want to face the higher uncertainty associated with self-employment. Yet another possible reason is that self-employment opportunities are highly concentrated in low-status services, and the more educated may want to stay away from those occupations either because they have higher aspirations or due to social pressure.
Married migrants are more likely to be self-employed, perhaps because the married couple have complementary skills making self-employment more feasible or because a spouse provides an extra source of income and serves as a sort of insurance for the self-employed household head. Having more children is also associated with a higher probability of self-employment, possibly because individuals with more children need more income or a more flexible work schedule, especially when their children are still young. Years since first migrated to an urban area is positively correlated with the probability of self-employment, which may reflect the fact that self-employment requires more knowledge of the urban economy than wage work, and migrants who arrived in urban areas long ago tend to have acquired more of such knowledge.
Home province fixed effects are included in all regressions. This is important because heterogeneity across home provinces may affect both the dependent variable and the endogenous independent variable. On the one hand, migrants may have different numbers of friends simply because social customs and population densities differ across home provinces. For this reason, home province fixed effects should be included in the first-stage regression. On the other hand, self-employment rate can also vary across migrants from different home provinces due to unobserved factors. For example, Sichuan cuisine is very popular in many urban areas in China, and as a result, migrants from Sichuan may disproportionally concentrate in the food services industry and work as self-employed restaurant owners. Therefore, we should also control for home province fixed effects in the second-stage regression.
Table 3 presents results from a similar set of regressions, only now we use the number of relatives as the independent variable. The results are qualitatively similar. The coefficients in 2SLS regressions are positive and statistically significant. They are much larger than the corresponding OLS coefficients. For example, the IV coefficient from sample D is 55 times as large as the OLS coefficient estimated from the same sample. This IV coefficient suggests that one more relative increases the probability of self-employment by 3.2 percentage points.14
A comparison of the results in Tables 2 and 3 suggests that the effect of an extra relative is larger than that of an extra friend. This makes sense. In China, it is generally believed that “blood is thicker than water,” meaning that kinship is more reliable than friendship when one needs substantial assistance from the social-family network. Therefore, it is hardly surprising to find that an extra relative has more influence than an extra friend.
We should point out that the specifications in Tables 2 and 3, including either the number of friends or the number of relatives as an independent variable but not both, are not ideal. As expected, these two variables are positively correlated, with a correlation coefficient of 0.56. Therefore, if both variables have positive effects on the choice of self-employment, then including only one of them in the regression will overestimate the coefficient because it will capture part of the positive effect of the other variable.
Ideally, we want to include both as independent variables in our regression and separately identify the effect of each variable. However, we have only one plausible IV, which does not allow us to deal with two endogenous independent variables simultaneously. As a compromise, we use the sum of these two variables as an alternative measure of network size. This imposes the assumption that the effect of an extra friend is the same as the effect of an extra relative, which may not be true given our discussion above. Nonetheless, this seems to be the most reasonable way to construct a network size measure that incorporates the effects of both friends and relatives. The regression results using this measure are shown in Table 4.
We see again that the 2SLS coefficients are positive and statistically significant. They are more precisely estimated and substantially larger than the corresponding OLS coefficients. Also, the IV coefficients between the two samples are of the same order. Using sample D, the IV coefficient is 26 times as large as the OLS coefficient, again implying that biases from measurement errors dominate endogeneity biases in the OLS regressions. The IV results suggest that an extra friend or relative increases the probability of self-employment by 0.85 percentage points. This is indeed smaller than the effects found in either Table 2 or 3, confirming the suspicion that using the number of friends or relatives only in the regression will overestimate the effect.
For all of the 2SLS regressions in Tables 2-4, we have presented the first stage F statistics to show the correlation between the instrumental variable and the endogenous independent variable. The F statistics give us the highest confidence in Table 4, where in one case the test statistic is higher than 10, and in the other case, it is very close to 10, the “rule of thumb” critical value for testing weak instruments. For this reason, as well as the independent variable taking into account both friends and relatives, we consider the 2SLS regressions in Table 4 our preferred specification.
We subject our main results in Tables 2-4 to a battery of robustness tests, including (1) running IV-Probit instead of 2SLS regressions; (2) distinguishing the effect on the duration of self-employment from the effect on the choice of self-employment; (3) excluding cross-province migrants from the regression analyses; (4) Controlling for city characteristics; (5) Controlling for origin–destination differences; and (6) experimenting with alternative instrumental variables. We documented the results from these robustness checks in an earlier working paper (Zhang and Zhao, 2012) and will not discuss them in detail here. It suffices to say that our main results are robust to all of these sensitivity analyses.
In sum, we find consistent evidence that more friends or relatives lead to a higher probability of self-employment. Estimates from our preferred specification (in Table 4) show that an extra friend or relative increases the probability of self-employment by 0.85-0.99 percentage points. We also find that naïve OLS regressions greatly underestimate the effect, most likely because of measurement errors in the explanatory variables.
We take a standard IV approach to examine the effect of social-family networks on the choice of self-employment among rural–urban migrants in China. In particular, we use the distance from home province when a migrant first moved to an urban area to instrument for network size today. We believe the exclusion condition is likely to be satisfied in the particular institutional context of rural–urban migration in China and especially for the sample of migrants who first started with wage-earning jobs in urban sectors and have moved on to different jobs over time. We find that the migrants who initially moved further away—and therefore have fewer friends and relatives today—are less likely to shift from wage work to self-employment. We consider this result as evidence that the size of social-family network affects one’s self-employment decision.
To implement our empirical strategy, we have focused on some highly selective samples. Self-selection exists at various stages of sample construction. For example, some people chose to migrate to urban areas and others did not; some rural–urban migrants changed jobs and others did not. These issues are conveniently ignored. We hope that future work, with higher-quality data and better-designed empirical analyses, will help overcome these and other limitations of this study.
1See, for example, Birley (1985), Burt (1997), Brüderl and Preisendörfer (1997), Allen (2000), and Greve and Salaff (2003).
2See Montgomery (1991), Ioannides and Loury (2004), and Jackson (2008) for excellent reviews of this literature. Some recent work has paid careful attention to the problem of endogenous network formation (see, e.g., Munshi, 2003; Luke and Munshi, 2006; Beaman, 2012; and Laschever, 2009). Bian (1994) and Zhang and Li (2003) study networks and labor market outcomes in the context of China, but neither investigates the relationship between social-family networks and the choice of self-employment.
3Evans and Jovanovic (1989), Evans and Leighton (1989), Holtz-Eakin, Joulfaian, and Rosen (1994a, 1994b), Blanchflower and Oswald (1998), Fairlie (1999), and Parker (2004) all offer some supportive evidence, although Hurst and Lusardi (2004) cast doubt on some of these findings. In the context of China, Wang (2008) finds that the relaxation of constraints on capital (and job mobility), as a result of a wealth shock created by a housing reform, has increased self-employment.
4See, for example, Lentz and Laband (1990), Dunn and Holtz-Eakin (2000), Hout and Rosen (2000), Davidsson and Honig (2003), and Giannetti and Simonov (2009). Djankov et al. (2006) report that in China immediate and extended family members of entrepreneurs are nearly three times more likely to be entrepreneurs themselves than family members of non-entrepreneurs.
5Two existing studies are related to ours. Using survey data on 595 residents in the U.S. state of Wisconsin, Allen (2000) finds that the probability of self-employment is positively correlated with the size of family network, although not correlated with the number of friends. Using survey data on some 9,000 working-age adults in 13 cities in China, Yueh (2009) finds that an individual is more likely to be self-employed when the size of the social network is larger. Our paper is most closely related and complementary to Yueh’s work.
7This proportion increases to 22.9 percent if, in addition to household heads, we also consider other working members of the households.
8Using the same RUMiC data, Giulietti et al. (2012) has confirmed that the earnings differential is an important driver in the choice of self-employment among rural–urban migrants in China.
9We use the word “province” to refer to all provincial level jurisdictions in China, including 23 provinces, five autonomous regions, and four direct-control municipalities.
10Only one province, Hainan (which is on an island), is not connected with other provinces through railway. There are only two migrants from Hainan in the database, so we simply dropped those two observations.
11See, for example, Lü (2009), Wei (2008), and Xu and Qian (2009).
12The survey asked the first-time migrants the question, “who provided you the information for job hunting in the urban sector.” Relatives (52.7 percent) and other migrants from the same village (28.4 percent) overwhelmingly top the list of answers.
13This average is so small partly because we have forced the log distance of all within-province migrations to be zero.
14We must note here that because we are estimating a linear probability model and because the IV coefficient is better understood as a local average treatment effect, it is inappropriate to extrapolate this estimate too far away from the sample mean.
Akgüç M, Corrado G, Klaus FZ (2014) The RUMiC longitudinal survey: fostering research on labor markets in China. IZA J Labor Dev 3:5
We thank Randy Akee, Evan Due, Shihe Fu, Delia Furtado, Wayne Gray, Xin Meng, and Jiang Qian for stimulating discussions on this topic. We thank Jackline Wahba (managing editor), Sandra Poncet, and an anonymous referee for their thoughtful comments on earlier versions of this paper. We are grateful for comments and suggestions from participants at the RUMiCI conference in Yogyakarta, Indonesia, the 3rd Migration and Development Conference in Paris, and seminars at Clark University and Renmin University of China. Collection of the Rural Urban Migration in China (RUMiC) data used in this paper is financed by IZA, ARC/AusAid, the Ford Foundation, and the Ministry of Labor and Social Security of China. Zhong Zhao would like to acknowledge financial support from the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (project no. 10XNJ016).
Responsible editor: Jackline Wahba
Authors and Affiliations
Department of Economics, Clark University, 950 Main Street, Worcester, MA, 01610, USA
School of Labor and Human Resources, Renmin University of China, 59 Zhongguancun Street, Beijing, 100872, China
The IZA Journal of Labor and Development is committed to the IZA Guiding Principles of Research Integrity. The authors declare that they have observed these principles.
Zhang is an associate professor of economics at Clark University; Zhao is a professor of economics at Renmin University of China. Both are research fellows at the IZA.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.