China Living Standards Survey (LSS) consists of one household survey and one community (village) survey, conducted in Hebei and Liaoning Provinces (northern and northeast China) in July 1995 and July 1997 respectively. Five villages from each three sample counties of each province were selected (six were selected in Liaoyang County of Liaoning Province because of administrative area change). About 880 farm households were selected from total thirty-one sample villages for the household survey. The same thirty-one villages formed the samples of community survey. This document provides information on the content of different questionnaires, the survey design and implementation, data processing activities, and the different available data sets.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
The scope of the study included:
(a) HOUSEHOLD QUESTIONNAIRE
Section 1: General Household Information
Section 2: Schooling
Section 3: Employment
Section 4: Housing
Section 5: Farmland
Section 6: Agricultural Management
Section 7: Family-Run Non-Farm
Section 8: Household Consumption
Section 9: Gift-Exchange, Remittances, and Other Income
Section 10: Credit and Savings
(b) COMMUNITY QUESTIONNAIRE
Section 1: Village Basic Information and Labour Migration
Section 2: Cultivated Land System
Section 3: Village Stores, Periodic Markets, and Farmers Commercial Activities
Section 4: Village-Run Enterprise
Section 5: Rural Credit Market
Section 6: Chemical Fertilizer Market
Section 7: Village Leadership
Agriculture & Rural Development
Food (production, crisis)
Migration & Remittances
Producers and sponsors
Research Centre for Rural Economy
The World Bank
The World Bank
The China LSS sample is not a rigorous random sample drawn from a well-defined population. Instead it is only a rough approximation of the rural population in Hebei and Liaoning provinces in North-eastern China. The reason for this is that part of the motivation for the survey was to compare the current conditions with conditions that existed in Hebei and Liaoning in the 1930's. Because of this, three counties in Hebei and three counties in Liaoning were selected as "primary sampling units" because data had been collected from those six counties by the Japanese occupation government in the 1930's. Within each of these six counties (xian) five villages (cun) were selected, for an overall total of 30 villages (in fact, an administrative change in one village led to 31 villages being selected). In each county a "main village" was selected that was in fact a village that had been surveyed in the 1930s. Because of the interest in these villages 50 households were selected from each of these six villages (one for each of the six counties). In addition, four other villages were selected in each county. These other villages were not drawn randomly but were selected so as to "represent" variation within the county. Within each of these villages 20 households were selected for interviews. Thus, the intended sample size was 780 households, 130 from each county. Unlike county and village selection, the selection of households within each village was done according to standard sample selection procedures. In each village, a list of all households in the village was obtained from village leaders. An "interval" was calculated as the number of the households in the village divided by the number of households desired for the sample (50 for main villages and 20 for other villages). For the list of households, a random number was drawn between 1 and the interval number. This was used as a starting point. The interval was then added to this number to get a second number, then the interval was added to this second number to get a third number, and so on. The set of numbers produced were the numbers used to select the households, in terms of their order on the list. In fact, the number of households in the sample is 785, as opposed to 780. Most of this difference is due to a village in which 24 households were interviewed, as opposed to the goal of 20 households
Dates of Data Collection
Data Collection Mode
(a) DATA ENTRY
All responses obtained from the household interviews were recorded in the household questionnaires. These were then entered into the computer, in the field, using data entry programs written in BASIC. The data produced by the data entry program were in the form of household files, i.e. one data file for all of the data in one household/community questionnaire. Thus, for the household there were about 880 data files. These data files were processed at the University of Toronto and the World Bank to produce datasets in statistical software formats, each of which contained information for all households for a subset of variables. The subset of variables chosen corresponded to data entry screens, so these files are hereafter referred to as "screen files". For the household survey component 66 data files were created. Members of the survey team checked and corrected data by checking the questionnaires for original recorded information. We would like to emphasize that correction here refers to checking questionnaires, in case of errors in skip patterns, incorrect values, or outlying values, and changing values if and only if data in the computer were different from those in the questionnaires. The personnel in charge of data preparation were given specific instructions not to change data even if values in the questionnaires were clearly incorrect. We have no reason to believe that these instructions were not followed, and every reason to believe that the data resulting from these checks and corrections are accurate and of the highest quality possible.
(b) DATA EDITING
The screen files were then brought to World Bank headquarters in Washington, D.C. and uploaded to a mainframe computer, where they were converted to "standard" LSMS formats by merging datasets to produce separate datasets for each section with variable names corresponding to the questionnaires. In some cases, this has meant a single dataset for a section, while in others it has meant retaining "screen" datasets with just the variable names changed. Linking Parts of the Household Survey Each household has a unique identification number which is contained in the variable HID. Values for this variable range from 10101 to 60520. The first number is the code for the six counties in which data were collected, the second and third digits are for the villages within each county. Finally, the last two digits of HID contain the household number within the village. Data for households from different parts of the survey can be merged by using the HID variable which appears in each dataset of the household survey. To link information for an individual use should be made of both the household identification number, HID, and the person identification number, PID. A child in the household can be linked to the parents, if the parents are household members, through the parents' id codes in Section 01B. For parents who are not in the household, information is collected on the parent's schooling, main occupation and whether he/she is currently alive. Household members can be linked with their non-resident children through the parents' id codes in Section 01C. Linking the Household to the Community Data The community data have a somewhat different set of identifying variables than the household data. Each community dataset has four identifying variables: province (code 7 for Hebei and code 8 for Liaoning); county (six two digit codes, of which the first digit represents province and the second digit represents the three counties in each province); township (3 digit code, first digit is county, second digit is county and third digit is township); and village (4 digit code, first digit is county, second digit is county, third digit is township, and third fourth digit is village). Constructed Data Set Researchers at the World Bank and the University of Toronto have created a data set with information on annual household expenditures, region codes, etc. This constructed data set is made available for general use with the understanding that the description below is the only documentation that will be provided. Any manipulation of the data requires assumptions to be made and, as much as possible, those assumptions are explained below. Except where noted, the data set has been created using only the original (raw) data sets. A researcher could construct a somewhat different data set by incorporating different assumptions. Aggregate Expenditure, TOTEXP. The dataset TOTEXP contains variables for total household annual expenditures (for the year 1994) and variables for the different components of total household expenditures: food expenditures, non-food expenditures, use value of consumer durables, etc. These, along with the algorithm used to calculate household expenditures are detailed in Appendix D. The dataset also contains the variable HID, which can be used to match this dataset to the household level data set. Note that all of the expenditure variables are totals for the household. That is, they are not in per capita terms. Researchers will have to divide these variables by household size to get per capita numbers. The household size variable is included in the data set.
In receiving these data it is recognized that the data are supplied for use within your organization, and you agree to the following stipulations as conditions for the use of the data:
1. The data are supplied solely for the use described in this form and will not be made available to other organizations or individuals. Other organizations or individuals may request the data directly.
2. Three copies of all publications, conference papers, or other research reports based entirely or in part upon the requested data will be supplied to:
The World Bank
Development Economics Research Group
LSMS Database Administrator
1818 H Street,
3. The researcher will refer to the 1995 & 97 China - Heibei and Liaoning Living Standards Survey as the source of the information in all publications, conference papers, and manuscripts. At the same time, the World Bank is not responsable for the estimations reported by the analyst(s).
4. Users who download the data may not pass the data to third parties.
5. The database cannot be used for commercial ends, nor can it be sold.
Use of the dataset must be acknowledged by including a citation which would include:
- Identification of the Primary Investigator
- Title of the survey (including the country name and year of implementation)
- Survey reference number - Source and date of download
Example: Research Centre for Rural Economy (China) and the World Bank.,China Living Standards Survey 1995 -1997. Ref. CHN_1995_LSS_v01_M. Dataset downloaded from the World Bank Microdata Library (www.microdata.worldbank.org/lsms) on [date]
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses