The Project for Statistics on Living standards and Development was a countrywide World Bank Living Standards Measurement Survey. It covered approximately 9000 households, drawn from a representative sample of South African households. The fieldwork was undertaken during the nine months leading up to the country's first democratic elections at the end of April 1994. The purpose of the survey was to collect statistical information about the conditions under which South Africans live in order to provide policymakers with the data necessary for planning strategies. This data would aid the implementation of goals such as those outlined in the Government of National Unity's Reconstruction and Development Programme.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
The scope of the study was:
· Household roster
· Household services
· Food Spending and Consumption
· Non-Food Spending
· Remittances and Marital Maintenance
· Land Access and Use
· Employment Status
· Agricultural Production
Agriculture & Rural Development
Food (production, crisis)
Land (policy, resource management)
Access to Finance
Population & Reproductive Health
Producers and sponsors
Southern Africa Labour and Development Research Unit
The World Bank
Government of Denmark
Government of the Netherlands
Government of Norway
(a) SAMPLE SIZE
Sample size is 9,000 households. The sample design adopted for the study was a two-stage self-weighting design in which the first stage units were Census Enumerator Subdistricts (ESDs, or their equivalent) and the second stage were households. The advantage of using such a design is that it provides a representative sample that need not be based on accurate census population distribution.in the case of South Africa, the sample will automatically include many poor people, without the need to go beyond this and oversample the poor. Proportionate sampling as in such a self-weighting sample design offers the simplest possible data files for further analysis, as weights do not have to be added. However, in the end this advantage could not be retained, and weights had to be added. The sampling frame was drawn up on the basis of small, clearly demarcated area units, each with a population estimate. The nature of the self-weighting procedure adopted ensured that this population estimate was not important for determining the final sample, however. For most of the country, census ESDs were used. Where some ESDs comprised relatively large populations as for instance in some black townships such as Soweto, aerial photographs were used to divide the areas into blocks of approximately equal population size. In other instances, particularly in some of the former homelands, the area units were not ESDs but villages or village groups. In the sample design chosen, the area stage units (generally ESDs) were selected with probability proportional to size, based on the census population.
(b) SAMPLE DESIGN
Systematic sampling was used throughout that is, sampling at fixed interval in a list of ESDs, starting at a randomly selected starting point. Given that sampling was self-weighting, the impact of stratification was expected to be modest. The main objective was to ensure that the racial and geographic breakdown approximated the national population distribution. This was done by listing the area stage units (ESDs) by statistical region and then within the statistical region by urban or rural. Within these sub-statistical regions, the ESDs were then listed in order of percentage African. The sampling interval for the selection of the ESDs was obtained by dividing the 1991 census population of 38,120,853 by the 300 clusters to be selected. This yielded 105,800. Starting at a randomly selected point, every 105,800th person down the cluster list was selected. This ensured both geographic and racial diversity (ESDs were ordered by statistical sub-region and proportion of the population African). In three or four instances, the ESD chosen was judged inaccessible and replaced with a similar one. In the second sampling stage the unit of analysis was the household. In each selected ESD a listing or enumeration of households was carried out by means of a field operation. From the households listed in an ESD a sample of households was selected by systematic sampling. Even though the ultimate enumeration unit was the household, in most cases "stands" were used as enumeration units. However, when a stand was chosen as the enumeration unit all households on that stand had to be interviewed. Census population data, however, was available only for 1991. An assumption on population growth was thus made to obtain an approximation of the population size for 1993, the year of the survey. The sampling interval at the level of the household was determined in the following way: Based on the decision to have a take of 125 individuals on average per cluster (i.e. assuming 5 members per household to give an average cluster size of 25 households), the interval of households to be selected was determined as the census population divided by 118.1, i.e. allowing for population growth since the census. It was subsequently discovered that population growth was slightly over-estimated, but this had little effect on the findings of the survey. Individuals in hospitals, old age homes, hotels and hostels of educational institutions were not included in the sample. Migrant labour hostels were included. In addition to those that turned up in the selected ESDs, a sample of three hostels was chosen from a national list provided by the Human Sciences Research Council and within each of these hostels a representative sample was drawn on a similar basis as described above for the households in ESDs.
A self-weighting sample design should in principle eliminate the need for weighting. A number of factors intervened, however, which made it essential to use weights after all. Amongst these was violence, which prevented survey teams from conducting interviews in two clusters on the East Rand; failure to continue interviewing in a cluster until the required take had been interviewed; and systematic under-representation of whites in the sample
Dates of Data Collection
Data Collection Mode
All the questionnaires were checked when received. Where information was incomplete or appeared contradictory, the questionnaire was sent back to the relevant survey organization. As soon as the data was available, it was captured using local development platform ADE. This was completed in February 1994. Following this, a series of exploratory programs were written to highlight inconsistencies and outlier. For example, all person level files were linked together to ensure that the same person code reported in different sections of the questionnaire corresponded to the same person. The error reports from these programs were compared to the questionnaires and the necessary alterations made. This was a lengthy process, as several files were checked more than once, and completed at the beginning of August 1994. In some cases, questionnaires would contain missing values, or comments that the respondent did not know, or refused to answer a question. These responses are coded in the data files with the following values:
-1 : The data was not available on the questionnaire or form
-2 : The field is not applicable
-3 : Respondent refused to answer
-4 : Respondent did not know answer to question
The data collected in clusters 217 and 218 should be viewed as highly unreliable and therefore removed from the data set. The data currently available on the web site has been revised to remove the data from these clusters. Researchers who have downloaded the data in the past should revise their data sets. For information on the data in those clusters, contact SALDRU http://www.saldru.uct.ac.za/.
In receiving these data it is recognized that the data are supplied for use within my organization, and I agree to the following stipulations as conditions for the use of the data:
1. The data are supplied solely for the use described in this form and will not be made available to other organizations or individuals. Other organizations or individuals may request the data directly.
2. Three copies of all publications, conference papers, or other research reports based entirely or in part upon the requested data will be supplied to:
SALDRU School of Economics
University of Cape Town
3. The researcher will refer to the 1993 South Africa Integrated Household Survey as the source of the information in all publications, conference papers, and manuscripts. At the same time, the World Bank is not responsable for the estimations reported by the analyst(s).
4. Users who download the data may not pass the data to third parties.
5. The database cannot be used for commercial ends, nor can it be sold.
Use of the dataset must be acknowledged by including a citation which would include:
- Identification of the Primary Investigator
- Title of the survey (including the year of implementation)
- Survey reference number
- Source and date of download
Example: Southern Africa Labour and Development Research Unit. Integrated Household Survey (IHS) 1993 Ref. ZAF_1993_IHS_v01_M. Dataset downloaded from www.microdata.worldbank.org on [date]
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses