Survey ID Number
COVID-19 Rapid Response Phone Survey with Households, Wave 1-8, 2020-2022
The World Bank in collaboration with the Kenya National Bureau of Statistics and the University of California, Berkeley are conducting the Kenya COVID-19 Rapid Response Phone Survey to track the socioeconomic impacts of the COVID-19 pandemic, the recovery from it as well as other shocks to provide timely data to inform policy. This dataset contains information from eight waves of the COVID-19 RRPS, which is part of a panel survey that targets Kenyan nationals and started in May 2020. The same households were interviewed every two months for five survey rounds, in the first year of data collection and every four months thereafter, with interviews conducted using Computer Assisted Telephone Interviewing (CATI) techniques.
The data set contains information from two samples of Kenyan households. The first sample is a randomly drawn subset of all households that were part of the 2015/16 Kenya Integrated Household Budget Survey (KIHBS) Computer-Assisted Personal Interviewing (CAPI) pilot and provided a phone number. The second was obtained through the Random Digit Dialing method, by which active phone numbers created from the 2020 Numbering Frame produced by the Kenya Communications Authority are randomly selected. The samples cover urban and rural areas and are designed to be representative of the population of Kenya using cell phones. Waves 1-7 of this survey include information on household background, service access, employment, food security, income loss, transfers, health, and COVID-19 knowledge and vaccinations. Wave 8 focused on how households were exposed to shocks, in particular adverse weather shocks and the increase in the price of food and fuel, but also included parts of the previous modules on household background, service access, employment, food security, income loss, and subjective wellbeing.
The data is uploaded in three files. The first is the hh file, which contains household level information. The ‘hhid’, uniquely identifies all household. The second is the adult level file, which contains data at the level of adult household members. Each adult in a household is uniquely identified by the ‘adult_id’. The third file is the child level file, available only for waves 3-7, which contains information for every child in the household. Each child in a household is uniquely identified by the ‘child_id’.
The duration of data collection and sample size for each completed wave was:
Wave 1: May 14 to July 7, 2020; 4,061 Kenyan households
Wave 2: July 16 to September 18, 2020; 4,492 Kenyan households
Wave 3: September 28 to December 2, 2020; 4,979 Kenyan households
Wave 4: January 15 to March 25, 2021; 4,892 Kenyan households
Wave 5: March 29 to June 13, 2021; 5,854 Kenyan households
Wave 6: July 14 to November 3, 2021; 5,765 Kenyan households
Wave 7: November 15, 2021, to March 31, 2022; 5,633 Kenyan households
Wave 8: May 31 to July 8, 2022: 4,550 Kenyan households
The same questionnaire is also administered to refugees in Kenya, with the data available in the UNHCR microdata library:
Cross-Sectional weights
For the KNBS and RDD samples, to make the sample nationally representative of the current population of households with mobile phone access, we create weights in two steps.
Step 1: Construct raw weights combining the two national samples: The current population consists of
(I) households that existed in 2015/16, and did not change phone numbers,
(II) households that existed in 2015/16, but changed phone number,
(III) households that did not exist in 2015/16.
Abstracting from differential attrition, the weights from the 2015/16 KIHBS CAPI pilot make the KIHBS sample representative of type (I) households. For RDD households, we ask whether they existed in 2015/16, when they had acquired their phone number, and where they lived in 2015/16, allowing us to classify them into type (I), (II) and (III) households and assign them to KIHBS strata. We adjust weights of each RDD household to be inversely proportional to the number of mobile phone numbers used by the household, and scale them relative to the average number of mobile phone numbers used in the KIHBS within each stratum. RDD therefore gives us a representative sample of type (II) and (III) households. We then combine RDD and KIHBS type (I) households by ex-post adding RDD households into the 2015/16 sampling frame and adjusting weights accordingly. Last, we combine our representative samples of type (I), type (II) and type (III), using the share of each type within each stratum from RDD (inversely weighted by number of mobile phone numbers). Variable: weight_raw
Step 2: Scale the weights to population proportions in each county and urban/rural stratum: We use post stratification to adjust for differential attrition and response rates across counties and rural/urban strata. We scale the raw weights from step 1 to reflect the population size in each county and rural/urban stratum as recorded in the 2019 Kenya Population and Housing Census conducted by the KNBS (2019 Kenya Population and Housing Census, Volume II: Distribution of Population by Administrative Units, December 2019, Kenya National Bureau of Statistics, Variable: weight
In addition to being nationally representative, the data is also weekly representative for all waves except for wave 8. The variable weight_weekly should be used for weekly representative estimates.
Panel Weights
To construct panel weights, we follow the approach outlined in Himelein (2014): “Weight Calculations for Panel Surveys with Subsampling and Split-off Tracking”. In each household we follow one target respondent. Wherever households split, only the current household of the target respondent was interviewed. The weights for the wave 1 and 2 balanced panel are constructed by applying the following steps to the full sample of Kenyan nationals:
0. Wave 1 cross-sectional weights after post-stratification adjustment are used as a base. W_1 = W_wave1
1. Attrition adjustment through propensity score-based method: The predicted probability that a sample household was successfully re-interviewed in the second survey wave is estimated through a propensity score estimation. The propensity score (PS) is modeled with a linear logistic model at the level of the household. The dependent variable is a dummy indicating whether a household that has completed the survey in wave 1 has also done so in wave. The following covariates were used in the linear logistic model: Urban/rural dummy, County dummies, Household head gender, Household head age, Household size, Dependency ratio, Dummy: Is anyone in the household working, Asset ownership: Radio, Asset ownership: Mattress, Asset ownership: Charcoal Jiko, Asset ownership: Fridge, Wall material: 3 dummies, Floor materials: 3 dummies, Connection to electricity grid, Number of mobile phones numbers household uses, Number of phone numbers recorded for follow-up, Sample dummy for estimation with national samples
2. Rank households by PS and split into 10 equal groups
3. Calculate attrition adjustment factor: ac (attrition correction) = the reciprocal of the mean empirical response rate for the propensity score decile
4. Adjust base weights for attrition: W_2 = W_1 * ac
5. Trim top 1 percent of the weights distribution (), by replacing the weights among the top 1 percent of the distribution with the highest value of a weight below the cutoff. W_3 = trim(W_2)
6. Apply post-stratification in the same way as for cross-sectional weights (step 2) Variable: weight_panel_w1_2
The balanced panel weights including waves 3, 4, 5, 6, 7 and 8 were constructed using the same procedure. Variables: weight_panel_w1_2_3, weight_panel_w1_2_3_4, weight_panel_w1_2_3_4_5, weight_panel_w1_2_3_4_5_6, weight_panel_w1_2_3_4_5_6_7 and weight_panel_w1_2_3_4_5_6_7_8.