dc.description.abstract
In 2015, the United Nations (UN) set up 17 Sustainable Development Goals (SDGs) to be achieved by 2030 (General Assembly, 2015). The goals encompass indicators of various socioeconomic characteristics (General Assembly, 2015). To reach them, there is a need to reliably
measure the indicators, especially at disaggregated levels. National Statistical Institutes (NSI)
collect data on various socio-economic indicators by conducting censuses or sample surveys.
Although a census provides data on the entire population, it is only carried out every 10 years
in most countries and it requires enormous financial resources. Sample surveys on the other
hand are commonly used because they are cheaper and require a shorter time to collect (Sarndal et al., 2003; Cochran, 2007). They are, therefore, essential sources of data on the country’s key
socio-economic indicators, which are necessary for policy-making, allocating resources, and
determining interventions necessary. Surveys are mostly designed for the national level and
specific planned areas or domains. Therefore, the drawback is sample surveys are not adequate
for data dis-aggregation due to small sample sizes (Rao and Molina, 2015). In this thesis,
geographical divisions will be called areas, while other sub-divisions such as age-sex-ethnicity
will be called domains in line with (Pfeffermann, 2013; Rao and Molina, 2015).
One solution to obtain reliable estimates at disaggregated levels is to use small area
estimation (SAE) techniques. SAE increases the precision of survey estimates by combining the
survey data and another source of data, for example, a previous census, administrative data or
other passively recorded data such as mobile phone data as used in Schmid et al. (2017). The
results obtained using the survey data only are called direct estimates, while those obtained using
SAE models will be called model-based estimates. The auxiliary data are covariates related to
the response variable of interest (Rao and Molina, 2015). According to Rao and Molina (2015), an area or domain is regarded as small if the area or domain sample size is inadequate to estimate the desired accuracy. The field of SAE has grown substantially over the years
mainly due to the demand from governments and private sectors. Currently, it is possible to estimate several linear and non-linear target statistics such as the mean and the Gini coefficient (Gini, 1912), respectively. This thesis contributes to the wide literature on SAE by presenting
three important applications using Kenyan data sources.
Chapter 1 is an application to estimate poverty and inequality in Kenya. The Empirical
Best Predictor (EBP) of Molina and Rao (2010) and the M-quantile model of Chambers and
Tzavidis (2006) are used to estimate poverty and inequality in Kenya. Four indicators are
estimated, i.e. the mean, the Head Count Ratio, the Poverty Gap and the Gini coefficient. Three transformations are explored: the logarithmic, log-shift and the Box-Cox to mitigate the
requirement for normality of model errors. The M-quantile model is used as a robust alternative
to the EBP. The mean squared errors are estimated using bootstrap procedures. Chapter 2 is an application to estimate health insurance coverage in Kenyan counties using a binary M-quantile
SAE model (Chambers et al., 2016) for women and men aged 15 to 49 years old. This has
the advantage that we avoid specifying the distribution of the random effects and distributional
robustness is automatically achieved. The MSE is estimated using an analytical approach based
on Taylor series linearization. Chapter 3 presents the estimation of overweight prevalence at the county level in Kenya. In this application, the Fay-Herriot model (Fay and Herriot, 1979) is
explored with arcsine square-root transformation. This is to stabilize the variance and meet the
assumption of normality. To transform back to the original scale, we use a bias-corrected back
transformation. For this model, the design variance is smoothed using Generalized Variance
Functions as in (Pratesi, 2016, Chapter 11). The mean squared error is estimated using a
bootstrap procedure. In summary, this thesis contributes to the vast literature on small area
estimation from an applied perspective by;
(a) Presenting for the first time regional disaggregated SAE results for selected indicators for
Kenya.
(b) Combining data sources to improve the estimation of the selected disaggregated socioeconomic
indicators.
(c) Exploring data-driven transformations to mitigate the assumption of normality in linear
and linear mixed-effects models.
(d) Presenting a robust approach to small area estimation based on the M-quantile model.
(e) Estimating the mean squared error to access uncertainty using bootstrap procedures.
en