dc.contributor.author
Krennmair, Patrick
dc.date.accessioned
2023-01-16T12:38:44Z
dc.date.available
2023-01-16T12:38:44Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/37542
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-37256
dc.description.abstract
The thesis combines four papers that introduce a coherent framework based on MERFs for the estimation of spatially disaggregated economic and inequality indicators and associated uncertainties. Chapter 1 focusses on flexible domain prediction using MERFs. We discuss characteristics of semi-parametric point and uncertainty estimates for domain-specific means. Extensive model- and design-based simulations highlight advantages of MERFs in comparison to 'traditional' LMM-based SAE methods. Chapter 2 introduces the use of MERFs under limited covariate information. The access to population-level micro-data for auxiliary information imposes barriers for researchers and practitioners. We introduce an approach that adaptively incorporates aggregated auxiliary information using calibration-weights in the absence of unit-level auxiliary data. We apply the proposed method to German survey data and use aggregated covariate census information from the same year to estimate the average opportunity cost of care work for 96 planning regions in Germany. In Chapter 3, we discuss the estimation of non-linear poverty and inequality indicators. Our proposed method allows to estimate domain-specific cumulative distribution functions from which desired (non-linear) poverty estimators can be obtained. We evaluate proposed point and uncertainty estimators in a design-based simulation and focus on a case study uncovering spatial patterns of poverty for the Mexican state of Veracruz. Additionally, Chapter 3 informs a methodological discussion on differences and advantages between the use of predictive algorithms and (linear) statistical models in the context of SAE. The final Chapter 4 complements the previous research by implementing discussed methods for point and uncertainty estimates in the open-source R package SAEforest. The package facilitates the use of discussed methods and accessibly adds MERFs to the existing toolbox for SAE and official statistics.
Overall, this work aims to synergize aspects from two statistical spheres (e.g. 'traditional' parametric models and nonparametric predictive algorithms) by critically discussing and adapting tree-based methods for applications in SAE. In this perspective, the thesis contributes to the existing literature along three dimensions: 1) The methodological development of alternative semi-parametric methods for the estimation of non-linear domain-specific indicators and means under unit-level and aggregated auxiliary covariates. 2) The proposition of a general framework that enables further discussions between 'traditional' and algorithmic approaches for SAE as well as an extensive comparison between LMM-based methods and MERFs in applications and several model and design-based simulations. 3) The provision of an open-source software package to facilitate the usability of methods and thus making MERFs and general SAE methodology accessible for tailored research applications of statistical, institutional and political practitioners.
en
dc.format.extent
135 Seiten
dc.rights.uri
http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen
dc.subject
Small Area Estimation
en
dc.subject
Mixed Effects Random Forests
en
dc.subject
Official Statistics
en
dc.subject.ddc
300 Sozialwissenschaften::310 Statistiken::310 Sammlungen allgemeiner Statistiken
dc.title
A Framework for the Estimation of Disaggregated Statistical Indicators Using Tree-Based Machine Learning Methods
dc.contributor.gender
male
dc.contributor.firstReferee
Schmid, Timo
dc.contributor.furtherReferee
Tzavidis, Nikos
dc.date.accepted
2022-12-19
dc.identifier.urn
urn:nbn:de:kobv:188-refubium-37542-8
refubium.affiliation
Wirtschaftswissenschaft
dcterms.accessRights.dnb
free
dcterms.accessRights.openaire
open access