This dissertation consists of three papers sharing the objective to analyze how machine learning methods can be useful to economists and econometricians in their pursuit to understand causal mechanisms operating in the economy. Such causal knowledge is essential when designing policies that help achieve societal goals. ML techniques are increasingly applied in and adapted to practical policy settings. These are characterized by the same type of endogeneity problems that make actionable inference from data difficult and that economists are occupied with. Thus, there are many potential synergies between ML and economics that are surfacing on both the academic and policy-making agendas. Contributions to two points of interchange between the two fields are made. First, ML can be used to improve or extend widely-used identification techniques in economics and, second, insights into causal modeling from the ML community can be introduced as novel routes to identification in economics. The first paper of this dissertation falls in the former, the second and third paper in the latter category.
In the first paper of this dissertation, we adapt the causal forest methodology proposed by Athey et al (2019) to estimate heterogeneous treatment effects in difference-in-differences studies and analyze heterogeneous effects on wage growth of the 2015 introduction of the statutory minimum wage in Germany. Two contributions are made. First, we show how the causal forest methodology can be applied in difference-in-differences settings. Second, we show that previously documented effect heterogeneities can be explained by interactions of other covariates.
The starting point for the second and third paper of this dissertation is the second point of interchange. There is a tendency to argue ML techniques' strength is their superior predictive capacity. However, above and beyond the idea that superior prediction can be useful in causal inference problems, developments in the ML community question this dictum: Techniques to model causal relations and to identify them from observational data are emerging (for a survey see Peters et al 2017).
A central tenet of causal machine learning is that the observed joint distribution of a number of random variables contains causal information in the form of invariance properties. This causal information can be exploited by appropriate statistical techniques, even in the absence of quasi-experimental techniques. In that sense, the causal machine learning literature offers novel pathways to causal understanding that are not yet exploited in economics. The originality of the second and third paper lies in exploring the potential of these novel pathways.
In the second paper, we propose a test for reverse causality that relies on the insight that making functional form assumptions can help identify the causal direction between two observed variables. Two contributions are made. First, we extend existing research from the computer science community on the identifiability of the causal direction by addressing heteroskedastic error structures and the presence of additional control variables. Second, we provide a test for reverse causality that does not rely on instruments.
In the third paper, I propose a test for instrument validity, which relies on a method proposed by Janzing & Schölkopf (2018) to quantify confounding in multivariate linear models. Given the often controversial identifying assumptions in instrumental variable models, whose justification is rarely statistically-grounded, such a method is a valuable addition to the empirical economics toolkit. Two contributions are made. First, I address the limitation of Janzing & Schölkopf (2018) of providing an overall degree of confounding for the whole model and provide a way to use their method to estimate a degree of confounding of a single covariate in multivariate linear models. Second, I show how this method can be employed to test for instrument validity in instrumental variable models and provide an empirical application.