Project Details
Projekt Print View

Learning from high-dimensional, heterogeneous data: Machine learning methods in econometrics

Subject Area Statistics and Econometrics
Term since 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 431701914
 
Due to the advancing digitalization, large data sets with many possible predictor variables for different areas of human behaviour are available nowadays. Data from microeconomic applications often show heterogeneity regarding data sources, endogeneity of relevant variables and increasing data dimensions. Thus, the analysis of large amounts of high-dimensional data from this area requires tailor-made machine learning methods. In this project, we intend to develop and to extend machine learning methods for dealing with heterogeneous treatment effects in randomized experiments, as well as with heterogeneity using random coefficient regression models in consumer demand analysis. In this framework, we will investigate the least absolute shrinkage and selection operator (LASSO), the adaptive version of the LASSO, the causal version of the random forest methodology as well as boosting. A focus will be on analysing variable selection properties of these methods, partly combined with the recently introduced knockoff methodology to achieve control of the false discovery rate. On the methodological side, we will develop theoretical guarantees of the proposed methods and investigate efficient ways for generating knockoff variables. On the applied side, we will implement the methods in freely available software packages, and we will apply them to data sets from consumer demand and to randomized experiments.
DFG Programme Research Grants
International Connection China, United Kingdom, USA
 
 

Additional Information

Textvergrößerung und Kontrastanpassung