Project Details
Integration of additional micro data sets in the core data
Subject Area
Empirical Social Research
Term
from 2018 to 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 316511172
The basis for small-scale segmented microsimulations is a suitable basis population, which realistically reproduces individual and geographical structures.Partial project 1 (TP 1) concentrates on building a basic dataset based on register and census data. For this purpose, further variables for the primary school simulation are implemented using additional sources of data. TP 2 is concerned with a comprehensive expansion of this database in order to enable research in the fields of health care and migration, which are research topics in partial projects 4 and 5.For this data expansion a toolbox of methods is built and extended, which permits to integrate further topic-related sets of variables beyond the scope the two topics of care and migration. In doing so the microstructures in form of individual combinations of characteristics as well as household relations should be preserved in the best possible way. Therefore, in particular, methods of synthetic data generation are applied.The data generating process has to distinguish between micro and macro structures with special emphasis on the harmonization of both levels. In order to capture these structures, social-scientific surveys such as the micro-census or SOEP as well as further topic-specific datasets are used. In addition, the database should reproduce regional differences, since these are fundamental for carrying out geographically differentiated microsimulations. As in many surveys regionally differentiated data are of low quality due to small sample sizes, special statistical procedures from small area estimation are used.The whole expansion of the basic population is conducted in three essential processes. In the modelling process the structures of the variables are first captured as good as possible and then added to the existing database in the prediction process. The estimated models as well as the generated distributions are checked in the validation process. This validation is done both individually for all added variables as well as in a multivariate analysis for the entire basis population.In order to ensure plausibility of the overall basis population, different methods of statistical and logical editing are used. The entire program of methods will be made available to the research community in the sense of Open and Reproducible Research.
DFG Programme
Research Units