Project Details
Projekt Print View

Record Matching with Similarity Encoding: A Supervised Learning Recommendation Framework for Linking Survey Data.

Subject Area Statistics and Econometrics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Empirical Social Research
Term since 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 539465691
 
Recent research in economics and social sciences uses the German Socio-Economic Panel (SOEP) and the Allgemeine Bevölkerungsumfrage der Sozialwissenschaften (ALLBUS) surveys to study how external shocks (e.g., environmental or social) affect people's political preferences. For instance, Sabet et al. (2022) used SOEP data to demonstrate that right-wing terrorist attacks in different regions of the country lead to an increase in right-wing populist voting. In contrast, Huber & Yendell (2019) utilized ALLBUS data to argue that Christian religiosity is negatively associated with right-wing attitudes and voting behavior. However, the impact of such shocks on a region largely depends on the public's resilience, which is influenced by the actions of regional and federal politicians. Surprisingly, previous research has not investigated the actions of politicians representing affected areas in the aftermath of these events. Since the SOEP and ALLBUS surveys lack information on political representatives at various levels, this aspect remains unexplored. Ignoring these reactions in quantitative analyses may introduce biases in estimations, as they play a crucial role in shaping public opinion (e.g., Matsubayashi et al., 2014) and hinder a comprehensive understanding of causal mechanisms. To address this limitation, we propose linking politicians from different levels of Germany's federal structure with survey responses using geographic identifiers in SOEP and ALLBUS. We source information on elected politicians and their opponents from abgeordnetenwatch.de, covering European, Federal, and state elections across multiple legislative periods. Additionally, we offer to facilitate Natural Language Processing (NLP)-based analyses of politicians' reactions by suggesting links to their social media accounts (e.g., Twitter, Facebook) and Wikidata entries. However, it's worth noting that abgeordnetenwatch.de also includes public questions and corresponding political answers, making social media links optional. Social media data may introduce noise due to non-political content. Central to this proposal is Entity Matching (EM). In cases where a common identifier is absent, we intend to employ our recently developed EM framework, MLMATCH (Karapanagiotis & Liebald, 2023). This framework blends state-of-the-art EM techniques based on Artificial Neural Networks (ANN) with a novel similarity encoding derived from economic matching techniques. Its domain-independent EM performance and scalability make it well-suited for our proposal. We have already developed an alpha version of the software and plan to release a stable version publicly within the DFG infrastructure Priority Program "New Data Spaces for the Social Sciences" in the coming years.
DFG Programme Infrastructure Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung