Project Details
Projekt Print View

Analyzing Historical and Cross-Cultural Bias through the lens of NLP (HistoBias)

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term since 2025
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 544712340
 
Bias has become a key concept in Natural Language Processing (NLP). One reason is that ever more mature NLP technologies carry the risk of negatively affecting society by reinforcing societal biases hidden in the underlying methods and data. In this project, we concern ourselves with social biases such as migrant or gender bias, where key NLP technologies may make decisions deemed unfair as they discriminate against certain actors and sub-populations. Our main focus is on detection and analysis of social bias cross-temporally and cross-culturally, which is largely neglected in current research, which has instead mainly focused on modern English for analysis of bias. In particular, we will analyze how multidimensional social bias relating to gender and migration status has changed over time and in different cultures across four languages: German, English, Italian and Czech. We will base our analysis on two different types of corpora, reflecting different subsets of society: political debates (since the 1950s) as well as social media (over the last 15 years). To enable our analysis, we will first collect corpora in different languages. We will preprocess the corpora which exhibit various types of language variation: in particular, they may exhibit historical spelling differences and meaning shift of individual words. After preprocessing, we will design multilingual bias detection analyses on top of static and contextualized text representations, extending tests like the Word Embedding Association Test (WEAT) and leveraging varying socalled “templates”. We focus on the detection of multidimensional bias (e.g., the intersection of being female and a migrant), an emergent topic in NLP. To ensure interpretability of our results, we will complement our automatic embedding-based bias detection tests with fine-granular human and automatic annotation aided by social and historical scientists. These annotations will allow interpretable insights into how biases develop over time and different languages. We will complement and validate our automatic text-based results with classical repeated cross-sectional surveys established in the social sciences, a.o. survey data from Microcensus, ALLBUS, Eurobarometer and European Social Survey. Our study will fill the gap in the NLP literature with respect to text-based analysis of bias in a historical and cross-cultural context, which can put current results on bias (for modern English) into perspective. We hope to be able to initiate new fruitful research directions which will help foster the understanding of how social biases have developed over time and across cultures and how they will likely develop in the future, to gain better empirically supported insights into the variability of bias. This better understanding in turn will contribute to greater fairness of machine learning approaches building on textual data.
DFG Programme Research Grants
International Connection Spain
Cooperation Partner Professor Paolo Rosso
 
 

Additional Information

Textvergrößerung und Kontrastanpassung