Crosslingual Language Varieties: A Multifaceted Investigation
Final Report Abstract
The research project Crosslingual Language Varieties was motivated by the ubiquity of bilingualism and multilingualism in today’s world, and aimed to advance our knowledge of the diversity of language varieties used by speakers of one, two, or more languages. We posed several research questions that investigated the similarities and differences among various language varieties (including native language, learner language, advanced nonnative language and translation), and proposed to use three different research methods that complement and inform each other: computational analysis with large corpora; deep linguistic analysis with smaller, tightly-controlled corpora; and psycholinguistic experiments. To address these questions, we put a great emphasis on the construction of infrastructure, namely corpora, annotation schemes and other language resources. Another point of attention was the justification of the actual theoretical foundation of the predominant native speaker comparison paradigm. Language resources developed as part of the project include the Hebrew Learner Corpus, with a detailed annotation scheme of target hypotheses; and a subset of the Falko corpus with a deep annotation of reflexivity, based on an annotation scheme that was developed for this project. The main research contributions of the project address the differences between native and learner language, taking into account and using possible first language (L1) influence as a potential explanation for such differences. In one study which was based on the Hebrew corpus data introduced above, we aimed at the identification of non-native essay texts, attribution of the L1 of the learners, and prediction of their proficiency level. For this, we trained a model that sought the most influential linguistic characteristics and their most rewarding combination that would perform the desired classifications. In another study we used computational methods to check the tendency of English learners to prefer words stemming from their own L1 language family over etymologically different synonyms. We confirmed that such tendency indeed is evident in learners’ essays, and that it declines with rising proficiency. Yet another contribution uses a small, deeply annotated corpus of German as a foreign language to explore the underuse of reflexives in learner essays, finding that learners cope surprisingly well with patterns that require learning by rote. The insights gained from the project are mainly in the area of second language acquisition and different kinds characteristics of L1 influence on an L2. The discussion of different methodological approaches to crosslinguistic influence were especially fruitful when it came to the setup of infrastructure, but also considering the evaluation of research results in the light of statistic robustness and linguistic adequacy.
Publications
-
A Challenge for Contrastive L1/L2 Corpus Studies: Large Inter- and Intra-Individual Variation Across Morphological, but Not Global Syntactic Categories in Task-Based Corpus Data of a Homogeneous L1 German Group. Frontiers in Psychology, 12.
Shadrova, Anna; Linscheid, Pia; Lukassek, Julia; Lüdeling, Anke & Schneider, Sarah
-
FALKO. Eine Familie vielseitig annotierter Lernerkorpora des Deutschen als Fremdsprache. Korpora Deutsch als Fremdsprache (KorDaF). Vol.2, Issue 2.
Hagen Hirschmann, Anke Lüdeling, Anna Shadrova, Dominique Bobeck, Martin Klotz, Roodabeh Akbari, Sarah Schneider & Shujun Wan
-
Predicting the Proficiency Level of Nonnative Hebrew Authors. Proceedings of the Language Resources and Evaluation Conference, p. 5356-5365. Marseille, France.
Isabelle Nguyen & Shuly Wintner
-
The Hebrew Essay Corpus. Proceedings of the Language Resources and Evaluation Conference, p. 5580-5586. Marseille, France
Chen Gafni, Anat Prior & Shuly Wintner
-
Annotationsrichtlinien für Reflexive in Falko.(1.0.0) Zenodo.
Sarah Schneider & Dominique Bobeck
-
Do more proficient writers use fewer cognates in L2? A computational approach. Bilingualism: Language and Cognition, 27(1), 84-94.
Nativ, Liat; Nov, Yuval; Ordan, Noam; Wintner, Shuly & Prior, Anat
-
Analyzing learner language: the case of the Hebrew Learner Essay Corpus. Language Resources and Evaluation, 59(2), 685-726.
Gafni, Chen; Herzig, Sheinfux Livnat; Klunover, Hadar; Bar, Siman Tov Anat; Prior, Anat & Wintner, Shuly
