Integrating Collaborative and Linguistic Resources for Word Sense Disambiguation and Semantic Role Labeling (InCoRe)
Final Report Abstract
Text analysis tools in natural language processing crucially rely on lexical-semantic resources and labeled datasets. Such resources and datasets are typically created by expert linguists, a timeconsuming and expensive effort. Therefore, user-generated resources, such as Wikipedia and Wiktionary, have gained popularity in recent years. Altogether various lexical resources have been created in many languages and contain different, complementary types of linguistic and semantic information in heterogenous formats. The main goal of InCoRe is to comprehensively integrate those lexical resources on the semantic level in a single linked lexical resource UBY. This entails that all the information that is available in the different linked resources for a specific word sense, for instance for the sense to understand of the verb to get, can be accessed directly from a single database. This will facilitate the use of and further research on resource-based natural language processing applications. The second goal is to show how the integrated resources can be used to benefit and improve resource-based natural language processing applications like word sense disambiguation (WSD) and semantic role labeling (SRL). Presented with a word in context, WSD identifies which word sense is used, for instance the sense Buying of the verb sell in Daimler sold Chrysler. SRL adds more semantic information by labeling the participants in an event with their semantic functions, e.g., identifying that Chrysler is the sold Goods in the sentence Daimler sold Chrysler. In InCoRe, we successfully addressed these two goals: first, we created the linked lexical resource UBY that links ten lexical-semantic resources on the sense level, and all resources with semantic roles additionally on the role level for two languages, English and German. We freely publish ready-to-use UBY databases in agreement with the licenses of the contained resources. Furthermore, the software to create and access the UBY databases, including a JAVA API, is continuously extended and published as an open-source project. A series of tutorials on how to create and use UBY databases complements our efforts to support the research community, and the feedback on mailinglists shows that the community of UBY-users is growing continuously. We then showed that a linked lexical resource like UBY benefits text analysis tools for the two tasks of word sense disambiguation and semantic role labeling. For word sense disambiguation, we developed an open-source JAVA library for knowledge-based WSD and successfully employed UBY for the creation of coarse-grained sense inventories, the basis for coarse-grained WSD. Coarse-grained word sense disambiguation is particularly relevant for practical applications, for instance machine translation or question answering. We were also able to prove that UBY benefits supervised WSD, which, unlike knowledge-based WSD, makes use of sense-labeled datasets: we developed a novel method for automatically creating sense labeled data that makes use of UBY in the paradigm of distant supervision. Regarding SRL, we extended the distant supervision-based method to FrameNet semantic role labels, successfully creating role-labeled training data that are complementary to existing manually labeled data. Experiments on different sense inventories, languages (English and German), and test datasets from various domains show that our approach can be successfully used to create large-scale sense- and role-labeled training data which can in turn be used to train high-quality supervised WSD and SRL systems. The created datasets and software are publicly vailable on the website of the UKP Lab. We furthermore created a gold-standard dataset of user-generated questions and answers labeled with word senses and roles from FrameNet. This allows us to evaluate our methods on the new domain of user-generated Web texts that is increasingly important for practical applications. In a follow-up project, we will study the benefits of linked lexical resources like UBY and the developed methods to the automatic answering of user-generated questions.
Publications
-
2013. DKPro WSD: A Generalized UIMA-based Framework for Word Sense Disambiguation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics – System Demonstrations (ACL 2013), p. 37-42, Sofia, Bulgaria
Tristan Miller, Nicolai Erbs, Hans-Peter Zorn, Torsten Zesch and Iryna Gurevych
-
2013. FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), vol. 1, p. 1363-1373, Sofia, Bulgaria
Silvana Hartmann and Iryna Gurevych
-
2014. Automated Verb Sense Labelling Based on Linked Lexical Resources. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), p. 68-77, Gothenburg, Sweden
Kostadin Cholakov, Judith Eckle-Kohler and Iryna Gurevych
-
2014. Lexical Substitution Dataset for German. In: Proceedings of the 9th International Conference on Language Resources and Evaluations (LREC 2014), p. 1406-1411, Reykjavik, Iceland
Kostadin Cholakov, Chris Biemann, Judith Eckle-Kohler and Iryna Gurevych