Infrastruktur für interaktive verteilte Exploration und Annotation grosser Korpora und Wissensbasen
Zusammenfassung der Projektergebnisse
The project has developed INCEpTION, an open source text annotation platform software. It has become a popular go-to solution for users with the need to annotate text across a wide range of disciplines and use-cases. The INCEpTION annotation platform is incorporates three functional pillars: automatic annotation suggestions using machine learning, knowledge management and search. These support in particular tasks such as named entity annotation and named entity linking, but as the use-cases mentioned above highlight, the platform is by far not limited to such tasks. INCEpTION has evolved to fully replace the earlier WebAnno tool, while still remaining largely backwards compatible and offering WebAnno users and easy upgrade path. With respect to automatic annotation suggestions, INCEpTION not only brings several machine learning algorithms out of the box, but also allows to connect custom external machine learning algorithms or even text analysis services such as European Language Grid and CLARIN WebLicht. With respect to knowledge management, INCEpTION supports the widely used SPARQL protocol which allows connecting to many knowledge providers such as offered by the ZBW Leibnitz Information Centre for Economics, DBPedia, Wikidata, the Food and Agriculture Organization (FAO) of the United Nations and many more. In Aug 2022, ca. 350 active installations in 74 countries were sending us anonymous usage data, with the majority of installations in Germany, the U.S.A., China and France. Considering that up to 70% of the installations may have decided to opt out of anonymous data submissions, the actual number of installations may be over 1 000. Over 430 people have starred the INCEpTION project on GitHub. Over 200 people have subscribed to the project’s mailing list and generated over 1 300 messages in over 400 conversations. Until Aug 2022, over 160 users have opened over 400 issues with bug reports, feature requests and questions on our GitHub issue tracker, contributing to the total of over 1 900 issues tracked there at the time (1 700 of which have been resolved). Based on the interaction with the user community during the project, we deviated in various aspects from the original ideas presented in the project proposal. The most significant deviation was the switch from focusing compute clusters to process data to being interoperable with GPU-based machine learning. The advent of GPU-enabled deep learning caused to community to strongly focus on this technology and artificial neural networks have achieved amazing new results over the last years. It became clear early in the project that we needed to support these new technologies and so we decided to also switch our focus from cluster-based processing to interoperability with GPU-based machine learning, in particular the ability to interface with popular Python-based packages. The architectural changes involved in this switch also enabled us to interoperate with external text analysis services such as those offered by the European Language Grid, CLARIN WebLicht, LAPPS Grid (deprecated) or Huggingface (experimental).
Projektbezogene Publikationen (Auswahl)
-
A tool for extracting sense-disambiguated example sentences through user feedback. Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 69-72.
Boullosa, Beto; de Castilho, Richard Eckart; Geyken, Alexander; Lemnitzer, Lothar & Gurevych, Iryna
-
INCEpTION - A community-oriented smart semantic annotation platform19 (poster) at a workshop organized by the Digital Humanities Initiative in the RMU Network (DH-RMU) in Mainz
Beto Boullosa
-
INCEpTION - A Community-Oriented Smart Semantic Annotation Platform (poster) at the Amazon Research Days, Berlin
Richard Eckart de Castilho
-
INCEpTION - Corpus-based Data Science from Scratch. In Digital Infrastructures for Research (DI4R) 2018, Oct. 2018
R. E. de Castilho; J.-C. Klie; N. Kumar; B. Boullosa & I. Gurevych
-
INCEpTION: Interactive Machine-assisted Annotation. In Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, pages 105–105, July 2018
J.-C. Klie
-
Integrating Knowledge-Supported Search into the INCEpTION Annotation Platform. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 127-132.
Boullosa, Beto; de Castilho, Richard Eckart; Kumar, Naveen; Klie, Jan-Christoph & Gurevych, Iryna
-
Linking Text and Knowledge Using the INCEpTION Annotation Platform. 2018 IEEE 14th International Conference on e-Science (e-Science), 327-328.
Castilho, Richard Eckart De; Klie, Jan-Christoph; Kumar, Naveen; Boullosa, Beto & Gurevych, Iryna
-
The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 5–9. Association for Computational Linguistics, June 2018
J.-C. Klie; M. Bugert; B. Boullosa; R. E. de Castilho & I. Gurevych
-
A Multi-Platform Annotation Ecosystem for Domain Adaptation. Proceedings of the 13th Linguistic Annotation Workshop, 189-194.
de Castilho, Richard Eckart; Ide, Nancy; Kim, Jin-Dong; Klie, Jan-Christoph & Suderman, Keith
-
Beyond WebAnno: The INCEpTION Text Annotation Platform20,21 (poster) at the CLARIN Bazaar of the CLARIN Annual Conference 2019, Leipzig
Richard Eckart de Castilho
-
Towards cross-platform interoperability for machine-assisted text annotation. Genomics & Informatics, 17(2), e19.
de Castilho, Richard Eckart; Ide, Nancy; Kim, Jin-Dong; Klie, Jan-Christoph & Suderman, Keith
-
Data collection and annotation pipeline for social good projects. November 2020. AI for Social Good - AAAI Fall Symposium 2020
C. Scheunemann; J. Naumann; M. Eichler; K. Stowe & I. Gurevych
-
From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
Klie, Jan-Christoph; de Castilho, Richard Eckart & Gurevych, Iryna
-
Improving QA Generalization by Concurrent Modeling of Multiple Biases. Findings of the Association for Computational Linguistics: EMNLP 2020, 839-853.
Wu, Mingzhu; Moosavi, Nafise Sadat; Rücklé, Andreas & Gurevych, Iryna
-
Annotation Curricula to Implicitly Train Non-Expert Annotators. Computational Linguistics, 48(2), 343-373.
Lee, Ji-Ung; Klie, Jan-Christoph & Gurevych, Iryna
