Project Details
Projekt Print View

Interactive distributed corpus exploration and annotation infrastructure for large corpora and knowledge-bases

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2016 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 315979217
 
Final Report Year 2022

Final Report Abstract

The project has developed INCEpTION, an open source text annotation platform software. It has become a popular go-to solution for users with the need to annotate text across a wide range of disciplines and use-cases. The INCEpTION annotation platform is incorporates three functional pillars: automatic annotation suggestions using machine learning, knowledge management and search. These support in particular tasks such as named entity annotation and named entity linking, but as the use-cases mentioned above highlight, the platform is by far not limited to such tasks. INCEpTION has evolved to fully replace the earlier WebAnno tool, while still remaining largely backwards compatible and offering WebAnno users and easy upgrade path. With respect to automatic annotation suggestions, INCEpTION not only brings several machine learning algorithms out of the box, but also allows to connect custom external machine learning algorithms or even text analysis services such as European Language Grid and CLARIN WebLicht. With respect to knowledge management, INCEpTION supports the widely used SPARQL protocol which allows connecting to many knowledge providers such as offered by the ZBW Leibnitz Information Centre for Economics, DBPedia, Wikidata, the Food and Agriculture Organization (FAO) of the United Nations and many more. In Aug 2022, ca. 350 active installations in 74 countries were sending us anonymous usage data, with the majority of installations in Germany, the U.S.A., China and France. Considering that up to 70% of the installations may have decided to opt out of anonymous data submissions, the actual number of installations may be over 1 000. Over 430 people have starred the INCEpTION project on GitHub. Over 200 people have subscribed to the project’s mailing list and generated over 1 300 messages in over 400 conversations. Until Aug 2022, over 160 users have opened over 400 issues with bug reports, feature requests and questions on our GitHub issue tracker, contributing to the total of over 1 900 issues tracked there at the time (1 700 of which have been resolved). Based on the interaction with the user community during the project, we deviated in various aspects from the original ideas presented in the project proposal. The most significant deviation was the switch from focusing compute clusters to process data to being interoperable with GPU-based machine learning. The advent of GPU-enabled deep learning caused to community to strongly focus on this technology and artificial neural networks have achieved amazing new results over the last years. It became clear early in the project that we needed to support these new technologies and so we decided to also switch our focus from cluster-based processing to interoperability with GPU-based machine learning, in particular the ability to interface with popular Python-based packages. The architectural changes involved in this switch also enabled us to interoperate with external text analysis services such as those offered by the European Language Grid, CLARIN WebLicht, LAPPS Grid (deprecated) or Huggingface (experimental).

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung