Event-basierte Exploration und Analyse von Linked Open Data
Zusammenfassung der Projektergebnisse
Driven by the continuous and almost exponential increase of textual data on the Web, novel and sophisticated methods in support of the extraction, exploration, and analysis of information from such texts remain to be major challenges in many research disciplines. While there is a plethora of “interesting” information to be extracted from text, the focus of this joint project between researchers from the TU Ilmenau and Heidelberg University was on events. Event descriptions, typically comprised of a location, a time, and actors involved in an event, provide an important means to analyze and explore complex phenomena. They are a key ingredient for constructing timelines, detecting geographic or temporal hot spots in terms of activity, or simply to provide a chronological summary for a location or actor. In this project called EventAE (for Event Analysis and Exploration), we contributed to this challenge by providing the community with • a comprehensive pipeline to extract event information from diverse types of textual documents, • an event query and analysis framework called STARK (for Spatio-Temporal data Analytics on spaRK), and • a publically accessible event data repository containing large-scale data sets related to events as well as locations and actors. The event extraction pipeline realizes novel methods in terms of employing standard tools for Named Entity Recognition (NER) and constructing event specifications, for both English and German texts, in particular because of modeling correlations among event components using networks. The STARK framework provides a more comprehensive and efficient approach to support querying and processing spatio-temporal data than existing approaches. It furthermore provides the core backend in support of the querying, analysis and exploration of event data extracted using the NER pipeline. The event repository provides the community with diverse data sets, ranging from gazetteer-like specifications of locations and actors (in combination with Wikidata) to event specifications extracted from diverse text sources such as Wikipedia and German and English news outlets. All components are readily accessible through the project website and can be used for extensions and comparisons. We envision that the results of this project not only have an impact on future research in event extraction and exploration from text data, but also that the framework developed in this project provides other research communities with an infrastructure to perform text analysis tasks in a more effective way. The interest in event related information is not specific to computer science, computational linguistics or NLP, but has been and continues to be of central importance in many fields in the social sciences, the humanities, and medicine. These communities continue to struggle with an ever growing amount of text corpora that need to be efficiently explored regarding domain specific research questions. We hope that the results, methods, and tools developed in this project provide not only these communities with an easy entrance into the field of data science and text analysis. Collaborations with researchers from respective communities during this project have clearly shown the need, utility and in particular interest of the results obtained in this joint project.
Projektbezogene Publikationen (Auswahl)
- A Framework for Scalable Correlation of Spatio-temporal Event Data. 30th British International Conference on Databases, BICOD 2015, LNCS 9147, 9-15
Stefan Hagedorn, Kai-Uwe Sattler, Michael Gertz
(Siehe online unter https://doi.org/10.1007/978-3-319-20424-6_2) - Beyond Friendships and Followers: The Wikipedia Social Network. 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015, ACM, 472-479
Johanna Geiß, Andreas Spitz, Michael Gertz
(Siehe online unter https://doi.org/10.1145/2808797.2808840) - Complex Event Processing on Linked Stream Data. Datenbank-Spektrum 15(2): 119-129 (2015)
Omran Saleh, Stefan Hagedorn, Kai-Uwe Sattler
(Siehe online unter https://doi.org/10.1007/s13222-015-0190-5) - Refining imprecise spatio-temporal events: a network-based approach. Proceedings of the 10th Workshop on Geographic Information Retrieval, GIR 2016, 5:1-5:10
Andreas Spitz, Johanna Geiß, Michael Gertz, Stefan Hagedorn, Kai-Uwe Sattler
(Siehe online unter https://doi.org/10.1145/3003464.3003469) - Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events. Proceedings of the 39th International Conference on Research and Development in Information Retrieval (SIGIR), ACM, 2016, 503-512
Andreas Spitz, Michael Gertz
(Siehe online unter https://doi.org/10.1145/2911451.2911529) - With a Little Help from my Neighbors: Person Name Linking Using the Wikipedia Social Network. Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Companion Volume, ACM, 2016, 985-990
Johanna Geiß, Michael Gertz
(Siehe online unter https://doi.org/10.1145/2872518.2891109) - EVELIN: exploration of event entity links in implicit networks. In Proceedings of the 26th International Conference on World Wide Web, Companion Volume, pages 273-277, 2017
Andreas Spitz, Satya Almasian, and Michael Gertz:
(Siehe online unter https://doi.org/10.1145/3041021.3054721) - HeidelPlace: An Extensible Framework for Geoparsing. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017 (System Demonstrations), 2017, 85-90
Ludwig Richter, Johanna Geiß, Andreas Spitz, Michael Gertz
(Siehe online unter https://doi.org/10.18653/v1/D17-2015) - The STARK Framework for Spatio- Temporal Data Analytics on Spark. BTW 2017: 123-142
Stefan Hagedorn, Philipp Götze, Kai-Uwe Sattler