Asked and Answered: Intelligent Data Science in Software Projects
Final Report Abstract
Stakeholders of software development projects (e.g. managers, software developers, etc.) base their decisions “mostly on the basis of a gut feeling”. A possible explanation for this is limited time and the growing mass of data accumulating with time in software and system engineering projects that makes searching for specific information time consuming and hard. Therefore, the goal of this project was to provide a question answering solution that can help stakeholders find information they need in a fast and easy manner and as a result to support they decision-making. Furthermore, such solution can benefit as well process improvement, safety analysis, and a myriad of other software engineering tasks. With this project, we were able to apply data management and access techniques to the specific domain of software engineering processes. The joint work of our research groups helped to understand each others research focus and integrate those perspectives into the effort to achieve our joint research objectives. The following outcomes resulted from the project: 1. Conceptual framework for question formulation on software artifacts: • Overview of information needs of stakeholders from the software engineering domain 2. Extraction and semantic enrichment of information about software artifacts: • Knowledge bases – OLAP-like database and RDF triples – containing data addressing information needs of stakeholders from the software engineering domain • Identification of trace links in the constructed knowledge base • Enrichment of the constructed knowledge base through machine learning and information retrieval techniques 3. Query posing and question answering:• Literature review on different Text-to-SQL approaches • Software Engineering dataset to train Text-to-SQL approaches • Revised Text-to-SQL approach • Construction of SQL queries for the software engineering domain • Approach for knowledge base agnostic transformation of natural language to SQL and SPARQL • Decision support to distinguish different search/answer scenarios based on detected information need
Publications
- From Natural Language Questions to SPARQL Queries: A Pattern-based Approach In: Datenbanksysteme für Business, Technologie und Web (BTW 2019), 18. Fachtagung des GI-Fachbereichs Datenbanken und Informationssysteme (DBIS), March, LNI, 2019, Publisher GI, Rostock, Germany
Nadine Steinmetz, Ann-Katrin Arning, Kai-Uwe Sattler
(See online at https://doi.org/10.18420/btw2019-18) - Structured information in bug report descriptions - influence on irbased bug localization and developers, Softw. Qual. J. 27 (2019) 1315–1337
M. Rath, P. Mäder
(See online at https://doi.org/10.1007/s11219-019-09445-6) - The SEOSS 33 dataset — Requirements, bug reports, code history, and trace links for entire projects, Data in Brief, Volume 25, 2019
Michael Rath, Patrick Mäder
(See online at https://doi.org/10.1016/j.dib.2019.104005) - Traceability in the wild: Automatically augmenting incomplete trace links, in: SE/SWM, volume P-292 of LNI, GI, 2019, p. 63
M. Rath, J. Rendall, J. L. C. Guo, J. Cleland-Huang, P. Mäder
(See online at https://dx.doi.org/10.18420/se2019-15) - Question Answering on OLAP-like Data Sources Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, CEUR Workshop Proceedings, March, 2020, Copenhagen, Denmark
Nadine Steinmetz, Samar Shahabi-Ghahfarokhi, Kai-Uwe Sattler
- Conversational Question Answering Using a Shift of Context Proceedings of the Workshops of the EDBT/ICDT 2021 Joint Conference, CEUR Workshop Proceedings, March, 2021, Nicosia, Cyprus
Nadine Steinmetz, Bhavya Senthil-Kumar, Kai-Uwe Sattler
(See online at https://doi.org/10.22032/dbt.51535) - Reaching out for the answer: Answertype prediction In: Proceedings of the SeMantic Answertype prediction task (SMART) at ISWC 2021 Semantic Web Challenge co-located with the 20th International Semantic Web Conference (ISWC 2021). SMART 2021, CEUR-WS (2021), 2021
Kanchan Shivashankar, Khaoula Benmaarouf, Nadine Steinmetz
- What is in the KGQA benchmark datasets? Survey on Challenges in Datasets for Question Answering on Knowledge Graphs Journal on Data Semantics, Springer, 2021
Nadine Steinmetz, Kai-Uwe Sattler
(See online at https://doi.org/10.1007/s13740-021-00128-9) - SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks, Data in Brief, Volume 42, 2022, 108211, ISSN 2352-3409
M.T. Tomova, M. Hofmann, P. Mäder
(See online at https://doi.org/10.1016/j.dib.2022.108211)