Project Details
Using a theoretical simulation framework to analyse and develop predictive machine learning methods on networks
Applicants
Professor Dr. Daniel Memmert; Dr. Fabian Wunderlich
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Theoretical Computer Science
Theoretical Computer Science
Term
since 2019
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 432919559
The present research project builds upon a theoretical simulation framework for the validation of predictive ratings on networks, which has been developed. By means of the simulation framework, artificial data can be generated that replicates a full predictive process, involving network generation, creation of predictive ratings and derivation of percentage forecasts from these ratings. The advantage of artificial data is that, in contrast to real data, all inherent processes can be deliberately controlled and varied. This makes it possible to analyze the exact influence of the network structure on the predictive quality and also enables improved accuracy measures and profitability measures for examining the models. While classical statistical models were already successfully validated in the previous project, the present research project focuses on the theoretical validation and further development of predictive machine learning (hereafter abbreviated as ML) methods on networks. Data from the sports sector serve as an application example, considering complex data sets from football and tennis. With regard to ML models, the project addresses methods of supervised learning, which will be specified, implemented, integrated into the existing simulation framework and tested for functionality in the first work package. Four different classes of models will be considered, two pure ML model classes based on Random Forest and Graph Neural Networks as well as two hybrid model classes combining ML-based methods with classical statistical methods. In the second work package, the ML-based models are validated using artificial data from the simulation framework. In particular, we aim to determine how the predictive quality of the models is influenced by varying network and data structures. This includes the identification of situations in which ML, hybrid or classical models are superior to the other models. This research question is partly inspired by the fact that in predictive processes (e.g. in economics) ML models do not yet outperform traditional methods. The manipulation of input data and validation of model outputs is closely related to the question of interpretability for ML models. By analyzing the model quality and identifying strengths and weaknesses of the models, we intend to draw conclusions about potential further development of ML-based models, which can be implemented and revalidated within work package three. The manipulation of input data and validation of model outputs is In the last work package, the ML-based models are applied to real datasets in order to ensure the transferability of theoretical insights to real-world applications. Again, it is intended to identify potential for further model improvement by analyzing strength and weaknesses of the models.
DFG Programme
Research Grants
Co-Investigator
Professor Dr. Ralph Ewerth