Project Details
Projekt Print View

Analysis of Dataset Shifts in Mobile Malware

Subject Area Security and Dependability, Operating-, Communication- and Distributed Systems
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2021 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 456292433
 
Final Report Year 2024

Final Report Abstract

In recent years, mobile devices have become a popular target for malware authors, resulting in a steady increase in new variants of mobile malware. Unfortunately, traditional solutions for detecting malware do not provide proper protection to counter this threat, as they generally rely on manually crafted detection patterns. Therefore, researchers have started to explore whether machine learning techniques can be used to derive effective detection patterns automatically. As a result, many learningbased approaches for detecting mobile malware have been proposed in recent years, showing promising results in laboratory settings. Unfortunately, recent research has shown that the detection performance of learning-based approaches for the detection of mobile malware is often overestimated. A main reason for the overestimation of detection performance is that the evaluation of AI-based approaches commonly assumes that the underlying data distribution does not change over time. However, this assumption does generally not hold for mobile malware. Instead, the distribution continuously changes over time—a phenomenon known as ’dataset shift’ in learning theory. In consequence, the detection performance of current learning approaches drastically decreases in real-world settings. Although some factors are already known that contribute to the creation of Dataset Shifts in this domain, the exact causes remain so far mostly unclear. The goal of this research project was to develop novel techniques for analyzing the root causes of dataset shifts in mobile applications in order to improve learning-based detection systems using the gained knowledge. As a result, in collaboration with researchers from University College London, King’s College London, and Technische Universit¨t Berlin, a framework has been developed that can identify and provide insights into dataset shifts in evaluation datasets using explainable learning (XAI) techniques. The insights gained from this can, in turn, be used to improve the detection performance of learning-based methods and to recognize possible biases in evaluation datasets. As another outcome of this research, we identified and systematized further pitfalls that could lead to an overestimation of the capabilities of machine learning techniques. The corresponding publication was awarded the Distinguished Paper Award at the renowned USENIX Security conference in 2022.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung