Analysis of Dataset Shifts in Mobile Malware

Applicant Professor Dr.-Ing. Daniel Arp

Subject Area Security and Dependability, Operating-, Communication- and Distributed Systems
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term from 2021 to 2023

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 456292433

Final Report Year 2024

Final Report Abstract

In recent years, mobile devices have become a popular target for malware authors, resulting in a steady increase in new variants of mobile malware. Unfortunately, traditional solutions for detecting malware do not provide proper protection to counter this threat, as they generally rely on manually crafted detection patterns. Therefore, researchers have started to explore whether machine learning techniques can be used to derive effective detection patterns automatically. As a result, many learningbased approaches for detecting mobile malware have been proposed in recent years, showing promising results in laboratory settings. Unfortunately, recent research has shown that the detection performance of learning-based approaches for the detection of mobile malware is often overestimated. A main reason for the overestimation of detection performance is that the evaluation of AI-based approaches commonly assumes that the underlying data distribution does not change over time. However, this assumption does generally not hold for mobile malware. Instead, the distribution continuously changes over time—a phenomenon known as ’dataset shift’ in learning theory. In consequence, the detection performance of current learning approaches drastically decreases in real-world settings. Although some factors are already known that contribute to the creation of Dataset Shifts in this domain, the exact causes remain so far mostly unclear. The goal of this research project was to develop novel techniques for analyzing the root causes of dataset shifts in mobile applications in order to improve learning-based detection systems using the gained knowledge. As a result, in collaboration with researchers from University College London, King’s College London, and Technische Universit¨t Berlin, a framework has been developed that can identify and provide insights into dataset shifts in evaluation datasets using explainable learning (XAI) techniques. The insights gained from this can, in turn, be used to improve the detection performance of learning-based methods and to recognize possible biases in evaluation datasets. As another outcome of this research, we identified and systematized further pitfalls that could lead to an overestimation of the capabilities of machine learning techniques. The corresponding publication was awarded the Distinguished Paper Award at the renowned USENIX Security conference in 2022.

Publications

Dos and don’ts of machine learning in computer security. In Proc. of USENIX Security Symposium, 2022.
Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wressnegger, L. Cavallaro & K. Rieck
Misleading Deep-Fake Detection with GAN Fingerprints. 2022 IEEE Security and Privacy Workshops (SPW), 59-65. IEEE.
Wesselkamp, Vera; Rieck, Konrad; Arp, Daniel & Quiring, Erwin
Quantifying the Risk of Wormhole Attacks on Bluetooth Contact Tracing. Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy, 264-275. ACM.
Czybik, Stefan; Arp, Daniel & Rieck, Konrad
Drift Forensics of Malware Classifiers. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 197-207. ACM.
Chow, Theo; Kan, Zeliang; Linhardt, Lorenz; Cavallaro, Lorenzo; Arp, Daniel & Pierazzi, Fabio
Lessons Learned on Machine Learning for Computer Security. IEEE Security & Privacy, 21(5), 72-77.
Arp, Daniel; Quiring, Erwin; Pendlebury, Feargus; Warnecke, Alexander; Pierazzi, Fabio; Wressnegger, Christian; Cavallaro, Lorenzo & Rieck, Konrad
Return of a new version of drinik android malware targeting indian taxpayers
S. Agarwal & D. Arp
Code repository of drift forensics project
T. Chow, Z. Kan, L. Linhardt, L. Cavallaro, D. Arp & F. Pierazzi

Servicenavigation

Hauptnavigation

Analysis of Dataset Shifts in Mobile Malware

Final Report Abstract

Publications

Additional Information

Servicenavigation

Hauptnavigation

Analysis of Dataset Shifts in Mobile Malware

Final Report Abstract

Publications

Additional Information

Textvergrößerung und Kontrastanpassung