Machine Learning Methods for Information Security
Zusammenfassung der Projektergebnisse
Investigation of security aspects of machine learning is an emerging research topic at the juncture of machine learning and computer security. The motivation for such interdisciplinary research arises from both machine learning and security. Security research is getting increasingly dependent on machine learning and data analysis due to the growing complexity of attacks. On the other hand, the success of machine learning as a versatile tool for Internet-related businesses raises new security concerns. The increasing use of machine learning opens new possibilities for its abuse. At the core of the problem lies the fundamental requirement of machine learning that future data exhibits the same statistical properties as the data observed in the past. Recent evidence has demonstrated that, in many circumstances, an attacker can deceive machine learning methods by manipulating data in a transparent way. Hence, the main goal of this project was to develop new methodology for security analysis and evaluation of machine learning methods which would enable one to quantitatively characterize security of learning algorithms. The project has delivered both positive and negative results with respect to this goal. Simple learning algorithms such as centroid anomaly detection could be analyzed theoretically. It could be shown that under some constraints to be imposed on the attacker, hard security guarantees could be achieved in the form of strict upper bounds on the attack progress. On the other hand, for most of the mainstream learning algorithms, efficient attacks could be developed in practice and even executed in realistic environments when an attacker has limited information about the deployed learning method and models. For practical information security applications, such as intrusion detection and malware analysis, this vulnerability of learning to adversarial noise seems to be less profound. Despite the various possibilities for obfuscation offered by some data formats, some fundamental structural constraints can be found in the specific formats which still enable accurate detection of malicious functionality. New detection methods developed in the project based on novel representations of the structural information about the data, e.g., the “bag-of-paths” features derived from the PDF format, demonstrated excellent ability for discrimination between malicious and benign data. The main conclusion that can be drawn from the experimental work carried out in the project is that, in practice, a careful design of features analyzed by machine learning algorithms makes their evasion a formidable task.
Projektbezogene Publikationen (Auswahl)
-
Botzilla: Detecting the ”phoning home” of malicious software. In Proceedings of the ACM Symposium on Applied Computing, pages 1978–1984, 2010
K. Rieck, G. Schwenk, T. Limmer, T. Holz, and P. Laskov
-
Online anomaly detection under adversarial impact. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 405–412, 2010
M. Kloft and P. Laskov
-
TokDoc: A self-healing web application firewall. In Proceedings of the ACM Symposium on Applied Computing, pages 1846–1853, 2010
T. Krüger, C. Gehl, K. Rieck, and P. Laskov
-
Static detection of malicious JavaScript-bearing PDF documents. In Proceedings of the Annual Computer Security Applications Conference, 2011
P. Laskov and N. Srndic
-
Support vector machines under adversarial label noise. In Proceedings of the Asian Conference on Machine Learning, pages 97–112, 2011
B. Biggio, B. Nelson, and P. Laskov
-
Poisoning attacks against support vector machines. In Proceedings of the International Conference on Machine Learning, 2012
B. Biggio, B. Nelson, and P. Laskov
-
Security analysis of online centroid anomaly detection. Journal of Machine Learning Research, 13:3133–3176, 2012
M. Kloft and P. Laskov
-
Detection of malicious PDF files based on hierarchical document structure. In Proceedings of the 20th Network and Distributed Systems Symposium, 2013
N. Srndic and P. Laskov
-
Evasion attacks against machine learning at test time. In Proceedings of the European Conference on Machine Learning, pages 387–402, 2013
B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto, and F. Roli
-
Practical evasion of a learning-based classifier: A case study. In Proceedings of the IEEE Symposium on Security and Privacy, pages 197–211, 2014
N. Srndic and P. Laskov