Project Details
Projekt Print View

PRISMA: Efficient Algorithms and Methods for Online Extraction of Performance Models in Virtualized Environments

Subject Area Software Engineering and Programming Languages
Term from 2015 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 251959028
 
Final Report Year 2020

Final Report Abstract

During the course of the PRISMA project, we developed and published efficient algorithms and methods for automatic extraction of architecture-level performance models of virtualization platforms and their hosted applications during system operation. Many complex performance effects and influences in virtualized environments (e.g., mutual influences of the fine-granular system components and layers such as OS, virtualization, middleware, application logic, I/O subsystem, caching and communication protocols) are only observable during system operation when the system is running in the real production environment under real production workloads as opposed to running in a controlled testing environment with artificial workloads or synthetic benchmarks. Therefore, we envisioned a novel class of virtualization platforms and virtual appliances that integrate the developed model extraction algorithms in their architecture. The term virtual appliance (VA) refers to a prepackaged virtual machine image containing a software stack designed to run on a virtualization platform. In pursuit of this goal, we developed a reference architecture for online learning of performance models. While doing so, the following two main research challenges were tackled: • How to extract models based on monitoring data collected at run time with no possibility to conduct static code analysis of the application, no control of the executed application workloads, and only limited flexibility to vary the system configuration during operation? • How to automatically identify and quantify parametric dependencies and cope with a potentially very large search space of possible dependencies? We addressed the first research question by proposing an agent-based architecture, in which the agents were already embedded in VAs in order to minimize the effort for the developer. Furthermore, each agent was equipped with a limited scope, as the visibilities of each entity in a virtualized environment are quite limited. On instantiation of a VA, the contained agent starts to monitor the application serving real production workloads and it automatically builds a submodel describing the observed performance behavior of the application and platform layers inside the VA. The agent continuously updates the model skeleton to reflect dynamic changes, for instance, in the configuration or in the workload of the application. The virtualization platform then composes the submodels from different VAs and agents from underlying infrastructure layers into an end-to-end performance model. The resulting end-to-end performance model of the virtualized system can then be used for online resource management. This agent-based architecture circumvents the need for static code analysis, while the continuous updates enable the model to update itself if the executed configuration or the workload changes. This way, the longer the application runs, the better the performance model will reflect the real-world system behavior. The second research question was concerned with the extraction of parametric dependencies. We addressed it by drawing parallels to some techniques from machine learning. The first step is to identify which model variables are dependent on which input parameters. The main problem is the exponentially growing search space of possible dependencies. Our approach tackles this problem by leveraging feature selection techniques from machine learning. The identification of parametric dependencies between different variables of the monitoring stream can be framed as a classic application of feature selection. Therefore, we proposed a generic algorithm for the automated identification of parametric dependencies on monitoring streams and applied three different heuristics to filter the resulting dependencies. In the second step, we proposed a meta-selector selecting the most appropriate regression technique for characterizing every dependency based on the characteristics of the available data. Summarizing, the works conducted during the course of this project form valuable contributions to the respective research fields. Although targeted at virtualized environments, some aspects like the use of agentbased architectures for performance model extraction of distributed software systems, the proposed model merging algorithm, and the introduced techniques for the identification and characterization of parametric dependencies can be transferred to non-virtualized domains or even to other disciplines of computer science. Therefore, as part of our future work, we will try to further broaden the scope of the contributions in order to accelerate the progress of the entire field.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung