Project Details
Projekt Print View

Profiling Toolkit for High Performance Computing

Subject Area Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Software Engineering and Programming Languages
Term from 2016 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 320897507
 
High-Performance-Computing (HPC) has become a standard research tool in many scientific disciplines. In the natural and engineering sciences research without at least supporting HPC calculations is becoming increasingly rare. On top of that new disciplines are discovering HPC as an asset to their research, for example in the areas of bioinformatics and social sciences. This means that more and more scientists start using HPC resources, without having a good understanding of the working of such systems. On the other hand, the complexity of HPC resources increases, thereby increasing this knowledge gap. This especially pertains to the performance parameters of HPC jobs and the importance of performance engineering. Scientists with beginning or intermediate HPC knowledge levels are often content once their research problem can be solved on an available system in an acceptable time frame, even if that means compromising on accuracy or the amount of questions addressed. The situation is exacerbated by the fact that these users mostly use their local Tier-3 compute center, which typically lacks sufficient human resources to work with them individually on application performance. We also need to concede that, at least in the beginning of their scientific research, most users do not concern themselves with optimal use of system resources or application performance, as, understandably, their primary objective is to generate scientific output as fast as possible. This also leads to a lock-in where researchers will be unable to transfer their work to Tier-2 or Tier-1 compute resources, even if that would be required, simply because they are not able to scale their calculations sufficiently. With the deployment of heterogeneous, and more complex systems at Tier-3 centers the need of awareness for performance aspects is seen as a challenge for the optimal use of compute and storage resources on Tier-3 and Tier-2 resources, as outlined by the call. We aim to raising awareness for performance parameters and issues across all HPC user communities and to enable HPC users at all levels of experience to obtain and understand information on the perfor-mance of their workloads. The resulting information are then suitable for further investigation and performance engineering measures,thereby also lowering barriers to Tier-2 and Tier-1 resources due to insufficient scaling. In order to achieve these goals we propose to implement a profiling tool set, based on existing profiling solutions, which automatically collects per job performance metrics and presents them to researchers in an understandable summary.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung