Project Details
SFB 1404: FONDA - Foundations of Workflows for Large-Scale Scientific Data Analysis
Subject Area
Computer Science, Systems and Electrical Engineering
Biology
Geosciences
Medicine
Physics
Biology
Geosciences
Medicine
Physics
Term
since 2020
Website
Homepage
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 414984028
Essentially all scientific disciplines are generating an ever-increasing amount of data. To derive scientific discoveries, these data sets are analyzed by complex data analysis workflows (DAWs), which are series of discrete analysis programs arranged in (often non-linear) pipelines. Because they usually deal with very large data sets, DAWs must be executed on distributed and/or parallel computational infrastructures. Traditionally, DAWs are optimized for speed, which leads to solutions that are hard to reproduce and share and that are tightly bound to exactly one type of input. However, as stated as summary in a recent NSF/DOE workshop that brought together the workflow and the HPC communi-ties, “… human productivity arguably still is the most expensive resource, trumping power, perfor-mance, and other factors …”.The proposed CRC FONDA – “Foundations of workflows for large-scale scientific data analysis” – will take up this observation and investigate methods for increasing productivity in the development, execution, and maintenance of DAWs for large scientific data sets. Our long-term goal is to develop methods and tools that achieve substantial reductions in development time and development cost of DAWs. We will approach these questions from a fundamental perspective, i.e., we aim at finding new abstractions, models, and algorithms that can eventually form the basis of a new class of future DAW infrastructures. Toward these goals, FONDA in its first phase will focus on three critical properties of DAWs and of DAW engines, namely portability, adaptability, and dependability (PAD). We want to investigate answers to questions such as: How can we build DAWs and DAW engines that enable portability of analysis across different infrastructures? How must DAWs be designed to adapt to changing input data or slightly changing requirements? How can we build dependable DAW systems that are aware of and control their own limitations and preconditions?DAWs are bridges between two worlds: First, the specific scientific discipline using a DAW, and, sec-ond, Computer Science, which builds the infrastructures necessary for developing and executing DAWs. Developing novel foundations for scientific DAWs thus requires a close interaction between these two worlds. FONDA implements this idea by building on an interdisciplinary group of PIs from Computer Science, Material Science, Geosciences, and the Life Sciences. Through these cooperations, FONDA’s research results will be continuously validated using relevant and current scientific problems from different fields of the natural sciences.
DFG Programme
Collaborative Research Centres
Current projects
- A01 - Foundations of Data Analysis Workflow Validation (Project Heads Schweikardt, Nicole ; Weidlich, Matthias )
- A02 - Adapting Genomic Data Analysis Workflows for Different Data Access Patterns (Project Heads Leser, Ulf ; Reinert, Knut )
- A03 - Deriving Trust Levels for Multi-Choice Data Analysis Workflows (Project Heads Draxl, Claudia ; Grunske, Lars )
- A05 - Dependability, Adaptability and Uncertainty Quantification for Data Analysis Workflows in Large-Scale Biomedical Image Analysis (Project Heads Kainmüller, Dagmar ; Ritter, Kerstin )
- B01 - Scheduling and Adaptive Execution of Data Analysis Workflows across Heterogeneous Infra-structures (Project Heads Kao, Odej ; Meyerhenke, Henning )
- B04 - Exploiting Software-Defined Networks for Efficient Data Management in Next-Generation Data Analysis Workflows (Project Heads Reinefeld, Alexander ; Scheuermann, Björn )
- B05 - Adaptive, Distributed and Scalable Analysis of Massive Satellite Data (Project Heads Hostert, Patrick ; Leser, Ulf )
- B06 - Distributed Run-Time Monitoring and Control of Data Analysis Workflows (Project Heads Grunske, Lars ; Rabl, Tilmann )
- C01 - Data Analysis Workflows for Interactive Scientific Exploration (Project Heads Kehr, Birte ; Weidlich, Matthias )
- MGKS02 - Integrated Research Training Group (Project Heads Grunske, Lars ; Reinert, Knut )
- S01 - Testbeds and Repositories (Project Heads Kao, Odej ; Leser, Ulf )
- Z - Central Administrative Project (Project Head Leser, Ulf )
Completed projects
Applicant Institution
Humboldt-Universität zu Berlin
Participating University
Freie Universität Berlin; Technische Universität Berlin; Universität Osnabrück
Participating Institution
Hasso-Plattner-Institut für Digital Engineering gGmbH; Max-Delbrück-Centrum für Molekulare Medizin (MDC)
Spokesperson
Professor Dr. Ulf Leser