Project Details
Network and infrastructure resource management for high-volume EO data processing
Applicant
Professor Dr. Tobias Hoßfeld
Subject Area
Security and Dependability, Operating-, Communication- and Distributed Systems
Term
since 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 522760169
Sustainability researchers and geoscientists currently face the challenge of manually integrating and managing data sources and computing infrastructure for large-scale data processing tasks. This process involves organizing data sources and computing clusters, networking them, and efficiently initiating analysis tasks. The goal of subproject CN1 is to automate these processes by leveraging modern orchestration principles from cloud and distributed computing, focusing on creating a network infrastructure that supports automated decision-making for task execution. This requires a network overlay that combines compute, storage, and network nodes to form a control plane, along with monitoring mechanisms for resource management, strategic placement of computing jobs, and an adaptive data plane to ensure efficient data exchange among data sources, compute instances, and storage for handling large volumes of data. Therefore, CN1 is an integral part of the SOS Research Unit’s comprehensive strategy. The overall goal of CN1 is to develop a data-centric & compute-centric networking architecture that provides the networking infrastructure through: (1) infrastructure resource management; and (2) the consolidation of the management of the network and the management of the serverless platform. We introduce a novel data-centric & compute-centric networking (DCN) paradigm designed to eventually be integrated directly into the network, compute, and storage layers to facilitate the execution of scientific workflows. CN1 investigates network functions, their integration with upper layers, the dynamic infrastructure configuration and on-demand instantiation per job to provide compute resources with the necessary connectivity between data source, compute instance, and storage. On the data plane, CN1 explores DCN functions for the data transmission in WANs that will be defined and disaggregated for data-intensive computing tasks. On the control plane, designing an infrastructure abstraction layer for scientific data processing is a key research area of CN1 for the coordination, monitoring, management, and orchestration of infrastructure nodes by utilizing a decentralized network. Task requests will specify resource needs, execution constraints (e.g., execute the task in a docker container, in an HPC environment, or on CO2-neutral servers), and other metadata, such as data source locations. Demands regarding memory and CPU are described through the annotated physical execution graph of the RU’s framework. CN1 utilizes the existing information and contributes to the collective knowledge base (e.g. monitoring data), enhancing the shared resources of the RU’s blackboard. A key challenge of CN1 is to place the task at the best (network, compute, storage) nodes and to establish efficient connectivity. Finally, different possibilities to improve the performance and the scalability of the control and data plane (e.g., through caching) are investigated and realized.
DFG Programme
Research Units