Project Details
Mastering Offloading and Correlated failures for resilient CommunicAtion networks (MOCCA)
Subject Area
Security and Dependability, Operating-, Communication- and Distributed Systems
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 503231190
In this project, the resilience of communication infrastructures will be improved. We facilitate the implementation of advanced resilience mechanisms by offloading using efficient packet processing frameworks (EPPFs). We also design networks such that they can better survive correlated failures. EPPFs process packets more efficiently than traditional user-space programs as they avoid some of the overhead induced by the networking stack and the kernel. Examples for EPPFs are eBPF, DPDK, and Snabb. They are suitable for programming high-performance virtual network functions (VNFs) so that these VNFs can be leveraged for offloading device functionality to the CPU of a server or a smartNIC. Performance is of utmost importance for offloading. Therefore, building blocks for VNFs, e.g., header rewriting, en- and decapsulation, etc. are implemented with different EPPFs and their throughput is compared on both server and smartNIC CPUs in a comprehensive performance study. Moreover, various signalling options for offloading are investigated. Then, novel network functions are implemented which are too complex for implementation on programmable hardware, and their performance is evaluated. They improve availability or security. Examples are the Packet Replication, Elimination, and Ordering Function (PREOF) or Network Attestation for Secure Routing (NASR). Furthermore, offloading will be studied to extend legacy hardware with new functionality. Examples of such new features are MPLS Network Actions (MNAs) which define new mechanisms for resilient forwarding. These activities are embedded into the standardization of the IETF. Correlated failures of many similar components can happen due to security incidents, software bugs, or update problems, just to name a few. If many devices share a vulnerable property, massive outages are possible. We define generalized shared risk groups (GSRGs) as a set of components of a network - hardware or software - that share a common technical property such as CPU, operating system, or application software. They differ from conventional shared risk groups in that they do not depend on a common infrastructure. GSRGs from past incidents will be classified, they are modeled for existing networks, and their impact on network operation and service provisioning will be evaluated as a resilience metric. Networks with heterogeneous components are likely to have more GSRGs but smaller ones, which may improve network resilience when the heterogeneous components are appropriately organized. Thus, there is an optimization potential for the placement of heterogeneous components within a network, as well as regarding the placement of resilience mechanisms including their offloading. In the project we tackle this challenge both for greenfields and for brownfields when a number of components is substituted by new ones.
DFG Programme
Priority Programmes
