Project Details
Projekt Print View

Dynamic Redundancy for Many-core Systems

Subject Area Computer Architecture, Embedded and Massively Parallel Systems
Term from 2017 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 337312426
 
Safety-critical systems require redundancy for fault-detection and fault-tolerance. Depending on the application mode or execution state, different types of redundancy are required: Dual Modular Redundancy (DMR) for fail-safe modes and Triple Modular Redundancy (TMR) or even higher redundancy for fail-operational modes. Future safety-critical systems will feature mode switching between application of different criticality and fault-tolerance demands requiring more dynamicity in redundancy modes. An example in automotive may be switching from parking assistant to piloted driving with much higher safety demands, but executed on the same embedded multi-core. This project investigates the dynamic switching between redundancy modes depending on external causes. We call this dynamic redundancy. We investigate dynamic redundancy switching between hardware modes (no redundancy, DMR, and TMR), same for software modes and combinations. Our target hardware platforms are tile-based Multi-Processor Systems on Chips (MPSoCs) that are enhanced by fault-tolerant hardware to reach an Adaptively Redundant Processor. Dynamic redundancy switching is controlled by a Redundancy Management Unit per tile and a Network-on-Chip (NoC) managed redundancy enhancement that enables scalable designs by introducing NoC voting capabilities.On the software side we start with an actor-based data-flow execution model, which is able to execute tasks, called dataflow actors, depending on input data availability in parallel. Such a model is common in current parallel programming environments as e.g. the task model in OpenMP. We enhance the dataflow actor model to run actors redundantly to reach DMR or TMR modes. Dataflow actors can be easily re-executed in case of failure because of their freedom from side effects. We investigate dynamic redundancy switching between software and hardware modes as well as timing behavior and real-time schedulability in case of failures.To perform the evaluations, we develop a cross-layer system architecture, called Adaptively Redundant Many-core Architecture (ARMA), incorporating dynamic redundancy on software and hardware levels. ARMA is intended to combine high performance parallel execution with safety mechanisms like fault tolerance and timing predictability to cover the different requirements of a broad range of embedded domains, like automotive, avionic, and space. The dynamic redundancy switching concept shall be proven on a FPGA-based ARMA multi-core processor, yet the insights gained and techniques developed in this project shall be applicable to future MPSoC architectures in general.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung