Project Details
Projekt Print View

ASTEROID - An Analyzable, Resilient, Embedded Real-Time Operating System Design

Subject Area Computer Architecture, Embedded and Massively Parallel Systems
Security and Dependability, Operating-, Communication- and Distributed Systems
Term from 2010 to 2019
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 181374413
 
The operating system (OS) plays a key role in any complex computing system. A current OS supporting software integration with memory management and resource virtualization contains several core functions that depend on error free hardware (HW). Undetected errors in these functions quickly and irreversibly propagate through the system making it virtually impossible to recover from a function failure. Other OS functions can recover from failures with appropriate mechanisms. Such functions inherit the dependability requirements of the applications using it. The project idea is to develop OS and HW mechanisms that utilize the HW and communication resources of a many-core system to efficiently provide the required dependability. In the first two project phases we identified critical core functionality of the operating system and the processor pipeline. We developed fault detection and correction mechanisms that exploit the inherent redundancy of state-of-the-art and future multi- and many-core architectures. As a result, our system is able to support modern multithreaded applications using redundant multithreading and provide real-time guarantees to these applications. We extended our research into errors arising in the cooperation of multiple processors, for instance through failures in the network-on-chip (NoC). We identified the Reliable Computing Base, software and hardware components that need to function correctly in order to provide real-time and reliability guarantees and investigated how failures in these components affect the system. The goal of the third project phase is to integrate the mechanisms and methods developed in the previous phases into a system that achieves full cross-layer protection against transient hardware faults. For this purpose we will integrate existing results, optimize their coverage, resource, and performance overheads and research the remaining gaps that were not addressed in the previous phases, including end-to-end protection of the NoC and OS-assisted replication for device drivers.
DFG Programme Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung