Project Details
Projekt Print View

PEP: Precise and Efficient Prediction of Good Worst-case Performance for Contemporary and Future Architectures

Subject Area Computer Architecture, Embedded and Massively Parallel Systems
Term from 2016 to 2019
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 289264719
 
Final Report Year 2020

Final Report Abstract

Hard real-time systems are safety-critical embedded systems that have to satisfy stringent timing constraints that are dictated by their interactions with the physical environment. Examples for such systems can be found in avionic control systems, automotive electronics, industrial automation, and in robotics. Such systems need reliable guarantees for timeliness. To provide such timing guarantees has become increasingly challenging since complex processor cores are being deployed that consist of a variety of performance-enhancing microarchitectural features, such as caches, branch predictors, and complex pipelines. This challenge has been exacerbated by the introduction of multi-core processors, where interference on shared resources, such as buses, DRAM controllers, and last-level caches needs to be accounted for. One promising approach to deal with this challenge is compositional timing analysis. Rather than analyzing the system as a whole, in compositional timing analysis, the timing contributions of different system components are analyzed separately and then composed to obtain a timing bound for the system as a whole. As the state space of each individual component is much smaller than the state space of the entire system, which can be thought of as the product of its component state spaces, such compositional approaches promise to be much more efficient. However, due to intricate interactions between the different components of a system it is not at all obvious whether the system’s timing behavior can be reasonably decomposed into contributions by individual components to enable sound and precise compositional timing analysis. In fact, as our preliminary work demonstrated, so-called amplifying timing anomalies render essentially all existing compositional timing analyses unsound on any modern commercial processor. In this project, we studied how to enable sound and precise compositional timing analysis by careful hardware design choices. A key scientific result is that monotonicity of a processor’s cycle-by-cycle behavior with respect to a natural progress order enables timing predictability, and in particular, sound compositional timing analysis. Amplifying timing anomalies, as they are found in conventional microarchitectures, are a consequence of non-monotonicity. To achieve monotonicity we introduced the notion of strictly in-order pipelines. By carefully eliminating dependencies between consecutive instructions strictly in-order pipelines are provably monotonic. We developed a proof-of-concept processor with a strictly in-order pipeline in the hardware description language Verilog and synthesized it onto an FPGA target for experimental evaluation. The strictly in-order design incurs only a moderate slowdown ranging between 5% and 7% compared with a conventional design and thus preserves most of the benefits of pipelining. Due to the small number of additional gates required to turn a conventional in-order pipeline into a strictly in-order pipeline, it is conceivable to implement a “strictly in-order mode” in future commercial processors that could be switched on and off based on an application’s predictability requirements. Manual pen-and-paper timing-predictability proofs for complex systems such as the strictly in-order pipeline are lengthy and therefore prone to errors. To increase the confidence in the predictability results, and to prepare for even more complex future processor designs, we explored the use of SMT (satisfiability-modulo-theories) solvers to automate these proofs. Encoding the processor’s cycle-by-cycle behavior and a small number of key invariants using logical formulas allows to carry out the timing-predictability proofs via satisfiable queries to the Z3 SMT solver. Another major outcome of this project are new precise and efficient timing analysis techniques that exploit compositionality where it is beneficial. One of these new analyses is the first precise worst-case execution time (WCET) analysis that can deal with write-back caches. Write-back caches are favored over write-through caches in industry, because they usually offer higher performance, but prior to this project there was a lack of precise timing analyses for write-back caches. By taking a new “store-focussed” perspective on the problem, our new analysis often results in significantly better WCET bounds than previous approaches, which approached the problem solely from an “eviction-focussed” perspective. This new WCET analysis is complemented by a new compositional response-time analysis framework that soundly captures the effects of write-back caches in a fixed-priority scheduling setting. Together, the WCET and the response-time analysis, for the first time, enable the timing verification of systems with write-back caches, exploiting the benefits of write-back caches over write-through caches. Early on during the project we realized that compositional task-level cache analyses would be too imprecise to be of practical use. Thus we focussed on improving the precision of integrated task-level cache analyses. The outcome of this endeavor are the first exact local and global cache analysis algorithms. Unlike all of the prior work, which sacrifices precision for analysis efficiency, these new analyses are exact, i.e., they achieve the maximal possible precision, while remaining sufficiently efficient. Analysis efficiency is achieved by exploiting a monotonicity property of caches with least-recently-used replacement.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung