Project Details
Projekt Print View

SCAD: Synchronous Control Asynchronous Dataflow (SCAD) Architectures

Subject Area Computer Architecture, Embedded and Massively Parallel Systems
Term since 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 424386388
 
Many processor architectures expose their internal datapaths to the compiler, so that the compiler can (1) allocate processing units (PUs) for the instructions, (2) schedule the instructions on the allocated PUs, and (3) even organize the communication of intermediate results between the PUs. Thus, the idea of shifting more work to the compilers, as introduced by VLIW and other statically scheduled architectures, is extended here to the allocation of PUs and the communication of intermediate results. Solving these synthesis problems statically by the compiler can significantly increase performance by exploiting more instruction-level parallelism (ILP) while reducing processor power consumption. The decentralization of hardware components eliminates performance bottlenecks in the hardware and leads to a linear growth of chip size with respect to the number of PUs, allowing these processors to scale to a large number of PUs. In the first funding period, we formalized the code generation problem of buffered exposed datapath (BED) architectures using SAT and SMT constraints to understand the problem precisely. To this end, we developed a translation from sequential programs to data flow process networks (DPNs), whose nodes are mapped to the processing units of the BED architecture. Using SAT and SMT solvers, we are able to generate optimal code in this way. We also formalized the allocation and scheduling problems in isolation to understand their relationship and impact on the overall problem. On the hardware design side, we developed all the basic components, i.e. the buffers, processing units, and various interconnects, and evaluated preliminary prototypes. We also developed the concept of virtual buffers to achieve a design with linear circuit complexity. We also developed simulation-based tools for cycle-accurate simulation of the prototypes. Since generating optimal code with the constraints is an NP-complete problem, in the second funding period we will consider heuristics that scale polynomially with the size of the programs. To this end, we have already started to study the allocation and scheduling problems in isolation, and can reduce the allocation problem to a graph coloring problem that can be solved with polynomial-time heuristics. The complexity of the scheduling problem is currently unclear, but we have found that a large part of it consists of 2SAT clauses that can be solved in linear time. Therefore, we want to develop heuristics based on these observations and apply results from the rich theory of linear graph layouts to our code generation problem. On the hardware side, we want to remove the move instruction bus, which is currently the bottleneck of our design, and add multiple load/store units for parallel memory access. We will also explore the use of application-specific PUs and the potential of multithreaded BED architectures.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung