Project Details
Projekt Print View

A Method for Integrating Allocation and Binding in Modulo Scheduling using Complex Sub-circuits

Subject Area Computer Architecture, Embedded and Massively Parallel Systems
Term from 2019 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 420658696
 
Final Report Year 2023

Final Report Abstract

In this project existing procedures for the generation of circuits on register transfer level from behavior descriptions (e.g. C/C++) should be improved, and/or new procedures should be developed. The following procedures served as a starting point: 1. Iterative resource allocation: The resources available in the hardware implementation (e.g., number of multipliers, block RAM instances, or memory interfaces) are iteratively changed to allow for different throughput/hardware tradeoffs, requiring the subsequent steps of the synthesis process to be traversed for each resource allocation. 2. Modulo scheduling for a given resource allocation: Successive computations of the algorithm to be implemented are interleaved, in order to utilize the available hardware as efficiently as possible, by exploiting parallelism in the input description. Thereby the start times of the operations to be executed must be chosen in such a way that only the resources made available in the allocation are used. 3. Binding for the given modulo schedule: The assignment, which operation gets executed on which hardware unit directly influences the additional overhead in the data path that is necessary to provide the correct data to the compute units. Therefore, binding algorithms are used to decide which operation is executed on which hardware unit for a given modulo schedule so that the additional implementation overhead is minimized. This isolated solution of the individual steps, the complexity of the modulo scheduling problem, and the size of real-world problems means that typically only a fraction of the entire design space can be explored and often only sub-optimal solutions can be obtained with respect to the throughput/hardware tradeoff. Therefore, the following goals were defined: 1. Provide throughput-optimal hardware implementations by combining allocation and modulo scheduling. 2. Reducing HLS runtime through a novel hardware-aware subgraph mining heuristic. 3. Reduction of computation time for modulo scheduling. The implementation of Goal 1 has the consequence that Pareto-optimal circuits with respect to hardware effort/throughput can be specifically identified and that dominated solutions can partly be skipped without having to be explicitly computed. The implementation of the second goal results in a reduction of the time required for subgraph mining due to problem-specific constraints on the search space. The use of the results from Goal 2 to solve the third goal causes a systematic reduction of the resulting modulo scheduling problems, which results in a significant reduction of runtime.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung