Quantifizierung des Trade-Offs zwischen Energie und Berechnungsgenauigkeit in Computer Vision Prozessorarchitekturen erweitert mit stochastischen Berechnungsmechanismen

Antragsteller Professor Dr.-Ing. Holger Blume; Professor Dr. Alberto Garcia-Ortiz; Professor Guillermo Paya Vaya, Ph.D.

Fachliche Zuordnung Elektronische Halbleiter, Bauelemente und Schaltungen, Integrierte Systeme, Sensorik, Theoretische Elektrotechnik
Rechnerarchitektur, eingebettete und massiv parallele Systeme

Förderung Förderung von 2015 bis 2019

Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 279180031

Erstellungsjahr 2019

Zusammenfassung der Projektergebnisse

In this project, the energy-accuracy trade-offs of two different processor architecture organizations, a horizontal micro-SIMD processor and a vertical vector SIMD processor, are explored quantitatively. These processor architectures are not only optimized for a specic computer vision application, but also enhanced with approximate and stochastic computing mechanisms. A complete analysis of stochastic and approximate techniques in vision processors has been performed and the potential of the different techniques depending on the employed micro-architecture has been compared. The proposed architectures are specialized for computer vision applications, e.g., image feature extraction, by exploiting inherent data-level parallelism. In the case studies performed in this project, the vertical architectures required up to 2.1x larger silicon area than horizontal architectures with identical ALU resources. This is because the level of parallelism is increased by replicating vertical vector unit resources including distributed memories, which results in a large circuit area. In contrast, the centralized memories of the horizontal architecture are not required to be increased in data size. For SIFT image feature extraction, however, the vertical vector SIMD processors achieve up to 2.7x higher performance, since the horizontal micro-SIMD architecture is limited by the instruction issuing throughput of the scalar main processor and data reordering overhead, whereas the vertical vector SIMD processor allows a high clock frequency and exible data memory accesses. When comparing performance-area-energy eciency metrics, the vertical architectures achieve up to 3.9x more ecient SIFT execution than the horizontal architectures. These processor architectures can be enhanced with approximate and stochastic computing units, like arithmetic units. During the execution of the project, different adder and multiplier structures for approximate and stochastic processors were evaluated. The VHDL code of the implemented and evaluated designs is going to be available to the public by means of an open-source library (www.ids.uni-bremen.de/repostoch/). A new proposed approximate adder Optimized Lower-Part Constant-OR Adder (OLOCA) improves the mean squared error by 58% at a 13.8% lower area-delay product compared to a state-of-the-art Lower-Part OR Adder (LOA). Since current error metrics were found to be misleading when applied to differing classes of errors, emphasis has been put on creating a combined Saturated Mean Squared Error (SMSE) metric that allows a fair comparison of both approximate and stochastic error effects throughout the evaluated design library. In order to avoid slow gate-level simulations to characterize the stochastic behavior of an arithmetic circuit, an FPGA-based timing analysis framework (FLINT+) is proposed to accelerate the analysis of stochastic mechanisms by a speed-up factor of up to 476x. The processor architectures enhanced with approximate ALUs have been analyzed regarding their energy-accuracy trade-off for an egomotion estimation application. For this application, SIFT image features are matched and traced in stereoscopic camera video sequences from a vehicle to obtain an estimation of the vehicle movement. Early results demonstrate that the use of an approximative multiplier, i.e., an accuracy-congurable Broken-Array Multiplier (BAM), can reduce the datapath power consumption up to 23.3% for the horizontal processor datapath and up to 8.9% for the vertical processor datapath, while maintaining a similar estimation accuracy compared to GPS reference data. The results highlight the advantages of stochastic and approximate techniques for the improvement of vision processors. However, an integral optimization in all processor components will be necessary to exploit the full potential, including the interconnect architecture, memory, and control logic. Moreover, it is shown that different sections of Computer Vision algorithms have varying accuracy requirements. Due to that, approximate and stochastic processors will require mechanisms for accuracy reconguration to access the full energy-accuracy trade-off potential of an application.

Projektbezogene Publikationen (Auswahl)

A fair comparison of adders in stochastic regime. In: 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS). 2017, pp. 1-6
A. Najafi, M. Weissbrich, G. Paya Vaya, and A. Garcia-Ortiz
FLINT+: A runtime-congurable emulation-based stochastic timing analysis framework. In: 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS). 2017, pp. 1-8
M. Weissbrich, G. Paya-Vaya, L. Gerlach, H. Blume, A. Najafi, and A. García-Ortiz
FPGA Emulation Methodology for Fast and Accurate Power Estimation of Embedded Processors. In: Journal of Systems Architecture 77 (2017), pp. 14-25
Hesselbarth, S.; Schewior, G. & Blume, H.
ATE-Accuracy Trade-Offs for Approximate Adders and Multipliers in Pipelined Processor Datapaths. In: 2018 Third Workshop on Approximate Computing
M. Weissbrich, A. Najafi, A. Garcia-Ortiz, and G. Paya-Vaya
Coherent Design of Hybrid Approximate Adders: Unied Design Framework and Metrics. In: IEEE Journal on Emerging and Selected Topics in Circuits and Systems 8.4 (2018), pp. 736-745
Najafi, Ardalan; Weibbrich, Moritz; Paya-Vaya, Guillermo & Garcia-Ortiz, Alberto
Misalignment-aware delay modeling of narrow on-chip interconnects considering variability. In: 2018 7th InternationalConference on Modern Circuits and Systems Technologies (MOCAST). 2018, pp. 1-4
Najafi, Amir; Bamberg, Lennart; Najafi, Ardalan & Garcia-Ortiz, Alberto
Systematic Design of an Approximate Adder: The Optimized Lower Part Constant-OR Adder. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26.8 (2018), pp. 1595-1599
Dalloo, Ayad; Najafi, Ardalan & Garcia-Ortiz, Alberto
A Coding Approach to Improve the Energy Effciency of Approximate NoCs. 2019 14th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC). 2019, pp. 1-4
Najafi, Amir; Bamberg, Lennart; Vaya, Guillermo Paya & Garcia-Ortiz, Alberto
FLINT+: A Runtime-Congurable Emulation-Based Stochastic Timing Analysis Framework. In: Integration (2019)
Weißbrich, M.; Gerlach, L.; Blume, H.; Najafi, A.; García-Ortiz, A. & Payá-Vayá, G.

Servicenavigation

Hauptnavigation

Quantifizierung des Trade-Offs zwischen Energie und Berechnungsgenauigkeit in Computer Vision Prozessorarchitekturen erweitert mit stochastischen Berechnungsmechanismen

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Servicenavigation

Hauptnavigation

Quantifizierung des Trade-Offs zwischen Energie und Berechnungsgenauigkeit in Computer Vision Prozessorarchitekturen erweitert mit stochastischen Berechnungsmechanismen

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Textvergrößerung und Kontrastanpassung