Project Details
Projekt Print View

Quantification of the Trade-off between Energy and Exactness in Computer Vision Processor Architectures Enhanced with Stochastic Computing Mechanisms

Subject Area Electronic Semiconductors, Components and Circuits, Integrated Systems, Sensor Technology, Theoretical Electrical Engineering
Computer Architecture, Embedded and Massively Parallel Systems
Term from 2015 to 2019
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 279180031
 
Final Report Year 2019

Final Report Abstract

In this project, the energy-accuracy trade-offs of two different processor architecture organizations, a horizontal micro-SIMD processor and a vertical vector SIMD processor, are explored quantitatively. These processor architectures are not only optimized for a specic computer vision application, but also enhanced with approximate and stochastic computing mechanisms. A complete analysis of stochastic and approximate techniques in vision processors has been performed and the potential of the different techniques depending on the employed micro-architecture has been compared. The proposed architectures are specialized for computer vision applications, e.g., image feature extraction, by exploiting inherent data-level parallelism. In the case studies performed in this project, the vertical architectures required up to 2.1x larger silicon area than horizontal architectures with identical ALU resources. This is because the level of parallelism is increased by replicating vertical vector unit resources including distributed memories, which results in a large circuit area. In contrast, the centralized memories of the horizontal architecture are not required to be increased in data size. For SIFT image feature extraction, however, the vertical vector SIMD processors achieve up to 2.7x higher performance, since the horizontal micro-SIMD architecture is limited by the instruction issuing throughput of the scalar main processor and data reordering overhead, whereas the vertical vector SIMD processor allows a high clock frequency and exible data memory accesses. When comparing performance-area-energy eciency metrics, the vertical architectures achieve up to 3.9x more ecient SIFT execution than the horizontal architectures. These processor architectures can be enhanced with approximate and stochastic computing units, like arithmetic units. During the execution of the project, different adder and multiplier structures for approximate and stochastic processors were evaluated. The VHDL code of the implemented and evaluated designs is going to be available to the public by means of an open-source library (www.ids.uni-bremen.de/repostoch/). A new proposed approximate adder Optimized Lower-Part Constant-OR Adder (OLOCA) improves the mean squared error by 58% at a 13.8% lower area-delay product compared to a state-of-the-art Lower-Part OR Adder (LOA). Since current error metrics were found to be misleading when applied to differing classes of errors, emphasis has been put on creating a combined Saturated Mean Squared Error (SMSE) metric that allows a fair comparison of both approximate and stochastic error effects throughout the evaluated design library. In order to avoid slow gate-level simulations to characterize the stochastic behavior of an arithmetic circuit, an FPGA-based timing analysis framework (FLINT+) is proposed to accelerate the analysis of stochastic mechanisms by a speed-up factor of up to 476x. The processor architectures enhanced with approximate ALUs have been analyzed regarding their energy-accuracy trade-off for an egomotion estimation application. For this application, SIFT image features are matched and traced in stereoscopic camera video sequences from a vehicle to obtain an estimation of the vehicle movement. Early results demonstrate that the use of an approximative multiplier, i.e., an accuracy-congurable Broken-Array Multiplier (BAM), can reduce the datapath power consumption up to 23.3% for the horizontal processor datapath and up to 8.9% for the vertical processor datapath, while maintaining a similar estimation accuracy compared to GPS reference data. The results highlight the advantages of stochastic and approximate techniques for the improvement of vision processors. However, an integral optimization in all processor components will be necessary to exploit the full potential, including the interconnect architecture, memory, and control logic. Moreover, it is shown that different sections of Computer Vision algorithms have varying accuracy requirements. Due to that, approximate and stochastic processors will require mechanisms for accuracy reconguration to access the full energy-accuracy trade-off potential of an application.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung