Project Details
Energy-Efficient Hardware Acceleration of Transformer Models Using Left-to-Right Arithmetic
Applicant
Dr. Muhammad Usman, Ph.D.
Subject Area
Methods in Artificial Intelligence and Machine Learning
Computer Architecture, Embedded and Massively Parallel Systems
Computer Architecture, Embedded and Massively Parallel Systems
Term
since 2026
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 573796083
This project aims to create energy-efficient techniques for accelerating Vision Transformer (ViT) models—an emerging class of neural networks used in computer vision tasks such as object detection, medical imaging, and autonomous systems. While ViT models offer high accuracy, they demand substantial computational resources and energy, making them challenging to deploy on mobile or edge devices and raising significant sustainability concerns as AI adoption scales. To address these challenges, the proposed research introduces a novel approach to hardware acceleration based on left-to-right (LR) arithmetic—a technique that performs computations incrementally and can terminate early once the desired precision is achieved. This early stopping mechanism can reduce power consumption by up to 50% compared to conventional full-precision methods. This significantly contributes to ecological sustainability by minimizing unnecessary computation and reducing switching activity and memory access. The project will integrate LR arithmetic into key computational blocks of the Vision Transformer architecture, including matrix multiplication units and activation functions such as ReLU, LayerNorm, Softmax, and GELU. By designing low-latency, resource-aware accelerator units in hardware description languages (HDL), the project aims to support scalable and energy-efficient inference, both on the cloud and on constrained edge devices. Another central objective is to optimize the ViT model architecture using advanced compression and quantization techniques. These include structured pruning, mixed-precision arithmetic, and integer-based approximations of non-linear operations. Such techniques reduce model size and computational complexity while preserving accuracy, making it feasible to run ViT models efficiently in real-time scenarios. The proposed hardware will be implemented on Field Programmable Gate Arrays (FPGAs) and evaluated against modern CPU and GPU platforms. Benchmarking will focus on metrics such as latency, energy efficiency, throughput, and memory footprint. The goal is to demonstrate that LR-based acceleration outperforms traditional architectures in terms of both speed and energy consumption. Additionally, the project will explore bandwidth-aware memory architectures and optimized dataflow scheduling to further reduce off-chip data transfers and boost on-chip data reuse. This will result in improved throughput and energy savings—critical for real-time processing and low-power AI deployments. Finally, the project aims to generalize these techniques across various ViT models (e.g., DeiT, Swin), proving their flexibility and reuse potential in both cloud and edge contexts. By integrating sustainability at the architectural level, this work goes beyond technical advancement—it addresses the broader responsibility of building greener, more efficient AI systems for future generations.
DFG Programme
Position
