Project Details
LEAP: Locality-Driven High-Performance and Energy-Efficient GPUs in the Post-Dennard Era
Applicant
Dr.-Ing. Sohan Lal
Subject Area
Computer Architecture, Embedded and Massively Parallel Systems
Term
since 2026
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 553209396
Graphics Processing Units (GPUs) were initially designed for graphics applications, but their massive computational power has made them highly effective for general-purpose computing tasks such as scientific simulations and machine learning (ML). Today, GPU-accelerated systems are integral to many advancements, including the success of Generative AI. While GPU-accelerated systems with higher computational power and energy efficiency are desirable, the semiconductor industry faces significant challenges in scaling performance and energy efficiency due to the end of Dennard Scaling and the slowdown of Moore’s Law. Until new technologies like quantum computing become practical, system architects and programmers must optimize every aspect of GPU performance for sustainable computing. This project, LEAP (Locality-driven high-pErformance And energy-efficient GPUs), aims to enhance GPU performance and energy efficiency by optimizing the memory hierarchy to better exploit data locality, particularly spatial locality, which is the key for accessing data with lower latency, lower energy, and higher bandwidth. Caches are a crucial part of the memory hierarchy in modern processors and operate on the principle of data locality. Despite their benefits, exploiting data locality in GPUs is challenging due to issues like memory divergence, which leads to over-fetching and inefficient cache use. Modern GPUs use sector caches to mitigate over-fetching, but this conservative design misses opportunities to leverage higher spatial locality. Our initial work illustrates that there is an immense potential to improve sector cache design by employing an ML-bassed spatial locality predictor. The LEAP project aims to enhance GPU performance and energy efficiency through the following objectives: 1) Classical Predictor Integration: Evaluate the feasibility of augmenting sector caches of GPUs with a classical, history-based spatial locality predictor to reduce under-fetching. This will involve considering the unique challenges and opportunities presented by the massively parallel nature of GPUs. 2) ML-Based Predictor Design: Develop and implement an ML-based spatial locality predictor for GPUs. While GPUs accelerated ML adoption, ML can also enhance GPU design. Our initial results show that an ML predictor can improve fetch accuracy by up to 74% and reduce execution time by 28%. 3) Shared L1 Cache Adaptation: Evaluate and adapt the spatial locality predictor for shared L1 caches and other orthogonal approaches to exploit data locality more effectively.
DFG Programme
Research Grants
International Connection
Greece
Cooperation Partner
Professor Georgios Keramidas, Ph.D.
