Project Details
Projekt Print View

DSI2: Learning Data Structure Behaviour from Executions of Pointer Programs

Subject Area Software Engineering and Programming Languages
Term from 2014 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 242347041
 
Final Report Year 2024

Final Report Abstract

Software development in industry focuses on evolution and maintenance rather than creating new software. While information about a software component's behaviour is usually available and enough for developers to reuse the component, it is often insufficient for conducting maintenance tasks: developers need a detailed understanding of the component's internals, including how data is organized. Indeed, the choice of data structures is a key implementation aspect and significantly contributes to software complexity. For example, when non-functional properties like low memory usage are crucial, developers tend to implement new data structures. Understanding these (non-standard) structures is vital for comprehending the software as a whole. This project focused on analyzing dynamic data structures, with an emphasis on their state, i.e., how data is organized in memory. State information falls into two categories: Structural aspects concern the shape created by linkage structure, e.g., whether that structure forms a binary tree. Semantic aspects impose constraints, e.g., whether a tree meets the binary search tree criteria. We were especially interested in how a data structure's formal specification can be derived automatically from a program and enhance program comprehension. Program heap visualization can also benefit from formal characterizations, and software verification requires a formal specification to prove correctness, too. The results of the project’s first half demonstrated via our novel dynamic analyzer for extracting, identifying and aggregating pointer linkages, so called strands, that dynamic data structures can be identified reliably from source code executions, even when nonstandard implementations are involved. To some extent, it is also possible to obtain data structure information from compiled binaries, e.g., from malware. However, analysis quality heavily depends on tools, which are not publicly available, for extracting type information from binaries. This led us to focus in the project’s second half on programs in byte code, like those compiled from Java, where type information is explicit. We first developed a state-level analysis that generates formal shape predicates to describe dynamic data structures. To enhance program comprehension, we showed how this shape information can be used to automatically create data structure state abstractions for visualization. We also designed a lightweight language for encoding similar abstractions when no formal predicate is available. In addition, we devised a new analysis that considers invalid data structure states, too, which is needed to obtain precise constraints describing also a structure’s semantic aspects. When dealing with classes exhibiting complex linkage structures, our obtained constraints effectively provide a shape predicate specification that can serve as a starting point for program verification.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung