Project Details
Projekt Print View

DSI2: Learning Data Structure Behaviour from Executions of Pointer Programs

Subject Area Software Engineering and Programming Languages
Term from 2014 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 242347041
 
This project shall develop novel techniques and tools to automatically and reliably identify dynamic data structures in a given pointer program, by analysing traces of its execution. This is a challenging task due to the variety and complexity of coding styles seen in real-world software, such as exhibited by the cyclic doubly-linked list implementation of the Linux kernel. The targeted use cases for such an advanced analysis are program comprehension, e.g., for understanding legacy code; formal verification, e.g., for automatically generating verification conditions needed by verification tools; and reverse engineering, e.g., for analysing malware binaries.Since June 2014 we have developed and evaluated "Data Structure Investigator" (DSI), a software tool that advances upon the state-of-the-art in two ways. Firstly, DSI employs a novel memory abstraction to deal with the many non-standard data structure implementations in C code, including custom memory allocators and architecture-specific optimizations. Secondly, to address the challenge that data structure manipulations temporarily obscure a structure´s true form, DSI implements a new evidence-based approach that is tolerant of such degenerate shapes. Further, we have successfully employed DSI to prototype the above use cases. For program comprehension, we have implemented a visualization tool for making complex list-based data structures legible. For program verification, we have written a verification condition generator that utilises DSI´s output to automatically infer simple verification conditions for C programs, such as shape invariance properties, for use by the VeriFast program verifier. For reverse engineering, we have demonstrated how DSI can be interfaced to an external type inference tool, Howard, and Intel´s PIN instrumentation framework for binaries, to analyse simple malware components.On the basis of our prototypes, the follow-up project proposed here shall significantly enhance DSI´s scope and usability along three axes. Firstly, our approach shall be deepened so as to be able to analyse richer data structures, such as complex nestings and balanced trees, and payload data, such as sorted lists and trees. Secondly, DSI shall be advanced for the reverse engineering and analysis of complex malware. This involves tackling the type inference problem in binaries, producing a DSI plug-in for the popular IDA Pro disassembler and debugger, and investigating the utility of DSI´s analysis information for malware signaturing and classification. Thirdly, the VeriFast plug-in shall be redesigned conceptually and practically, so as to vastly enhance automation by addressing the many coding styles used in C programs. The improved DSI tool shall be evaluated by large, real-world case studies and made available to the public.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung