Project Details
Generation and Analysis of Synthetic data for TRansport Applications (GASTRA)
Applicant
Professor Dr. Constantinos Antoniou
Subject Area
Traffic and Transport Systems, Intelligent and Automated Traffic
Term
since 2025
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 561061973
Recently, generation and use of synthetic data, which is artificial data designed to mimic real data, has emerged as a useful approach to overcome some of the limitations and difficulties discussed above. Data quality and completeness may be improved by imputing missing values and extending the range of possible values. Real observations may be augmented to reduce the risk that individuals may be identified, or purely synthetic data may be generated, which will not be associated directly with actual individuals. Data from different sources may be synthesized to generate a unified complete dataset with consistent definitions and resolutions of variables to meet the modeling needs. A larger share of cases of rare events may be generated using a controlled mechanism that would artificially produce more balanced datasets, but also allow to account for this manipulation in the modeling task. This research aims to explore various methods to generate synthetic data to meet certain requirements, evaluate its quality, and use it, alone or integrated with real data, for the training and testing of transportation-related models and for their application to predict outcomes of “what-if” scenarios. To promote the use of the methods that will be developed, they will be implemented as a software toolbox that would be made available to the research and practitioner community as open-source code. Objective 1: Consolidate the state of the art in traffic data analysis, identifying the gaps in data collection, gathering, and integration in support of its utilization for analysis and prediction; Objective 2: Create a methodology for imputation, generation, and integration of available datasets, for various classes of data types, using processes of learning from the existing data; Objective 3: Develop an open-source toolbox for applying the methodology, which can be easily further utilized, extended, and deployed by the research community and practitioners; Objective 4: Demonstrate the applicability of the toolbox to different domains, in particular, (i) naturalistic driving data and (ii) prediction of traffic flow characteristics.
DFG Programme
Research Grants
International Connection
Israel
Partner Organisation
The Israel Science Foundation
Cooperation Partner
Professor Dr. Tomer Toledo
