Project Details
Integrating machine learning in combinatorial dynamic optimization for urban transportation services
Applicant
Professor Dr. Marlin Ulmer
Subject Area
Management and Marketing
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 510629371
The goal of this project is to provide effective decision support for stochastic dynamic pickup and delivery problems by combining the strengths of mixed-integer linear programming (MILP) and reinforcement learning (RL).Stochastic dynamic pickup-and-delivery problems play an increasingly important role in urban logistics. They are characterized by the often time-critical transport of wares or passengers in the city. Common examples are same-day delivery, ridesharing, and restaurant meal delivery. The mentioned problems have in common that a sequence of decision problems with future uncertainty must be solved in every decision step where the full value of a decision reveals only later in the service horizon. Searching the combinatorial decision space of the subproblems for efficient and feasible tours is a complex task of solving a MILP. This complexity is now multiplied by the challenge of evaluating such decision with respect to their effectiveness given future dynamism and uncertainty; an ideal case for RL. Both are crucial to fully meet operational requirements. Therefore, a direct combination of both methods is needed. Yet, a seamless integration has not been established due to different reasons and is the aim of this research project. We suggest using RL to manipulate the MILP itself to derive not only efficient but also effective decisions. This manipulation may change the objective function or the constraints. Incentive or penalty terms can be added to the objective function to enforce or prohibit the selection of certain decisions. Alternatively, the constraints may be adapted to reserve fleet-resources.The challenge is to decide where and how the manipulation takes place. SDPDPs have constraints with respect to routing, vehicle capacities, or time windows. Some constraints may be irrelevant for the fleet’s flexibility while others might be binding. The first part of the research project focuses on identifying the “interesting” parts of the MILP via (un-)supervised learning. Once the “interesting” parts are identified, the second challenge is to find the right parametrization. Here, we will apply RL methods to learn the state-dependent manipulation of the MILP components.
DFG Programme
Research Grants
International Connection
Netherlands
Cooperation Partner
Professor Roberto Roberti, Ph.D.