Self-optimizing decentralized production control
Final Report Abstract
With the results of this project, the great potentials of, on the one hand, decentralised and thus locally limited and, on the other hand, optimised production control based on experience through reinforcement learning could be shown. In this way, the complexity of decision optimisation in modern manufacturing systems can be significantly reduced. At the same time, the technological potentials of increasing digitalisation, networking and the availability of decentralised computing capacities are exploited. In order to be able to efficiently apply the Deep Q-Learning algorithm in this context both decentrally for decision-making and centrally for optimisation, taking into account global manufacturing key figures, decision-making was decoupled from optimisation in terms of location and time, contrary to the original algorithm [MNI15]. For this purpose, an architecture for a multi-agent system was developed in the sense of a cyber-physical production system, which enables completely decentralised data acquisition and management as well as communication between the agents. These agents represent the orders and machines within the respective production system and functionally map the subtasks of availability checking, work distribution and sequence formation within production control. In addition, an administration agent was introduced that maps the interface to all peripheral systems (ERP, simulation model). As a second essential sub-method, a central DQN module was conceptualised. In this module, the experience data generated by the order and machine agents are managed in a global database and used for optimisation. In the process, a neural network is continuously updated with the network parameters optimised using central, high-performance computing capacities. This neural network is stored centrally and can be retrieved by the decentralised agents during initialisation and integrated for real-time decision-making. As a result, a relative delivery date deviation between about -10 and 0 % could be achieved constantly. In addition, an evaluation system was developed that takes into account both the local system status and global production key figures as a basis for evaluation.
Publications
-
A deep q-learning-based optimization of the inventory control in a linear process chain. Production Engineering, 15(1), 35-43.
Dittrich, M.-A. & Fohlmeister, S.
-
Cooperative multi-agent system for production control using reinforcement learning. CIRP Annals, 69(1), 389-392.
Dittrich, Marc-André & Fohlmeister, Silas
-
Dezentrale Produktionssteuerung in der Werkstattfertigung. VDI-Z, 162(05-06), 63-65.
Denkena, Berend; Dittrich, Marc-André; Keunecke, Lars & Fohlmeister, Silas
-
fms_marl - Scalable cooperative Multi- Agent-Reinforcement-Learning for order-controlled on schedule manufacturing in flexible manufacturing systems
Fohlmeister, S.; Palmer, G. & Kemp, D.
-
Selbstoptimierende Reihenfolgebildung in der Fertigung/Intelligent order sequencing in manufacturing. wt Werkstattstechnik online, 111(04), 212-216.
Fohlmeister, Silas; Denkena, Berend & Dittrich, André
