Development of deep reinforcement learning method for production scheduling in a Two-stage flow production system with parallel machines and sequence-dependent setup times

Gerpott, Falk Torsten

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/79358

Title:	Development of deep reinforcement learning method for production scheduling in a Two-stage flow production system with parallel machines and sequence-dependent setup times
Other Titles:	การพัฒนาวิธีการเรียนรู้ของการเสริมแรงเชิงลึกเพื่อการจัดตารางการผลิตในระบบของการผลิตแบบสองขั้นตอนด้วยเครื่องจักรแบบขนานและเวลาตั้งค่าขึ้นกับลำดับงาน
Authors:	Gerpott, Falk Torsten
Authors:	Poti Chaopaisarn Zadek, Hartmut Gerpott, Falk Torsten
Issue Date:	10-Nov-2021
Publisher:	Chiang Mai : Graduate School, Chiang Mai University
Abstract:	Production scheduling covers the allocation and sequencing of jobs in a production system. Traditional solution methods for this optimization problem often face a tradeoff between an acceptable solution quality and a short computation time. In addition, current trends in production and logistics amplify the need for real-time solutions that are also capable of adapting to changes in demand, products or the production system. Reinforcement learning as one machine learning method offers a promising alternative approach to better cope with well-known tradeoff and new trend challenges. In a reinforcement learning framework a production system is construed as an environment in which an agent makes decisions regarding job allocation and sequencing. This agent can be represented by a (deep) neural network, which learns through a reward what decisions lead to good results in terms of certain goal criteria. Different algorithms can be used for learning purposes to achieve stable and efficient training progress. The present applies the Advantage Actor Critic (A2C) algorithm is applied, which combines two learning approaches and parallelizes its learning process. The A2C is investigated for the first time for a scheduling problem with sequence- dependent setup times in a hybrid flow shop system. A pre-implemented A2C algorithm of the Python library Stable Baselines3 is used. As specific application, a real production system is modeled in a salabim simulation model, with the agent taking decisions via the OpenAI Gym interface. The results indicate that deep reinforcement learning keeps up with or even outperforms previous solution approaches in terms of solution quality, as well as computational efficiency.
URI:	http://cmuir.cmu.ac.th/jspui/handle/6653943832/79358
Appears in Collections:	ENG: Theses

Files in This Item:

File	Description	Size	Format
640631140 FALK TORSTEN GERPOTT.pdf		28.81 MB	Adobe PDF	View/Open Request a copy

Show full item record