A recent study published in Engineering presents a significant advancement in manufacturing scheduling. Researchers Xueyan Sun, Weiming Shen, Jiaxin Fan, and their colleagues from Huazhong University of Science and Technology and the Technical University of Munich have developed an improved proximal policy optimization (IPPO) method to address the distributed heterogeneous hybrid blocking flow-shop scheduling problem (DHHBFSP).
The DHHBFSP is a complex optimization challenge in manufacturing. In distributed manufacturing settings, jobs with diverse requirements arrive randomly at different hybrid flow shops. These shops have varying numbers of machines and processing times, and blocking constraints further complicate the scheduling process. The researchers aimed to minimize both total tardiness and total energy consumption simultaneously, two crucial factors in enhancing production efficiency and reducing costs.
To tackle this problem, the team formulated a multi-objective Markov decision process (MOMDP) model for the DHHBFSP. They defined state features, a vector-based reward function, and an end-to-end action space. In their proposed IPPO method, a factory agent (FA) was assigned to each factory. Multiple FAs worked asynchronously to select unscheduled jobs. This approach allowed for real-time decision-making in response to the dynamic arrival of jobs.
The IPPO method incorporated a two-stage training strategy. By learning from both single-policy and dual-policy data, it improved data utilization. The researchers trained two PPO networks within a single agent with different weight distributions for the objectives. This enabled the exploration of additional Pareto solutions and broadened the Pareto front, leading to better scheduling solutions.
The researchers tested the IPPO method on randomly generated instances and compared it with various other methods, including variants of the basic PPO, dispatch rules, multi-objective metaheuristics, and multi-agent reinforcement learning methods. The experimental results were promising. The IPPO method outperformed the other methods in terms of convergence and solution quality. It achieved better invert generational distance (IGD) and purity (P) values, indicating that it could obtain non-dominated solutions closer to the real Pareto front and had a higher proportion of non-dominated solutions.
This research has important implications for the manufacturing industry. The IPPO method provides a more efficient way to schedule jobs in distributed heterogeneous hybrid flow shops, which can lead to reduced production times and energy consumption. In the future, the researchers plan to optimize the training settings of the IPPO algorithm to ensure consistent performance across different instances. They also aim to explore its applicability to other types of distributed scheduling problems, such as distributed job shop scheduling and distributed flexible job shop scheduling. Additionally, they will investigate a new DRL method that combines metaheuristics for multi-objective problems.
The paper "Deep Reinforcement Learning-based Multi-Objective Scheduling for Distributed Heterogeneous Hybrid Flow Shops with Blocking Constraints," authored by Xueyan Sun, Weiming Shen, Jiaxin Fan, Birgit Vogel-Heuser, Fandi Bi, and Chunjiang Zhang. Full text of the open access paper: https://doi.org/10.1016/j.eng.2024.11.033