Alexandre Jesus , Arthur Corrêa , Miguel Vieira , Catarina Marques , Cristóvão Silva , Samuel Moniz
{"title":"通过约束规划增强柔性作业车间调度的多智能体深度强化学习","authors":"Alexandre Jesus , Arthur Corrêa , Miguel Vieira , Catarina Marques , Cristóvão Silva , Samuel Moniz","doi":"10.1016/j.cor.2026.107428","DOIUrl":null,"url":null,"abstract":"<div><div>This paper introduces <em>PRISMA</em>, a hybrid multi-agent Deep Reinforcement Learning (DRL) framework for solving the Flexible Job-shop Scheduling Problem (FJSP). It uses Constraint Programming (CP) solutions to pretrain decentralized policies and to guide exploration during training. Although DRL can generate fast solutions for large combinatorial problems, it often fails to match the quality of optimization methods, motivating the integration with hybrid frameworks. The growing interest in embedding domain knowledge into learning algorithms has produced several hybrid formulations, yet their potential remains underexplored, particularly in multi-agent settings. <em>PRISMA</em> combines supervised and reinforcement learning within a multi-agent framework, where CP solutions are used to (i) learn expert decisions through imitation learning, and (ii) train an auxiliary network that guides DRL training via reward shaping. A shared graph network is adopted for transferring system-level knowledge into machine-level observations, enabling fast and consistent inference from enriched local embeddings. To the best of our knowledge, <em>PRISMA</em> introduces the first expert-derived guidance mechanism for the FJSP and is among the earliest to apply imitation learning within a multi-agent formulation. By combining both modules, it strengthens the bridge between optimization and learning-based methods, where such dual integrations remain scarce. Experimental results show faster convergence and higher solution quality than state-of-the-art DRL models. <em>PRISMA</em> achieves an average optimality gap of 6.74%, corresponding to a 50% relative improvement over the single-agent baseline, while reducing inference time. These findings reinforce the value of merging optimization accuracy with the flexibility of multi-agent DRL for efficient scheduling.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"190 ","pages":"Article 107428"},"PeriodicalIF":4.3000,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing multi-agent deep reinforcement learning for flexible job-shop scheduling through constraint programming\",\"authors\":\"Alexandre Jesus , Arthur Corrêa , Miguel Vieira , Catarina Marques , Cristóvão Silva , Samuel Moniz\",\"doi\":\"10.1016/j.cor.2026.107428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper introduces <em>PRISMA</em>, a hybrid multi-agent Deep Reinforcement Learning (DRL) framework for solving the Flexible Job-shop Scheduling Problem (FJSP). It uses Constraint Programming (CP) solutions to pretrain decentralized policies and to guide exploration during training. Although DRL can generate fast solutions for large combinatorial problems, it often fails to match the quality of optimization methods, motivating the integration with hybrid frameworks. The growing interest in embedding domain knowledge into learning algorithms has produced several hybrid formulations, yet their potential remains underexplored, particularly in multi-agent settings. <em>PRISMA</em> combines supervised and reinforcement learning within a multi-agent framework, where CP solutions are used to (i) learn expert decisions through imitation learning, and (ii) train an auxiliary network that guides DRL training via reward shaping. A shared graph network is adopted for transferring system-level knowledge into machine-level observations, enabling fast and consistent inference from enriched local embeddings. To the best of our knowledge, <em>PRISMA</em> introduces the first expert-derived guidance mechanism for the FJSP and is among the earliest to apply imitation learning within a multi-agent formulation. By combining both modules, it strengthens the bridge between optimization and learning-based methods, where such dual integrations remain scarce. Experimental results show faster convergence and higher solution quality than state-of-the-art DRL models. <em>PRISMA</em> achieves an average optimality gap of 6.74%, corresponding to a 50% relative improvement over the single-agent baseline, while reducing inference time. These findings reinforce the value of merging optimization accuracy with the flexibility of multi-agent DRL for efficient scheduling.</div></div>\",\"PeriodicalId\":10542,\"journal\":{\"name\":\"Computers & Operations Research\",\"volume\":\"190 \",\"pages\":\"Article 107428\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2026-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Operations Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0305054826000468\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2026/2/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054826000468","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/11 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Enhancing multi-agent deep reinforcement learning for flexible job-shop scheduling through constraint programming
This paper introduces PRISMA, a hybrid multi-agent Deep Reinforcement Learning (DRL) framework for solving the Flexible Job-shop Scheduling Problem (FJSP). It uses Constraint Programming (CP) solutions to pretrain decentralized policies and to guide exploration during training. Although DRL can generate fast solutions for large combinatorial problems, it often fails to match the quality of optimization methods, motivating the integration with hybrid frameworks. The growing interest in embedding domain knowledge into learning algorithms has produced several hybrid formulations, yet their potential remains underexplored, particularly in multi-agent settings. PRISMA combines supervised and reinforcement learning within a multi-agent framework, where CP solutions are used to (i) learn expert decisions through imitation learning, and (ii) train an auxiliary network that guides DRL training via reward shaping. A shared graph network is adopted for transferring system-level knowledge into machine-level observations, enabling fast and consistent inference from enriched local embeddings. To the best of our knowledge, PRISMA introduces the first expert-derived guidance mechanism for the FJSP and is among the earliest to apply imitation learning within a multi-agent formulation. By combining both modules, it strengthens the bridge between optimization and learning-based methods, where such dual integrations remain scarce. Experimental results show faster convergence and higher solution quality than state-of-the-art DRL models. PRISMA achieves an average optimality gap of 6.74%, corresponding to a 50% relative improvement over the single-agent baseline, while reducing inference time. These findings reinforce the value of merging optimization accuracy with the flexibility of multi-agent DRL for efficient scheduling.
期刊介绍:
Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.