通过约束规划增强柔性作业车间调度的多智能体深度强化学习

IF 4.3 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Operations Research Pub Date : 2026-06-01 Epub Date: 2026-02-11 DOI:10.1016/j.cor.2026.107428

Alexandre Jesus , Arthur Corrêa , Miguel Vieira , Catarina Marques , Cristóvão Silva , Samuel Moniz

{"title":"通过约束规划增强柔性作业车间调度的多智能体深度强化学习","authors":"Alexandre Jesus , Arthur Corrêa , Miguel Vieira , Catarina Marques , Cristóvão Silva , Samuel Moniz","doi":"10.1016/j.cor.2026.107428","DOIUrl":null,"url":null,"abstract":"<div><div>This paper introduces <em>PRISMA</em>, a hybrid multi-agent Deep Reinforcement Learning (DRL) framework for solving the Flexible Job-shop Scheduling Problem (FJSP). It uses Constraint Programming (CP) solutions to pretrain decentralized policies and to guide exploration during training. Although DRL can generate fast solutions for large combinatorial problems, it often fails to match the quality of optimization methods, motivating the integration with hybrid frameworks. The growing interest in embedding domain knowledge into learning algorithms has produced several hybrid formulations, yet their potential remains underexplored, particularly in multi-agent settings. <em>PRISMA</em> combines supervised and reinforcement learning within a multi-agent framework, where CP solutions are used to (i) learn expert decisions through imitation learning, and (ii) train an auxiliary network that guides DRL training via reward shaping. A shared graph network is adopted for transferring system-level knowledge into machine-level observations, enabling fast and consistent inference from enriched local embeddings. To the best of our knowledge, <em>PRISMA</em> introduces the first expert-derived guidance mechanism for the FJSP and is among the earliest to apply imitation learning within a multi-agent formulation. By combining both modules, it strengthens the bridge between optimization and learning-based methods, where such dual integrations remain scarce. Experimental results show faster convergence and higher solution quality than state-of-the-art DRL models. <em>PRISMA</em> achieves an average optimality gap of 6.74%, corresponding to a 50% relative improvement over the single-agent baseline, while reducing inference time. These findings reinforce the value of merging optimization accuracy with the flexibility of multi-agent DRL for efficient scheduling.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"190 ","pages":"Article 107428"},"PeriodicalIF":4.3000,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing multi-agent deep reinforcement learning for flexible job-shop scheduling through constraint programming\",\"authors\":\"Alexandre Jesus , Arthur Corrêa , Miguel Vieira , Catarina Marques , Cristóvão Silva , Samuel Moniz\",\"doi\":\"10.1016/j.cor.2026.107428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper introduces <em>PRISMA</em>, a hybrid multi-agent Deep Reinforcement Learning (DRL) framework for solving the Flexible Job-shop Scheduling Problem (FJSP). It uses Constraint Programming (CP) solutions to pretrain decentralized policies and to guide exploration during training. Although DRL can generate fast solutions for large combinatorial problems, it often fails to match the quality of optimization methods, motivating the integration with hybrid frameworks. The growing interest in embedding domain knowledge into learning algorithms has produced several hybrid formulations, yet their potential remains underexplored, particularly in multi-agent settings. <em>PRISMA</em> combines supervised and reinforcement learning within a multi-agent framework, where CP solutions are used to (i) learn expert decisions through imitation learning, and (ii) train an auxiliary network that guides DRL training via reward shaping. A shared graph network is adopted for transferring system-level knowledge into machine-level observations, enabling fast and consistent inference from enriched local embeddings. To the best of our knowledge, <em>PRISMA</em> introduces the first expert-derived guidance mechanism for the FJSP and is among the earliest to apply imitation learning within a multi-agent formulation. By combining both modules, it strengthens the bridge between optimization and learning-based methods, where such dual integrations remain scarce. Experimental results show faster convergence and higher solution quality than state-of-the-art DRL models. <em>PRISMA</em> achieves an average optimality gap of 6.74%, corresponding to a 50% relative improvement over the single-agent baseline, while reducing inference time. These findings reinforce the value of merging optimization accuracy with the flexibility of multi-agent DRL for efficient scheduling.</div></div>\",\"PeriodicalId\":10542,\"journal\":{\"name\":\"Computers & Operations Research\",\"volume\":\"190 \",\"pages\":\"Article 107428\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2026-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Operations Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0305054826000468\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2026/2/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054826000468","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/11 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

介绍了用于求解柔性作业车间调度问题的混合多智能体深度强化学习（DRL）框架PRISMA。它使用约束规划（CP）解决方案来预训练分散策略，并在训练过程中指导探索。尽管DRL可以为大型组合问题生成快速解，但它往往无法与优化方法的质量相匹配，这促使了与混合框架的集成。人们对将领域知识嵌入学习算法的兴趣日益浓厚，已经产生了几种混合公式，但它们的潜力仍未得到充分开发，特别是在多智能体设置中。PRISMA在多智能体框架内结合了监督学习和强化学习，其中CP解决方案用于(i)通过模仿学习学习专家决策，以及（ii）训练辅助网络，通过奖励塑造指导DRL训练。采用共享图网络将系统级知识转换为机器级观测，实现了从丰富的局部嵌入中快速一致的推理。据我们所知，PRISMA为FJSP引入了第一个专家衍生的指导机制，并且是最早在多智能体配方中应用模仿学习的机制之一。通过结合这两个模块，它加强了优化和基于学习的方法之间的桥梁，这种双重集成仍然很少见。实验结果表明，与现有的DRL模型相比，收敛速度更快，求解质量更高。PRISMA实现了6.74%的平均最优性差距，相对于单智能体基线提高了50%，同时减少了推理时间。这些发现强化了将优化精度与多智能体DRL的灵活性相结合以实现高效调度的价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing multi-agent deep reinforcement learning for flexible job-shop scheduling through constraint programming

This paper introduces PRISMA, a hybrid multi-agent Deep Reinforcement Learning (DRL) framework for solving the Flexible Job-shop Scheduling Problem (FJSP). It uses Constraint Programming (CP) solutions to pretrain decentralized policies and to guide exploration during training. Although DRL can generate fast solutions for large combinatorial problems, it often fails to match the quality of optimization methods, motivating the integration with hybrid frameworks. The growing interest in embedding domain knowledge into learning algorithms has produced several hybrid formulations, yet their potential remains underexplored, particularly in multi-agent settings. PRISMA combines supervised and reinforcement learning within a multi-agent framework, where CP solutions are used to (i) learn expert decisions through imitation learning, and (ii) train an auxiliary network that guides DRL training via reward shaping. A shared graph network is adopted for transferring system-level knowledge into machine-level observations, enabling fast and consistent inference from enriched local embeddings. To the best of our knowledge, PRISMA introduces the first expert-derived guidance mechanism for the FJSP and is among the earliest to apply imitation learning within a multi-agent formulation. By combining both modules, it strengthens the bridge between optimization and learning-based methods, where such dual integrations remain scarce. Experimental results show faster convergence and higher solution quality than state-of-the-art DRL models. PRISMA achieves an average optimality gap of 6.74%, corresponding to a 50% relative improvement over the single-agent baseline, while reducing inference time. These findings reinforce the value of merging optimization accuracy with the flexibility of multi-agent DRL for efficient scheduling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Operations Research 工程技术-工程：工业

CiteScore

8.60

自引率

8.70%

发文量

292

审稿时长

8.5 months

期刊介绍： Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.