Quality Inspection Scheduling Problem Based on Reinforcement Learning Environment

Tao Xu, Kai Xu, Jiangming Zhang, Si-qing Yang, Jun-Heng Huang
{"title":"Quality Inspection Scheduling Problem Based on Reinforcement Learning Environment","authors":"Tao Xu, Kai Xu, Jiangming Zhang, Si-qing Yang, Jun-Heng Huang","doi":"10.1109/CEEPE58418.2023.10165918","DOIUrl":null,"url":null,"abstract":"Quantity inspection plays an important role in power metering. With the development of the digital construction of the quality inspection laboratory, the scheduling algorithm of quality inspection tasks requires higher scheduling efficiency and accuracy to meet the diverse needs of practical applications. Different from the traditional job-shop scheduling problem (JSP), there is no fixed corresponding relationship between samples and tasks in the quality inspection task scheduling problem (QISP), which means a higher degree of freedom of scheduling. At the same time, quality inspection tasks have more complex constraints such as serial, parallel, and mutual exclusion, which makes the existing scheduling algorithms cannot be directly applied. This paper builds a reinforcement learning (RL) based method for QISP. A new scheduling feature representation method is proposed to fully describe the state of quality inspection tasks and sample-device utilization. Aiming to solve the problem of sparse rewards, we present a reward function to integrate scheduling environment utilization rate and empty time. Considering the non-repetitive and complex constraints of quality inspection tasks, a set of action selection rules is proposed to replace the agent's direct learning of action decisions. Heuristic decide is used to improve the convergence speed of the algorithm and enhance the interpretability of the model's action selection. Compared with the traditional MWKR, GA, PSO algorithms, the RL-based method in this paper shows great advantages in solution quality and efficiency on the real data-set of a quality inspection laboratory of a state grid corporation.","PeriodicalId":431552,"journal":{"name":"2023 6th International Conference on Energy, Electrical and Power Engineering (CEEPE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Conference on Energy, Electrical and Power Engineering (CEEPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEPE58418.2023.10165918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Quantity inspection plays an important role in power metering. With the development of the digital construction of the quality inspection laboratory, the scheduling algorithm of quality inspection tasks requires higher scheduling efficiency and accuracy to meet the diverse needs of practical applications. Different from the traditional job-shop scheduling problem (JSP), there is no fixed corresponding relationship between samples and tasks in the quality inspection task scheduling problem (QISP), which means a higher degree of freedom of scheduling. At the same time, quality inspection tasks have more complex constraints such as serial, parallel, and mutual exclusion, which makes the existing scheduling algorithms cannot be directly applied. This paper builds a reinforcement learning (RL) based method for QISP. A new scheduling feature representation method is proposed to fully describe the state of quality inspection tasks and sample-device utilization. Aiming to solve the problem of sparse rewards, we present a reward function to integrate scheduling environment utilization rate and empty time. Considering the non-repetitive and complex constraints of quality inspection tasks, a set of action selection rules is proposed to replace the agent's direct learning of action decisions. Heuristic decide is used to improve the convergence speed of the algorithm and enhance the interpretability of the model's action selection. Compared with the traditional MWKR, GA, PSO algorithms, the RL-based method in this paper shows great advantages in solution quality and efficiency on the real data-set of a quality inspection laboratory of a state grid corporation.
基于强化学习环境的质量检验调度问题
数量检验在电力计量中起着重要的作用。随着质检实验室数字化建设的发展,对质检任务调度算法提出了更高的调度效率和精度要求,以满足实际应用的多样化需求。与传统的作业车间调度问题(JSP)不同,质量检验任务调度问题(QISP)中样品与任务之间不存在固定的对应关系,这意味着调度的自由度更高。同时,质量检测任务具有串行、并行、互斥等更为复杂的约束条件,使得现有的调度算法无法直接应用。本文建立了一种基于强化学习(RL)的QISP方法。提出了一种新的调度特征表示方法,以充分描述质量检测任务的状态和样品设备的利用情况。针对稀疏奖励问题,提出了一种综合调度环境利用率和空闲时间的奖励函数。考虑到质量检测任务的非重复性和复杂约束,提出了一套动作选择规则来取代智能体对动作决策的直接学习。采用启发式决策提高了算法的收敛速度,增强了模型动作选择的可解释性。与传统的MWKR、GA、PSO算法相比,本文基于强化学习的方法在国有电网公司质量检测实验室的实际数据集上,在求解质量和效率方面具有很大的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信