强化学习在采矿系统中的应用

Aline Xavier Fidencio, T. Glasmachers, Daniel Naro
{"title":"强化学习在采矿系统中的应用","authors":"Aline Xavier Fidencio, T. Glasmachers, Daniel Naro","doi":"10.1109/SAMI50585.2021.9378663","DOIUrl":null,"url":null,"abstract":"Automation techniques have been widely applied in different industry segments, among others, to increase both productivity and safety. In the mining industry, with the usage of such systems, the operator can be removed from hazardous environments without compromising task execution and it is possible to achieve more efficient and standardized operation. In this work a study case on the application of machine learning algorithms to a mining system example is presented, in which reinforcement learning algorithms were used to solve a control problem. As an example, a machine chain consisting of a Bucket Wheel Excavator, a Belt Wagon and a Hopper Car was used. This system has two material transfer points that need to remain aligned during operation in order to allow continuous material flow. To keep the alignment, the controller makes use of seven degrees of freedom given by slewing, luffing and crawler drives. Experimental tests were done in a simulated environment with two state-of-the-art algorithms, namely Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). The trained agents were evaluated in terms of episode return and length, as well as alignment quality and action values used. Results show that, for the given task, the PPO agent performs quantitatively and qualitatively better than the SAC agent. However, none of the agents were able to completely solve the proposed testing task.","PeriodicalId":402414,"journal":{"name":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Application of Reinforcement Learning to a Mining System\",\"authors\":\"Aline Xavier Fidencio, T. Glasmachers, Daniel Naro\",\"doi\":\"10.1109/SAMI50585.2021.9378663\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automation techniques have been widely applied in different industry segments, among others, to increase both productivity and safety. In the mining industry, with the usage of such systems, the operator can be removed from hazardous environments without compromising task execution and it is possible to achieve more efficient and standardized operation. In this work a study case on the application of machine learning algorithms to a mining system example is presented, in which reinforcement learning algorithms were used to solve a control problem. As an example, a machine chain consisting of a Bucket Wheel Excavator, a Belt Wagon and a Hopper Car was used. This system has two material transfer points that need to remain aligned during operation in order to allow continuous material flow. To keep the alignment, the controller makes use of seven degrees of freedom given by slewing, luffing and crawler drives. Experimental tests were done in a simulated environment with two state-of-the-art algorithms, namely Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). The trained agents were evaluated in terms of episode return and length, as well as alignment quality and action values used. Results show that, for the given task, the PPO agent performs quantitatively and qualitatively better than the SAC agent. However, none of the agents were able to completely solve the proposed testing task.\",\"PeriodicalId\":402414,\"journal\":{\"name\":\"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SAMI50585.2021.9378663\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMI50585.2021.9378663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

自动化技术已广泛应用于不同的工业领域,其中包括提高生产率和安全性。在采矿业中,使用这种系统,操作员可以在不影响任务执行的情况下离开危险环境,并且可以实现更有效和标准化的操作。本文提出了一个将机器学习算法应用于采矿系统的研究案例,其中使用强化学习算法来解决控制问题。以斗轮挖掘机、皮带车和斗斗车组成的机械链为例。该系统有两个物料传递点,在操作过程中需要保持对齐,以允许连续的物料流动。为了保持对齐,控制器利用了由回转、变幅和履带驱动给出的七个自由度。实验测试在模拟环境中进行了两种最先进的算法,即近端策略优化(PPO)和软行为者批评(SAC)。对训练过的代理人进行评估,包括事件的复发和长度,以及对齐质量和所使用的行动价值。结果表明,对于给定的任务,PPO代理在数量和质量上都优于SAC代理。然而,没有一个代理能够完全解决提议的测试任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application of Reinforcement Learning to a Mining System
Automation techniques have been widely applied in different industry segments, among others, to increase both productivity and safety. In the mining industry, with the usage of such systems, the operator can be removed from hazardous environments without compromising task execution and it is possible to achieve more efficient and standardized operation. In this work a study case on the application of machine learning algorithms to a mining system example is presented, in which reinforcement learning algorithms were used to solve a control problem. As an example, a machine chain consisting of a Bucket Wheel Excavator, a Belt Wagon and a Hopper Car was used. This system has two material transfer points that need to remain aligned during operation in order to allow continuous material flow. To keep the alignment, the controller makes use of seven degrees of freedom given by slewing, luffing and crawler drives. Experimental tests were done in a simulated environment with two state-of-the-art algorithms, namely Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). The trained agents were evaluated in terms of episode return and length, as well as alignment quality and action values used. Results show that, for the given task, the PPO agent performs quantitatively and qualitatively better than the SAC agent. However, none of the agents were able to completely solve the proposed testing task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信