Contrastive-Learning-Based Decision Making for Dynamic Time-Linkage Optimization

IF 8.7 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Systems Man Cybernetics-Systems Pub Date : 2025-09-24 DOI:10.1109/TSMC.2025.3611797

Xiao-Fang Liu;Meng Gao;Yongchun Fang;Zhi-Hui Zhan;Jun Zhang

{"title":"Contrastive-Learning-Based Decision Making for Dynamic Time-Linkage Optimization","authors":"Xiao-Fang Liu;Meng Gao;Yongchun Fang;Zhi-Hui Zhan;Jun Zhang","doi":"10.1109/TSMC.2025.3611797","DOIUrl":null,"url":null,"abstract":"In dynamic time-linkage optimization, current decisions influence the future state of environments. To make good decisions that have a positive impact on future states, existing methods usually build a model to predict the future rewards of solutions for decision making. However, these prediction models present low accuracy since decision data are not enough to train such a complex model. To address this issue, this article proposes a contrastive-learning-based decision making (CLDM) method, which builds a contrastive model to learn the relationship between solutions but not absolute rewards and adopts a quick decision strategy to select solutions. In CLDM, a clustering-based time-linkage detection (CD) strategy is developed to measure the intensity of the time linkage, which determines whether to make decisions based on future rewards. To represent the relative relationship between solutions, a large number of contrastive samples are constructed using the limited historical decisions. A contrastive model is trained for solution comparison in terms of the combination of current fitness and future rewards. Candidate solutions are clustered into multiple groups to filter poor ones, and a few solutions are preserved to rank using the contrastive model. The winner is taken as the decision solution. Integrating CLDM into particle swarm optimization (PSO), a new algorithm named contrastive-learning-based PSO (CL-PSO) is put forward. Experimental results on multiple dynamic time-linkage optimization instances demonstrate that CL-PSO outperforms state-of-the-art algorithms in terms of solution quality. CL-PSO can also well solve the mobile robot path planning problem.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 11","pages":"8661-8674"},"PeriodicalIF":8.7000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11176976/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In dynamic time-linkage optimization, current decisions influence the future state of environments. To make good decisions that have a positive impact on future states, existing methods usually build a model to predict the future rewards of solutions for decision making. However, these prediction models present low accuracy since decision data are not enough to train such a complex model. To address this issue, this article proposes a contrastive-learning-based decision making (CLDM) method, which builds a contrastive model to learn the relationship between solutions but not absolute rewards and adopts a quick decision strategy to select solutions. In CLDM, a clustering-based time-linkage detection (CD) strategy is developed to measure the intensity of the time linkage, which determines whether to make decisions based on future rewards. To represent the relative relationship between solutions, a large number of contrastive samples are constructed using the limited historical decisions. A contrastive model is trained for solution comparison in terms of the combination of current fitness and future rewards. Candidate solutions are clustered into multiple groups to filter poor ones, and a few solutions are preserved to rank using the contrastive model. The winner is taken as the decision solution. Integrating CLDM into particle swarm optimization (PSO), a new algorithm named contrastive-learning-based PSO (CL-PSO) is put forward. Experimental results on multiple dynamic time-linkage optimization instances demonstrate that CL-PSO outperforms state-of-the-art algorithms in terms of solution quality. CL-PSO can also well solve the mobile robot path planning problem.

查看原文本刊更多论文

基于对比学习的动态时间链优化决策

在动态时间链优化中，当前决策影响环境的未来状态。为了做出对未来状态有积极影响的好决策，现有的方法通常会建立一个模型来预测决策解决方案的未来回报。然而，由于决策数据不足以训练如此复杂的模型，这些预测模型呈现出较低的准确性。针对这一问题，本文提出了一种基于对比学习的决策方法（CLDM），该方法通过建立对比模型来学习解决方案之间的关系，而不是绝对奖励，并采用快速决策策略来选择解决方案。在CLDM中，开发了一种基于聚类的时间链接检测策略（CD）来测量时间链接的强度，从而决定是否根据未来奖励做出决策。为了表示解之间的相对关系，使用有限的历史决策构造了大量的对比样本。根据当前适应度和未来奖励的组合，训练了一个对比模型来进行解决方案的比较。候选解决方案聚类成多组以过滤差的解决方案，并保留一些解决方案使用对比模型进行排名。取优胜者作为决策解。将CLDM算法与粒子群优化算法相结合，提出了一种基于对比学习的粒子群优化算法。在多个动态时间链优化实例上的实验结果表明，CL-PSO在求解质量上优于现有算法。CL-PSO还能很好地解决移动机器人路径规划问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Systems Man Cybernetics-Systems AUTOMATION & CONTROL SYSTEMS-COMPUTER SCIENCE, CYBERNETICS

CiteScore

18.50

自引率

11.50%

发文量

812

审稿时长

6 months

期刊介绍： The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.