Towards Generalizable Meta-Deep Reinforcement Learning Algorithm for Multiobjective Traveling Salesman Problems

Xiaoyu Fu;Shenshen Gu;Chee-Meng Chew;Tengfei Li
{"title":"Towards Generalizable Meta-Deep Reinforcement Learning Algorithm for Multiobjective Traveling Salesman Problems","authors":"Xiaoyu Fu;Shenshen Gu;Chee-Meng Chew;Tengfei Li","doi":"10.1109/TAI.2025.3614210","DOIUrl":null,"url":null,"abstract":"The multiobjective traveling salesman problem (MOTSP) is a representative class of multiobjective combinatorial optimization problems, with significant implications for both theoretical research and practical applications. Although deep reinforcement learning (DRL) has shown promise in solving MOTSPs, existing approaches often struggle with generalization to large-scale problem instances. To address this challenge, we propose a novel meta-deep reinforcement learning framework with preference-fused attention networks (MDRL-PFAN). This framework integrates a preference-fused mechanism to jointly encode problem instances and weight preferences into a unified feature space. Moreover, an ensemble meta-learning strategy is adopted to train the meta-model across tasks with varying scales, equipping MDRL-PFAN with robust solving and strong cross-scale generalization capabilities. During inference, a lightweight fine-tuning process on small-batch adaptation tasks is employed to further enhance optimization performance. Extensive experiments on diverse MOTSP instances demonstrate that MDRL-PFAN achieves superior performance compared to classic evolutionary algorithms and state-of-the-art DRL algorithms in terms of training efficiency, solution quality, and cross-scale generalization capability.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 4","pages":"2238-2252"},"PeriodicalIF":0.0000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11181152/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/26 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The multiobjective traveling salesman problem (MOTSP) is a representative class of multiobjective combinatorial optimization problems, with significant implications for both theoretical research and practical applications. Although deep reinforcement learning (DRL) has shown promise in solving MOTSPs, existing approaches often struggle with generalization to large-scale problem instances. To address this challenge, we propose a novel meta-deep reinforcement learning framework with preference-fused attention networks (MDRL-PFAN). This framework integrates a preference-fused mechanism to jointly encode problem instances and weight preferences into a unified feature space. Moreover, an ensemble meta-learning strategy is adopted to train the meta-model across tasks with varying scales, equipping MDRL-PFAN with robust solving and strong cross-scale generalization capabilities. During inference, a lightweight fine-tuning process on small-batch adaptation tasks is employed to further enhance optimization performance. Extensive experiments on diverse MOTSP instances demonstrate that MDRL-PFAN achieves superior performance compared to classic evolutionary algorithms and state-of-the-art DRL algorithms in terms of training efficiency, solution quality, and cross-scale generalization capability.
多目标旅行商问题的可推广元深度强化学习算法
多目标旅行商问题(MOTSP)是一类典型的多目标组合优化问题,具有重要的理论研究和实际应用意义。尽管深度强化学习(DRL)在解决motsp方面显示出了希望,但现有的方法往往难以泛化到大规模的问题实例。为了解决这一挑战,我们提出了一种具有偏好融合注意网络(MDRL-PFAN)的新型元深度强化学习框架。该框架集成了偏好融合机制,将问题实例和权重偏好联合编码到统一的特征空间中。此外,采用集成元学习策略对不同尺度任务的元模型进行训练,使MDRL-PFAN具有鲁棒解算能力和较强的跨尺度泛化能力。在推理过程中,对小批量自适应任务进行轻量级微调,进一步提高优化性能。在不同MOTSP实例上的大量实验表明,与经典进化算法和最先进的DRL算法相比,MDRL-PFAN在训练效率、解质量和跨尺度泛化能力方面取得了卓越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书