Towards Generalizable Meta-Deep Reinforcement Learning Algorithm for Multiobjective Traveling Salesman Problems

IEEE transactions on artificial intelligence Pub Date : 2026-04-01 Epub Date: 2025-09-26 DOI:10.1109/TAI.2025.3614210

Xiaoyu Fu;Shenshen Gu;Chee-Meng Chew;Tengfei Li

{"title":"Towards Generalizable Meta-Deep Reinforcement Learning Algorithm for Multiobjective Traveling Salesman Problems","authors":"Xiaoyu Fu;Shenshen Gu;Chee-Meng Chew;Tengfei Li","doi":"10.1109/TAI.2025.3614210","DOIUrl":null,"url":null,"abstract":"The multiobjective traveling salesman problem (MOTSP) is a representative class of multiobjective combinatorial optimization problems, with significant implications for both theoretical research and practical applications. Although deep reinforcement learning (DRL) has shown promise in solving MOTSPs, existing approaches often struggle with generalization to large-scale problem instances. To address this challenge, we propose a novel meta-deep reinforcement learning framework with preference-fused attention networks (MDRL-PFAN). This framework integrates a preference-fused mechanism to jointly encode problem instances and weight preferences into a unified feature space. Moreover, an ensemble meta-learning strategy is adopted to train the meta-model across tasks with varying scales, equipping MDRL-PFAN with robust solving and strong cross-scale generalization capabilities. During inference, a lightweight fine-tuning process on small-batch adaptation tasks is employed to further enhance optimization performance. Extensive experiments on diverse MOTSP instances demonstrate that MDRL-PFAN achieves superior performance compared to classic evolutionary algorithms and state-of-the-art DRL algorithms in terms of training efficiency, solution quality, and cross-scale generalization capability.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 4","pages":"2238-2252"},"PeriodicalIF":0.0000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11181152/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/26 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The multiobjective traveling salesman problem (MOTSP) is a representative class of multiobjective combinatorial optimization problems, with significant implications for both theoretical research and practical applications. Although deep reinforcement learning (DRL) has shown promise in solving MOTSPs, existing approaches often struggle with generalization to large-scale problem instances. To address this challenge, we propose a novel meta-deep reinforcement learning framework with preference-fused attention networks (MDRL-PFAN). This framework integrates a preference-fused mechanism to jointly encode problem instances and weight preferences into a unified feature space. Moreover, an ensemble meta-learning strategy is adopted to train the meta-model across tasks with varying scales, equipping MDRL-PFAN with robust solving and strong cross-scale generalization capabilities. During inference, a lightweight fine-tuning process on small-batch adaptation tasks is employed to further enhance optimization performance. Extensive experiments on diverse MOTSP instances demonstrate that MDRL-PFAN achieves superior performance compared to classic evolutionary algorithms and state-of-the-art DRL algorithms in terms of training efficiency, solution quality, and cross-scale generalization capability.

查看原文本刊更多论文

多目标旅行商问题的可推广元深度强化学习算法

多目标旅行商问题（MOTSP）是一类典型的多目标组合优化问题，具有重要的理论研究和实际应用意义。尽管深度强化学习（DRL）在解决motsp方面显示出了希望，但现有的方法往往难以泛化到大规模的问题实例。为了解决这一挑战，我们提出了一种具有偏好融合注意网络（MDRL-PFAN）的新型元深度强化学习框架。该框架集成了偏好融合机制，将问题实例和权重偏好联合编码到统一的特征空间中。此外，采用集成元学习策略对不同尺度任务的元模型进行训练，使MDRL-PFAN具有鲁棒解算能力和较强的跨尺度泛化能力。在推理过程中，对小批量自适应任务进行轻量级微调，进一步提高优化性能。在不同MOTSP实例上的大量实验表明，与经典进化算法和最先进的DRL算法相比，MDRL-PFAN在训练效率、解质量和跨尺度泛化能力方面取得了卓越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量