基于不完全标记演示的弱监督多模态模仿学习。

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-09-10 DOI:10.1016/j.neunet.2025.108098

Sijia Gu, Fei Zhu

{"title":"基于不完全标记演示的弱监督多模态模仿学习。","authors":"Sijia Gu, Fei Zhu","doi":"10.1016/j.neunet.2025.108098","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-modal imitation learning enables the agent to learn demonstrations of multiple modes at the same time. However, as expert demonstrations in practice tend to have incomplete labels for behavior modes, most methods are inefficient. To address this issue, an approach capable of imitation learning from incompletely labeled expert demonstrations, referred to as Weakly Supervised Multi-modal Imitation Learning (WSMIL), is proposed. WSMIL incorporates weakly supervised learning into multi-modal imitation learning by adding a behavior mode classifier to the adversarial network, thus forming adversaries among three players (generator, classifier and discriminator). Both labeled and unlabeled data are fully utilized in this adversarial process where fake state-action-label pairs generated by the generator and the classifier try to deceive the discriminator that tries to identify them and limited labeled expert demonstrations. Additionally, in order to ensure the data distribution of classifier and generator individually to converge to the expert’s real distribution, three extra losses are employed, where simulated annealing behavioral cloning is also added to the generator network to improve the generalization of policy. Experiments show that WSMIL accurately distinguishes modes with incomplete modal labels in demonstrations, learns close to the expert standard for each mode, and is more stable than other multi-modal methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108098"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Weakly supervised multi-modal imitation learning from incompletely labeled demonstrations\",\"authors\":\"Sijia Gu, Fei Zhu\",\"doi\":\"10.1016/j.neunet.2025.108098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-modal imitation learning enables the agent to learn demonstrations of multiple modes at the same time. However, as expert demonstrations in practice tend to have incomplete labels for behavior modes, most methods are inefficient. To address this issue, an approach capable of imitation learning from incompletely labeled expert demonstrations, referred to as Weakly Supervised Multi-modal Imitation Learning (WSMIL), is proposed. WSMIL incorporates weakly supervised learning into multi-modal imitation learning by adding a behavior mode classifier to the adversarial network, thus forming adversaries among three players (generator, classifier and discriminator). Both labeled and unlabeled data are fully utilized in this adversarial process where fake state-action-label pairs generated by the generator and the classifier try to deceive the discriminator that tries to identify them and limited labeled expert demonstrations. Additionally, in order to ensure the data distribution of classifier and generator individually to converge to the expert’s real distribution, three extra losses are employed, where simulated annealing behavioral cloning is also added to the generator network to improve the generalization of policy. Experiments show that WSMIL accurately distinguishes modes with incomplete modal labels in demonstrations, learns close to the expert standard for each mode, and is more stable than other multi-modal methods.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"194 \",\"pages\":\"Article 108098\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025009785\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025009785","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多模态模仿学习使智能体能够同时学习多个模态的演示。然而，由于实践中的专家演示往往对行为模式的标签不完整，大多数方法效率低下。为了解决这个问题，提出了一种能够从不完全标记的专家演示中进行模仿学习的方法，称为弱监督多模态模仿学习（WSMIL）。WSMIL通过在对抗网络中加入行为模式分类器，将弱监督学习融入到多模态模仿学习中，从而在三个参与者（生成器、分类器和鉴别器）之间形成对手。在这个对抗过程中，标记和未标记的数据都被充分利用，其中由生成器和分类器生成的假状态-动作-标签对试图欺骗试图识别它们的鉴别器和有限的标记专家演示。此外，为了保证分类器和生成器各自的数据分布收敛于专家的真实分布，在生成器网络中增加了三个额外的损失，其中在生成器网络中加入了模拟退火行为克隆，以提高策略的泛化性。实验表明，在演示中，WSMIL能够准确识别模态标签不完整的模态，每个模态的学习都接近专家标准，并且比其他多模态方法更加稳定。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Weakly supervised multi-modal imitation learning from incompletely labeled demonstrations

Multi-modal imitation learning enables the agent to learn demonstrations of multiple modes at the same time. However, as expert demonstrations in practice tend to have incomplete labels for behavior modes, most methods are inefficient. To address this issue, an approach capable of imitation learning from incompletely labeled expert demonstrations, referred to as Weakly Supervised Multi-modal Imitation Learning (WSMIL), is proposed. WSMIL incorporates weakly supervised learning into multi-modal imitation learning by adding a behavior mode classifier to the adversarial network, thus forming adversaries among three players (generator, classifier and discriminator). Both labeled and unlabeled data are fully utilized in this adversarial process where fake state-action-label pairs generated by the generator and the classifier try to deceive the discriminator that tries to identify them and limited labeled expert demonstrations. Additionally, in order to ensure the data distribution of classifier and generator individually to converge to the expert’s real distribution, three extra losses are employed, where simulated annealing behavioral cloning is also added to the generator network to improve the generalization of policy. Experiments show that WSMIL accurately distinguishes modes with incomplete modal labels in demonstrations, learns close to the expert standard for each mode, and is more stable than other multi-modal methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.