MGSGNet-S*：基于知识蒸馏的多层引导语义图网络rgb -热城市场景分析

IF 14.3 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Intelligent Vehicles Pub Date : 2024-09-09 DOI:10.1109/TIV.2024.3456437

Wujie Zhou;Hongping Wu;Qiuping Jiang

{"title":"MGSGNet-S*：基于知识蒸馏的多层引导语义图网络rgb -热城市场景分析","authors":"Wujie Zhou;Hongping Wu;Qiuping Jiang","doi":"10.1109/TIV.2024.3456437","DOIUrl":null,"url":null,"abstract":"Owing to rapid developments in driverless technologies, vision tasks for unmanned vehicles have gained considerable attention, particularly in multimodal-based urban scene parsing. Although deep-learning algorithms have outperformed traditional models in such tasks, they cannot operate on mobile devices and edge networks owing to the coarse-grained cross-modal complementary information alignment, inadequate modeling of semantic-category relations, overabundance of parameters, and high computational complexity. To address these issues, a multilayer guided semantic graph network via knowledge distillation (MGSGNet-S*) is proposed for red-green-blue-thermal urban scene parsing. First, a new cross-modal adaptive fusion module adjusts pixel-level adaptive modal complementary information by incorporating additional deep modal information and residual cross-modal matrix fine-grained attention. Second, a novel semantic graph module overcomes the misclassification problems of objects of the same semantic class during low-level encoding by incorporating high-level information in the Euclidean space and modeling semantic graph relationships in the non-Euclidean space. Finally, to strike the balance between accuracy and efficiency, a tailored framework optimally utilizes effective knowledge of pixel intra- and inter-class similarity, fusion features, and cross-modal correlation. Experimental results indicate that MGSGNet-S* considerably outperforms relevant state-of-the-art methods with fewer parameters and lower computational costs. The numbers of parameters and floating-point operations were reduced by 95.69% and 93.34%, respectively, relative to those for the teacher model, thus demonstrating stronger inferencing capabilities at 28.65 frames per second.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 5","pages":"3543-3559"},"PeriodicalIF":14.3000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MGSGNet-S*: Multilayer Guided Semantic Graph Network via Knowledge Distillation for RGB-Thermal Urban Scene Parsing\",\"authors\":\"Wujie Zhou;Hongping Wu;Qiuping Jiang\",\"doi\":\"10.1109/TIV.2024.3456437\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Owing to rapid developments in driverless technologies, vision tasks for unmanned vehicles have gained considerable attention, particularly in multimodal-based urban scene parsing. Although deep-learning algorithms have outperformed traditional models in such tasks, they cannot operate on mobile devices and edge networks owing to the coarse-grained cross-modal complementary information alignment, inadequate modeling of semantic-category relations, overabundance of parameters, and high computational complexity. To address these issues, a multilayer guided semantic graph network via knowledge distillation (MGSGNet-S*) is proposed for red-green-blue-thermal urban scene parsing. First, a new cross-modal adaptive fusion module adjusts pixel-level adaptive modal complementary information by incorporating additional deep modal information and residual cross-modal matrix fine-grained attention. Second, a novel semantic graph module overcomes the misclassification problems of objects of the same semantic class during low-level encoding by incorporating high-level information in the Euclidean space and modeling semantic graph relationships in the non-Euclidean space. Finally, to strike the balance between accuracy and efficiency, a tailored framework optimally utilizes effective knowledge of pixel intra- and inter-class similarity, fusion features, and cross-modal correlation. Experimental results indicate that MGSGNet-S* considerably outperforms relevant state-of-the-art methods with fewer parameters and lower computational costs. The numbers of parameters and floating-point operations were reduced by 95.69% and 93.34%, respectively, relative to those for the teacher model, thus demonstrating stronger inferencing capabilities at 28.65 frames per second.\",\"PeriodicalId\":36532,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Vehicles\",\"volume\":\"10 5\",\"pages\":\"3543-3559\"},\"PeriodicalIF\":14.3000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Vehicles\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10669814/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Vehicles","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10669814/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着无人驾驶技术的快速发展，无人驾驶车辆的视觉任务得到了越来越多的关注，特别是在基于多模式的城市场景分析中。虽然深度学习算法在这些任务中表现优于传统模型，但由于粗粒度的跨模态互补信息对齐，语义-类别关系建模不足，参数过多，计算复杂度高，因此无法在移动设备和边缘网络上运行。为了解决这些问题，提出了一种基于知识蒸馏的多层引导语义图网络（MGSGNet-S*）用于红绿蓝热城市场景解析。首先，一个新的跨模态自适应融合模块通过融合额外的深度模态信息和残留的跨模态矩阵细粒度注意来调整像素级自适应模态互补信息。其次，提出了一种新的语义图模块，通过在欧几里得空间中融合高级信息，在非欧几里得空间中对语义图关系进行建模，克服了同一语义类对象在低级编码过程中的误分类问题。最后，为了在精度和效率之间取得平衡，量身定制的框架最佳地利用了像素类内和类间相似性、融合特征和跨模态相关性的有效知识。实验结果表明，MGSGNet-S*以更少的参数和更低的计算成本大大优于相关的最先进的方法。与教师模型相比，参数和浮点运算的数量分别减少了95.69%和93.34%，从而以28.65帧/秒的速度展示了更强的推理能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MGSGNet-S*: Multilayer Guided Semantic Graph Network via Knowledge Distillation for RGB-Thermal Urban Scene Parsing

Owing to rapid developments in driverless technologies, vision tasks for unmanned vehicles have gained considerable attention, particularly in multimodal-based urban scene parsing. Although deep-learning algorithms have outperformed traditional models in such tasks, they cannot operate on mobile devices and edge networks owing to the coarse-grained cross-modal complementary information alignment, inadequate modeling of semantic-category relations, overabundance of parameters, and high computational complexity. To address these issues, a multilayer guided semantic graph network via knowledge distillation (MGSGNet-S^*) is proposed for red-green-blue-thermal urban scene parsing. First, a new cross-modal adaptive fusion module adjusts pixel-level adaptive modal complementary information by incorporating additional deep modal information and residual cross-modal matrix fine-grained attention. Second, a novel semantic graph module overcomes the misclassification problems of objects of the same semantic class during low-level encoding by incorporating high-level information in the Euclidean space and modeling semantic graph relationships in the non-Euclidean space. Finally, to strike the balance between accuracy and efficiency, a tailored framework optimally utilizes effective knowledge of pixel intra- and inter-class similarity, fusion features, and cross-modal correlation. Experimental results indicate that MGSGNet-S^* considerably outperforms relevant state-of-the-art methods with fewer parameters and lower computational costs. The numbers of parameters and floating-point operations were reduced by 95.69% and 93.34%, respectively, relative to those for the teacher model, thus demonstrating stronger inferencing capabilities at 28.65 frames per second.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Vehicles Mathematics-Control and Optimization

CiteScore

12.10

自引率

13.40%

发文量

177

期刊介绍： The IEEE Transactions on Intelligent Vehicles (T-IV) is a premier platform for publishing peer-reviewed articles that present innovative research concepts, application results, significant theoretical findings, and application case studies in the field of intelligent vehicles. With a particular emphasis on automated vehicles within roadway environments, T-IV aims to raise awareness of pressing research and application challenges. Our focus is on providing critical information to the intelligent vehicle community, serving as a dissemination vehicle for IEEE ITS Society members and others interested in learning about the state-of-the-art developments and progress in research and applications related to intelligent vehicles. Join us in advancing knowledge and innovation in this dynamic field.