基于知识蒸馏的RGB-D镜像分割差分模态多级自适应融合网络

IF 5.7 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2024-11-22 DOI:10.1109/TBDATA.2024.3505057

Wujie Zhou;Han Zhang;Weiwei Qiu

{"title":"基于知识蒸馏的RGB-D镜像分割差分模态多级自适应融合网络","authors":"Wujie Zhou;Han Zhang;Weiwei Qiu","doi":"10.1109/TBDATA.2024.3505057","DOIUrl":null,"url":null,"abstract":"Mirrors play a significant role in our daily lives and are ubiquitous. However, deep learning computer vision models find them challenging owing to the negative impact of reflected information on scene understanding. This study addresses two key challenges faced by multimodal models. First, the cross-modal variability of features at different stages is generally overlooked by contemporary backbone networks. Second, good performance has only been achieved at an unacceptable computational expense, owing to the numerous parameters used. To address the first challenge, we propose a differential-mode multistage adaptive fusion network (differential mode refers to images generated by different sensors that are differentiated to complement each other) that incorporates two-step fusion in the coding stage to account for the degrees of difference among the cross-modal features. In the first stage, wherein considerable differences in modal features exist, multi-angle fusion is performed. In the second stage, wherein the differences are smaller, a hierarchical adaptive fusion strategy is employed. Regarding the second challenge, we introduce a companion training framework for mirror segmentation that combines knowledge distillation and contrastive learning. Our proposed scheme achieves state-of-the-art performance on an available mirror segmentation dataset without requiring numerous parameters.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1959-1969"},"PeriodicalIF":5.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Differential Modal Multistage Adaptive Fusion Networks via Knowledge Distillation for RGB-D Mirror Segmentation\",\"authors\":\"Wujie Zhou;Han Zhang;Weiwei Qiu\",\"doi\":\"10.1109/TBDATA.2024.3505057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mirrors play a significant role in our daily lives and are ubiquitous. However, deep learning computer vision models find them challenging owing to the negative impact of reflected information on scene understanding. This study addresses two key challenges faced by multimodal models. First, the cross-modal variability of features at different stages is generally overlooked by contemporary backbone networks. Second, good performance has only been achieved at an unacceptable computational expense, owing to the numerous parameters used. To address the first challenge, we propose a differential-mode multistage adaptive fusion network (differential mode refers to images generated by different sensors that are differentiated to complement each other) that incorporates two-step fusion in the coding stage to account for the degrees of difference among the cross-modal features. In the first stage, wherein considerable differences in modal features exist, multi-angle fusion is performed. In the second stage, wherein the differences are smaller, a hierarchical adaptive fusion strategy is employed. Regarding the second challenge, we introduce a companion training framework for mirror segmentation that combines knowledge distillation and contrastive learning. Our proposed scheme achieves state-of-the-art performance on an available mirror segmentation dataset without requiring numerous parameters.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 4\",\"pages\":\"1959-1969\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10764777/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10764777/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

镜子在我们的日常生活中扮演着重要的角色，无处不在。然而，由于反射信息对场景理解的负面影响，深度学习计算机视觉模型发现它们具有挑战性。本研究解决了多式联运模式面临的两个关键挑战。首先，当代骨干网通常忽略了不同阶段特征的跨模态变异性。其次，由于使用了大量的参数，良好的性能只能在不可接受的计算代价下实现。为了解决第一个挑战，我们提出了一个差分模式多阶段自适应融合网络（差分模式是指由不同传感器产生的图像，这些图像被区分以相互补充），该网络在编码阶段包含两步融合，以考虑跨模态特征之间的差异程度。在第一阶段，模态特征存在较大差异，进行多角度融合。在差异较小的第二阶段，采用分层自适应融合策略。对于第二个挑战，我们引入了一个结合了知识蒸馏和对比学习的镜像分割伴侣训练框架。我们提出的方案在一个可用的镜像分割数据集上实现了最先进的性能，而不需要大量的参数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Differential Modal Multistage Adaptive Fusion Networks via Knowledge Distillation for RGB-D Mirror Segmentation

Mirrors play a significant role in our daily lives and are ubiquitous. However, deep learning computer vision models find them challenging owing to the negative impact of reflected information on scene understanding. This study addresses two key challenges faced by multimodal models. First, the cross-modal variability of features at different stages is generally overlooked by contemporary backbone networks. Second, good performance has only been achieved at an unacceptable computational expense, owing to the numerous parameters used. To address the first challenge, we propose a differential-mode multistage adaptive fusion network (differential mode refers to images generated by different sensors that are differentiated to complement each other) that incorporates two-step fusion in the coding stage to account for the degrees of difference among the cross-modal features. In the first stage, wherein considerable differences in modal features exist, multi-angle fusion is performed. In the second stage, wherein the differences are smaller, a hierarchical adaptive fusion strategy is employed. Regarding the second challenge, we introduce a companion training framework for mirror segmentation that combines knowledge distillation and contrastive learning. Our proposed scheme achieves state-of-the-art performance on an available mirror segmentation dataset without requiring numerous parameters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.