{"title":"基于知识蒸馏的RGB-D镜像分割差分模态多级自适应融合网络","authors":"Wujie Zhou;Han Zhang;Weiwei Qiu","doi":"10.1109/TBDATA.2024.3505057","DOIUrl":null,"url":null,"abstract":"Mirrors play a significant role in our daily lives and are ubiquitous. However, deep learning computer vision models find them challenging owing to the negative impact of reflected information on scene understanding. This study addresses two key challenges faced by multimodal models. First, the cross-modal variability of features at different stages is generally overlooked by contemporary backbone networks. Second, good performance has only been achieved at an unacceptable computational expense, owing to the numerous parameters used. To address the first challenge, we propose a differential-mode multistage adaptive fusion network (differential mode refers to images generated by different sensors that are differentiated to complement each other) that incorporates two-step fusion in the coding stage to account for the degrees of difference among the cross-modal features. In the first stage, wherein considerable differences in modal features exist, multi-angle fusion is performed. In the second stage, wherein the differences are smaller, a hierarchical adaptive fusion strategy is employed. Regarding the second challenge, we introduce a companion training framework for mirror segmentation that combines knowledge distillation and contrastive learning. Our proposed scheme achieves state-of-the-art performance on an available mirror segmentation dataset without requiring numerous parameters.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1959-1969"},"PeriodicalIF":5.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Differential Modal Multistage Adaptive Fusion Networks via Knowledge Distillation for RGB-D Mirror Segmentation\",\"authors\":\"Wujie Zhou;Han Zhang;Weiwei Qiu\",\"doi\":\"10.1109/TBDATA.2024.3505057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mirrors play a significant role in our daily lives and are ubiquitous. However, deep learning computer vision models find them challenging owing to the negative impact of reflected information on scene understanding. This study addresses two key challenges faced by multimodal models. First, the cross-modal variability of features at different stages is generally overlooked by contemporary backbone networks. Second, good performance has only been achieved at an unacceptable computational expense, owing to the numerous parameters used. To address the first challenge, we propose a differential-mode multistage adaptive fusion network (differential mode refers to images generated by different sensors that are differentiated to complement each other) that incorporates two-step fusion in the coding stage to account for the degrees of difference among the cross-modal features. In the first stage, wherein considerable differences in modal features exist, multi-angle fusion is performed. In the second stage, wherein the differences are smaller, a hierarchical adaptive fusion strategy is employed. Regarding the second challenge, we introduce a companion training framework for mirror segmentation that combines knowledge distillation and contrastive learning. Our proposed scheme achieves state-of-the-art performance on an available mirror segmentation dataset without requiring numerous parameters.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 4\",\"pages\":\"1959-1969\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10764777/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10764777/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Differential Modal Multistage Adaptive Fusion Networks via Knowledge Distillation for RGB-D Mirror Segmentation
Mirrors play a significant role in our daily lives and are ubiquitous. However, deep learning computer vision models find them challenging owing to the negative impact of reflected information on scene understanding. This study addresses two key challenges faced by multimodal models. First, the cross-modal variability of features at different stages is generally overlooked by contemporary backbone networks. Second, good performance has only been achieved at an unacceptable computational expense, owing to the numerous parameters used. To address the first challenge, we propose a differential-mode multistage adaptive fusion network (differential mode refers to images generated by different sensors that are differentiated to complement each other) that incorporates two-step fusion in the coding stage to account for the degrees of difference among the cross-modal features. In the first stage, wherein considerable differences in modal features exist, multi-angle fusion is performed. In the second stage, wherein the differences are smaller, a hierarchical adaptive fusion strategy is employed. Regarding the second challenge, we introduce a companion training framework for mirror segmentation that combines knowledge distillation and contrastive learning. Our proposed scheme achieves state-of-the-art performance on an available mirror segmentation dataset without requiring numerous parameters.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.