RGB-D室内场景语义分割的特征隐式映射知识提取

IEEE transactions on artificial intelligence Pub Date : 2024-08-30 DOI:10.1109/TAI.2024.3452052

Wujie Zhou;Yuxiang Xiao;Yuanyuan Liu;Qiuping Jiang

{"title":"RGB-D室内场景语义分割的特征隐式映射知识提取","authors":"Wujie Zhou;Yuxiang Xiao;Yuanyuan Liu;Qiuping Jiang","doi":"10.1109/TAI.2024.3452052","DOIUrl":null,"url":null,"abstract":"Depth images are often used to improve the geometric understanding of scenes owing to their intuitive distance properties. Although there have been significant advancements in semantic segmentation tasks using red–green–blue-depth (RGB-D) images, the complexity of existing methods remains high. Furthermore, the requirement for high-quality depth images increases the model inference time, which limits the practicality of these methods. To address this issue, we propose a feature-implicit mapping knowledge distillation (FIMKD) method and a cross-modal knowledge distillation (KD) architecture to leverage deep modal information for training and reduce the model dependence on this information during inference. The approach comprises two networks: FIMKD-T, a teacher network that uses RGB-D data, and FIMKD-S, a student network that uses only RGB data. FIMKD-T extracts high-frequency information using the depth modality and compensates for the loss of RGB details due to a reduction in resolution during feature extraction by the high-frequency feature enhancement module, thereby enhancing the geometric perception of semantic features. In contrast, the FIMKD-S network does not employ deep learning techniques; instead, it uses a nonlearning approach to extract high-frequency information. To enable the FIMKD-S network to learn deep features, we propose a feature-implicit mapping KD for feature distillation. This mapping technique maps the features in channel and space to a low-dimensional hidden layer, which helps to avoid inefficient single-pattern student learning. We evaluated the proposed FIMKD-S* (FIMKD-S with KD) on the NYUv2 and SUN-RGBD datasets. The results demonstrate that both FIMKD-T and FIMKD-S* achieve state-of-the-art performance. Furthermore, FIMKD-S* provides the best performance balance.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6488-6499"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FIMKD: Feature-Implicit Mapping Knowledge Distillation for RGB-D Indoor Scene Semantic Segmentation\",\"authors\":\"Wujie Zhou;Yuxiang Xiao;Yuanyuan Liu;Qiuping Jiang\",\"doi\":\"10.1109/TAI.2024.3452052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Depth images are often used to improve the geometric understanding of scenes owing to their intuitive distance properties. Although there have been significant advancements in semantic segmentation tasks using red–green–blue-depth (RGB-D) images, the complexity of existing methods remains high. Furthermore, the requirement for high-quality depth images increases the model inference time, which limits the practicality of these methods. To address this issue, we propose a feature-implicit mapping knowledge distillation (FIMKD) method and a cross-modal knowledge distillation (KD) architecture to leverage deep modal information for training and reduce the model dependence on this information during inference. The approach comprises two networks: FIMKD-T, a teacher network that uses RGB-D data, and FIMKD-S, a student network that uses only RGB data. FIMKD-T extracts high-frequency information using the depth modality and compensates for the loss of RGB details due to a reduction in resolution during feature extraction by the high-frequency feature enhancement module, thereby enhancing the geometric perception of semantic features. In contrast, the FIMKD-S network does not employ deep learning techniques; instead, it uses a nonlearning approach to extract high-frequency information. To enable the FIMKD-S network to learn deep features, we propose a feature-implicit mapping KD for feature distillation. This mapping technique maps the features in channel and space to a low-dimensional hidden layer, which helps to avoid inefficient single-pattern student learning. We evaluated the proposed FIMKD-S* (FIMKD-S with KD) on the NYUv2 and SUN-RGBD datasets. The results demonstrate that both FIMKD-T and FIMKD-S* achieve state-of-the-art performance. Furthermore, FIMKD-S* provides the best performance balance.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"5 12\",\"pages\":\"6488-6499\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10659736/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10659736/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度图像由于其直观的距离属性，经常被用来提高对场景的几何理解。尽管使用红-绿-蓝-深（RGB-D）图像的语义分割任务已经取得了重大进展，但现有方法的复杂性仍然很高。此外，对高质量深度图像的要求增加了模型推理时间，限制了这些方法的实用性。为了解决这个问题，我们提出了一种特征隐式映射知识蒸馏（FIMKD）方法和一种跨模态知识蒸馏（KD）架构来利用深度模态信息进行训练，并在推理过程中减少模型对这些信息的依赖。该方法包括两个网络：FIMKD-T，一个使用RGB- d数据的教师网络，和FIMKD-S，一个只使用RGB数据的学生网络。FIMKD-T利用深度模态提取高频信息，补偿高频特征增强模块在特征提取过程中由于分辨率降低而导致的RGB细节损失，从而增强语义特征的几何感知。相比之下，FIMKD-S网络不采用深度学习技术；相反，它使用非学习方法来提取高频信息。为了使FIMKD-S网络能够学习深度特征，我们提出了一种用于特征蒸馏的特征隐式映射KD。这种映射技术将通道和空间中的特征映射到一个低维的隐藏层，有助于避免低效的单模式学生学习。我们在NYUv2和SUN-RGBD数据集上对提出的FIMKD-S* (FIMKD-S with KD)进行了评估。结果表明，FIMKD-T和FIMKD-S*都达到了最先进的性能。此外，FIMKD-S*提供了最佳的性能平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FIMKD: Feature-Implicit Mapping Knowledge Distillation for RGB-D Indoor Scene Semantic Segmentation

Depth images are often used to improve the geometric understanding of scenes owing to their intuitive distance properties. Although there have been significant advancements in semantic segmentation tasks using red–green–blue-depth (RGB-D) images, the complexity of existing methods remains high. Furthermore, the requirement for high-quality depth images increases the model inference time, which limits the practicality of these methods. To address this issue, we propose a feature-implicit mapping knowledge distillation (FIMKD) method and a cross-modal knowledge distillation (KD) architecture to leverage deep modal information for training and reduce the model dependence on this information during inference. The approach comprises two networks: FIMKD-T, a teacher network that uses RGB-D data, and FIMKD-S, a student network that uses only RGB data. FIMKD-T extracts high-frequency information using the depth modality and compensates for the loss of RGB details due to a reduction in resolution during feature extraction by the high-frequency feature enhancement module, thereby enhancing the geometric perception of semantic features. In contrast, the FIMKD-S network does not employ deep learning techniques; instead, it uses a nonlearning approach to extract high-frequency information. To enable the FIMKD-S network to learn deep features, we propose a feature-implicit mapping KD for feature distillation. This mapping technique maps the features in channel and space to a low-dimensional hidden layer, which helps to avoid inefficient single-pattern student learning. We evaluated the proposed FIMKD-S* (FIMKD-S with KD) on the NYUv2 and SUN-RGBD datasets. The results demonstrate that both FIMKD-T and FIMKD-S* achieve state-of-the-art performance. Furthermore, FIMKD-S* provides the best performance balance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量