高光谱解混的双模态多尺度特征交叉融合

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-02-15 DOI:10.1016/j.imavis.2025.105445

Senlong Qin, Yuqi Hao, Minghui Chu, Xiaodong Yu

{"title":"高光谱解混的双模态多尺度特征交叉融合","authors":"Senlong Qin, Yuqi Hao, Minghui Chu, Xiaodong Yu","doi":"10.1016/j.imavis.2025.105445","DOIUrl":null,"url":null,"abstract":"<div><div>Hyperspectral images (HSI) possess rich spectral characteristics but suffer from low spatial resolution, which has led many methods to focus on extracting more spatial information from HSI. However, the spatial information that can be extracted from a single HSI is limited, making it difficult to distinguish objects with similar materials. To address this issue, we propose a multimodal unmixing network called MSFF-Net. This network enhances unmixing performance by integrating the spatial information from light detection and ranging (LiDAR) data into the unmixing process. To ensure a more comprehensive fusion of features from the two modalities, we introduce a multi-scale cross-fusion method, providing a new approach to multimodal data fusion. Additionally, the network employs attention mechanisms to enhance channel-wise and spatial features, boosting the model's representational capacity. Our proposed model effectively consolidates multimodal information, significantly improving its unmixing capability, especially in complex environments, leading to more accurate unmixing results and facilitating further analysis of HSI. We evaluate our method using two real-world datasets. Experimental results demonstrate that our proposed approach outperforms other state-of-the-art methods in terms of both stability and effectiveness.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"155 ","pages":"Article 105445"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two-modal multiscale feature cross fusion for hyperspectral unmixing\",\"authors\":\"Senlong Qin, Yuqi Hao, Minghui Chu, Xiaodong Yu\",\"doi\":\"10.1016/j.imavis.2025.105445\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hyperspectral images (HSI) possess rich spectral characteristics but suffer from low spatial resolution, which has led many methods to focus on extracting more spatial information from HSI. However, the spatial information that can be extracted from a single HSI is limited, making it difficult to distinguish objects with similar materials. To address this issue, we propose a multimodal unmixing network called MSFF-Net. This network enhances unmixing performance by integrating the spatial information from light detection and ranging (LiDAR) data into the unmixing process. To ensure a more comprehensive fusion of features from the two modalities, we introduce a multi-scale cross-fusion method, providing a new approach to multimodal data fusion. Additionally, the network employs attention mechanisms to enhance channel-wise and spatial features, boosting the model's representational capacity. Our proposed model effectively consolidates multimodal information, significantly improving its unmixing capability, especially in complex environments, leading to more accurate unmixing results and facilitating further analysis of HSI. We evaluate our method using two real-world datasets. Experimental results demonstrate that our proposed approach outperforms other state-of-the-art methods in terms of both stability and effectiveness.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"155 \",\"pages\":\"Article 105445\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625000332\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000332","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

高光谱图像具有丰富的光谱特征，但空间分辨率较低，这使得许多方法都致力于从高光谱图像中提取更多的空间信息。然而，从单个HSI中提取的空间信息是有限的，这使得很难区分具有相似材料的物体。为了解决这个问题，我们提出了一个称为MSFF-Net的多模态解混网络。该网络通过将来自光探测和测距（LiDAR）数据的空间信息集成到解混过程中，提高了解混性能。为了确保两种模式的特征融合更加全面，我们引入了一种多尺度交叉融合方法，为多模式数据融合提供了一种新的途径。此外，该网络采用注意机制来增强通道和空间特征，从而提高模型的表征能力。我们提出的模型有效地整合了多模态信息，显著提高了其解混能力，特别是在复杂环境下，从而获得更准确的解混结果，便于进一步分析HSI。我们使用两个真实世界的数据集来评估我们的方法。实验结果表明，我们提出的方法在稳定性和有效性方面优于其他最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Two-modal multiscale feature cross fusion for hyperspectral unmixing

Hyperspectral images (HSI) possess rich spectral characteristics but suffer from low spatial resolution, which has led many methods to focus on extracting more spatial information from HSI. However, the spatial information that can be extracted from a single HSI is limited, making it difficult to distinguish objects with similar materials. To address this issue, we propose a multimodal unmixing network called MSFF-Net. This network enhances unmixing performance by integrating the spatial information from light detection and ranging (LiDAR) data into the unmixing process. To ensure a more comprehensive fusion of features from the two modalities, we introduce a multi-scale cross-fusion method, providing a new approach to multimodal data fusion. Additionally, the network employs attention mechanisms to enhance channel-wise and spatial features, boosting the model's representational capacity. Our proposed model effectively consolidates multimodal information, significantly improving its unmixing capability, especially in complex environments, leading to more accurate unmixing results and facilitating further analysis of HSI. We evaluate our method using two real-world datasets. Experimental results demonstrate that our proposed approach outperforms other state-of-the-art methods in terms of both stability and effectiveness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.