{"title":"Lightweight and Efficient Multimodal Prompt Injection Network for Scene Parsing of Remote Sensing Scene Images","authors":"Yangzhen Li;Wujie Zhou;Jiajun Meng;Weiqing Yan","doi":"10.1109/TGRS.2024.3507784","DOIUrl":null,"url":null,"abstract":"Scene parsing of high-resolution remote sensing images with complex backgrounds has received extensive attention in recent years. As unimodal networks are significantly affected by weather conditions, reflecting complex ground conditions fully and accurately is difficult; therefore, multimodal scene analysis is particularly important. Current multimodal scene-parsing networks often employ a dual-coding architecture to achieve high-performance segmentation. Because prompt learning allows models to understand and capture contextual information more effectively, the proposed prompt injection module (PIM) extracts relevant information from frozen normalized digital surface model (nDSM) features and integrates it into the infrared, red, and green (IRRG) branches through a modal embedding block. To extract the contextual semantic relationships between the local and global features in the image efficiently, we also design a dynamic filter block for feature enhancement. This design facilitates the mutual complementarity and guidance of information between the two modalities and optimizes fusion. The experimental results demonstrate that lightweight and effective multimodal prompt injection network (LENet) outperforms most current state-of-the-art lightweight methods on two public datasets, achieving comparable accuracy to that of traditional methods. It has only 10.81 M parameters, with 2.72 GFLOPS. Our code and results are available at \n<uri>https://github.com/LYZ00918/LENet</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-9"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10770250/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Scene parsing of high-resolution remote sensing images with complex backgrounds has received extensive attention in recent years. As unimodal networks are significantly affected by weather conditions, reflecting complex ground conditions fully and accurately is difficult; therefore, multimodal scene analysis is particularly important. Current multimodal scene-parsing networks often employ a dual-coding architecture to achieve high-performance segmentation. Because prompt learning allows models to understand and capture contextual information more effectively, the proposed prompt injection module (PIM) extracts relevant information from frozen normalized digital surface model (nDSM) features and integrates it into the infrared, red, and green (IRRG) branches through a modal embedding block. To extract the contextual semantic relationships between the local and global features in the image efficiently, we also design a dynamic filter block for feature enhancement. This design facilitates the mutual complementarity and guidance of information between the two modalities and optimizes fusion. The experimental results demonstrate that lightweight and effective multimodal prompt injection network (LENet) outperforms most current state-of-the-art lightweight methods on two public datasets, achieving comparable accuracy to that of traditional methods. It has only 10.81 M parameters, with 2.72 GFLOPS. Our code and results are available at
https://github.com/LYZ00918/LENet
.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.