{"title":"基于压缩和快速调整的遥感图像多任务感知研究","authors":"Yongqiang Wang;Feng Liang;Hang Chen;Haisheng Fu;Jiro Katto","doi":"10.1109/LGRS.2025.3589030","DOIUrl":null,"url":null,"abstract":"Recently, advancements in satellite technology have greatly increased the availability of high-resolution remote sensing images. Concurrently, learning-based image compression (LIC) has significantly improved the efficiency of transmitting and storing such images. As machine recognition tasks increasingly depend on transmitting visual data across devices, compressed images play a key role in both human and machine perception during downstream tasks. However, most LIC approaches are not optimized for machine recognition tasks. To address this limitation, we propose a remote sensing image compression network called RSIC, which integrates multitask perception and supports downstream tasks such as object detection. Specifically, we introduce a wavelet-based frequency-spatial block (WFSB) that separates frequency components and processes them using transformer and convolutional neural network (CNN) blocks to effectively capture frequency-specific features. Within WFSB, the prompting Swin-Transformer block (PSTB) extracts spatial information while enabling prompt tuning. In addition, after primary codec training, instance and task prompts are applied during the encoding and decoding stages, respectively, facilitating machine perception without full fine-tuning. Extensive experimental results show that our model achieves better rate–distortion (R–D) performance for image compression on the aerial image dataset (AID) test dataset, surpassing the traditional versatile video coding (VVC) codec and several recent LIC methods. Furthermore, our method demonstrates superior performance in terms of rate–accuracy for machine perception on the Northwestern Polytechnical University Very-High-Resolution 10-Class Dataset (NWPU VHR-10) and High-Resolution SAR Images Dataset (HRSID) remote sensing datasets.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward Multitask Perception for Remote Sensing Imagery via Compression and Prompt Tuning\",\"authors\":\"Yongqiang Wang;Feng Liang;Hang Chen;Haisheng Fu;Jiro Katto\",\"doi\":\"10.1109/LGRS.2025.3589030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, advancements in satellite technology have greatly increased the availability of high-resolution remote sensing images. Concurrently, learning-based image compression (LIC) has significantly improved the efficiency of transmitting and storing such images. As machine recognition tasks increasingly depend on transmitting visual data across devices, compressed images play a key role in both human and machine perception during downstream tasks. However, most LIC approaches are not optimized for machine recognition tasks. To address this limitation, we propose a remote sensing image compression network called RSIC, which integrates multitask perception and supports downstream tasks such as object detection. Specifically, we introduce a wavelet-based frequency-spatial block (WFSB) that separates frequency components and processes them using transformer and convolutional neural network (CNN) blocks to effectively capture frequency-specific features. Within WFSB, the prompting Swin-Transformer block (PSTB) extracts spatial information while enabling prompt tuning. In addition, after primary codec training, instance and task prompts are applied during the encoding and decoding stages, respectively, facilitating machine perception without full fine-tuning. Extensive experimental results show that our model achieves better rate–distortion (R–D) performance for image compression on the aerial image dataset (AID) test dataset, surpassing the traditional versatile video coding (VVC) codec and several recent LIC methods. Furthermore, our method demonstrates superior performance in terms of rate–accuracy for machine perception on the Northwestern Polytechnical University Very-High-Resolution 10-Class Dataset (NWPU VHR-10) and High-Resolution SAR Images Dataset (HRSID) remote sensing datasets.\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11080011/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11080011/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Toward Multitask Perception for Remote Sensing Imagery via Compression and Prompt Tuning
Recently, advancements in satellite technology have greatly increased the availability of high-resolution remote sensing images. Concurrently, learning-based image compression (LIC) has significantly improved the efficiency of transmitting and storing such images. As machine recognition tasks increasingly depend on transmitting visual data across devices, compressed images play a key role in both human and machine perception during downstream tasks. However, most LIC approaches are not optimized for machine recognition tasks. To address this limitation, we propose a remote sensing image compression network called RSIC, which integrates multitask perception and supports downstream tasks such as object detection. Specifically, we introduce a wavelet-based frequency-spatial block (WFSB) that separates frequency components and processes them using transformer and convolutional neural network (CNN) blocks to effectively capture frequency-specific features. Within WFSB, the prompting Swin-Transformer block (PSTB) extracts spatial information while enabling prompt tuning. In addition, after primary codec training, instance and task prompts are applied during the encoding and decoding stages, respectively, facilitating machine perception without full fine-tuning. Extensive experimental results show that our model achieves better rate–distortion (R–D) performance for image compression on the aerial image dataset (AID) test dataset, surpassing the traditional versatile video coding (VVC) codec and several recent LIC methods. Furthermore, our method demonstrates superior performance in terms of rate–accuracy for machine perception on the Northwestern Polytechnical University Very-High-Resolution 10-Class Dataset (NWPU VHR-10) and High-Resolution SAR Images Dataset (HRSID) remote sensing datasets.