Shen Liu, Qi Liu, Shiyan Lu, Jing Zhang, Tiecheng Song
{"title":"基于视觉语言模型成对图像描述的旋转鲁棒遥感图像分类方法","authors":"Shen Liu, Qi Liu, Shiyan Lu, Jing Zhang, Tiecheng Song","doi":"10.1049/ell2.70407","DOIUrl":null,"url":null,"abstract":"<p>Visual-language models, for example, the contrastive language-image pretraining (CLIP), have shown promising results for image classification. However, existing CLIP-based methods fail to describe the fine-grained rotational relationships of paired images and cannot effectively align rotation-associated features in image-text space, limiting their performance in rotation-robust image classification. To address this challenge, we propose a rotation-robust CLIP (RoRoCLIP) model for remote sensing image classification. RoRoCLIP contains two key components, a dual image feature extraction (DIFE) module and a rotation awareness (RoA) module. The DIFE module extracts features from both the original and rotated images via the pretrained encoder of CLIP, and performs image-level feature interactions via a learnable transformer layer. The RoA module associates the textual prompt ‘differences caused by rotation’ with differential visual features extracted by DIFE, and aligns rotation-associated features in image-text space. Based on these two modules, we construct a classification loss and an RoA loss to optimise the model, enabling RoRoCLIP to perceive rotational variations and learn discriminative features for image classification. Experimental results on the NWPU-VHR-10 and RSOD remote sensing datasets demonstrate that the proposed model enhances the CLIP's robustness against image rotation, and outperforms the state-of-the-art approaches in classification accuracy.</p>","PeriodicalId":11556,"journal":{"name":"Electronics Letters","volume":"61 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ell2.70407","citationCount":"0","resultStr":"{\"title\":\"Rotation-Robust Remote Sensing Image Classification Method Based on Paired Image Description of Vision-Language Model\",\"authors\":\"Shen Liu, Qi Liu, Shiyan Lu, Jing Zhang, Tiecheng Song\",\"doi\":\"10.1049/ell2.70407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Visual-language models, for example, the contrastive language-image pretraining (CLIP), have shown promising results for image classification. However, existing CLIP-based methods fail to describe the fine-grained rotational relationships of paired images and cannot effectively align rotation-associated features in image-text space, limiting their performance in rotation-robust image classification. To address this challenge, we propose a rotation-robust CLIP (RoRoCLIP) model for remote sensing image classification. RoRoCLIP contains two key components, a dual image feature extraction (DIFE) module and a rotation awareness (RoA) module. The DIFE module extracts features from both the original and rotated images via the pretrained encoder of CLIP, and performs image-level feature interactions via a learnable transformer layer. The RoA module associates the textual prompt ‘differences caused by rotation’ with differential visual features extracted by DIFE, and aligns rotation-associated features in image-text space. Based on these two modules, we construct a classification loss and an RoA loss to optimise the model, enabling RoRoCLIP to perceive rotational variations and learn discriminative features for image classification. Experimental results on the NWPU-VHR-10 and RSOD remote sensing datasets demonstrate that the proposed model enhances the CLIP's robustness against image rotation, and outperforms the state-of-the-art approaches in classification accuracy.</p>\",\"PeriodicalId\":11556,\"journal\":{\"name\":\"Electronics Letters\",\"volume\":\"61 1\",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ell2.70407\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronics Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ell2.70407\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronics Letters","FirstCategoryId":"5","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ell2.70407","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Rotation-Robust Remote Sensing Image Classification Method Based on Paired Image Description of Vision-Language Model
Visual-language models, for example, the contrastive language-image pretraining (CLIP), have shown promising results for image classification. However, existing CLIP-based methods fail to describe the fine-grained rotational relationships of paired images and cannot effectively align rotation-associated features in image-text space, limiting their performance in rotation-robust image classification. To address this challenge, we propose a rotation-robust CLIP (RoRoCLIP) model for remote sensing image classification. RoRoCLIP contains two key components, a dual image feature extraction (DIFE) module and a rotation awareness (RoA) module. The DIFE module extracts features from both the original and rotated images via the pretrained encoder of CLIP, and performs image-level feature interactions via a learnable transformer layer. The RoA module associates the textual prompt ‘differences caused by rotation’ with differential visual features extracted by DIFE, and aligns rotation-associated features in image-text space. Based on these two modules, we construct a classification loss and an RoA loss to optimise the model, enabling RoRoCLIP to perceive rotational variations and learn discriminative features for image classification. Experimental results on the NWPU-VHR-10 and RSOD remote sensing datasets demonstrate that the proposed model enhances the CLIP's robustness against image rotation, and outperforms the state-of-the-art approaches in classification accuracy.
期刊介绍:
Electronics Letters is an internationally renowned peer-reviewed rapid-communication journal that publishes short original research papers every two weeks. Its broad and interdisciplinary scope covers the latest developments in all electronic engineering related fields including communication, biomedical, optical and device technologies. Electronics Letters also provides further insight into some of the latest developments through special features and interviews.
Scope
As a journal at the forefront of its field, Electronics Letters publishes papers covering all themes of electronic and electrical engineering. The major themes of the journal are listed below.
Antennas and Propagation
Biomedical and Bioinspired Technologies, Signal Processing and Applications
Control Engineering
Electromagnetism: Theory, Materials and Devices
Electronic Circuits and Systems
Image, Video and Vision Processing and Applications
Information, Computing and Communications
Instrumentation and Measurement
Microwave Technology
Optical Communications
Photonics and Opto-Electronics
Power Electronics, Energy and Sustainability
Radar, Sonar and Navigation
Semiconductor Technology
Signal Processing
MIMO