Yogendra Rao Musunuri;Changwon Kim;Oh-Seol Kwon;Sun-Yuan Kung
{"title":"基于模块剪枝的遥感图像标题神经结构搜索","authors":"Yogendra Rao Musunuri;Changwon Kim;Oh-Seol Kwon;Sun-Yuan Kung","doi":"10.1109/LGRS.2025.3593475","DOIUrl":null,"url":null,"abstract":"Remote sensing image captioning (RSIC) has garnered significant attention for enhancing the interpretability of aerial imagery through textual descriptions. Conventional approaches employ convolutional neural networks (CNNs) for visual feature extraction paired with recurrent neural networks (RNNs) or transformers for caption generation. However, these architectures suffer from high complexity and computational costs. While neural architecture search (NAS) via network pruning has been extensively studied, module-based pruning for RSIC systems remains largely unexplored. We propose a novel dedicated decoder pruning methodology for sequential caption generators—a module-based pruning method for end-to-end encoder–decoder architectural adaptation. It features two key innovations: 1) structured pruning of a pre-trained ResNet encoder and transformer encoder–decoder components and 2) a cross-entropy-based caption matching strategy replacing conventional prediction training in the decoder’s final layer. The proposed method enables simultaneously enhancing inference efficiency and reducing storage requirements without compromising performance. As evaluated on the RSICD dataset using CIDEr, ROUGE, METEOR, bilingual evaluation understudy (BLEU), and Sm metrics, our method achieves 42.8% model size reduction while improving accuracy, establishing new benchmarks in efficient RSIC.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Module-Pruning-Based Neural Architectural Search for Remote Sensing Image Captioning\",\"authors\":\"Yogendra Rao Musunuri;Changwon Kim;Oh-Seol Kwon;Sun-Yuan Kung\",\"doi\":\"10.1109/LGRS.2025.3593475\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Remote sensing image captioning (RSIC) has garnered significant attention for enhancing the interpretability of aerial imagery through textual descriptions. Conventional approaches employ convolutional neural networks (CNNs) for visual feature extraction paired with recurrent neural networks (RNNs) or transformers for caption generation. However, these architectures suffer from high complexity and computational costs. While neural architecture search (NAS) via network pruning has been extensively studied, module-based pruning for RSIC systems remains largely unexplored. We propose a novel dedicated decoder pruning methodology for sequential caption generators—a module-based pruning method for end-to-end encoder–decoder architectural adaptation. It features two key innovations: 1) structured pruning of a pre-trained ResNet encoder and transformer encoder–decoder components and 2) a cross-entropy-based caption matching strategy replacing conventional prediction training in the decoder’s final layer. The proposed method enables simultaneously enhancing inference efficiency and reducing storage requirements without compromising performance. As evaluated on the RSICD dataset using CIDEr, ROUGE, METEOR, bilingual evaluation understudy (BLEU), and Sm metrics, our method achieves 42.8% model size reduction while improving accuracy, establishing new benchmarks in efficient RSIC.\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11098904/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11098904/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Module-Pruning-Based Neural Architectural Search for Remote Sensing Image Captioning
Remote sensing image captioning (RSIC) has garnered significant attention for enhancing the interpretability of aerial imagery through textual descriptions. Conventional approaches employ convolutional neural networks (CNNs) for visual feature extraction paired with recurrent neural networks (RNNs) or transformers for caption generation. However, these architectures suffer from high complexity and computational costs. While neural architecture search (NAS) via network pruning has been extensively studied, module-based pruning for RSIC systems remains largely unexplored. We propose a novel dedicated decoder pruning methodology for sequential caption generators—a module-based pruning method for end-to-end encoder–decoder architectural adaptation. It features two key innovations: 1) structured pruning of a pre-trained ResNet encoder and transformer encoder–decoder components and 2) a cross-entropy-based caption matching strategy replacing conventional prediction training in the decoder’s final layer. The proposed method enables simultaneously enhancing inference efficiency and reducing storage requirements without compromising performance. As evaluated on the RSICD dataset using CIDEr, ROUGE, METEOR, bilingual evaluation understudy (BLEU), and Sm metrics, our method achieves 42.8% model size reduction while improving accuracy, establishing new benchmarks in efficient RSIC.