IET Computer Vision最新文献

筛选
英文 中文
Angle Metric Learning for Discriminative Features on Vehicle Re-Identification 基于角度度量学习的车辆再识别判别特征
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-05-04 DOI: 10.1049/cvi2.70015
Yutong Xie, Shuoqi Zhang, Lide Guo, Yuming Liu, Rukai Wei, Yanzhao Xie, Yangtao Wang, Maobin Tang, Lisheng Fan
{"title":"Angle Metric Learning for Discriminative Features on Vehicle Re-Identification","authors":"Yutong Xie,&nbsp;Shuoqi Zhang,&nbsp;Lide Guo,&nbsp;Yuming Liu,&nbsp;Rukai Wei,&nbsp;Yanzhao Xie,&nbsp;Yangtao Wang,&nbsp;Maobin Tang,&nbsp;Lisheng Fan","doi":"10.1049/cvi2.70015","DOIUrl":"https://doi.org/10.1049/cvi2.70015","url":null,"abstract":"<p>Vehicle re-identification (Re-ID) facilitates the recognition and distinction of vehicles based on their visual characteristics in images or videos. However, accurately identifying a vehicle poses great challenges due to (i) the pronounced intra-instance variations encountered under varying lighting conditions such as day and night and (ii) the subtle inter-instance differences observed among similar vehicles. To address these challenges, the authors propose <b>A</b>ngle <b>M</b>etric learning for <b>D</b>iscriminative <b>F</b>eatures on vehicle Re-ID (termed as AMDF), which aims to maximise the variance between visual features of different classes while minimising the variance within the same class. AMDF comprehensively measures the angle and distance discrepancies between features. First, to mitigate the impact of lighting conditions on intra-class variation, the authors employ CycleGAN to generate images that simulate consistent lighting (either day or night), thereby standardising the conditions for distance measurement. Second, Swin Transformer was integrated to help generate more detailed features. At last, a novel angle metric loss based on cosine distance is proposed, which organically integrates angular metric and 2-norm metric, effectively maximising the decision boundary in angular space. Extensive experimental evaluations on three public datasets including VERI-776, VERI-Wild, and VEHICLEID, indicate that the method achieves state-of-the-art performance. The code of this project is released at https://github.com/ZnCu-0906/AMDF.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143905080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tran-GCN: A Transformer-Enhanced Graph Convolutional Network for Person Re-Identification in Monitoring Videos trans - gcn:一种用于监控视频中人物再识别的变压器增强图卷积网络
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-29 DOI: 10.1049/cvi2.70025
Xiaobin Hong, Tarmizi Adam, Masitah Ghazali
{"title":"Tran-GCN: A Transformer-Enhanced Graph Convolutional Network for Person Re-Identification in Monitoring Videos","authors":"Xiaobin Hong,&nbsp;Tarmizi Adam,&nbsp;Masitah Ghazali","doi":"10.1049/cvi2.70025","DOIUrl":"https://doi.org/10.1049/cvi2.70025","url":null,"abstract":"<p>Person re-identification (Re-ID) has gained popularity in computer vision, enabling cross-camera pedestrian recognition. Although the development of deep learning has provided a robust technical foundation for person Re-ID research, most existing person Re-ID methods overlook the potential relationships among local person features, failing to adequately address the impact of pedestrian pose variations and local body parts occlusion. Therefore, we propose a transformer-enhanced graph convolutional network (Tran-GCN) model to improve person re-identification performance in monitoring videos. The model comprises four key components: (1) a pose estimation learning branch is utilised to estimate pedestrian pose information and inherent skeletal structure data, extracting pedestrian key point information; (2) a transformer learning branch learns the global dependencies between fine-grained and semantically meaningful local person features; (3) a convolution learning branch uses the basic ResNet architecture to extract the person's fine-grained local features; and (4) a Graph convolutional module (GCM) integrates local feature information, global feature information and body information for more effective person identification after fusion. Quantitative and qualitative analysis experiments conducted on three different datasets (Market-1501, DukeMTMC-ReID and MSMT17) demonstrate that the Tran-GCN model can more accurately capture discriminative person features in monitoring videos, significantly improving identification accuracy.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNN-Based Flank Predictor for Quadruped Animal Species 基于cnn的四足动物侧面预测器
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-29 DOI: 10.1049/cvi2.70024
Vanessa Suessle, Marco Heurich, Colleen T. Downs, Andreas Weinmann, Elke Hergenroether
{"title":"CNN-Based Flank Predictor for Quadruped Animal Species","authors":"Vanessa Suessle,&nbsp;Marco Heurich,&nbsp;Colleen T. Downs,&nbsp;Andreas Weinmann,&nbsp;Elke Hergenroether","doi":"10.1049/cvi2.70024","DOIUrl":"https://doi.org/10.1049/cvi2.70024","url":null,"abstract":"<p>The bilateral asymmetry of flanks, where the sides of an animal with unique visual markings are independently patterned, complicates tasks such as individual identification. Automatically generating additional information on the visible side of the animal would improve the accuracy of individual identification. In this study, we used transfer learning on popular convolutional neural network (CNN) image classification architectures to train a flank predictor that predicted the visible flank of quadruped mammalian species in images. We automatically derived the data labels from existing datasets initially labelled for animal pose estimation. The developed models were evaluated across various scenarios involving unseen quadruped species in familiar and unfamiliar habitats. As a real-world scenario, we used a dataset of manually labelled Eurasian lynx (<i>Lynx lynx</i>) from camera traps in the Bavarian Forest National Park, Germany, to evaluate the model. The best model on data obtained in the field was trained on a MobileNetV2 architecture. It achieved an accuracy of 91.7% for the unseen/untrained species lynx in a complex unseen/untrained habitat with challenging light conditions. The developed flank predictor was designed to be embedded as a preprocessing step for automated analysis of camera trap datasets to enhance tasks such as individual identification.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Generated-bbox Guided Interactive Image Segmentation With Vision Transformers 基于视觉变形器的生成盒引导交互式图像分割
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-24 DOI: 10.1049/cvi2.70019
Shiyin Zhang, Yafei Dong, Shuang Qiu
{"title":"The Generated-bbox Guided Interactive Image Segmentation With Vision Transformers","authors":"Shiyin Zhang,&nbsp;Yafei Dong,&nbsp;Shuang Qiu","doi":"10.1049/cvi2.70019","DOIUrl":"https://doi.org/10.1049/cvi2.70019","url":null,"abstract":"<p>Existing click-based interactive image segmentation methods typically initiate object extraction with the first click and iteratively refine the coarse segmentation through subsequent interactions. Unlike box-based methods, click-based approaches mitigate ambiguity when multiple targets are present within a single bounding box, but suffer from a lack of precise location and outline information. Inspired by instance segmentation, the authors propose a Generated-bbox Guided method that provides location and outline information using an automatically generated bounding box, rather than a manually labelled one, minimising the need for extensive user interaction. Building on the success of vision transformers, the authors adopt them as the network architecture to enhance model's performance. A click-based interactive image segmentation network named the Generated-bbox Guided Coarse-to-Fine Network (GCFN) was proposed. GCFN is a two-stage cascade network comprising two sub-networks: Coarsenet and Finenet. A transformer-based Box Detector was introduced to generate an initial bounding box from a inside click, that can provide location and outline information. Additionally, two feature enhancement modules guided by foreground and background information: the Foreground-Background Feature Enhancement Module (FFEM) and the Pixel Enhancement Module (PEM) were designed. The authors evaluate the GCFN method on five popular benchmark datasets and demonstrate the generalisation capability on three medical image datasets.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-Based Uncertainty Estimation for Source-Free Active Domain Adaptation 无源主动域自适应中基于结构的不确定性估计
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-16 DOI: 10.1049/cvi2.70020
Jihong Ouyang, Zhengjie Zhang, Qingyi Meng, Jinjin Chi
{"title":"Structure-Based Uncertainty Estimation for Source-Free Active Domain Adaptation","authors":"Jihong Ouyang,&nbsp;Zhengjie Zhang,&nbsp;Qingyi Meng,&nbsp;Jinjin Chi","doi":"10.1049/cvi2.70020","DOIUrl":"https://doi.org/10.1049/cvi2.70020","url":null,"abstract":"<p>Active domain adaptation (active DA) provides an effective solution by selectively labelling a limited number of target samples to significantly enhance adaptation performance. However, existing active DA methods often struggle in real-world scenarios where, due to data privacy concerns, only a pre-trained source model is available, rather than the source samples. To address this issue, we propose a novel method called the structure-based uncertainty estimation model (SUEM) for source-free active domain adaptation (SFADA). To be specific, we introduce an innovative active sample selection strategy that combines both uncertainty and diversity sampling to identify the most informative samples. We assess the uncertainty in target samples using structure-wise probabilities and implement a diversity selection method to minimise redundancy. For the selected samples, we not only apply standard-supervised loss but also conduct interpolation consistency training to further explore the structural information of the target domain. Extensive experiments across four widely used datasets demonstrate that our method is comparable to or outperforms current UDA and active DA methods.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70020","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143840855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synchronised and Fine-Grained Head for Skeleton-Based Ambiguous Action Recognition 基于骨架的模糊动作识别的同步和细粒度头部
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-15 DOI: 10.1049/cvi2.70016
Hao Huang, Yujie Lin, Siyu Chen, Haiyang Liu
{"title":"Synchronised and Fine-Grained Head for Skeleton-Based Ambiguous Action Recognition","authors":"Hao Huang,&nbsp;Yujie Lin,&nbsp;Siyu Chen,&nbsp;Haiyang Liu","doi":"10.1049/cvi2.70016","DOIUrl":"https://doi.org/10.1049/cvi2.70016","url":null,"abstract":"<p>Skeleton-based action recognition using Graph Convolutional Networks (GCNs) has achieved remarkable performance, but recognising ambiguous actions, such as ‘waving’ and ‘saluting’, remains a significant challenge. Existing methods typically rely on a serial combination of GCNs and Temporal Convolutional Networks (TCNs), where spatial and temporal features are extracted independently, leading to an unbalanced spatial-temporal information, which hinders accurate action recognition. Moreover, existing methods for ambiguous actions often overemphasise local details, resulting in the loss of crucial global context, which further complicates the task of differentiating ambiguous actions. To address these challenges, the authors propose a lightweight plug-and-play module called Synchronised and Fine-grained Head (SF-Head), inserted between GCN and TCN layers. SF-Head first conducts Synchronised Spatial-Temporal Extraction (SSTE) with a Feature Redundancy Loss (F-RL), ensuring a balanced interaction between the two types of features. It then performs Adaptive Cross-dimensional Feature Aggregation (AC-FA), with a Feature Consistency Loss (F-CL), which aligns the aggregated feature with their original spatial-temporal feature. This aggregation step effectively combines both global context and local details, enhancing the model's ability to classify ambiguous actions. Experimental results on NTU RGB + D 60, NTU RGB + D 120, NW-UCLA and PKU-MMD I datasets demonstrate significant improvements in distinguishing ambiguous actions. Our code will be made available at https://github.com/HaoHuang2003/SFHead.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70016","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143835867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EDG-CDM: A New Encoder-Guided Conditional Diffusion Model-Based Image Synthesis Method for Limited Data EDG-CDM:一种新的基于编码器引导的条件扩散模型的有限数据图像合成方法
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-08 DOI: 10.1049/cvi2.70018
Haopeng Lei, Hao Yin, Kaijun Liang, Mingwen Wang, Jinshan Zeng, Guoliang Luo
{"title":"EDG-CDM: A New Encoder-Guided Conditional Diffusion Model-Based Image Synthesis Method for Limited Data","authors":"Haopeng Lei,&nbsp;Hao Yin,&nbsp;Kaijun Liang,&nbsp;Mingwen Wang,&nbsp;Jinshan Zeng,&nbsp;Guoliang Luo","doi":"10.1049/cvi2.70018","DOIUrl":"https://doi.org/10.1049/cvi2.70018","url":null,"abstract":"<p>The Diffusion Probabilistic Model (DM) has emerged as a powerful generative model in the field of image synthesis, capable of producing high-quality and realistic images. However, training DM requires a large and diverse dataset, which can be challenging to obtain. This limitation weakens the model's generalisation and robustness when training data is limited. To address this issue, EDG-CDM, an innovative encoder-guided conditional diffusion model was proposed for image synthesis with limited data. Firstly, the authors pre-train the encoder by introducing noise to capture the distribution of image features and generate the condition vector through contrastive learning and KL divergence. Next, the encoder undergoes further training with classification to integrate image class information, providing more favourable and versatile conditions for the diffusion model. Subsequently, the encoder is connected to the diffusion model, which is trained using all available data with encoder-provided conditions. Finally, the authors evaluate EDG-CDM on various public datasets with limited data, conducting extensive experiments and comparing our results with state-of-the-art methods using metrics such as Fréchet Inception Distance and Inception Score. Our experiments demonstrate that EDG-CDM outperforms existing models by consistently achieving the lowest FID scores and the highest IS scores, highlighting its effectiveness in generating high-quality and diverse images with limited training data. These results underscore the significance of EDG-CDM in advancing image synthesis techniques under data-constrained scenarios.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143801593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of Computer Vision Algorithms for Fine-Grained Classification Using Crowdsourced Insect Images 利用众包昆虫图像进行细粒度分类的计算机视觉算法的性能
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-04 DOI: 10.1049/cvi2.70006
Rita Pucci, Vincent J. Kalkman, Dan Stowell
{"title":"Performance of Computer Vision Algorithms for Fine-Grained Classification Using Crowdsourced Insect Images","authors":"Rita Pucci,&nbsp;Vincent J. Kalkman,&nbsp;Dan Stowell","doi":"10.1049/cvi2.70006","DOIUrl":"https://doi.org/10.1049/cvi2.70006","url":null,"abstract":"<p>With fine-grained classification, we identify unique characteristics to distinguish among classes of the same super-class. We are focusing on species recognition in Insecta as they are critical for biodiversity monitoring and at the base of many ecosystems. With citizen science campaigns, billions of images are collected in the wild. Once these are labelled, experts can use them to create distribution maps. However, the labelling process is time consuming, which is where computer vision comes in. The field of computer vision offers a wide range of algorithms, each with its strengths and weaknesses; how do we identify the algorithm that is in line with our application? To answer this question, we provide a full and detailed evaluation of nine algorithms among deep convolutional networks (CNN), vision transformers (ViT) and locality-based vision transformers (LBVT) on 4 different aspects: classification performance, embedding quality, computational cost and gradient activity. We offer insights that we have not yet had in this domain proving to which extent these algorithms solve the fine-grained tasks in Insecta. We found that ViT performs the best on inference speed and computational cost, whereas LBVT outperforms the others on performance and embedding quality; the CNN provide a trade-off among the metrics.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143778248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foundation Model Based Camouflaged Object Detection 基于基础模型的伪装目标检测
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-01 DOI: 10.1049/cvi2.70009
Zefeng Chen, Zhijiang Li, Yunqi Xue, Li Zhang
{"title":"Foundation Model Based Camouflaged Object Detection","authors":"Zefeng Chen,&nbsp;Zhijiang Li,&nbsp;Yunqi Xue,&nbsp;Li Zhang","doi":"10.1049/cvi2.70009","DOIUrl":"https://doi.org/10.1049/cvi2.70009","url":null,"abstract":"<p>Camouflaged object detection (COD) aims to identify and segment objects that closely resemble and are seamlessly integrated into their surrounding environments, making it a challenging task in computer vision. COD is constrained by the limited availability of training data and annotated samples, and most carefully designed COD models exhibit diminished performance under low-data conditions. In recent years, there has been increasing interest in leveraging foundation models, which have demonstrated robust general capabilities and superior generalisation performance, to address COD challenges. This work proposes a knowledge-guided domain adaptation (KGDA) approach to tackle the data scarcity problem in COD. The method utilises the knowledge descriptions generated by multimodal large language models (MLLMs) for camouflaged images, aiming to enhance the model's comprehension of semantic objects and camouflaged scenes through highly abstract and generalised knowledge representations. To resolve ambiguities and errors in the generated text descriptions, a multi-level knowledge aggregation (MLKG) module is devised. This module consolidates consistent semantic knowledge and forms multi-level semantic knowledge features. To incorporate semantic knowledge into the visual foundation model, the authors introduce a knowledge-guided semantic enhancement adaptor (KSEA) that integrates the semantic knowledge of camouflaged objects while preserving the original knowledge of the foundation model. Extensive experiments demonstrate that our method surpasses 19 state-of-the-art approaches and exhibits strong generalisation capabilities even with limited annotated data.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143749464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Optimisation of Satellite Image-Based Crop Mapping: A Comparison of Deep Time Series and Semi-Supervised Time Warping Strategies 基于卫星图像作物制图的时间优化:深度时间序列与半监督时间翘曲策略的比较
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-26 DOI: 10.1049/cvi2.70014
Rosie Finnegan, Joseph Metcalfe, Sara Sharifzadeh, Fabio Caraffini, Xianghua Xie, Alberto Hornero, Nicholas W. Synes
{"title":"Temporal Optimisation of Satellite Image-Based Crop Mapping: A Comparison of Deep Time Series and Semi-Supervised Time Warping Strategies","authors":"Rosie Finnegan,&nbsp;Joseph Metcalfe,&nbsp;Sara Sharifzadeh,&nbsp;Fabio Caraffini,&nbsp;Xianghua Xie,&nbsp;Alberto Hornero,&nbsp;Nicholas W. Synes","doi":"10.1049/cvi2.70014","DOIUrl":"https://doi.org/10.1049/cvi2.70014","url":null,"abstract":"<p>This study presents a novel approach to crop mapping using remotely sensed satellite images. It addresses the significant classification modelling challenges, including (1) the requirements for extensive labelled data and (2) the complex optimisation problem for selection of appropriate temporal windows in the absence of prior knowledge of cultivation calendars. We compare the lightweight Dynamic Time Warping (DTW) classification method with the heavily supervised Convolutional Neural Network - Long Short-Term Memory (CNN-LSTM) using high-resolution multispectral optical satellite imagery (3 m/pixel). Our approach integrates effective practical preprocessing steps, including data augmentation and a data-driven optimisation strategy for the temporal window, even in the presence of numerous crop classes. Our findings demonstrate that DTW, despite its lower data demands, can match the performance of CNN-LSTM through our effective preprocessing steps while significantly improving runtime. These results demonstrate that both CNN-LSTM and DTW can achieve deployment-level accuracy and underscore the potential of DTW as a viable alternative to more resource-intensive models. The results also prove the effectiveness of temporal windowing for improving runtime and accuracy of a crop classification study, even with no prior knowledge of planting timeframes.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信