IET Computer Vision最新文献_第5页

Advancements in smart agriculture: A systematic literature review on state-of-the-art plant disease detection with computer vision 智能农业的进展：用计算机视觉进行最先进的植物病害检测的系统文献综述

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-02-14 DOI: 10.1049/cvi2.70004

Esra Yilmaz, Sevim Ceylan Bocekci, Cengiz Safak, Kazim Yildiz

{"title":"Advancements in smart agriculture: A systematic literature review on state-of-the-art plant disease detection with computer vision","authors":"Esra Yilmaz, Sevim Ceylan Bocekci, Cengiz Safak, Kazim Yildiz","doi":"10.1049/cvi2.70004","DOIUrl":"10.1049/cvi2.70004","url":null,"abstract":"In an era of rapid digital transformation, ensuring sustainable and traceable food production is more crucial than ever. Plant diseases, a major threat to agriculture, lead to significant losses in crops and financial damage. Standard techniques for detecting diseases, though widespread, are lengthy and intensive work, especially in extensive agricultural settings. This systematic literature review examines the cutting-edge technologies in smart agriculture specifically computer vision, robotics, deep learning (DL), and Internet of Things (IoT) that are reshaping plant disease detection and management. By analysing 198 studies published between 2021 and 2023, from an initial pool of 19,838 papers, the authors reveal the dominance of DL, particularly with datasets such as PlantVillage, and highlight critical challenges, including dataset limitations, lack of geographical diversity, and the scarcity of real-world field data. Moreover, the authors explore the promising role of IoT, robotics, and drones in enhancing early disease detection, although the high costs and technological gaps present significant barriers for small-scale farmers, especially in developing countries. Through the preferred reporting items for systematic reviews and meta-analyses methodology, this review synthesises these findings, identifying key trends, uncovering research gaps, and offering actionable insights for the future of plant disease management in smart agriculture.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Egocentric action anticipation from untrimmed videos 以自我为中心的动作预期从未修剪的视频

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-02-14 DOI: 10.1049/cvi2.12342

Ivan Rodin, Antonino Furnari, Giovanni Maria Farinella

{"title":"Egocentric action anticipation from untrimmed videos","authors":"Ivan Rodin, Antonino Furnari, Giovanni Maria Farinella","doi":"10.1049/cvi2.12342","DOIUrl":"10.1049/cvi2.12342","url":null,"abstract":"Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are ‘trimmed’, meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with ‘untrimmed’ video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12342","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Controlling semantics of diffusion-augmented data for unsupervised domain adaptation 无监督域自适应扩散增强数据的控制语义

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-02-07 DOI: 10.1049/cvi2.70002

Henrietta Ridley, Roberto Alcover-Couso, Juan C. SanMiguel

{"title":"Controlling semantics of diffusion-augmented data for unsupervised domain adaptation","authors":"Henrietta Ridley, Roberto Alcover-Couso, Juan C. SanMiguel","doi":"10.1049/cvi2.70002","DOIUrl":"10.1049/cvi2.70002","url":null,"abstract":"Unsupervised domain adaptation (UDA) offers a compelling solution to bridge the gap between labelled synthetic data and unlabelled real-world data for training semantic segmentation models, given the high costs associated with manual annotation. However, the visual differences between the synthetic and real images pose significant challenges to their practical applications. This work addresses these challenges through synthetic-to-real style transfer leveraging diffusion models. The authors’ proposal incorporates semantic controllers to guide the diffusion process and low-rank adaptations (LoRAs) to ensure that style-transferred images align with real-world aesthetics while preserving semantic layout. Moreover, the authors introduce quality metrics to rank the utility of generated images, enabling the selective use of high-quality images for training. To further enhance reliability, the authors propose a novel loss function that mitigates artefacts from the style transfer process by incorporating only pixels aligned with the original semantic labels. Experimental results demonstrate that the authors’ proposal outperforms selected state-of-the-art methods for image generation and UDA training, achieving optimal performance even with a smaller set of high-quality generated images. The authors’ code and models are available at http://www-vpu.eps.uam.es/ControllingSem4UDA/.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TomoSAR 3D reconstruction: Cascading adversarial strategy with sparse observation trajectory TomoSAR三维重建：稀疏观测轨迹级联对抗策略

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-02-04 DOI: 10.1049/cvi2.70001

Xian Zhu, Xiaoqin Zeng, Yuhua Cong, Yanhao Huang, Ziyan Zhu, Yantao Luo

{"title":"TomoSAR 3D reconstruction: Cascading adversarial strategy with sparse observation trajectory","authors":"Xian Zhu, Xiaoqin Zeng, Yuhua Cong, Yanhao Huang, Ziyan Zhu, Yantao Luo","doi":"10.1049/cvi2.70001","DOIUrl":"10.1049/cvi2.70001","url":null,"abstract":"Synthetic aperture radar tomography (TomoSAR) has shown significant potential for the 3D Reconstruction of buildings, especially in critical areas such as topographic mapping, urban planning, and disaster monitoring. In practical applications, the constraints of observation trajectories frequently lead to the acquisition of a limited dataset of sparse SAR images, presenting challenges for TomoSAR 3D Reconstruction and affecting its signal-to-noise ratio and elevation resolution performance. The study introduces a cascade adversarial strategy based on the Conditional Generative Adversarial Network (CGAN), optimised explicitly for sparse observation trajectories. In the preliminary phase of the CGAN, the U-Net architecture was employed to capture more global information and enhance image detail recovery capability, which is subsequently utilised in the cascade refinement network. The ResNet34 residual network in the advanced network stage was adopted to bolster feature extraction and image generation capabilities further. Based on experimental validation performed on the curated TomoSAR 3D super-resolution dataset tailored for buildings, the findings reveal that the methodology yields a notable enhancement in image quality and accuracy compared to other techniques.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143111862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human activity recognition: A review of deep learning-based methods 人类活动识别：基于深度学习的方法综述

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-02-01 DOI: 10.1049/cvi2.70003

Sanjay Jyoti Dutta, Tossapon Boongoen, Reyer Zwiggelaar

引用次数: 0

A principal direction-guided local voxelisation structural feature approach for point cloud registration 一种主要的方向导向局部体素化结构特征点云配准方法

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-01-29 DOI: 10.1049/cvi2.70000

Chenyang Li, Yansong Duan

{"title":"A principal direction-guided local voxelisation structural feature approach for point cloud registration","authors":"Chenyang Li, Yansong Duan","doi":"10.1049/cvi2.70000","DOIUrl":"10.1049/cvi2.70000","url":null,"abstract":"Point cloud registration is a crucial aspect of computer vision and 3D reconstruction. Traditional registration methods often depend on global features or iterative optimisation, leading to inefficiencies and imprecise outcomes when processing complex scene point cloud data. To address these challenges, the authors introduce a principal direction-guided local voxelisation structural feature (PDLVSF) approach for point cloud registration. This method reliably identifies feature points regardless of initial positioning. Approach begins with the 3D Harris algorithm to extract feature points, followed by determining the principal direction within the feature points' radius neighbourhood to ensure rotational invariance. For scale invariance, voxel grid normalisation is utilised to maximise the point cloud's geometric resolution and make it scale-independent. Cosine similarity is then employed for effective feature matching, identifying corresponding feature point pairs and determining transformation parameters between point clouds. Experimental validations on various datasets, including the real terrain dataset, demonstrate the effectiveness of our method. Results indicate superior performance in root mean square error (RMSE) and registration accuracy compared to state-of-the-art methods, particularly in scenarios with high noise, limited overlap, and significant initial pose rotation. The real terrain dataset is publicly available at https://github.com/black-2000/Real-terrain-data.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143120630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NBCDC-YOLOv8: A new framework to improve blood cell detection and classification based on YOLOv8 NBCDC-YOLOv8：基于YOLOv8改进血细胞检测和分类的新框架

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-01-22 DOI: 10.1049/cvi2.12341

Xuan Chen, Linxuan Li, Xiaoyu Liu, Fengjuan Yin, Xue Liu, Xiaoxiao Zhu, Yufeng Wang, Fanbin Meng

{"title":"NBCDC-YOLOv8: A new framework to improve blood cell detection and classification based on YOLOv8","authors":"Xuan Chen, Linxuan Li, Xiaoyu Liu, Fengjuan Yin, Xue Liu, Xiaoxiao Zhu, Yufeng Wang, Fanbin Meng","doi":"10.1049/cvi2.12341","DOIUrl":"10.1049/cvi2.12341","url":null,"abstract":"In recent years, computer technology has successfully permeated all areas of medicine and its management, and it now offers doctors an accurate and rapid means of diagnosis. Existing blood cell detection methods suffer from low accuracy, which is caused by the uneven distribution, high density, and mutual occlusion of different blood cell types in blood microscope images, this article introduces NBCDC-YOLOv8: a new framework to improve blood cell detection and classification based on YOLOv8. Our framework innovates on several fronts: it uses Mosaic data augmentation to enrich the dataset and add small targets, incorporates a space to depth convolution (SPD-Conv) tailored for cells that are small and have low resolution, and introduces the Multi-Separated and Enhancement Attention Module (MultiSEAM) to enhance feature map resolution. Additionally, it integrates a bidirectional feature pyramid network (BiFPN) for effective multi-scale feature fusion and includes four detection heads to improve recognition accuracy of various cell sizes, especially small target platelets. Evaluated on the Blood Cell Classification Dataset (BCCD), NBCDC-YOLOv8 obtains a mean average precision (mAP) of 94.7%, and thus surpasses the original YOLOv8n by 2.3%.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143363050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Re-identification of patterned animals by multi-image feature aggregation and geometric similarity 基于多图像特征聚合和几何相似性的图案动物再识别

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-01-08 DOI: 10.1049/cvi2.12337

Ekaterina Nepovinnykh, Veikka Immonen, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen

{"title":"Re-identification of patterned animals by multi-image feature aggregation and geometric similarity","authors":"Ekaterina Nepovinnykh, Veikka Immonen, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen","doi":"10.1049/cvi2.12337","DOIUrl":"10.1049/cvi2.12337","url":null,"abstract":"Image-based re-identification of animal individuals allows gathering of information such as population size and migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analysing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, the authors study pattern feature aggregation based re-identification and consider two ways of improving accuracy: (1) aggregating pattern image features over multiple images and (2) combining the pattern appearance similarity obtained by feature aggregation and geometric pattern similarity. Aggregation over multiple database images of the same individual allows to obtain more comprehensive and robust descriptors while reducing the computation time. On the other hand, combining the two similarity measures allows to efficiently utilise both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, the authors demonstrate that the proposed method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks without species-specific training or fine-tuning.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation MMF-Net：一种用于三维人体姿态估计的新型多特征多层次融合网络

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-01-07 DOI: 10.1049/cvi2.12336

Qianxing Li, Dehui Kong, Jinghua Li, Baocai Yin

{"title":"MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation","authors":"Qianxing Li, Dehui Kong, Jinghua Li, Baocai Yin","doi":"10.1049/cvi2.12336","DOIUrl":"10.1049/cvi2.12336","url":null,"abstract":"Human pose estimation based on monocular video has always been the focus of research in the human computer interaction community, which suffers mainly from depth ambiguity and self-occlusion challenges. While the recently proposed learning-based approaches have demonstrated promising performance, they do not fully explore the complementarity of features. In this paper, the authors propose a novel multi-feature and multi-level fusion network (MMF-Net), which extracts and combines joint features, bone features and trajectory features at multiple levels to estimate 3D human pose. In MMF-Net, firstly, the bone length estimation module and the trajectory multi-level fusion module are used to extract the geometric size information of the human body and multi-level trajectory information of human motion, respectively. Then, the fusion attention-based combination (FABC) module is used to extract multi-level topological structure information of the human body, and effectively fuse topological structure information, geometric size information and trajectory information. Extensive experiments show that MMF-Net achieves competitive results on Human3.6M, HumanEva-I and MPI-INF-3DHP datasets.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A robust few-shot classifier with image as set of points 以图像为点集的鲁棒少射分类器

IF 1.3 4区计算机科学

IET Computer Vision Pub Date : 2025-01-02 DOI: 10.1049/cvi2.12340

Suhua Peng, Zongliang Zhang, Xingwang Huang, Zongyue Wang, Shubing Su, Guorong Cai

引用次数: 0