{"title":"Advancements in smart agriculture: A systematic literature review on state-of-the-art plant disease detection with computer vision","authors":"Esra Yilmaz, Sevim Ceylan Bocekci, Cengiz Safak, Kazim Yildiz","doi":"10.1049/cvi2.70004","DOIUrl":"10.1049/cvi2.70004","url":null,"abstract":"<p>In an era of rapid digital transformation, ensuring sustainable and traceable food production is more crucial than ever. Plant diseases, a major threat to agriculture, lead to significant losses in crops and financial damage. Standard techniques for detecting diseases, though widespread, are lengthy and intensive work, especially in extensive agricultural settings. This systematic literature review examines the cutting-edge technologies in smart agriculture specifically computer vision, robotics, deep learning (DL), and Internet of Things (IoT) that are reshaping plant disease detection and management. By analysing 198 studies published between 2021 and 2023, from an initial pool of 19,838 papers, the authors reveal the dominance of DL, particularly with datasets such as PlantVillage, and highlight critical challenges, including dataset limitations, lack of geographical diversity, and the scarcity of real-world field data. Moreover, the authors explore the promising role of IoT, robotics, and drones in enhancing early disease detection, although the high costs and technological gaps present significant barriers for small-scale farmers, especially in developing countries. Through the preferred reporting items for systematic reviews and meta-analyses methodology, this review synthesises these findings, identifying key trends, uncovering research gaps, and offering actionable insights for the future of plant disease management in smart agriculture.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivan Rodin, Antonino Furnari, Giovanni Maria Farinella
{"title":"Egocentric action anticipation from untrimmed videos","authors":"Ivan Rodin, Antonino Furnari, Giovanni Maria Farinella","doi":"10.1049/cvi2.12342","DOIUrl":"10.1049/cvi2.12342","url":null,"abstract":"<p>Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are ‘trimmed’, meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with ‘untrimmed’ video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12342","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henrietta Ridley, Roberto Alcover-Couso, Juan C. SanMiguel
{"title":"Controlling semantics of diffusion-augmented data for unsupervised domain adaptation","authors":"Henrietta Ridley, Roberto Alcover-Couso, Juan C. SanMiguel","doi":"10.1049/cvi2.70002","DOIUrl":"10.1049/cvi2.70002","url":null,"abstract":"<p>Unsupervised domain adaptation (UDA) offers a compelling solution to bridge the gap between labelled synthetic data and unlabelled real-world data for training semantic segmentation models, given the high costs associated with manual annotation. However, the visual differences between the synthetic and real images pose significant challenges to their practical applications. This work addresses these challenges through synthetic-to-real style transfer leveraging diffusion models. The authors’ proposal incorporates semantic controllers to guide the diffusion process and low-rank adaptations (LoRAs) to ensure that style-transferred images align with real-world aesthetics while preserving semantic layout. Moreover, the authors introduce quality metrics to rank the utility of generated images, enabling the selective use of high-quality images for training. To further enhance reliability, the authors propose a novel loss function that mitigates artefacts from the style transfer process by incorporating only pixels aligned with the original semantic labels. Experimental results demonstrate that the authors’ proposal outperforms selected state-of-the-art methods for image generation and UDA training, achieving optimal performance even with a smaller set of high-quality generated images. The authors’ code and models are available at http://www-vpu.eps.uam.es/ControllingSem4UDA/.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TomoSAR 3D reconstruction: Cascading adversarial strategy with sparse observation trajectory","authors":"Xian Zhu, Xiaoqin Zeng, Yuhua Cong, Yanhao Huang, Ziyan Zhu, Yantao Luo","doi":"10.1049/cvi2.70001","DOIUrl":"10.1049/cvi2.70001","url":null,"abstract":"<p>Synthetic aperture radar tomography (TomoSAR) has shown significant potential for the 3D Reconstruction of buildings, especially in critical areas such as topographic mapping, urban planning, and disaster monitoring. In practical applications, the constraints of observation trajectories frequently lead to the acquisition of a limited dataset of sparse SAR images, presenting challenges for TomoSAR 3D Reconstruction and affecting its signal-to-noise ratio and elevation resolution performance. The study introduces a cascade adversarial strategy based on the Conditional Generative Adversarial Network (CGAN), optimised explicitly for sparse observation trajectories. In the preliminary phase of the CGAN, the U-Net architecture was employed to capture more global information and enhance image detail recovery capability, which is subsequently utilised in the cascade refinement network. The ResNet34 residual network in the advanced network stage was adopted to bolster feature extraction and image generation capabilities further. Based on experimental validation performed on the curated TomoSAR 3D super-resolution dataset tailored for buildings, the findings reveal that the methodology yields a notable enhancement in image quality and accuracy compared to other techniques.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143111862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human activity recognition: A review of deep learning-based methods","authors":"Sanjay Jyoti Dutta, Tossapon Boongoen, Reyer Zwiggelaar","doi":"10.1049/cvi2.70003","DOIUrl":"10.1049/cvi2.70003","url":null,"abstract":"<p>Human Activity Recognition (HAR) covers methods for automatically identifying human activities from a stream of data. End-users of HAR methods cover a range of sectors, including health, self-care, amusement, safety and monitoring. In this survey, the authors provide a thorough overview of deep learning based and detailed analysis of work that was performed between 2018 and 2023 in a variety of fields related to HAR with a focus on device-free solutions. It also presents the categorisation and taxonomy of the covered publication and an overview of publicly available datasets. To complete this review, the limitations of existing approaches and potential future research directions are discussed.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143110673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A principal direction-guided local voxelisation structural feature approach for point cloud registration","authors":"Chenyang Li, Yansong Duan","doi":"10.1049/cvi2.70000","DOIUrl":"10.1049/cvi2.70000","url":null,"abstract":"<p>Point cloud registration is a crucial aspect of computer vision and 3D reconstruction. Traditional registration methods often depend on global features or iterative optimisation, leading to inefficiencies and imprecise outcomes when processing complex scene point cloud data. To address these challenges, the authors introduce a principal direction-guided local voxelisation structural feature (PDLVSF) approach for point cloud registration. This method reliably identifies feature points regardless of initial positioning. Approach begins with the 3D Harris algorithm to extract feature points, followed by determining the principal direction within the feature points' radius neighbourhood to ensure rotational invariance. For scale invariance, voxel grid normalisation is utilised to maximise the point cloud's geometric resolution and make it scale-independent. Cosine similarity is then employed for effective feature matching, identifying corresponding feature point pairs and determining transformation parameters between point clouds. Experimental validations on various datasets, including the real terrain dataset, demonstrate the effectiveness of our method. Results indicate superior performance in root mean square error (RMSE) and registration accuracy compared to state-of-the-art methods, particularly in scenarios with high noise, limited overlap, and significant initial pose rotation. The real terrain dataset is publicly available at https://github.com/black-2000/Real-terrain-data.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143120630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NBCDC-YOLOv8: A new framework to improve blood cell detection and classification based on YOLOv8","authors":"Xuan Chen, Linxuan Li, Xiaoyu Liu, Fengjuan Yin, Xue Liu, Xiaoxiao Zhu, Yufeng Wang, Fanbin Meng","doi":"10.1049/cvi2.12341","DOIUrl":"10.1049/cvi2.12341","url":null,"abstract":"<p>In recent years, computer technology has successfully permeated all areas of medicine and its management, and it now offers doctors an accurate and rapid means of diagnosis. Existing blood cell detection methods suffer from low accuracy, which is caused by the uneven distribution, high density, and mutual occlusion of different blood cell types in blood microscope images, this article introduces NBCDC-YOLOv8: a new framework to improve blood cell detection and classification based on YOLOv8. Our framework innovates on several fronts: it uses Mosaic data augmentation to enrich the dataset and add small targets, incorporates a space to depth convolution (SPD-Conv) tailored for cells that are small and have low resolution, and introduces the Multi-Separated and Enhancement Attention Module (MultiSEAM) to enhance feature map resolution. Additionally, it integrates a bidirectional feature pyramid network (BiFPN) for effective multi-scale feature fusion and includes four detection heads to improve recognition accuracy of various cell sizes, especially small target platelets. Evaluated on the Blood Cell Classification Dataset (BCCD), NBCDC-YOLOv8 obtains a mean average precision (mAP) of 94.7%, and thus surpasses the original YOLOv8n by 2.3%.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143363050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ekaterina Nepovinnykh, Veikka Immonen, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen
{"title":"Re-identification of patterned animals by multi-image feature aggregation and geometric similarity","authors":"Ekaterina Nepovinnykh, Veikka Immonen, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen","doi":"10.1049/cvi2.12337","DOIUrl":"10.1049/cvi2.12337","url":null,"abstract":"<p>Image-based re-identification of animal individuals allows gathering of information such as population size and migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analysing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, the authors study pattern feature aggregation based re-identification and consider two ways of improving accuracy: (1) aggregating pattern image features over multiple images and (2) combining the pattern appearance similarity obtained by feature aggregation and geometric pattern similarity. Aggregation over multiple database images of the same individual allows to obtain more comprehensive and robust descriptors while reducing the computation time. On the other hand, combining the two similarity measures allows to efficiently utilise both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, the authors demonstrate that the proposed method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks without species-specific training or fine-tuning.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation","authors":"Qianxing Li, Dehui Kong, Jinghua Li, Baocai Yin","doi":"10.1049/cvi2.12336","DOIUrl":"10.1049/cvi2.12336","url":null,"abstract":"<p>Human pose estimation based on monocular video has always been the focus of research in the human computer interaction community, which suffers mainly from depth ambiguity and self-occlusion challenges. While the recently proposed learning-based approaches have demonstrated promising performance, they do not fully explore the complementarity of features. In this paper, the authors propose a novel multi-feature and multi-level fusion network (MMF-Net), which extracts and combines joint features, bone features and trajectory features at multiple levels to estimate 3D human pose. In MMF-Net, firstly, the bone length estimation module and the trajectory multi-level fusion module are used to extract the geometric size information of the human body and multi-level trajectory information of human motion, respectively. Then, the fusion attention-based combination (FABC) module is used to extract multi-level topological structure information of the human body, and effectively fuse topological structure information, geometric size information and trajectory information. Extensive experiments show that MMF-Net achieves competitive results on Human3.6M, HumanEva-I and MPI-INF-3DHP datasets.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suhua Peng, Zongliang Zhang, Xingwang Huang, Zongyue Wang, Shubing Su, Guorong Cai
{"title":"A robust few-shot classifier with image as set of points","authors":"Suhua Peng, Zongliang Zhang, Xingwang Huang, Zongyue Wang, Shubing Su, Guorong Cai","doi":"10.1049/cvi2.12340","DOIUrl":"10.1049/cvi2.12340","url":null,"abstract":"<p>In recent years, many few-shot classification methods have been proposed. However, only a few of them have explored robust classification, which is an important aspect of human visual intelligence. Humans can effortlessly recognise visual patterns, including lines, circles, and even characters, from image data that has been corrupted or degraded. In this paper, the authors investigate a robust classification method that extends the classical paradigm of robust geometric model fitting. The method views an image as a set of points in a low-dimensional space and analyses each image through low-dimensional geometric model fitting. In contrast, the majority of other methods, such as deep learning methods, treat an image as a single point in a high-dimensional space. The authors evaluate the performance of the method using a noisy Omniglot dataset. The experimental results demonstrate that the proposed method is significantly more robust than other methods. The source code and data for this paper are available at https://github.com/pengsuhua/PMF_OMNIGLOT.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12340","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}