IET Computer Vision最新文献_第3页

TomoSAR 3D reconstruction: Cascading adversarial strategy with sparse observation trajectory TomoSAR三维重建：稀疏观测轨迹级联对抗策略

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2025-02-04 DOI: 10.1049/cvi2.70001

Xian Zhu, Xiaoqin Zeng, Yuhua Cong, Yanhao Huang, Ziyan Zhu, Yantao Luo

{"title":"TomoSAR 3D reconstruction: Cascading adversarial strategy with sparse observation trajectory","authors":"Xian Zhu, Xiaoqin Zeng, Yuhua Cong, Yanhao Huang, Ziyan Zhu, Yantao Luo","doi":"10.1049/cvi2.70001","DOIUrl":"https://doi.org/10.1049/cvi2.70001","url":null,"abstract":"Synthetic aperture radar tomography (TomoSAR) has shown significant potential for the 3D Reconstruction of buildings, especially in critical areas such as topographic mapping, urban planning, and disaster monitoring. In practical applications, the constraints of observation trajectories frequently lead to the acquisition of a limited dataset of sparse SAR images, presenting challenges for TomoSAR 3D Reconstruction and affecting its signal-to-noise ratio and elevation resolution performance. The study introduces a cascade adversarial strategy based on the Conditional Generative Adversarial Network (CGAN), optimised explicitly for sparse observation trajectories. In the preliminary phase of the CGAN, the U-Net architecture was employed to capture more global information and enhance image detail recovery capability, which is subsequently utilised in the cascade refinement network. The ResNet34 residual network in the advanced network stage was adopted to bolster feature extraction and image generation capabilities further. Based on experimental validation performed on the curated TomoSAR 3D super-resolution dataset tailored for buildings, the findings reveal that the methodology yields a notable enhancement in image quality and accuracy compared to other techniques.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143111862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human activity recognition: A review of deep learning-based methods 人类活动识别：基于深度学习的方法综述

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2025-02-01 DOI: 10.1049/cvi2.70003

Sanjay Jyoti Dutta, Tossapon Boongoen, Reyer Zwiggelaar

引用次数: 0

A principal direction-guided local voxelisation structural feature approach for point cloud registration 一种主要的方向导向局部体素化结构特征点云配准方法

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2025-01-29 DOI: 10.1049/cvi2.70000

Chenyang Li, Yansong Duan

{"title":"A principal direction-guided local voxelisation structural feature approach for point cloud registration","authors":"Chenyang Li, Yansong Duan","doi":"10.1049/cvi2.70000","DOIUrl":"https://doi.org/10.1049/cvi2.70000","url":null,"abstract":"Point cloud registration is a crucial aspect of computer vision and 3D reconstruction. Traditional registration methods often depend on global features or iterative optimisation, leading to inefficiencies and imprecise outcomes when processing complex scene point cloud data. To address these challenges, the authors introduce a principal direction-guided local voxelisation structural feature (PDLVSF) approach for point cloud registration. This method reliably identifies feature points regardless of initial positioning. Approach begins with the 3D Harris algorithm to extract feature points, followed by determining the principal direction within the feature points' radius neighbourhood to ensure rotational invariance. For scale invariance, voxel grid normalisation is utilised to maximise the point cloud's geometric resolution and make it scale-independent. Cosine similarity is then employed for effective feature matching, identifying corresponding feature point pairs and determining transformation parameters between point clouds. Experimental validations on various datasets, including the real terrain dataset, demonstrate the effectiveness of our method. Results indicate superior performance in root mean square error (RMSE) and registration accuracy compared to state-of-the-art methods, particularly in scenarios with high noise, limited overlap, and significant initial pose rotation. The real terrain dataset is publicly available at https://github.com/black-2000/Real-terrain-data.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143120630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NBCDC-YOLOv8: A new framework to improve blood cell detection and classification based on YOLOv8 NBCDC-YOLOv8：基于YOLOv8改进血细胞检测和分类的新框架

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2025-01-22 DOI: 10.1049/cvi2.12341

Xuan Chen, Linxuan Li, Xiaoyu Liu, Fengjuan Yin, Xue Liu, Xiaoxiao Zhu, Yufeng Wang, Fanbin Meng

{"title":"NBCDC-YOLOv8: A new framework to improve blood cell detection and classification based on YOLOv8","authors":"Xuan Chen, Linxuan Li, Xiaoyu Liu, Fengjuan Yin, Xue Liu, Xiaoxiao Zhu, Yufeng Wang, Fanbin Meng","doi":"10.1049/cvi2.12341","DOIUrl":"https://doi.org/10.1049/cvi2.12341","url":null,"abstract":"In recent years, computer technology has successfully permeated all areas of medicine and its management, and it now offers doctors an accurate and rapid means of diagnosis. Existing blood cell detection methods suffer from low accuracy, which is caused by the uneven distribution, high density, and mutual occlusion of different blood cell types in blood microscope images, this article introduces NBCDC-YOLOv8: a new framework to improve blood cell detection and classification based on YOLOv8. Our framework innovates on several fronts: it uses Mosaic data augmentation to enrich the dataset and add small targets, incorporates a space to depth convolution (SPD-Conv) tailored for cells that are small and have low resolution, and introduces the Multi-Separated and Enhancement Attention Module (MultiSEAM) to enhance feature map resolution. Additionally, it integrates a bidirectional feature pyramid network (BiFPN) for effective multi-scale feature fusion and includes four detection heads to improve recognition accuracy of various cell sizes, especially small target platelets. Evaluated on the Blood Cell Classification Dataset (BCCD), NBCDC-YOLOv8 obtains a mean average precision (mAP) of 94.7%, and thus surpasses the original YOLOv8n by 2.3%.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143363050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Re-identification of patterned animals by multi-image feature aggregation and geometric similarity 基于多图像特征聚合和几何相似性的图案动物再识别

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2025-01-08 DOI: 10.1049/cvi2.12337

Ekaterina Nepovinnykh, Veikka Immonen, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen

{"title":"Re-identification of patterned animals by multi-image feature aggregation and geometric similarity","authors":"Ekaterina Nepovinnykh, Veikka Immonen, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen","doi":"10.1049/cvi2.12337","DOIUrl":"https://doi.org/10.1049/cvi2.12337","url":null,"abstract":"Image-based re-identification of animal individuals allows gathering of information such as population size and migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analysing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, the authors study pattern feature aggregation based re-identification and consider two ways of improving accuracy: (1) aggregating pattern image features over multiple images and (2) combining the pattern appearance similarity obtained by feature aggregation and geometric pattern similarity. Aggregation over multiple database images of the same individual allows to obtain more comprehensive and robust descriptors while reducing the computation time. On the other hand, combining the two similarity measures allows to efficiently utilise both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, the authors demonstrate that the proposed method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks without species-specific training or fine-tuning.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation MMF-Net：一种用于三维人体姿态估计的新型多特征多层次融合网络

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2025-01-07 DOI: 10.1049/cvi2.12336

Qianxing Li, Dehui Kong, Jinghua Li, Baocai Yin

{"title":"MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation","authors":"Qianxing Li, Dehui Kong, Jinghua Li, Baocai Yin","doi":"10.1049/cvi2.12336","DOIUrl":"https://doi.org/10.1049/cvi2.12336","url":null,"abstract":"Human pose estimation based on monocular video has always been the focus of research in the human computer interaction community, which suffers mainly from depth ambiguity and self-occlusion challenges. While the recently proposed learning-based approaches have demonstrated promising performance, they do not fully explore the complementarity of features. In this paper, the authors propose a novel multi-feature and multi-level fusion network (MMF-Net), which extracts and combines joint features, bone features and trajectory features at multiple levels to estimate 3D human pose. In MMF-Net, firstly, the bone length estimation module and the trajectory multi-level fusion module are used to extract the geometric size information of the human body and multi-level trajectory information of human motion, respectively. Then, the fusion attention-based combination (FABC) module is used to extract multi-level topological structure information of the human body, and effectively fuse topological structure information, geometric size information and trajectory information. Extensive experiments show that MMF-Net achieves competitive results on Human3.6M, HumanEva-I and MPI-INF-3DHP datasets.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A robust few-shot classifier with image as set of points 以图像为点集的鲁棒少射分类器

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2025-01-02 DOI: 10.1049/cvi2.12340

Suhua Peng, Zongliang Zhang, Xingwang Huang, Zongyue Wang, Shubing Su, Guorong Cai

{"title":"A robust few-shot classifier with image as set of points","authors":"Suhua Peng, Zongliang Zhang, Xingwang Huang, Zongyue Wang, Shubing Su, Guorong Cai","doi":"10.1049/cvi2.12340","DOIUrl":"https://doi.org/10.1049/cvi2.12340","url":null,"abstract":"In recent years, many few-shot classification methods have been proposed. However, only a few of them have explored robust classification, which is an important aspect of human visual intelligence. Humans can effortlessly recognise visual patterns, including lines, circles, and even characters, from image data that has been corrupted or degraded. In this paper, the authors investigate a robust classification method that extends the classical paradigm of robust geometric model fitting. The method views an image as a set of points in a low-dimensional space and analyses each image through low-dimensional geometric model fitting. In contrast, the majority of other methods, such as deep learning methods, treat an image as a single point in a high-dimensional space. The authors evaluate the performance of the method using a noisy Omniglot dataset. The experimental results demonstrate that the proposed method is significantly more robust than other methods. The source code and data for this paper are available at https://github.com/pengsuhua/PMF_OMNIGLOT.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12340","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SMGNFORMER: Fusion Mamba-graph transformer network for human pose estimation SMGNFORMER：用于人体姿态估计的融合曼巴图变压器网络

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-31 DOI: 10.1049/cvi2.12339

Yi Li, Zan Wang, Weiran Niu

引用次数: 0

LLFormer4D: LiDAR-based lane detection method by temporal feature fusion and sparse transformer LLFormer4D：基于时间特征融合和稀疏变压器的激光雷达车道检测方法

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-30 DOI: 10.1049/cvi2.12338

Jun Hu, Chaolu Feng, Haoxiang Jie, Zuotao Ning, Xinyi Zuo, Wei Liu, Xiangyu Wei

{"title":"LLFormer4D: LiDAR-based lane detection method by temporal feature fusion and sparse transformer","authors":"Jun Hu, Chaolu Feng, Haoxiang Jie, Zuotao Ning, Xinyi Zuo, Wei Liu, Xiangyu Wei","doi":"10.1049/cvi2.12338","DOIUrl":"https://doi.org/10.1049/cvi2.12338","url":null,"abstract":"Lane detection is a fundamental problem in autonomous driving, which provides vehicles with essential road information. Despite the attention from scholars and engineers, lane detection based on LiDAR meets challenges such as unsatisfactory detection accuracy and significant computation overhead. In this paper, the authors propose LLFormer4D to overcome these technical challenges by leveraging the strengths of both Convolutional Neural Network and Transformer networks. Specifically, the Temporal Feature Fusion module is introduced to enhance accuracy and robustness by integrating features from multi-frame point clouds. In addition, a sparse Transformer decoder based on Lane Key-point Query is designed, which introduces key-point supervision for each lane line to streamline the post-processing. The authors conduct experiments and evaluate the proposed method on the K-Lane and nuScenes map datasets respectively. The results demonstrate the effectiveness of the presented method, achieving second place with an F1 score of 82.39 and a processing speed of 16.03 Frames Per Seconds on the K-Lane dataset. Furthermore, this algorithm attains the best mAP of 70.66 for lane detection on the nuScenes map dataset.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12338","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HMSFU: A hierarchical multi-scale fusion unit for video prediction and beyond HMSFU：用于视频预测及其他领域的分层多尺度融合单元

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-29 DOI: 10.1049/cvi2.12312

Hongchang Zhu, Faming Fang

{"title":"HMSFU: A hierarchical multi-scale fusion unit for video prediction and beyond","authors":"Hongchang Zhu, Faming Fang","doi":"10.1049/cvi2.12312","DOIUrl":"https://doi.org/10.1049/cvi2.12312","url":null,"abstract":"Video prediction is the process of learning necessary information from historical frames to predict future video frames. Learning features from historical frames is a crucial step in this process. However, most current methods have a relatively single-scale learning approach, even if they learn features at different scales, they cannot fully integrate and utilise them, resulting in unsatisfactory prediction results. To address this issue, a hierarchical multi-scale fusion unit (HMSFU) is proposed. By using a hierarchical multi-scale architecture, each layer predicts future frames at different granularities using different convolutional scales. The abstract features from different layers can be fused, enabling the model not only to capture rich contextual information but also to expand the model's receptive field, enhance its expressive power, and improve its applicability to complex prediction scenarios. To fully utilise the expanded receptive field, HMSFU incorporates three fusion modules. The first module is the single-layer historical attention fusion module, which uses an attention mechanism to fuse the features from historical frames into the current frame at each layer. The second module is the single-layer spatiotemporal fusion module, which fuses complementary temporal and spatial features at each layer. The third module is the multi-layer spatiotemporal fusion module, which fuses spatiotemporal features from different layers. Additionally, the authors not only focus on the frame-level error using mean squared error loss, but also introduce the novel use of Kullback–Leibler (KL) divergence to consider inter-frame variations. Experimental results demonstrate that our proposed HMSFU model achieves the best performance on popular video prediction datasets, showcasing its remarkable competitiveness in the field.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12312","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143424260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0