IET Computer Vision最新文献

筛选
英文 中文
2D human skeleton action recognition with spatial constraints 带空间约束的二维人体骨骼动作识别
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-07-11 DOI: 10.1049/cvi2.12296
Lei Wang, Jianwei Zhang, Wenbing Yang, Song Gu, Shanmin Yang
{"title":"2D human skeleton action recognition with spatial constraints","authors":"Lei Wang,&nbsp;Jianwei Zhang,&nbsp;Wenbing Yang,&nbsp;Song Gu,&nbsp;Shanmin Yang","doi":"10.1049/cvi2.12296","DOIUrl":"10.1049/cvi2.12296","url":null,"abstract":"<p>Human actions are predominantly presented in 2D format in video surveillance scenarios, which hinders the accurate determination of action details not apparent in 2D data. Depth estimation can aid human action recognition tasks, enhancing accuracy with neural networks. However, reliance on images for depth estimation requires extensive computational resources and cannot utilise the connectivity between human body structures. Besides, the depth information may not accurately reflect actual depth ranges, necessitating improved reliability. Therefore, a 2D human skeleton action recognition method with spatial constraints (2D-SCHAR) is introduced. 2D-SCHAR employs graph convolution networks to process graph-structured human action skeleton data comprising three parts: depth estimation, spatial transformation, and action recognition. The initial two components, which infer 3D information from 2D human skeleton actions and generate spatial transformation parameters to correct abnormal deviations in action data, support the latter in the model to enhance the accuracy of action recognition. The model is designed in an end-to-end, multitasking manner, allowing parameter sharing among these three components to boost performance. The experimental results validate the model's effectiveness and superiority in human skeleton action recognition.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"968-981"},"PeriodicalIF":1.5,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12296","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141657484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Centre-loss—A preferred class verification approach over sample-to-sample in self-checkout products datasets 中心损失--自助结账产品数据集中优于样本到样本的类别验证方法
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-07-11 DOI: 10.1049/cvi2.12302
Bernardas Ciapas, Povilas Treigys
{"title":"Centre-loss—A preferred class verification approach over sample-to-sample in self-checkout products datasets","authors":"Bernardas Ciapas,&nbsp;Povilas Treigys","doi":"10.1049/cvi2.12302","DOIUrl":"10.1049/cvi2.12302","url":null,"abstract":"<p>Siamese networks excel at comparing two images, serving as an effective class verification technique for a single-per-class reference image. However, when multiple reference images are present, Siamese verification necessitates multiple comparisons and aggregation, often unpractical at inference. The Centre-Loss approach, proposed in this research, solves a class verification task more efficiently, using a single forward-pass during inference, than sample-to-sample approaches. Optimising a Centre-Loss function learns class centres and minimises intra-class distances in latent space. The authors compared verification accuracy using Centre-Loss against aggregated Siamese when other hyperparameters (such as neural network backbone and distance type) are the same. Experiments were performed to contrast the ubiquitous Euclidean against other distance types to discover the optimum Centre-Loss layer, its size, and Centre-Loss weight. In optimal architecture, the Centre-Loss layer is connected to the penultimate layer, calculates Euclidean distance, and its size depends on distance type. The Centre-Loss method was validated on the Self-Checkout products and Fruits 360 image datasets. Centre-Loss comparable accuracy and lesser complexity make it a preferred approach over sample-to-sample for the class verification task, when the number of reference image per class is high and inference speed is a factor, such as in self-checkouts.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"1004-1016"},"PeriodicalIF":1.5,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12302","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141657814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GR-Former: Graph-reinforcement transformer for skeleton-based driver action recognition GR-Former:基于骨架的驾驶员动作识别图形强化变换器
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-07-10 DOI: 10.1049/cvi2.12298
Zhuoyan Xu, Jingke Xu
{"title":"GR-Former: Graph-reinforcement transformer for skeleton-based driver action recognition","authors":"Zhuoyan Xu,&nbsp;Jingke Xu","doi":"10.1049/cvi2.12298","DOIUrl":"10.1049/cvi2.12298","url":null,"abstract":"<p>In in-vehicle driving scenarios, composite action recognition is crucial for improving safety and understanding the driver's intention. Due to spatial constraints and occlusion factors, the driver's range of motion is limited, thus resulting in similar action patterns that are difficult to differentiate. Additionally, collecting skeleton data that characterise the full human posture is difficult, posing additional challenges for action recognition. To address the problems, a novel Graph-Reinforcement Transformer (GR-Former) model is proposed. Using limited skeleton data as inputs, by introducing graph structure information to directionally reinforce the effect of the self-attention mechanism, dynamically learning and aggregating features between joints at multiple levels, the authors’ model constructs a richer feature vector space, enhancing its expressiveness and recognition accuracy. Based on the Drive &amp; Act dataset for composite action recognition, the authors’ work only applies human upper-body skeleton data to achieve state-of-the-art performance compared to existing methods. Using complete human skeleton data also has excellent recognition accuracy on the NTU RGB + D- and NTU RGB + D 120 dataset, demonstrating the great generalisability of the GR-Former. Generally, the authors’ work provides a new and effective solution for driver action recognition in in-vehicle scenarios.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"982-991"},"PeriodicalIF":1.5,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12298","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141659905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale skeleton simplification graph convolutional network for skeleton-based action recognition 基于骨骼动作识别的多尺度骨骼简化图卷积网络
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-07-08 DOI: 10.1049/cvi2.12300
Fan Zhang, Ding Chongyang, Kai Liu, Liu Hongjin
{"title":"Multi-scale skeleton simplification graph convolutional network for skeleton-based action recognition","authors":"Fan Zhang,&nbsp;Ding Chongyang,&nbsp;Kai Liu,&nbsp;Liu Hongjin","doi":"10.1049/cvi2.12300","DOIUrl":"10.1049/cvi2.12300","url":null,"abstract":"<p>Human action recognition based on graph convolutional networks (GCNs) is one of the hotspots in computer vision. However, previous methods generally rely on handcrafted graph, which limits the effectiveness of the model in characterising the connections between indirectly connected joints. The limitation leads to weakened connections when joints are separated by long distances. To address the above issue, the authors propose a skeleton simplification method which aims to reduce the number of joints and the distance between joints by merging adjacent joints into simplified joints. Group convolutional block is devised to extract the internal features of the simplified joints. Additionally, the authors enhance the method by introducing multi-scale modelling, which maps inputs into sequences across various levels of simplification. Combining with spatial temporal graph convolution, a multi-scale skeleton simplification GCN for skeleton-based action recognition (M3S-GCN) is proposed for fusing multi-scale skeleton sequences and modelling the connections between joints. Finally, M3S-GCN is evaluated on five benchmarks of NTU RGB+D 60 (C-Sub, C-View), NTU RGB+D 120 (X-Sub, X-Set) and NW-UCLA datasets. Experimental results show that the authors’ M3S-GCN achieves state-of-the-art performance with the accuracies of 93.0%, 97.0% and 91.2% on C-Sub, C-View and X-Set benchmarks, which validates the effectiveness of the method.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"992-1003"},"PeriodicalIF":1.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12300","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141668289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recognition of European mammals and birds in camera trap images using deep neural networks 利用深度神经网络识别相机捕获图像中的欧洲哺乳动物和鸟类
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-07-03 DOI: 10.1049/cvi2.12294
Daniel Schneider, Kim Lindner, Markus Vogelbacher, Hicham Bellafkir, Nina Farwig, Bernd Freisleben
{"title":"Recognition of European mammals and birds in camera trap images using deep neural networks","authors":"Daniel Schneider,&nbsp;Kim Lindner,&nbsp;Markus Vogelbacher,&nbsp;Hicham Bellafkir,&nbsp;Nina Farwig,&nbsp;Bernd Freisleben","doi":"10.1049/cvi2.12294","DOIUrl":"10.1049/cvi2.12294","url":null,"abstract":"<p>Most machine learning methods for animal recognition in camera trap images are limited to mammal identification and group birds into a single class. Machine learning methods for visually discriminating birds, in turn, cannot discriminate between mammals and are not designed for camera trap images. The authors present deep neural network models to recognise both mammals and bird species in camera trap images. They train neural network models for species classification as well as for predicting the animal taxonomy, that is, genus, family, order, group, and class names. Different neural network architectures, including ResNet, EfficientNetV2, Vision Transformer, Swin Transformer, and ConvNeXt, are compared for these tasks. Furthermore, the authors investigate approaches to overcome various challenges associated with camera trap image analysis. The authors’ best species classification models achieve a mean average precision (mAP) of 97.91% on a validation data set and mAPs of 90.39% and 82.77% on test data sets recorded in forests in Germany and Poland, respectively. Their best taxonomic classification models reach a validation mAP of 97.18% and mAPs of 94.23% and 79.92% on the two test data sets, respectively.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1162-1192"},"PeriodicalIF":1.5,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12294","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141683177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised multi-view clustering in computer vision: A survey 计算机视觉中的自监督多视角聚类:调查
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-07-02 DOI: 10.1049/cvi2.12299
Jiatai Wang, Zhiwei Xu, Xuewen Yang, Hailong Li, Bo Li, Xuying Meng
{"title":"Self-supervised multi-view clustering in computer vision: A survey","authors":"Jiatai Wang,&nbsp;Zhiwei Xu,&nbsp;Xuewen Yang,&nbsp;Hailong Li,&nbsp;Bo Li,&nbsp;Xuying Meng","doi":"10.1049/cvi2.12299","DOIUrl":"https://doi.org/10.1049/cvi2.12299","url":null,"abstract":"<p>In recent years, multi-view clustering (MVC) has had significant implications in the fields of cross-modal representation learning and data-driven decision-making. Its main objective is to cluster samples into distinct groups by leveraging consistency and complementary information among multiple views. However, the field of computer vision has witnessed the evolution of contrastive learning, and self-supervised learning has made substantial research progress. Consequently, self-supervised learning is progressively becoming dominant in MVC methods. It involves designing proxy tasks to extract supervisory information from image and video data, thereby guiding the clustering process. Despite the rapid development of self-supervised MVC, there is currently no comprehensive survey analysing and summarising the current state of research progress. Hence, the authors aim to explore the emergence of self-supervised MVC by discussing the reasons and advantages behind it. Additionally, the internal connections and classifications of common datasets, data issues, representation learning methods, and self-supervised learning methods are investigated. The authors not only introduce the mechanisms for each category of methods, but also provide illustrative examples of their applications. Finally, some open problems are identified for further investigation and development.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 6","pages":"709-734"},"PeriodicalIF":1.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12299","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142158626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusing crops representation into snippet via mutual learning for weakly supervised surveillance anomaly detection 通过相互学习将作物表征融合到片段中,用于弱监督监控异常检测
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-07-02 DOI: 10.1049/cvi2.12289
Bohua Zhang, Jianru Xue
{"title":"Fusing crops representation into snippet via mutual learning for weakly supervised surveillance anomaly detection","authors":"Bohua Zhang,&nbsp;Jianru Xue","doi":"10.1049/cvi2.12289","DOIUrl":"10.1049/cvi2.12289","url":null,"abstract":"<p>In recent years, the challenge of detecting anomalies in real-world surveillance videos using weakly supervised data has emerged. Traditional methods, utilising multi-instance learning (MIL) with video snippets, struggle with background noise and tend to overlook subtle anomalies. To tackle this, the authors propose a novel approach that crops snippets to create multiple instances with less noise, separately evaluates them and then fuses these evaluations for more precise anomaly detection. This method, however, leads to higher computational demands, especially during inference. Addressing this, our solution employs mutual learning to guide snippet feature training using these low-noise crops. The authors integrate multiple instance learning (MIL) for the primary task with snippets as inputs and multiple-multiple instance learning (MMIL) for an auxiliary task with crops during training. The authors’ approach ensures consistent multi-instance results in both tasks and incorporates a temporal activation mutual learning module (TAML) for aligning temporal anomaly activations between snippets and crops, improving the overall quality of snippet representations. Additionally, a snippet feature discrimination enhancement module (SFDE) refines the snippet features further. Tested across various datasets, the authors’ method shows remarkable performance, notably achieving a frame-level AUC of 85.78% on the UCF-Crime dataset, while reducing computational costs.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1112-1126"},"PeriodicalIF":1.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12289","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141684297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastFaceCLIP: A lightweight text-driven high-quality face image manipulation FastFaceCLIP:轻量级文本驱动的高质量人脸图像处理工具
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-07-02 DOI: 10.1049/cvi2.12295
Jiaqi Ren, Junping Qin, Qianli Ma, Yin Cao
{"title":"FastFaceCLIP: A lightweight text-driven high-quality face image manipulation","authors":"Jiaqi Ren,&nbsp;Junping Qin,&nbsp;Qianli Ma,&nbsp;Yin Cao","doi":"10.1049/cvi2.12295","DOIUrl":"10.1049/cvi2.12295","url":null,"abstract":"<p>Although many new methods have emerged in text-driven images, the large computational power required for model training causes these methods to have a slow training process. Additionally, these methods consume a considerable amount of video random access memory (VRAM) resources during training. When generating high-resolution images, the VRAM resources are often insufficient, which results in the inability to generate high-resolution images. Nevertheless, recent Vision Transformers (ViTs) advancements have demonstrated their image classification and recognition capabilities. Unlike the traditional Convolutional Neural Networks based methods, ViTs have a Transformer-based architecture, leverage attention mechanisms to capture comprehensive global information, moreover enabling enhanced global understanding of images through inherent long-range dependencies, thus extracting more robust features and achieving comparable results with reduced computational load. The adaptability of ViTs to text-driven image manipulation was investigated. Specifically, existing image generation methods were refined and the FastFaceCLIP method was proposed by combining the image-text semantic alignment function of the pre-trained CLIP model with the high-resolution image generation function of the proposed FastFace. Additionally, the Multi-Axis Nested Transformer module was incorporated for advanced feature extraction from the latent space, generating higher-resolution images that are further enhanced using the Real-ESRGAN algorithm. Eventually, extensive face manipulation-related tests on the CelebA-HQ dataset challenge the proposed method and other related schemes, demonstrating that FastFaceCLIP effectively generates semantically accurate, visually realistic, and clear images using fewer parameters and less time.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"950-967"},"PeriodicalIF":1.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141687557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DualAD: Dual adversarial network for image anomaly detection⋆ DualAD:用于图像异常检测的双对抗网络
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-06-25 DOI: 10.1049/cvi2.12297
Yonghao Wan, Aimin Feng
{"title":"DualAD: Dual adversarial network for image anomaly detection⋆","authors":"Yonghao Wan,&nbsp;Aimin Feng","doi":"10.1049/cvi2.12297","DOIUrl":"https://doi.org/10.1049/cvi2.12297","url":null,"abstract":"<p>Anomaly Detection, also known as outlier detection, is critical in domains such as network security, intrusion detection, and fraud detection. One popular approach to anomaly detection is using autoencoders, which are trained to reconstruct input by minimising reconstruction error with the neural network. However, these methods usually suffer from the trade-off between normal reconstruction fidelity and abnormal reconstruction distinguishability, which damages the performance. The authors find that the above trade-off can be better mitigated by imposing constraints on the latent space of images. To this end, the authors propose a new Dual Adversarial Network (DualAD) that consists of a Feature Constraint (FC) module and a reconstruction module. The method incorporates the FC module during the reconstruction training process to impose constraints on the latent space of images, thereby yielding feature representations more conducive to anomaly detection. Additionally, the authors employ dual adversarial learning to model the distribution of normal data. On the one hand, adversarial learning was implemented during the reconstruction process to obtain higher-quality reconstruction samples, thereby preventing the effects of blurred image reconstructions on model performance. On the other hand, the authors utilise adversarial training of the FC module and the reconstruction module to achieve superior feature representation, making anomalies more distinguishable at the feature level. During the inference phase, the authors perform anomaly detection simultaneously in the pixel and latent spaces to identify abnormal patterns more comprehensively. Experiments on three data sets CIFAR10, MNIST, and FashionMNIST demonstrate the validity of the authors’ work. Results show that constraints on the latent space and adversarial learning can improve detection performance.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1138-1148"},"PeriodicalIF":1.5,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12297","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAM-Y: Attention-enhanced hazardous vehicle object detection algorithm SAM-Y:注意力增强危险车辆目标检测算法
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-06-17 DOI: 10.1049/cvi2.12293
Shanshan Wang, Bushi Liu, Pengcheng Zhu, Xianchun Meng, Bolun Chen, Wei Shao, Liqing Chen
{"title":"SAM-Y: Attention-enhanced hazardous vehicle object detection algorithm","authors":"Shanshan Wang,&nbsp;Bushi Liu,&nbsp;Pengcheng Zhu,&nbsp;Xianchun Meng,&nbsp;Bolun Chen,&nbsp;Wei Shao,&nbsp;Liqing Chen","doi":"10.1049/cvi2.12293","DOIUrl":"https://doi.org/10.1049/cvi2.12293","url":null,"abstract":"<p>Vehicle transportation of hazardous chemicals is one of the important mobile hazards in modern logistics, and its unsafe factors bring serious threats to people's lives, property and environmental safety. Although the current object detection algorithm has certain applications in the detection of hazardous chemical vehicles, due to the complexity of the transportation environment, the small size and low resolution of the vehicle target etc., object detection becomes more difficult in the face of a complex background. In order to solve these problems, the authors propose an improved algorithm based on YOLOv5 to enhance the detection accuracy and efficiency of hazardous chemical vehicles. Firstly, in order to better capture the details and semantic information of hazardous chemical vehicles, the algorithm solves the problem of mismatch between the receptive field of the detector and the target object by introducing the receptive field expansion block into the backbone network, so as to improve the ability of the model to capture the detailed information of hazardous chemical vehicles. Secondly, in order to improve the ability of the model to express the characteristics of hazardous chemical vehicles, the authors introduce a separable attention mechanism in the multi-scale target detection stage, and enhances the prediction ability of the model by combining the object detection head and attention mechanism coherently in the feature layer of scale perception, the spatial location of spatial perception and the output channel of task perception. Experimental results show that the improved model significantly surpasses the baseline model in terms of accuracy and achieves more accurate object detection. At the same time, the model also has a certain improvement in inference speed and achieves faster inference ability.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1149-1161"},"PeriodicalIF":1.5,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12293","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信