IET Computer Vision最新文献

筛选
英文 中文
Joint image restoration for object detection in snowy weather 雪天物体检测的联合图像复原
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-03-27 DOI: 10.1049/cvi2.12274
Jing Wang, Meimei Xu, Huazhu Xue, Zhanqiang Huo, Fen Luo
{"title":"Joint image restoration for object detection in snowy weather","authors":"Jing Wang,&nbsp;Meimei Xu,&nbsp;Huazhu Xue,&nbsp;Zhanqiang Huo,&nbsp;Fen Luo","doi":"10.1049/cvi2.12274","DOIUrl":"10.1049/cvi2.12274","url":null,"abstract":"<p>Although existing object detectors achieve encouraging performance of object detection and localisation under real ideal conditions, the detection performance in adverse weather conditions (snowy) is very poor and not enough to cope with the detection task in adverse weather conditions. Existing methods do not deal well with the effect of snow on the identity of object features or usually ignore or even discard potential information that can help improve the detection performance. To this end, the authors propose a novel and improved end-to-end object detection network joint image restoration. Specifically, in order to address the problem of identity degradation of object detection due to snow, an ingenious restoration-detection dual branch network structure combined with a Multi-Integrated Attention module is proposed, which can well mitigate the effect of snow on the identity of object features, thus improving the detection performance of the detector. In order to make more effective use of the features that are beneficial to the detection task, a Self-Adaptive Feature Fusion module is introduced, which can help the network better learn the potential features that are beneficial to the detection and eliminate the effect of heavy or large local snow in the object area on detection by a special feature fusion, thus improving the network's detection capability in snowy. In addition, the authors construct a large-scale, multi-size snowy dataset called Synthetic and Real Snowy Dataset (SRSD), and it is a good and necessary complement and improvement to the existing snowy-related tasks. Extensive experiments on a public snowy dataset (Snowy-weather Datasets) and SRSD indicate that our method outperforms the existing state-of-the-art object detectors.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 6","pages":"759-771"},"PeriodicalIF":1.5,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12274","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140376973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tag-inferring and tag-guided Transformer for image captioning 用于图像标题的标签参考和标签引导转换器
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-03-22 DOI: 10.1049/cvi2.12280
Yaohua Yi, Yinkai Liang, Dezhu Kong, Ziwei Tang, Jibing Peng
{"title":"Tag-inferring and tag-guided Transformer for image captioning","authors":"Yaohua Yi,&nbsp;Yinkai Liang,&nbsp;Dezhu Kong,&nbsp;Ziwei Tang,&nbsp;Jibing Peng","doi":"10.1049/cvi2.12280","DOIUrl":"10.1049/cvi2.12280","url":null,"abstract":"<p>Image captioning is an important task for understanding images. Recently, many studies have used tags to build alignments between image information and language information. However, existing methods ignore the problem that simple semantic tags have difficulty expressing the detailed semantics for different image contents. Therefore, the authors propose a tag-inferring and tag-guided Transformer for image captioning to generate fine-grained captions. First, a tag-inferring encoder is proposed, which uses the tags extracted by the scene graph model to infer tags with deeper semantic information. Then, with the obtained deep tag information, a tag-guided decoder that includes short-term attention to improve the features of words in the sentence and gated cross-modal attention to combine image features, tag features and language features to produce informative semantic features is proposed. Finally, the word probability distribution of all positions in the sequence is calculated to generate descriptions for the image. The experiments demonstrate that the authors’ method can combine tags to obtain precise captions and that it achieves competitive performance with a 40.6% BLEU-4 score and 135.3% CIDEr score on the MSCOCO data set.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 6","pages":"801-812"},"PeriodicalIF":1.5,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12280","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140218940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learnable fusion mechanisms for multimodal object detection in autonomous vehicles 用于自动驾驶汽车多模式目标检测的可学习融合机制
IF 1.7 4区 计算机科学
IET Computer Vision Pub Date : 2024-03-15 DOI: 10.1049/cvi2.12259
Yahya Massoud, Robert Laganiere
{"title":"Learnable fusion mechanisms for multimodal object detection in autonomous vehicles","authors":"Yahya Massoud,&nbsp;Robert Laganiere","doi":"10.1049/cvi2.12259","DOIUrl":"10.1049/cvi2.12259","url":null,"abstract":"<p>Perception systems in autonomous vehicles need to accurately detect and classify objects within their surrounding environments. Numerous types of sensors are deployed on these vehicles, and the combination of such multimodal data streams can significantly boost performance. The authors introduce a novel sensor fusion framework using deep convolutional neural networks. The framework employs both camera and LiDAR sensors in a multimodal, multiview configuration. The authors leverage both data types by introducing two new innovative fusion mechanisms: element-wise multiplication and multimodal factorised bilinear pooling. The methods improve the bird's eye view moderate average precision score by +4.97% and +8.35% on the KITTI dataset when compared to traditional fusion operators like element-wise addition and feature map concatenation. An in-depth analysis of key design choices impacting performance, such as data augmentation, multi-task learning, and convolutional architecture design is offered. The study aims to pave the way for the development of more robust multimodal machine vision systems. The authors conclude the paper with qualitative results, discussing both successful and problematic cases, along with potential ways to mitigate the latter.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 4","pages":"499-511"},"PeriodicalIF":1.7,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12259","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140237870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attentional bias for hands: Cascade dual-decoder transformer for sign language production 手的注意偏差用于手语制作的级联双解码转换器
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-03-08 DOI: 10.1049/cvi2.12273
Xiaohan Ma, Rize Jin, Jianming Wang, Tae-Sun Chung
{"title":"Attentional bias for hands: Cascade dual-decoder transformer for sign language production","authors":"Xiaohan Ma,&nbsp;Rize Jin,&nbsp;Jianming Wang,&nbsp;Tae-Sun Chung","doi":"10.1049/cvi2.12273","DOIUrl":"10.1049/cvi2.12273","url":null,"abstract":"<p>Sign Language Production (SLP) refers to the task of translating textural forms of spoken language into corresponding sign language expressions. Sign languages convey meaning by means of multiple asynchronous articulators, including manual and non-manual information channels. Recent deep learning-based SLP models directly generate the full-articulatory sign sequence from the text input in an end-to-end manner. However, these models largely down weight the importance of subtle differences in the manual articulation due to the effect of regression to the mean. To explore these neglected aspects, an efficient cascade dual-decoder Transformer (CasDual-Transformer) for SLP is proposed to learn, successively, two mappings <i>SLP</i><sub><i>hand</i></sub>: <i>Text</i> → <i>Hand pose</i> and <i>SLP</i><sub>sign</sub>: <i>Text</i> → <i>Sign pose</i>, utilising an attention-based alignment module that fuses the hand and sign features from previous time steps to predict more expressive sign pose at the current time step. In addition, to provide more efficacious guidance, a novel spatio-temporal loss to penalise shape dissimilarity and temporal distortions of produced sequences is introduced. Experimental studies are performed on two benchmark sign language datasets from distinct cultures to verify the performance of the proposed model. Both quantitative and qualitative results show that the authors’ model demonstrates competitive performance compared to state-of-the-art models, and in some cases, achieves considerable improvements over them.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"696-708"},"PeriodicalIF":1.5,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12273","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140257431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASDNet: A robust involution-based architecture for diagnosis of autism spectrum disorder utilising eye-tracking technology ASDNet:利用眼动跟踪技术诊断自闭症谱系障碍的稳健内卷架构
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-02-12 DOI: 10.1049/cvi2.12271
Nasirul Mumenin, Mohammad Abu Yousuf, Md Asif Nashiry, A. K. M. Azad, Salem A. Alyami, Pietro Lio', Mohammad Ali Moni
{"title":"ASDNet: A robust involution-based architecture for diagnosis of autism spectrum disorder utilising eye-tracking technology","authors":"Nasirul Mumenin,&nbsp;Mohammad Abu Yousuf,&nbsp;Md Asif Nashiry,&nbsp;A. K. M. Azad,&nbsp;Salem A. Alyami,&nbsp;Pietro Lio',&nbsp;Mohammad Ali Moni","doi":"10.1049/cvi2.12271","DOIUrl":"10.1049/cvi2.12271","url":null,"abstract":"<p>Autism Spectrum Disorder (ASD) is a chronic condition characterised by impairments in social interaction and communication. Early detection of ASD is desired, and there exists a demand for the development of diagnostic aids to facilitate this. A lightweight Involutional Neural Network (INN) architecture has been developed to diagnose ASD. The model follows a simpler architectural design and has less number of parameters than the state-of-the-art (SOTA) image classification models, requiring lower computational resources. The proposed model is trained to detect ASD from eye-tracking scanpath (SP), heatmap (HM), and fixation map (FM) images. Monte Carlo Dropout has been applied to the model to perform an uncertainty analysis and ensure the effectiveness of the output provided by the proposed INN model. The model has been trained and evaluated using two publicly accessible datasets. From the experiment, it is seen that the model has achieved 98.12% accuracy, 96.83% accuracy, and 97.61% accuracy on SP, FM, and HM, respectively, which outperforms the current SOTA image classification models and other existing works conducted on this topic.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"666-681"},"PeriodicalIF":1.5,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12271","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139785035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-aware relation enhancement and similarity reasoning for image-text retrieval 用于图像文本检索的上下文感知关系增强和相似性推理
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-01-30 DOI: 10.1049/cvi2.12270
Zheng Cui, Yongli Hu, Yanfeng Sun, Baocai Yin
{"title":"Context-aware relation enhancement and similarity reasoning for image-text retrieval","authors":"Zheng Cui,&nbsp;Yongli Hu,&nbsp;Yanfeng Sun,&nbsp;Baocai Yin","doi":"10.1049/cvi2.12270","DOIUrl":"10.1049/cvi2.12270","url":null,"abstract":"<p>Image-text retrieval is a fundamental yet challenging task, which aims to bridge a semantic gap between heterogeneous data to achieve precise measurements of semantic similarity. The technique of fine-grained alignment between cross-modal features plays a key role in various successful methods that have been proposed. Nevertheless, existing methods cannot effectively utilise intra-modal information to enhance feature representation and lack powerful similarity reasoning to get a precise similarity score. Intending to tackle these issues, a context-aware Relation Enhancement and Similarity Reasoning model, called RESR, is proposed, which conducts both intra-modal relation enhancement and inter-modal similarity reasoning while considering the global-context information. For intra-modal relation enhancement, a novel context-aware graph convolutional network is introduced to enhance local feature representations by utilising relation and global-context information. For inter-modal similarity reasoning, local and global similarity features are exploited by the bidirectional alignment of image and text, and the similarity reasoning is implemented among multi-granularity similarity features. Finally, refined local and global similarity features are adaptively fused to get a precise similarity score. The experimental results show that our effective model outperforms some state-of-the-art approaches, achieving average improvements of 2.5% and 6.3% in R@sum on the Flickr30K and MS-COCO dataset.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"652-665"},"PeriodicalIF":1.5,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12270","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140483593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network OmDet:利用多模态检测网络进行大规模视觉语言多数据集预训练
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-01-24 DOI: 10.1049/cvi2.12268
Tiancheng Zhao, Peng Liu, Kyusong Lee
{"title":"OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network","authors":"Tiancheng Zhao,&nbsp;Peng Liu,&nbsp;Kyusong Lee","doi":"10.1049/cvi2.12268","DOIUrl":"10.1049/cvi2.12268","url":null,"abstract":"<p>The advancement of object detection (OD) in open-vocabulary and open-world scenarios is a critical challenge in computer vision. OmDet, a novel language-aware object detection architecture and an innovative training mechanism that harnesses continual learning and multi-dataset vision-language pre-training is introduced. Leveraging natural language as a universal knowledge representation, OmDet accumulates “visual vocabularies” from diverse datasets, unifying the task as a language-conditioned detection framework. The multimodal detection network (MDN) overcomes the challenges of multi-dataset joint training and generalizes to numerous training datasets without manual label taxonomy merging. The authors demonstrate superior performance of OmDet over strong baselines in object detection in the wild, open-vocabulary detection, and phrase grounding, achieving state-of-the-art results. Ablation studies reveal the impact of scaling the pre-training visual vocabulary, indicating a promising direction for further expansion to larger datasets. The effectiveness of our deep fusion approach is underscored by its ability to learn jointly from multiple datasets, enhancing performance through knowledge sharing.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"626-639"},"PeriodicalIF":1.5,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12268","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139601188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIANet: 3D object detection with structural information augment network SIANet:利用结构信息增强网络进行 3D 物体检测
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-01-23 DOI: 10.1049/cvi2.12272
Jing Zhou, Tengxing Lin, Zixin Gong, Xinhan Huang
{"title":"SIANet: 3D object detection with structural information augment network","authors":"Jing Zhou,&nbsp;Tengxing Lin,&nbsp;Zixin Gong,&nbsp;Xinhan Huang","doi":"10.1049/cvi2.12272","DOIUrl":"10.1049/cvi2.12272","url":null,"abstract":"<p>3D object detection technology from point clouds has been widely applied in the field of automatic driving in recent years. In practical applications, the shape point clouds of some objects are incomplete due to occlusion or far distance, which means they suffer from insufficient structural information. This greatly affects the detection performance. To address this challenge, the authors design a Structural Information Augment (SIA) Network for 3D object detection, named SIANet. Specifically, the authors design a SIA module to reconstruct the complete shapes of objects within proposals for enhancing their geometric features, which are further fused into the spatial feature of the object for box refinement to predict accurate detection boxes. Besides, the authors construct a novel Unet-liked Context-enhanced Transformer backbone network, which stacks Context-enhanced Transformer modules and an upsampling branch to capture contextual information efficiently and generate high-quality proposals for the SIA module. Extensive experiments show that the authors’ well-designed SIANet can effectively improve detection performance, especially surpassing the baseline network by 1.04% mean Average Precision (mAP) gain in the KITTI dataset and 0.75% LEVEL_2 mAP gain in the Waymo dataset.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"682-695"},"PeriodicalIF":1.5,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12272","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139604878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial catoptric light: An effective, stealthy and robust physical-world attack to DNNs 对抗性猫眼光:一种针对 DNN 的有效、隐蔽且强大的物理世界攻击
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-01-18 DOI: 10.1049/cvi2.12264
Chengyin Hu, Weiwen Shi, Ling Tian, Wen Li
{"title":"Adversarial catoptric light: An effective, stealthy and robust physical-world attack to DNNs","authors":"Chengyin Hu,&nbsp;Weiwen Shi,&nbsp;Ling Tian,&nbsp;Wen Li","doi":"10.1049/cvi2.12264","DOIUrl":"10.1049/cvi2.12264","url":null,"abstract":"<p>Recent studies have demonstrated that finely tuned deep neural networks (DNNs) are susceptible to adversarial attacks. Conventional physical attacks employ stickers as perturbations, achieving robust adversarial effects but compromising stealthiness. Recent innovations utilise light beams, such as lasers and projectors, for perturbation generation, allowing for stealthy physical attacks at the expense of robustness. In pursuit of implementing both stealthy and robust physical attacks, the authors present an adversarial catoptric light (AdvCL). This method leverages the natural phenomenon of catoptric light to generate perturbations that are both natural and stealthy. AdvCL first formalises the physical parameters of catoptric light and then optimises these parameters using a genetic algorithm to derive the most adversarial perturbation. Finally, the perturbations are deployed in the physical scene to execute stealthy and robust attacks. The proposed method is evaluated across three dimensions: effectiveness, stealthiness, and robustness. Quantitative results obtained in simulated environments demonstrate the efficacy of the proposed method, achieving an attack success rate of 83.5%, surpassing the baseline. The authors utilise common catoptric light as a perturbation to enhance the method's stealthiness, rendering physical samples more natural in appearance. Robustness is affirmed by successfully attacking advanced DNNs with a success rate exceeding 80% in all cases. Additionally, the authors discuss defence strategies against AdvCL and introduce some light-based physical attacks.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"557-573"},"PeriodicalIF":1.5,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12264","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139614963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel multi-model 3D object detection framework with adaptive voxel-image feature fusion 自适应体素图像特征融合的新型多模型三维物体检测框架
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2024-01-17 DOI: 10.1049/cvi2.12269
Zhao Liu, Zhongliang Fu, Gang Li, Shengyuan Zhang
{"title":"A novel multi-model 3D object detection framework with adaptive voxel-image feature fusion","authors":"Zhao Liu,&nbsp;Zhongliang Fu,&nbsp;Gang Li,&nbsp;Shengyuan Zhang","doi":"10.1049/cvi2.12269","DOIUrl":"10.1049/cvi2.12269","url":null,"abstract":"<p>The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge of effectively combining the complementary properties of multi-sensor data looms large. This work presents a new approach to multi-model 3D object detection, called adaptive voxel-image feature fusion (AVIFF). Adaptive voxel-image feature fusion is an end-to-end single-shot framework that can dynamically and adaptively fuse point cloud and image features, resulting in a more comprehensive and integrated analysis of the camera sensor and the LiDar sensor data. With the aid of the adaptive feature fusion module, spatialised image features can be adroitly fused with voxel-based point cloud features, while the Dense Fusion module ensures the preservation of the distinctive characteristics of 3D point cloud data through the use of a heterogeneous architecture. Notably, the authors’ framework features a novel generalised intersection over union loss function that enhances the perceptibility of object localsation and rotation in 3D space. Comprehensive experimentation has validated the efficacy of the authors’ proposed modules, firmly establishing AVIFF as a novel framework in the field of 3D object detection.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"640-651"},"PeriodicalIF":1.5,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12269","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139616930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信