IET Computer Vision最新文献

筛选
英文 中文
Predicting Fire Heat Release Rate Using Deep Perceptual and Detail-Aware Hybrid Feature Fusion From Early Smoke Signals 基于早期烟雾信号的深度感知和细节感知混合特征融合预测火灾热释放率
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2026-01-04 DOI: 10.1049/cvi2.70054
Tianliang Liu, Jinkai Wang, Xu Zhou, Jun Wan, Xiaogang Cheng, Xiubin Dai
{"title":"Predicting Fire Heat Release Rate Using Deep Perceptual and Detail-Aware Hybrid Feature Fusion From Early Smoke Signals","authors":"Tianliang Liu,&nbsp;Jinkai Wang,&nbsp;Xu Zhou,&nbsp;Jun Wan,&nbsp;Xiaogang Cheng,&nbsp;Xiubin Dai","doi":"10.1049/cvi2.70054","DOIUrl":"https://doi.org/10.1049/cvi2.70054","url":null,"abstract":"<p>With urbanisation accelerating, predicting the heat release rate (HRR) of building fires using visual data has emerged as a pivotal research focus in the field of fire rescue. However, existing approaches face challenges, such as limited training data and complex models, which lead to suboptimal performance and slow inference speeds. To address these issues and adapt to the rapid morphological changes of smoke in dynamic fire environments, we propose a lightweight neural network prediction model based on adaptive pooling with channel information interaction (APCI). This model can achieve high precision while maintaining faster inference speed. Our approach employs simplified dense connections to propagate shallow smoke features, thereby effectively capturing the relationship between smoke textures and multiscale features to accommodate the variations of smoke morphologies. To mitigate the loss of smoke features caused by spatial misalignment and ventilation disturbances during downsampling, we introduce an adaptive weighted pooling mechanism that fully leverages the detailed information contained in the invoked smoke. Additionally, an enhanced channel shuffle operation in channel information interaction ensures effective cross-level transfer to detail-aware information exchange during sudden escalations in fire intensity in the hybrid feature fusion framework. Experiments on the smoke-heat release rate dataset we created demonstrate that the proposed method can achieve a coefficient of determination <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mfenced>\u0000 <msup>\u0000 <mi>R</mi>\u0000 <mn>2</mn>\u0000 </msup>\u0000 </mfenced>\u0000 </mrow>\u0000 <annotation> $left({R}^{2}right)$</annotation>\u0000 </semantics></math> of 0.937, a root mean square error (RMSE) of 23.0 kW, a mean absolute error (MAE) of 17.4 kW and with an inference time of 4.13 ms per image.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2026-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145904635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TaiChi-AQA: A Dataset and Framework for Action Quality Assessment and Visual Analysis 行动品质评估与视觉分析之资料集与架构
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2025-12-29 DOI: 10.1049/cvi2.70053
Dejin Wang, Fengyan Lin, Kexin Zhu, Zhide Chen
{"title":"TaiChi-AQA: A Dataset and Framework for Action Quality Assessment and Visual Analysis","authors":"Dejin Wang,&nbsp;Fengyan Lin,&nbsp;Kexin Zhu,&nbsp;Zhide Chen","doi":"10.1049/cvi2.70053","DOIUrl":"https://doi.org/10.1049/cvi2.70053","url":null,"abstract":"<p>Action Quality Assessment (AQA) has become an advanced technology applied in various domains. However, most existing datasets focus on sports events, such as the Olympics, whereas datasets tailored for daily exercise activities remain scarce. Additionally, many of these datasets are unsuitable for direct application in AQA tasks. To address these limitations, we constructed a new AQA dataset, TaiChi-AQA, which includes detailed scoring annotations. Our dataset comprises 1313 Tai Chi action videos and features a comprehensive set of fine-grained labels, including action labels, action descriptions and frame-level perspective information. To validate the effectiveness of TaiChi-AQA, we systematically evaluated it using a variety of popular AQA methods. We also propose a straightforward yet effective module that integrates a multi-head attention mechanism with a gated multilayer perceptron (gMLP). This module is combined with the distributed autoencoder (DAE) framework. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the TaiChi-AQA dataset. The dataset are publicly available at https://github.com/mlxger/TaiChi-AQA.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145887814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prior Matters: Contribution- and Semantics-Aware Prior Estimation for Few-Shot Learning 先验问题:基于贡献和语义的小样本学习先验估计
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2025-12-14 DOI: 10.1049/cvi2.70051
Yanling Tian, Jiaying Wu, Jinglu Hu
{"title":"Prior Matters: Contribution- and Semantics-Aware Prior Estimation for Few-Shot Learning","authors":"Yanling Tian,&nbsp;Jiaying Wu,&nbsp;Jinglu Hu","doi":"10.1049/cvi2.70051","DOIUrl":"10.1049/cvi2.70051","url":null,"abstract":"<p>Few-shot learning (FSL) aims to classify novel categories using only a few labelled examples, which poses significant challenges for generalisation. Among existing approaches, distribution-based methods have shown promise by constructing class distributions for novel categories using statistical priors transferred from base classes. However, these methods often rely on nearest-neighbour visual similarity and assume equal contributions from selected base classes, which can lead to inaccurate priors. In this paper, we propose CAPE (contribution-aware prior estimation), a method that addresses this issue from two complementary perspectives. On the one hand, CAPE assigns adaptive weights to base class prototypes based on their relevance to the novel support set, mitigating the limitations of equal-contribution assumptions. On the other hand, to compensate for the ambiguity of visual features, especially in the 1-shot scenario, we incorporate semantic information from category labels to enhance prior selection. By jointly leveraging visual and semantic information, CAPE constructs more accurate and robust priors for the feature distributions of novel classes. Extensive experiments on four widely used FSL benchmarks, including <i>mini</i> ImageNet, tieredImageNet, CIFAR-FS and CUB datasets, demonstrate that our method consistently outperforms existing approaches, highlighting the effectiveness of contribution- and semantics-aware prior estimation.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-To-End Multiple Object Detection and Tracking With Spatio-Temporal Transformers 基于时空变换的端到端多目标检测与跟踪
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2025-12-14 DOI: 10.1049/cvi2.70052
Qi Lei, Xiangyu Song, Shijie Sun, Huansheng Song, Lichen Liu, Zhaoyang Zhang
{"title":"End-To-End Multiple Object Detection and Tracking With Spatio-Temporal Transformers","authors":"Qi Lei,&nbsp;Xiangyu Song,&nbsp;Shijie Sun,&nbsp;Huansheng Song,&nbsp;Lichen Liu,&nbsp;Zhaoyang Zhang","doi":"10.1049/cvi2.70052","DOIUrl":"10.1049/cvi2.70052","url":null,"abstract":"<p>Optimising both trajectory position information and identity information is a key challenge in multiple object tracking. Mainstream approaches ensure ID consistency by combining detection data with various additional information. However, many methods overlook the inherent spatio-temporal correlation of trajectory position information. We argue that additional modules are redundant, and that forecasting trajectories directly without the need for interframe association by utilising motion constraints is adequate. In this study, we introduce a novel end-to-end network called the spatio-temporal multiple object tracking with transformer (STMOTR), which employs motion constraints to establish binary matching within the reconstructed deformable-DETR network, heuristically learning object trajectories from the Video Swin backbone. This subtly constrained matching rule not only keeps the detection ID consistency but also significantly reduces the potential for tracking ID switch. We evaluated STMOTR on the UA-DETRAC and our proposed tunnel multiple object tracking dataset (T-MOT), achieving state-of-the-art performance with 39.8% PR-MOTA on the UA-DETRAC and 79.6% MOTA on the T-MOT. The source code is also available at https://github.com/Jade-Ray/STMOTR.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Lightweight Dual-Branch Meta-Learner for Few-Shot HSI Classification With Cross-Domain Adaptation 基于跨域自适应的轻量级双分支元学习器
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2025-12-12 DOI: 10.1049/cvi2.70050
Junqi Yao, Yonghui Yang, Ou Yang, Qingtian Wu
{"title":"A Lightweight Dual-Branch Meta-Learner for Few-Shot HSI Classification With Cross-Domain Adaptation","authors":"Junqi Yao,&nbsp;Yonghui Yang,&nbsp;Ou Yang,&nbsp;Qingtian Wu","doi":"10.1049/cvi2.70050","DOIUrl":"10.1049/cvi2.70050","url":null,"abstract":"<p>Hyperspectral imaging (HSI) plays a crucial role in urban area analysis from satellite data and supports the continuous advancement of intelligent cities. However, its practical deployment is hindered by two major challenges: the scarcity of reliable training annotations and the high spectral similarity among different land-cover classes. To address these issues, this paper introduces a novel meta-learning framework that synergistically combines knowledge transfer across domains with a dual-adjustment mode (comprising intracorrection (IC) and interalignment (IA)), while ensuring end-to-end trainability. Our contributions are twofold. (1) We refine the 3D attention network TGAN into TGAN2 (3D ghost attention network v2) by replacing the original ghost blocks with ghost-V2 modules and enlarging the receptive field to capture global context. (2) We propose a dual-adjustment mode (comprising intracorrection (IC) and interalignment (IA)) to generate robust class prototypes and mitigate domain shift. These innovations are integrated into our overarching framework, DMCM2 (dual-adjustment cross-domain meta-learning framework v2), which is unified by its end-to-end trainability and efficiency. The code and models will be publicly available at: https://github.com/YAO-JQ/DMCM2.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145739919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SwapDiffusion: Flexible Swapping Disentangled Content-Style Embeddings in P + $mathcal{P}+$ Space for Diffusion Models SwapDiffusion:在P +$ mathcal{P}+$空间中灵活地交换离散的内容样式嵌入
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2025-12-11 DOI: 10.1049/cvi2.70048
Yongxing He, Zejian Li, Wei Li, Xinlong Zhang, Jia Wei, Yongchuan Tang
{"title":"SwapDiffusion: Flexible Swapping Disentangled Content-Style Embeddings in \u0000 \u0000 \u0000 P\u0000 +\u0000 \u0000 $mathcal{P}+$\u0000 Space for Diffusion Models","authors":"Yongxing He,&nbsp;Zejian Li,&nbsp;Wei Li,&nbsp;Xinlong Zhang,&nbsp;Jia Wei,&nbsp;Yongchuan Tang","doi":"10.1049/cvi2.70048","DOIUrl":"10.1049/cvi2.70048","url":null,"abstract":"<p>This paper introduces SwapDiffusion, a novel framework for content-style disentanglement in diffusion-based image generation. We advance the understanding of the extended textual conditioning (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>P</mi>\u0000 <mo>+</mo>\u0000 </mrow>\u0000 <annotation> $mathcal{P}+$</annotation>\u0000 </semantics></math>) space in SDXL by identifying the <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mn>4</mn>\u0000 <mtext>th</mtext>\u0000 </msup>\u0000 </mrow>\u0000 <annotation> ${4}^{text{th}}$</annotation>\u0000 </semantics></math> and <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mn>7</mn>\u0000 <mtext>th</mtext>\u0000 </msup>\u0000 </mrow>\u0000 <annotation> ${7}^{text{th}}$</annotation>\u0000 </semantics></math> transformer block layers as primarily responsible for content and style, respectively. Building on this insight, we introduce a novel q-transformer architecture. It features a block-diagonal matrix masked self-attention layer that effectively isolates content and style embeddings by reducing inter-query interference. This design not only enhances disentanglement but also improves training efficiency. Crucially, the learnt image embeddings align well with textual ones, enabling flexible content and style control via images, text or their combinations. SwapDiffusion supports diverse applications such as style transfer (image- or text-driven), image variation, stylised text-to-image generation and multimodal-prompted image synthesis. Experimental results demonstrate that by aligning learnt image embeddings with the U-Net's pre-identified functional layers for content and style, SwapDiffusion achieves superior content-style separation and image quality while offering greater adaptability than existing approaches. The implementation code and pre-trained models will be released at https://github.com/lioo717/SwapDiffusion.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145739422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Refining Vision-Based Video Captioning via Object Semantic Prior 通过对象语义先验改进基于视觉的视频字幕
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2025-12-07 DOI: 10.1049/cvi2.70049
Wei-Teng Xu, Hong-Bo Zhang, Qing Lei, Jing-Hua Liu, Ji-Xiang Du
{"title":"Refining Vision-Based Video Captioning via Object Semantic Prior","authors":"Wei-Teng Xu,&nbsp;Hong-Bo Zhang,&nbsp;Qing Lei,&nbsp;Jing-Hua Liu,&nbsp;Ji-Xiang Du","doi":"10.1049/cvi2.70049","DOIUrl":"10.1049/cvi2.70049","url":null,"abstract":"<p>This paper presents a novel video captioning method guided by object semantic priors, aimed at improving the performance of vision-based video captioning models. The proposed approach leverages an object detection model to extract semantic representations of objects within image sequences, which are used as prior information to enhance the visual features of the video. During the encoding stage, this prior information is integrated with the video content, enabling a more comprehensive understanding of the visual context. In the decoding stage, the prior information guides the generation of more accurate and contextually appropriate captions. Extensive experiments on the MSVD and MSR-VTT datasets show that the proposed method significantly outperforms existing vision-based video captioning approaches in terms of caption accuracy and relevance. The results validate the effectiveness of incorporating object semantic priors into vision-based models for generating high-quality video captions.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70049","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145739658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial Forgery Detection Based on Mask and Frequency Diffusion Reconstruction 基于掩模和频率扩散重建的人脸伪造检测
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2025-11-22 DOI: 10.1049/cvi2.70046
Yanhan Peng, Xin Liu, Fengbiao Zan, Jian Yu
{"title":"Facial Forgery Detection Based on Mask and Frequency Diffusion Reconstruction","authors":"Yanhan Peng,&nbsp;Xin Liu,&nbsp;Fengbiao Zan,&nbsp;Jian Yu","doi":"10.1049/cvi2.70046","DOIUrl":"10.1049/cvi2.70046","url":null,"abstract":"<p>The field of face forgery detection continues to encounter significant challenges in achieving generalisation, and the rapid advancement of generative models particularly diffusion models has further intensified this problem. To tackle these challenges, we propose a novel detection framework that integrates spatial central masking and frequency-enriched diffusion reconstruction (MFDR), thereby enhancing both local detail reconstruction accuracy and global structural recovery. Specifically, during data preprocessing, we apply central masking to reconstruct the original image. The detector learns pixel-level discrepancies between the reconstructed masked regions and the corresponding original regions, which improves sensitivity to reconstruction errors and guides the model to focus more effectively on localised artefact detection. At the frequency-domain level, our proposed frequency-enhanced diffusion module explicitly optimises residual reconstruction in both low- and high-frequency subbands, effectively improving global structural recovery and preserving high-frequency detail fidelity. This, in turn, strengthens the model's capacity to capture forgery traces. Furthermore, during training, we introduce a contrastive learning strategy in which real images processed through masked diffusion and frequency reconstruction are used as positive samples. This design enables the detector to jointly perceive spatial reconstruction errors and preserve frequency-domain texture fidelity, thereby significantly enhancing its ability to detect subtle forgery artefacts. Experimental results show that our method achieves superior performance in detecting face images generated by various diffusion models (e.g., DDPM, LDM) and surpasses the diffusion reconstruction contrastive training (DRCT) baseline.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145581364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust 2D/3D Alignment With Enhanced NeRF 3D Reconstruction and Causal Feature Fusion 鲁棒2D/3D对齐增强NeRF 3D重建和因果特征融合
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2025-11-19 DOI: 10.1049/cvi2.70045
Jie Lin, Yi Bai, Yupei Deng, Bing Hu, Lifan Zhang
{"title":"Robust 2D/3D Alignment With Enhanced NeRF 3D Reconstruction and Causal Feature Fusion","authors":"Jie Lin,&nbsp;Yi Bai,&nbsp;Yupei Deng,&nbsp;Bing Hu,&nbsp;Lifan Zhang","doi":"10.1049/cvi2.70045","DOIUrl":"10.1049/cvi2.70045","url":null,"abstract":"<p>This paper proposes a unified framework integrating enhanced neural radiance fields (NeRF) with causal feature fusion to tackle 3D reconstruction and 2D/3D alignment challenges in complex scenes. For example, in 3D reconstruction, explicit representations have low reconstruction quality, whereas implicit ones have slow reconstruction speed; 2D/3D matching lacks effective information fusion; furthermore, existing 3D reconstruction methods fail to provide complementary information for alignment and relying solely on 2D alignment is susceptible to background interference. In order to improve 2D/3D alignment accuracy, we propose a holistic alignment architecture, including a combined implicit and explicit 3D reconstruction method, capable of constructing higher-quality 3D scenes and, crucially, generating richer features, such as voxel density information and colour information, to provide better feature complements for background robustness. Meanwhile, we construct 2D causal features and utilise the features for fusion and achieve more robust alignment through multidimensional anti-interference feature computation. Extensive experiments validate our framework on both public benchmarks and specialised domains. In medical endoscopy, the system assists surgeons by providing real-time 3D contextual guidance, reducing procedural risks. Quantitative results show superior performance over state-of-the-art methods. The proposed technology demonstrates broad applicability in scenarios demanding robust scene understanding.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145580930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction CloudFort:通过空间划分和集合预测增强三维点云分类对后门攻击的鲁棒性
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2025-11-18 DOI: 10.1049/cvi2.70047
Wenhao Lan, Yijun Yang, Haihua Shen, Shan Li
{"title":"CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction","authors":"Wenhao Lan,&nbsp;Yijun Yang,&nbsp;Haihua Shen,&nbsp;Shan Li","doi":"10.1049/cvi2.70047","DOIUrl":"10.1049/cvi2.70047","url":null,"abstract":"<p>The increasing adoption of 3D point cloud data in various applications, such as autonomous vehicles, robotics and virtual reality, has brought about significant advancements in object recognition and scene understanding. However, this progress is accompanied by new security challenges, particularly in the form of backdoor attacks. These attacks involve inserting malicious information into the training data of machine learning models, potentially compromising the model's behaviour. In this paper, we propose CloudFort, a novel defence mechanism designed to enhance the robustness of 3D point cloud classifiers against backdoor attacks. CloudFort leverages spatial partitioning and ensemble prediction techniques to effectively mitigate the impact of backdoor triggers while preserving the model's performance on clean data. We evaluate the effectiveness of CloudFort through extensive experiments, demonstrating its strong resilience against the point cloud backdoor attack (PCBA). Our results show that CloudFort significantly enhances the security of 3D point cloud classification models without compromising their accuracy on benign samples. Furthermore, we explore the limitations of CloudFort and discuss potential avenues for future research in the field of 3D point cloud security. The proposed defence mechanism represents a significant step towards ensuring the trustworthiness and reliability of point-cloud-based systems in real-world applications.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书