Tianliang Liu, Jinkai Wang, Xu Zhou, Jun Wan, Xiaogang Cheng, Xiubin Dai
{"title":"Predicting Fire Heat Release Rate Using Deep Perceptual and Detail-Aware Hybrid Feature Fusion From Early Smoke Signals","authors":"Tianliang Liu, Jinkai Wang, Xu Zhou, Jun Wan, Xiaogang Cheng, Xiubin Dai","doi":"10.1049/cvi2.70054","DOIUrl":"https://doi.org/10.1049/cvi2.70054","url":null,"abstract":"<p>With urbanisation accelerating, predicting the heat release rate (HRR) of building fires using visual data has emerged as a pivotal research focus in the field of fire rescue. However, existing approaches face challenges, such as limited training data and complex models, which lead to suboptimal performance and slow inference speeds. To address these issues and adapt to the rapid morphological changes of smoke in dynamic fire environments, we propose a lightweight neural network prediction model based on adaptive pooling with channel information interaction (APCI). This model can achieve high precision while maintaining faster inference speed. Our approach employs simplified dense connections to propagate shallow smoke features, thereby effectively capturing the relationship between smoke textures and multiscale features to accommodate the variations of smoke morphologies. To mitigate the loss of smoke features caused by spatial misalignment and ventilation disturbances during downsampling, we introduce an adaptive weighted pooling mechanism that fully leverages the detailed information contained in the invoked smoke. Additionally, an enhanced channel shuffle operation in channel information interaction ensures effective cross-level transfer to detail-aware information exchange during sudden escalations in fire intensity in the hybrid feature fusion framework. Experiments on the smoke-heat release rate dataset we created demonstrate that the proposed method can achieve a coefficient of determination <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mfenced>\u0000 <msup>\u0000 <mi>R</mi>\u0000 <mn>2</mn>\u0000 </msup>\u0000 </mfenced>\u0000 </mrow>\u0000 <annotation> $left({R}^{2}right)$</annotation>\u0000 </semantics></math> of 0.937, a root mean square error (RMSE) of 23.0 kW, a mean absolute error (MAE) of 17.4 kW and with an inference time of 4.13 ms per image.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2026-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145904635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TaiChi-AQA: A Dataset and Framework for Action Quality Assessment and Visual Analysis","authors":"Dejin Wang, Fengyan Lin, Kexin Zhu, Zhide Chen","doi":"10.1049/cvi2.70053","DOIUrl":"https://doi.org/10.1049/cvi2.70053","url":null,"abstract":"<p>Action Quality Assessment (AQA) has become an advanced technology applied in various domains. However, most existing datasets focus on sports events, such as the Olympics, whereas datasets tailored for daily exercise activities remain scarce. Additionally, many of these datasets are unsuitable for direct application in AQA tasks. To address these limitations, we constructed a new AQA dataset, TaiChi-AQA, which includes detailed scoring annotations. Our dataset comprises 1313 Tai Chi action videos and features a comprehensive set of fine-grained labels, including action labels, action descriptions and frame-level perspective information. To validate the effectiveness of TaiChi-AQA, we systematically evaluated it using a variety of popular AQA methods. We also propose a straightforward yet effective module that integrates a multi-head attention mechanism with a gated multilayer perceptron (gMLP). This module is combined with the distributed autoencoder (DAE) framework. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the TaiChi-AQA dataset. The dataset are publicly available at https://github.com/mlxger/TaiChi-AQA.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145887814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prior Matters: Contribution- and Semantics-Aware Prior Estimation for Few-Shot Learning","authors":"Yanling Tian, Jiaying Wu, Jinglu Hu","doi":"10.1049/cvi2.70051","DOIUrl":"10.1049/cvi2.70051","url":null,"abstract":"<p>Few-shot learning (FSL) aims to classify novel categories using only a few labelled examples, which poses significant challenges for generalisation. Among existing approaches, distribution-based methods have shown promise by constructing class distributions for novel categories using statistical priors transferred from base classes. However, these methods often rely on nearest-neighbour visual similarity and assume equal contributions from selected base classes, which can lead to inaccurate priors. In this paper, we propose CAPE (contribution-aware prior estimation), a method that addresses this issue from two complementary perspectives. On the one hand, CAPE assigns adaptive weights to base class prototypes based on their relevance to the novel support set, mitigating the limitations of equal-contribution assumptions. On the other hand, to compensate for the ambiguity of visual features, especially in the 1-shot scenario, we incorporate semantic information from category labels to enhance prior selection. By jointly leveraging visual and semantic information, CAPE constructs more accurate and robust priors for the feature distributions of novel classes. Extensive experiments on four widely used FSL benchmarks, including <i>mini</i> ImageNet, tieredImageNet, CIFAR-FS and CUB datasets, demonstrate that our method consistently outperforms existing approaches, highlighting the effectiveness of contribution- and semantics-aware prior estimation.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"End-To-End Multiple Object Detection and Tracking With Spatio-Temporal Transformers","authors":"Qi Lei, Xiangyu Song, Shijie Sun, Huansheng Song, Lichen Liu, Zhaoyang Zhang","doi":"10.1049/cvi2.70052","DOIUrl":"10.1049/cvi2.70052","url":null,"abstract":"<p>Optimising both trajectory position information and identity information is a key challenge in multiple object tracking. Mainstream approaches ensure ID consistency by combining detection data with various additional information. However, many methods overlook the inherent spatio-temporal correlation of trajectory position information. We argue that additional modules are redundant, and that forecasting trajectories directly without the need for interframe association by utilising motion constraints is adequate. In this study, we introduce a novel end-to-end network called the spatio-temporal multiple object tracking with transformer (STMOTR), which employs motion constraints to establish binary matching within the reconstructed deformable-DETR network, heuristically learning object trajectories from the Video Swin backbone. This subtly constrained matching rule not only keeps the detection ID consistency but also significantly reduces the potential for tracking ID switch. We evaluated STMOTR on the UA-DETRAC and our proposed tunnel multiple object tracking dataset (T-MOT), achieving state-of-the-art performance with 39.8% PR-MOTA on the UA-DETRAC and 79.6% MOTA on the T-MOT. The source code is also available at https://github.com/Jade-Ray/STMOTR.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Lightweight Dual-Branch Meta-Learner for Few-Shot HSI Classification With Cross-Domain Adaptation","authors":"Junqi Yao, Yonghui Yang, Ou Yang, Qingtian Wu","doi":"10.1049/cvi2.70050","DOIUrl":"10.1049/cvi2.70050","url":null,"abstract":"<p>Hyperspectral imaging (HSI) plays a crucial role in urban area analysis from satellite data and supports the continuous advancement of intelligent cities. However, its practical deployment is hindered by two major challenges: the scarcity of reliable training annotations and the high spectral similarity among different land-cover classes. To address these issues, this paper introduces a novel meta-learning framework that synergistically combines knowledge transfer across domains with a dual-adjustment mode (comprising intracorrection (IC) and interalignment (IA)), while ensuring end-to-end trainability. Our contributions are twofold. (1) We refine the 3D attention network TGAN into TGAN2 (3D ghost attention network v2) by replacing the original ghost blocks with ghost-V2 modules and enlarging the receptive field to capture global context. (2) We propose a dual-adjustment mode (comprising intracorrection (IC) and interalignment (IA)) to generate robust class prototypes and mitigate domain shift. These innovations are integrated into our overarching framework, DMCM2 (dual-adjustment cross-domain meta-learning framework v2), which is unified by its end-to-end trainability and efficiency. The code and models will be publicly available at: https://github.com/YAO-JQ/DMCM2.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145739919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SwapDiffusion: Flexible Swapping Disentangled Content-Style Embeddings in \u0000 \u0000 \u0000 P\u0000 +\u0000 \u0000 $mathcal{P}+$\u0000 Space for Diffusion Models","authors":"Yongxing He, Zejian Li, Wei Li, Xinlong Zhang, Jia Wei, Yongchuan Tang","doi":"10.1049/cvi2.70048","DOIUrl":"10.1049/cvi2.70048","url":null,"abstract":"<p>This paper introduces SwapDiffusion, a novel framework for content-style disentanglement in diffusion-based image generation. We advance the understanding of the extended textual conditioning (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>P</mi>\u0000 <mo>+</mo>\u0000 </mrow>\u0000 <annotation> $mathcal{P}+$</annotation>\u0000 </semantics></math>) space in SDXL by identifying the <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mn>4</mn>\u0000 <mtext>th</mtext>\u0000 </msup>\u0000 </mrow>\u0000 <annotation> ${4}^{text{th}}$</annotation>\u0000 </semantics></math> and <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mn>7</mn>\u0000 <mtext>th</mtext>\u0000 </msup>\u0000 </mrow>\u0000 <annotation> ${7}^{text{th}}$</annotation>\u0000 </semantics></math> transformer block layers as primarily responsible for content and style, respectively. Building on this insight, we introduce a novel q-transformer architecture. It features a block-diagonal matrix masked self-attention layer that effectively isolates content and style embeddings by reducing inter-query interference. This design not only enhances disentanglement but also improves training efficiency. Crucially, the learnt image embeddings align well with textual ones, enabling flexible content and style control via images, text or their combinations. SwapDiffusion supports diverse applications such as style transfer (image- or text-driven), image variation, stylised text-to-image generation and multimodal-prompted image synthesis. Experimental results demonstrate that by aligning learnt image embeddings with the U-Net's pre-identified functional layers for content and style, SwapDiffusion achieves superior content-style separation and image quality while offering greater adaptability than existing approaches. The implementation code and pre-trained models will be released at https://github.com/lioo717/SwapDiffusion.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145739422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}