Computer Vision and Image Understanding最新文献

筛选
英文 中文
Mandala simplification: Sacred symmetry meets minimalism 曼荼罗简化:神圣的对称遇上极简主义
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-03-01 DOI: 10.1016/j.cviu.2025.104319
Tusita Sarkar, Preetam Chayan Chatterjee, Partha Bhowmick
{"title":"Mandala simplification: Sacred symmetry meets minimalism","authors":"Tusita Sarkar,&nbsp;Preetam Chayan Chatterjee,&nbsp;Partha Bhowmick","doi":"10.1016/j.cviu.2025.104319","DOIUrl":"10.1016/j.cviu.2025.104319","url":null,"abstract":"<div><div>Mandalas, intricate artistic designs with radial symmetry, are imbued with a timeless allure that transcends cultural boundaries. Found in various cultures and spiritual traditions worldwide, mandalas hold profound significance as symbols of unity, wholeness, and spiritual transformation. At the heart of mandalas lies the concept of sacred symmetry, a timeless principle that resonates with the deepest realms of human consciousness. However, in handcrafted mandalas, symmetry often falls short of perfection, necessitating refinement to evoke harmony and balance. With this in mind, we introduce a computational approach aimed at capturing the all-round symmetry of mandalas through minimalist principles. By leveraging innovative geometric and graph-theoretic tools and an interactive twin atlas, this approach streamlines parameter domains to achieve the revered state of sacred symmetry, epitomizing harmonious balance. This is especially beneficial when dealing with handcrafted mandalas of subpar quality, necessitating concise representations for tasks like mandala editing, recreation, atlas building, and referencing. Experimental findings and related results demonstrate the effectiveness of the proposed methodology.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104319"},"PeriodicalIF":4.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Navigating social contexts: A transformer approach to relationship recognition 导航社会环境:关系识别的转换方法
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-03-01 DOI: 10.1016/j.cviu.2025.104327
Lorenzo Berlincioni, Luca Cultrera, Marco Bertini, Alberto Del Bimbo
{"title":"Navigating social contexts: A transformer approach to relationship recognition","authors":"Lorenzo Berlincioni,&nbsp;Luca Cultrera,&nbsp;Marco Bertini,&nbsp;Alberto Del Bimbo","doi":"10.1016/j.cviu.2025.104327","DOIUrl":"10.1016/j.cviu.2025.104327","url":null,"abstract":"<div><div>Recognizing interpersonal relationships is essential for enabling human–computer systems to understand and engage effectively with social contexts. Compared to other computer vision tasks, Interpersonal relation recognition requires an higher semantic understanding of the scene, ranging from large background context to finer clues. We propose a transformer based model that attends to each person pair relation in an image reaching state of the art performances on a classical benchmark dataset People in Social Context (PISC). Our solution differs from others as it makes no use of a separate GNN but relies instead on transformers alone. Additionally, we explore the impact of incorporating additional supervision from occupation labels on relationship recognition performance and we extensively ablate different architectural parameters and loss choices. Furthermore, we compare our model with a recent Large Multimodal Model (LMM) to precisely assess the zero-shot capabilities of such general models over highly specific tasks. Our study contributes to advancing the state of the art in social relationship recognition and highlights the potential of transformer-based models in capturing complex social dynamics from visual data.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104327"},"PeriodicalIF":4.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Brain tumor image segmentation based on shuffle transformer-dynamic convolution and inception dilated convolution 基于洗牌变换-动态卷积和初始扩张卷积的脑肿瘤图像分割
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-03-01 DOI: 10.1016/j.cviu.2025.104324
Lifang Zhou , Ya Wang
{"title":"Brain tumor image segmentation based on shuffle transformer-dynamic convolution and inception dilated convolution","authors":"Lifang Zhou ,&nbsp;Ya Wang","doi":"10.1016/j.cviu.2025.104324","DOIUrl":"10.1016/j.cviu.2025.104324","url":null,"abstract":"<div><div>Accurate segmentation of brain tumors is essential for accurate clinical diagnosis and effective treatment. Convolutional neural networks (CNNs) have improved brain tumor segmentation with their excellent performance in local feature modeling. However, they still face the challenge of unpredictable changes in tumor size and location, because it cannot be effectively matched by CNN-based methods with local and regular receptive fields. To overcome these obstacles, we propose brain tumor image segmentation based on shuffle transformer-dynamic convolution and inception dilated convolution that captures and adapts different features of tumors through multi-scale feature extraction. Our model combines Shuffle Transformer-Dynamic Convolution (STDC) to capture both fine-grained and contextual image details so that it helps improve localization accuracy. In addition, the Inception Dilated Convolution(IDConv) module solves the problem of significant changes in the size of brain tumors, and then captures the information of different size of object. The multi-scale feature aggregation(MSFA) module integrates features from different encoder levels, which contributes to enriching the scale diversity of input patches and enhancing the robustness of segmentation. The experimental results conducted on the BraTS 2019, BraTS 2020, BraTS 2021, and MSD BTS datasets indicate that our model outperforms other state-of-the-art methods in terms of accuracy.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104324"},"PeriodicalIF":4.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient feature selection for pre-trained vision transformers 预训练视觉变压器的高效特征选择
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-03-01 DOI: 10.1016/j.cviu.2025.104326
Lan Huang , Jia Zeng , Mengqiang Yu , Weiping Ding , Xingyu Bai , Kangping Wang
{"title":"Efficient feature selection for pre-trained vision transformers","authors":"Lan Huang ,&nbsp;Jia Zeng ,&nbsp;Mengqiang Yu ,&nbsp;Weiping Ding ,&nbsp;Xingyu Bai ,&nbsp;Kangping Wang","doi":"10.1016/j.cviu.2025.104326","DOIUrl":"10.1016/j.cviu.2025.104326","url":null,"abstract":"<div><div>Handcrafted layer-wise vision transformers have demonstrated remarkable performance in image classification. However, their high computational cost limits their practical applications. In this paper, we first identify and highlight the data-independent feature redundancy in pre-trained Vision Transformer (ViT) models. Based on this observation, we explore the feasibility of searching for the best substructure within the original pre-trained model. To this end, we propose EffiSelecViT, a novel pruning method aimed at reducing the computational cost of ViTs while preserving their accuracy. EffiSelecViT introduces importance scores for both self-attention heads and Multi-Layer Perceptron (MLP) neurons in pre-trained ViT models. L1 regularization is applied to constrain and learn these scores. In this simple way, components that are crucial for model performance are assigned higher scores, while those with lower scores are identified as less important and subsequently pruned. Experimental results demonstrate that EffiSelecViT can prune DeiT-B to retain only 64% of FLOPs while maintaining accuracy. This efficiency-accuracy trade-off is consistent across various ViT architectures. Furthermore, qualitative analysis reveals enhanced information expression in the pruned models, affirming the effectiveness and practicality of EffiSelecViT. The code is available at <span><span>https://github.com/ZJ6789/EffiSelecViT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104326"},"PeriodicalIF":4.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network 基于重放样本域-模态-混合重构和跨域认知网络的终身可见-红外人再识别
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-03-01 DOI: 10.1016/j.cviu.2025.104328
Xianyu Zhu , Guoqiang Xiao , Michael S. Lew , Song Wu
{"title":"Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network","authors":"Xianyu Zhu ,&nbsp;Guoqiang Xiao ,&nbsp;Michael S. Lew ,&nbsp;Song Wu","doi":"10.1016/j.cviu.2025.104328","DOIUrl":"10.1016/j.cviu.2025.104328","url":null,"abstract":"<div><div>Adapting statically-trained models to the incessant influx of data streams poses a pivotal research challenge. Concurrently, visible and infrared person re-identification (VI-ReID) offers an all-day surveillance mode to advance intelligent surveillance and elevate public safety precautions. Hence, we are pioneering a more fine-grained exploration of the lifelong VI-ReID task at the camera level, aiming to imbue the learned models with the capabilities of lifelong learning and memory within the continuous data streams. This task confronts dual challenges of cross-modality and cross-domain variations. Thus, in this paper, we proposed a Domain-Modality-Mix (DMM) based replay samples reconstruction strategy and Cross-domain Cognitive Network (CDCN) to address those challenges. Firstly, we establish an effective and expandable baseline model based on residual neural networks. Secondly, capitalizing on the unexploited potential knowledge of a memory bank that archives diverse replay samples, we enhance the anti-forgetting ability of our model by the Domain-Modality-Mix strategy, which devising a cross-domain, cross-modal image-level replay sample reconstruction, effectively alleviating catastrophic forgetting induced by modality and domain variations. Finally, guided by the Chunking Theory in cognitive psychology, we designed a Cross-domain Cognitive Network, which incorporates a camera-aware, expandable graph convolutional cognitive network to facilitate adaptive learning of intra-modal consistencies and cross-modal similarities within continuous cross-domain data streams. Extensive experiments demonstrate that our proposed method has remarkable adaptability and robust resistance to forgetting and outperforms multiple state-of-the-art methods in comparative assessments of the performance of LVI-ReID. The source code of our designed method is at <span><span>https://github.com/SWU-CS-MediaLab/DMM-CDCN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104328"},"PeriodicalIF":4.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143561781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial and temporal beliefs for mistake detection in assembly tasks 装配任务中错误检测的时空信念
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-03-01 DOI: 10.1016/j.cviu.2025.104338
Guodong Ding , Fadime Sener , Shugao Ma , Angela Yao
{"title":"Spatial and temporal beliefs for mistake detection in assembly tasks","authors":"Guodong Ding ,&nbsp;Fadime Sener ,&nbsp;Shugao Ma ,&nbsp;Angela Yao","doi":"10.1016/j.cviu.2025.104338","DOIUrl":"10.1016/j.cviu.2025.104338","url":null,"abstract":"<div><div>Assembly tasks, as an integral part of daily routines and activities, involve a series of sequential steps that are prone to error. This paper proposes a novel method for identifying ordering mistakes in assembly tasks based on knowledge-grounded beliefs. The beliefs comprise spatial and temporal aspects, each serving a unique role. Spatial beliefs capture the structural relationships among assembly components and indicate their topological feasibility. Temporal beliefs model the action preconditions and enforce sequencing constraints. Furthermore, we introduce a learning algorithm that dynamically updates and augments the belief sets online. To evaluate, we first test our approach in deducing predefined rules on synthetic data based on industry assembly. We also verify our approach on the real-world Assembly101 dataset, enhanced with annotations of component information. Our framework achieves superior performance in detecting ordering mistakes under both synthetic and real-world settings, highlighting the effectiveness of our approach.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104338"},"PeriodicalIF":4.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143579726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
View-to-label: Multi-view consistency for self-supervised monocular 3D object detection 视图到标签:多视图一致性的自监督单目3D物体检测
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-03-01 DOI: 10.1016/j.cviu.2025.104320
Issa Mouawad , Nikolas Brasch , Fabian Manhardt , Federico Tombari , Francesca Odone
{"title":"View-to-label: Multi-view consistency for self-supervised monocular 3D object detection","authors":"Issa Mouawad ,&nbsp;Nikolas Brasch ,&nbsp;Fabian Manhardt ,&nbsp;Federico Tombari ,&nbsp;Francesca Odone","doi":"10.1016/j.cviu.2025.104320","DOIUrl":"10.1016/j.cviu.2025.104320","url":null,"abstract":"<div><div>For autonomous vehicles, driving safely is highly dependent on the capability to correctly perceive the environment in the 3D space, hence the task of 3D object detection represents a fundamental aspect of perception. While 3D sensors deliver accurate metric perception, monocular approaches enjoy cost and availability advantages that are valuable in a wide range of applications. Unfortunately, training monocular methods requires a vast amount of annotated data. To compensate for this need, we propose a novel approach to self-supervise 3D object detection purely from RGB video sequences, leveraging geometric constraints and weak labels. Unlike other approaches that exploit additional sensors during training, <em>our method relies on the temporal continuity of video sequences.</em> A supervised pre-training on synthetic data produces initial plausible 3D boxes, then our geometric and photometrically grounded losses provide a strong self-supervision signal that allows the model to be fine-tuned on real data without labels.</div><div>Our experiments on Autonomous Driving benchmark datasets showcase the effectiveness and generality of our approach and the competitive performance compared to other self-supervised approaches.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104320"},"PeriodicalIF":4.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental few-shot instance segmentation without fine-tuning on novel classes 无需对新类进行微调的增量少量实例分割
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-03-01 DOI: 10.1016/j.cviu.2025.104323
Luofeng Zhang, Libo Weng, Yuanming Zhang, Fei Gao
{"title":"Incremental few-shot instance segmentation without fine-tuning on novel classes","authors":"Luofeng Zhang,&nbsp;Libo Weng,&nbsp;Yuanming Zhang,&nbsp;Fei Gao","doi":"10.1016/j.cviu.2025.104323","DOIUrl":"10.1016/j.cviu.2025.104323","url":null,"abstract":"<div><div>Many current incremental few-shot object detection and instance segmentation methods necessitate fine-tuning on novel classes, which presents difficulties when training newly emerged classes on devices with limited computational power. In this paper, a finetune-free incremental few-shot instance segmentation method is proposed. Firstly, a novel weight generator (NWG) is proposed to map the embeddings of novel classes to their respective true centers. Then, the limitations of cosine similarity on novel classes with few samples are analyzed, and a simple yet effective improvement called the piecewise function for similarity calculation (PFSC) is proposed. Lastly, a probability dependency method (PD) is designed to mitigate the impact on the performance of base classes after registering novel classes. The comparative experimental results show that the proposed model outperforms existing finetune-free methods much more on MS COCO and VOC datasets, and registration of novel classes has almost no negative impact on the base classes. Therefore, the model exhibits excellent performance and the proposed finetune-free idea enables it to learn novel classes directly through inference on devices with limited computational power.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104323"},"PeriodicalIF":4.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When super-resolution meets camouflaged object detection: A comparison study 超分辨率与伪装目标检测的比较研究
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-21 DOI: 10.1016/j.cviu.2025.104321
Juan Wen , Shupeng Cheng , Weiyan Hou , Luc Van Gool , Radu Timofte
{"title":"When super-resolution meets camouflaged object detection: A comparison study","authors":"Juan Wen ,&nbsp;Shupeng Cheng ,&nbsp;Weiyan Hou ,&nbsp;Luc Van Gool ,&nbsp;Radu Timofte","doi":"10.1016/j.cviu.2025.104321","DOIUrl":"10.1016/j.cviu.2025.104321","url":null,"abstract":"<div><div>Super-resolution (SR) and camouflage object detection (COD) are two prominent topics in the field of computer vision, with various joint applications. However, in previous work, these two areas were often studied in isolation. In this paper, we conduct a comprehensive comparative evaluation of both for the first time. Specifically, we benchmark different super-resolution methods on commonly used COD datasets while also evaluating the robustness of different COD models using COD data processed by SR methods. Experiments reveal challenges in preserving semantic information due to differences in targets and features between the two domains. COD relies on extracting semantic information from low-resolution images to identify camouflage targets. There is a risk of losing or distorting important semantic details during the application of SR techniques. Balancing the enhancement of spatial resolution with the preservation of semantic information is crucial for maintaining the accuracy of COD algorithms. Therefore, we propose a new SR model called Dilated Super-resolution (DISR) to enhance SR performance on COD, achieving state-of-the-art results on five commonly used SR datasets. The Urban100 x4 dataset task improved by 0.38 dB. Using low-resolution images processed by DISR for COD tasks can enhance target visibility and significantly improve the performance of COD tasks. Our goal is to leverage the synergies between these two domains, draw insights from the complementarity of techniques in both fields, and provide insights and inspiration for future research in both communities.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"253 ","pages":"Article 104321"},"PeriodicalIF":4.3,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MultiFire20K: A semi-supervised enhanced large-scale UAV-based benchmark for advancing multi-task learning in fire monitoring MultiFire20K:一个半监督的增强大型无人机基准,用于推进火灾监测中的多任务学习
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-19 DOI: 10.1016/j.cviu.2025.104318
Demetris Shianios, Panayiotis Kolios, Christos Kyrkou
{"title":"MultiFire20K: A semi-supervised enhanced large-scale UAV-based benchmark for advancing multi-task learning in fire monitoring","authors":"Demetris Shianios,&nbsp;Panayiotis Kolios,&nbsp;Christos Kyrkou","doi":"10.1016/j.cviu.2025.104318","DOIUrl":"10.1016/j.cviu.2025.104318","url":null,"abstract":"<div><div>Effective fire detection and response are crucial to minimizing the widespread damage and loss caused by fires in both urban and natural environments. While advancements in Computer Vision have enhanced fire detection and response, progress in UAV-based monitoring remains limited due to the lack of comprehensive datasets. This study introduces the <em>MultiFire20K</em> dataset, comprising 20,500 diverse aerial fire images with annotations for fire classification, environment classification, and separate segmentation masks for both fire and smoke, specifically designed to support multi-task learning. Due to limited labeled data in remote sensing, a semi-supervised approach for generating pseudo-labels for fire and smoke masks is explored which takes into consideration the environment of the event. We experimented with various segmentation architectures backbone models to generate reliable pseudo-label masks. Benchmarks were established by evaluating models on fire classification, environment classification, and the segmentation of both fire and smoke, and comparing these results to those obtained from multi-task models. Our study highlights the substantial advantages of a multi-task approach in fire monitoring, particularly in improving fire and smoke segmentation through shared knowledge during training. This enhanced efficiency, combined with the conservation of memory and computational resources, makes the multi-task framework superior for real-time applications, especially when compared to using separate models for each individual task. We anticipate that our dataset and benchmark results will encourage further research in fire surveillance, advancing fire detection and prevention methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104318"},"PeriodicalIF":4.3,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信