Image and Vision Computing最新文献

筛选
英文 中文
GDM-depth: Leveraging global dependency modelling for self-supervised indoor depth estimation GDM-depth:利用全局依赖性建模进行自监督室内深度估算
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-06 DOI: 10.1016/j.imavis.2024.105160
Chen Lv , Chenggong Han , Jochen Lang , He Jiang , Deqiang Cheng , Jiansheng Qian
{"title":"GDM-depth: Leveraging global dependency modelling for self-supervised indoor depth estimation","authors":"Chen Lv ,&nbsp;Chenggong Han ,&nbsp;Jochen Lang ,&nbsp;He Jiang ,&nbsp;Deqiang Cheng ,&nbsp;Jiansheng Qian","doi":"10.1016/j.imavis.2024.105160","DOIUrl":"https://doi.org/10.1016/j.imavis.2024.105160","url":null,"abstract":"<div><p>Self-supervised depth estimation algorithms eschew depth ground truth and employ the convolutional U-Net with a fixed receptive field which confines its focus primarily to nearby spatial distances. These factors obscure adequate supervision during image reconstruction, consequently hindering accurate depth estimation, particularly in complex indoor scenes. The pure transformer framework can perform global modelling to provide more semantic information. However, the cost is significant. To tackle these challenges, we introduce GDM-Depth, which utilizes global dependency modelling to offer more precise depth guidance from the network itself. Initially, we propose integrating learnable tree filters with unary terms, leveraging the structural properties of spanning trees to facilitate efficient long-range interactions. Subsequently, instead of replacing the convolutional framework entirely, we employ the transformer to design a scale-aware global feature extractor, establishing global relationships among local features at various scales, achieving both efficiency and cost-effectiveness. Furthermore, inter-class disparities between depth global and local features are observed. To address this issue, we introduce the global feature injector to further enhance the representation. GDM-Depth's effectiveness is demonstrated on the NYUv2, ScanNet, and InteriorNet depth datasets, achieving impressive test set performances of 87.2%, 83.1%, and 76.1% in key indicators <span><math><mi>δ</mi><mo>&lt;</mo><mn>0.125</mn></math></span>, respectively.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid attention transformer with re-parameterized large kernel convolution for image super-resolution 混合注意力变换器与重参数化大核卷积用于图像超分辨率
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-05 DOI: 10.1016/j.imavis.2024.105162
Zhicheng Ma , Zhaoxiang Liu , Kai Wang , Shiguo Lian
{"title":"Hybrid attention transformer with re-parameterized large kernel convolution for image super-resolution","authors":"Zhicheng Ma ,&nbsp;Zhaoxiang Liu ,&nbsp;Kai Wang ,&nbsp;Shiguo Lian","doi":"10.1016/j.imavis.2024.105162","DOIUrl":"https://doi.org/10.1016/j.imavis.2024.105162","url":null,"abstract":"<div><p>Single image super-resolution is a well-established low-level vision task that aims to reconstruct high-resolution images from low-resolution images. Methods based on Transformer have shown remarkable success and achieved outstanding performance in SISR tasks. While Transformer effectively models global information, it is less effective at capturing high frequencies such as stripes that primarily provide local information. Additionally, it has the potential to further enhance the capture of global information. To tackle this, we propose a novel Large Kernel Hybrid Attention Transformer using re-parameterization. It combines different kernel sizes and different steps re-parameterized convolution layers with Transformer to effectively capture global and local information to learn comprehensive features with low-frequency and high-frequency information. Moreover, in order to solve the problem of using batch normalization layer to introduce artifacts in SISR, we propose a new training strategy which is fusing convolution layer and batch normalization layer after certain training epochs. This strategy can enjoy the acceleration convergence effect of batch normalization layer in training and effectively eliminate the problem of artifacts in the inference stage. For re-parameterization of multiple parallel branch convolution layers, adopting this strategy can further reduce the amount of calculation of training. By coupling these core improvements, our LKHAT achieves state-of-the-art performance for single image super-resolution task.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-powered trustable and explainable fall detection system using transfer learning 使用迁移学习的人工智能驱动的可信任、可解释的跌倒检测系统
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-04 DOI: 10.1016/j.imavis.2024.105164
Aryan Nikul Patel , Ramalingam Murugan , Praveen Kumar Reddy Maddikunta , Gokul Yenduri , Rutvij H. Jhaveri , Yaodong Zhu , Thippa Reddy Gadekallu
{"title":"AI-powered trustable and explainable fall detection system using transfer learning","authors":"Aryan Nikul Patel ,&nbsp;Ramalingam Murugan ,&nbsp;Praveen Kumar Reddy Maddikunta ,&nbsp;Gokul Yenduri ,&nbsp;Rutvij H. Jhaveri ,&nbsp;Yaodong Zhu ,&nbsp;Thippa Reddy Gadekallu","doi":"10.1016/j.imavis.2024.105164","DOIUrl":"https://doi.org/10.1016/j.imavis.2024.105164","url":null,"abstract":"<div><p>Accidental falls pose a significant public health challenge, especially among vulnerable populations. To address this issue, comprehensive research on fall detection and rescue systems is essential. Vision-based technologies, with their promising potential, offer an effective means to detect falls. This research paper presents a cutting-edge fall detection methodology aimed at enhancing individual safety and well-being. The proposed methodology utilizes deep neural networks, leveraging their capabilities to drive advancements in fall detection. To overcome data limitations and computational efficiency concerns, this study employ transfer learning by fine-tuning pre-trained models on large-scale image datasets for fall detection. This approach significantly enhances model performance, enabling better generalization and accuracy, especially in real-time applications with constrained resources. Notably, the methodology achieved an impressive test accuracy of 98.15%. Additionally, the incorporation of Explainable Artificial Intelligence (XAI) techniques is used to ensure transparent and trustworthy decision-making in fall detection using deep learning models, especially in critical healthcare contexts for vulnerable individuals. XAI provides valuable insights into complex model architectures and parameters, enabling a deeper understanding of fall identification patterns. To evaluate the effectiveness of this approach, a rigorous experimentation was conducted using a diverse dataset containing real-world fall and non-fall scenarios. The results demonstrate substantial improvements in both accuracy and interpretability, confirming the superiority of this method over conventional fall detection approaches.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PMANet: Progressive multi-stage attention networks for skin disease classification PMANet:用于皮肤病分类的渐进式多级注意网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-04 DOI: 10.1016/j.imavis.2024.105166
Guangzhe Zhao, Chen Zhang, Xueping Wang, Benwang Lin, Feihu Yan
{"title":"PMANet: Progressive multi-stage attention networks for skin disease classification","authors":"Guangzhe Zhao,&nbsp;Chen Zhang,&nbsp;Xueping Wang,&nbsp;Benwang Lin,&nbsp;Feihu Yan","doi":"10.1016/j.imavis.2024.105166","DOIUrl":"https://doi.org/10.1016/j.imavis.2024.105166","url":null,"abstract":"<div><p>Automated skin disease classification is crucial for the timely diagnosis of skin lesions. However, accurate skin disease classification presents a challenge, given the significant intra-class variation and inter-class similarity among different kinds of skin diseases. Previous studies have attempted to address this issue by identifying the most discriminative part of a lesion, but they tend to overlook the interactions between multi-scale features. In this paper, we propose a Progressive Multi-stage Attention Network (PMANet) to enhance the learning of multi-scale discriminative features, so that the model can gradually localize from stable fine-grained to coarse-grained regions in order to improve the accuracy of disease classification. Specifically, we utilize a progressive multi-stage network to supervise feature and classification, thereby fostering multi-scale information and improving the model's ability to learn intra-class consistent information. Additionally, we propose an enhanced region proposal block that highlights key discriminative features and suppresses background noise of lesions, reinforcing the learning of inter-class discriminative features. Furthermore, we propose a multi-branch feature fusion block that effectively fuses multi-scale lesion features from different stages. Comprehensive experiments conducted on two datasets substantiate the effectiveness and superiority of the proposed method in accurately classifying skin disease.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141595170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A semi-parallel CNN-transformer fusion network for semantic change detection 用于语义变化检测的半并行 CNN 变换器融合网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-04 DOI: 10.1016/j.imavis.2024.105157
Changzhong Zou, Ziyuan Wang
{"title":"A semi-parallel CNN-transformer fusion network for semantic change detection","authors":"Changzhong Zou,&nbsp;Ziyuan Wang","doi":"10.1016/j.imavis.2024.105157","DOIUrl":"https://doi.org/10.1016/j.imavis.2024.105157","url":null,"abstract":"<div><p>Semantic change detection (SCD) can recognize the region and the type of changes in remote sensing images. Existing methods are either based on transformer or convolutional neural network (CNN), but due to the size of various ground objects is different, it is necessary to have global modeling ability and local information extraction ability at the same time. Therefore, in this paper we propose a fusion semantic change detection network (FSCD) with both global modeling ability and local information extraction ability by fusing transformer and CNN. A semi-parallel fusion block has also been proposed to construct FSCD. It can not only have global and local features in parallel, but also fuse them as deeply as serial. To better adaptively decide which mechanism is applied to which pixel, we design a self-attention and convolution selection module (ACSM). ACSM is a self-attention mechanism used to selectively combine transformer and CNN. Specifically, the importance of each mechanism is automatically obtained by learning. According to the importance, the mechanism suitable for a pixel is selected, which is better than using either mechanism alone. We evaluate the proposed FSCD on two datasets, and the proposed network has a significant improvement compared with the state-of-the-art network.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FTAN: Frame-to-frame temporal alignment network with contrastive learning for few-shot action recognition FTAN: 采用对比学习的帧到帧时序对齐网络,用于少镜头动作识别
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-04 DOI: 10.1016/j.imavis.2024.105159
Bin Yu , Yonghong Hou , Zihui Guo , Zhiyi Gao , Yueyang Li
{"title":"FTAN: Frame-to-frame temporal alignment network with contrastive learning for few-shot action recognition","authors":"Bin Yu ,&nbsp;Yonghong Hou ,&nbsp;Zihui Guo ,&nbsp;Zhiyi Gao ,&nbsp;Yueyang Li","doi":"10.1016/j.imavis.2024.105159","DOIUrl":"https://doi.org/10.1016/j.imavis.2024.105159","url":null,"abstract":"<div><p>Most current few-shot action recognition approaches follow the metric learning paradigm, measuring the distance of any sub-sequences (frames, any frame combinations or clips) between different actions for classification. However, this disordered distance metric between action sub-sequences ignores the long-term temporal relations of actions, which may result in significant metric deviations. What's more, the distance metric suffers from the distinctive temporal distribution of different actions, including intra-class temporal offsets and inter-class local similarity. In this paper, a novel few-shot action recognition framework, Frame-to-frame Temporal Alignment Network (<strong>FTAN</strong>), is proposed to address the above challenges. Specifically, an attention-based temporal alignment (<strong>ATA</strong>) module is devised to calculate the distance between corresponding frames of different actions along the temporal dimension to achieve frame-to-frame temporal alignment. Meanwhile, the Temporal Context module (<strong>TCM</strong>) is proposed to increase inter-class diversity by enriching the frame-level feature representation, and the Frames Cyclic Shift Module (<strong>FCSM</strong>) performs frame-level temporal cyclic shift to reduce intra-class inconsistency. In addition, we present temporal and global contrastive objectives to assist in learning discriminative and class-agnostic visual features. Experimental results show that the proposed architecture achieves state-of-the-art on HMDB51, UCF101, Something-Something V2 and Kinetics-100 datasets.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-dimensional hybrid incremental learning (2DHIL) framework for semantic segmentation of skin tissues 用于皮肤组织语义分割的二维混合增量学习(2DHIL)框架
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-03 DOI: 10.1016/j.imavis.2024.105147
{"title":"Two-dimensional hybrid incremental learning (2DHIL) framework for semantic segmentation of skin tissues","authors":"","doi":"10.1016/j.imavis.2024.105147","DOIUrl":"10.1016/j.imavis.2024.105147","url":null,"abstract":"<div><p>This study aims to enhance the robustness and generalization capability of a deep learning transformer model used for segmenting skin carcinomas and tissues through the introduction of incremental learning. Deep learning AI models demonstrate their claimed performance only for tasks and data types for which they are specifically trained. Their performance is severely challenged for the test cases which are not similar to training data thus questioning their robustness and ability to generalize. Moreover, these models require an enormous amount of annotated data for training to achieve desired performance. The availability of large annotated data, particularly for medical applications, is itself a challenge. Despite efforts to alleviate this limitation through techniques like data augmentation, transfer learning, and few-shot training, the challenge persists. To address this, we propose refining the models incrementally as new classes are discovered and more data becomes available, emulating the human learning process. However, deep learning models face the challenge of catastrophic forgetting during incremental training. Therefore, we introduce a two-dimensional hybrid incremental learning framework for segmenting non-melanoma skin cancers and tissues from histopathology images. Our approach involves progressively adding new classes and introducing data of varying specifications to introduce adaptability in the models. We also employ a combination of loss functions to facilitate new learning and mitigate catastrophic forgetting. Our extended experiments demonstrate significant improvements, with an F1 score reaching 91.78, mIoU of 93.00, and an average accuracy of 95%. These findings highlight the effectiveness of our incremental learning strategy in enhancing the robustness and generalization of deep learning segmentation models while mitigating catastrophic forgetting.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0262885624002518/pdfft?md5=d44cd642beec8e071716f174c3ad2a5f&pid=1-s2.0-S0262885624002518-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141623389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel infrared and visible image fusion algorithm based on global information-enhanced attention network 基于全局信息增强注意力网络的新型红外和可见光图像融合算法
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-03 DOI: 10.1016/j.imavis.2024.105161
Jia Tian, Dong Sun, Qingwei Gao, Yixiang Lu, Muxi Bao, De Zhu, Dawei Zhao
{"title":"A novel infrared and visible image fusion algorithm based on global information-enhanced attention network","authors":"Jia Tian,&nbsp;Dong Sun,&nbsp;Qingwei Gao,&nbsp;Yixiang Lu,&nbsp;Muxi Bao,&nbsp;De Zhu,&nbsp;Dawei Zhao","doi":"10.1016/j.imavis.2024.105161","DOIUrl":"https://doi.org/10.1016/j.imavis.2024.105161","url":null,"abstract":"<div><p>The fusion of infrared and visible images aims to extract and fuse thermal target information and texture details to the fullest extent possible, enhancing the visual understanding capabilities of images for both humans and computers in complex scenes. However, existing methods have difficulties in preserving the comprehensiveness of source image feature information and enhancing the saliency of image texture information. Therefore, we put forward a novel infrared and visible image fusion algorithm based on global information-enhanced attention network (GIEA). Specifically, we develop an attention-guided Transformer module (AGTM) to make sure the fused images have enough global information. This module combines the convolutional neural network and Transformer to perform adequate feature extraction from shallow to deep layers, and utilize the attention network for multi-level feature-guided learning. Then, we build the contrast enhancement module (CENM), which enhances the feature representation and contrast of the image so that the fused image contains significant texture information. Furthermore, our network is driven to fully preserve the texture and structure details of the source images with a loss function that consists of content loss and total variance loss. Numerous experiments demonstrate that our fusion approach outperforms other fusion approaches in both subjective and objective assessments.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial immune systems for data augmentation 用于数据增强的人工免疫系统
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-03 DOI: 10.1016/j.imavis.2024.105163
{"title":"Artificial immune systems for data augmentation","authors":"","doi":"10.1016/j.imavis.2024.105163","DOIUrl":"10.1016/j.imavis.2024.105163","url":null,"abstract":"<div><p>We study object detection models and observe that their respective architectures are vulnerable to image distortions such as noise, compression, blur, or snow. We propose alleviating this problem by training the models with antibodies generated using Artificial Immune Systems (AIS) from original training samples (antigens). These antibodies are AIS-distorted antigens at the pixel level through cycles of “select, clone, mutate, select” until an affinity to the antigen is achieved. We then add the antibodies to the antigens, train the models, validate and test them under 15 distortions, and show that our data augmentation approach (AISbod) significantly improved their accuracy without altering their architecture or inference speed. For example, the DINO object detector under the COCO dataset improves by 4% under clean samples, by 6.50% on average over all 15 distortions, by 2.15% under snow, and by 27.60% under impulse noise. Our simulations show that our method performs better under distortions and clean samples than related defense methods and is more consistent across datasets and object detection models. For instance, our method is, on average, 70% better than the closest related method across 15 distortions for the evaluated models under COCO. Moreover, we show that our approach to image classification and object tracking models significantly improves accuracy under distortions. We provide the code of our method and the DINO model trained using our method at <span><span><span>https://github.com/moforio/AISbod</span></span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141699521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video object segmentation based on dynamic perception update and feature fusion 基于动态感知更新和特征融合的视频对象分割
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-07-03 DOI: 10.1016/j.imavis.2024.105156
{"title":"Video object segmentation based on dynamic perception update and feature fusion","authors":"","doi":"10.1016/j.imavis.2024.105156","DOIUrl":"10.1016/j.imavis.2024.105156","url":null,"abstract":"<div><p>The current popular video object segmentation algorithms based on memory network indiscriminately update the frame information to the memory pool, fails to make reasonable use of the historical frame information, causing frame information redundancy in the memory pool, resulting in the increase of the computation amount. At the same time, the mask refinement method is relatively rough, resulting in blurred edges of the generated mask. To solve these problems, This paper proposes a video object segmentation algorithm based on dynamic perception update and feature fusion. In order to reasonably utilize the historical frame information, a dynamic perception update module is proposed to selectively update the segmentation frame mask. Meanwhile, a mask refinement module is established to enhance the detail information of the shallow features of the backbone network. This module uses a double kernels fusion block to fuse the different scale information of the features, and finally uses the Laplacian operator to sharpen the edges of the mask. The experimental results show that on the public datasets DAVIS2016, DAVIS2017 and YouTube-VOS<sub>18</sub>, the comprehensive performance of the algorithm in this paper reaches 86.9%, 79.3% and 71.6%, respectively, and the segmentation speed reaches 15FPS on the DAVIS2016 dataset. Compared with many mainstream algorithms in recent years, it has obvious advantages in performance.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141715214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信