Image and Vision Computing最新文献

筛选
英文 中文
W-shaped network combined with dual transformers and edge protection for multi-focus image fusion W 型网络与双变压器和边缘保护装置相结合,可实现多焦点图像融合
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-13 DOI: 10.1016/j.imavis.2024.105210
Hao Zhai, Yun Chen, Yao Wang, Yuncan Ouyang, Zhi Zeng
{"title":"W-shaped network combined with dual transformers and edge protection for multi-focus image fusion","authors":"Hao Zhai,&nbsp;Yun Chen,&nbsp;Yao Wang,&nbsp;Yuncan Ouyang,&nbsp;Zhi Zeng","doi":"10.1016/j.imavis.2024.105210","DOIUrl":"10.1016/j.imavis.2024.105210","url":null,"abstract":"<div><p>In this paper, a W-shaped network combined with dual transformers and edge protection is proposed for multi-focus image fusion. Different from the traditional Convolutional Neural Network (CNN) fusion method, a heterogeneous encoder network framework is designed for feature extraction, and a decoder is used for feature reconstruction. The purpose of this design is to preserve the local details and edge information of the source image to the maximum extent possible. Specifically, the first encoder uses adaptive average pooling to downsample the source image and extract important features from it. The source image pair for edge detection using the Gaussian Modified Laplace Operator (GMLO) is used as input for the second encoder, and adaptive maximum pooling is employed for downsampling. In addition, the encoder part of the network combines CNN and Transformer to extract both local and global features. By reconstructing the extracted feature information, the final fusion image is obtained. To evaluate the performance of this method, we compared 16 recent multi-focus image fusion methods and conducted qualitative and quantitative analyses. Experimental results on public datasets such as Lytro, MFFW, MFI-WHU, and the real scene dataset HBU-CVMDSP demonstrate that our method can accurately identify the focused and defocused regions of source images. It also preserves the edge details of the source images while extracting the focused regions.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105210"},"PeriodicalIF":4.2,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OCUCFormer: An Over-Complete Under-Complete Transformer Network for accelerated MRI reconstruction OCUCFormer:用于加速核磁共振成像重建的过完整欠完整变压器网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-13 DOI: 10.1016/j.imavis.2024.105228
Mohammad Al Fahim , Sriprabha Ramanarayanan , G.S. Rahul , Matcha Naga Gayathri , Arunima Sarkar , Keerthi Ram , Mohanasankar Sivaprakasam
{"title":"OCUCFormer: An Over-Complete Under-Complete Transformer Network for accelerated MRI reconstruction","authors":"Mohammad Al Fahim ,&nbsp;Sriprabha Ramanarayanan ,&nbsp;G.S. Rahul ,&nbsp;Matcha Naga Gayathri ,&nbsp;Arunima Sarkar ,&nbsp;Keerthi Ram ,&nbsp;Mohanasankar Sivaprakasam","doi":"10.1016/j.imavis.2024.105228","DOIUrl":"10.1016/j.imavis.2024.105228","url":null,"abstract":"<div><p>Many deep learning-based architectures have been proposed for accelerated Magnetic Resonance Imaging (MRI) reconstruction. However, existing encoder-decoder-based popular networks have a few shortcomings: (1) They focus on the anatomy structure at the expense of fine details, hindering their performance in generating faithful reconstructions; (2) Lack of long-range dependencies yields sub-optimal recovery of fine structural details. In this work, we propose an Over-Complete Under-Complete Transformer network (OCUCFormer) which focuses on better capturing fine edges and details in the image and can extract the long-range relations between these features for improved single-coil (SC) and multi-coil (MC) MRI reconstruction. Our model computes long-range relations in the highest resolutions using Restormer modules for improved acquisition and restoration of fine anatomical details. Towards learning in the absence of fully sampled ground truth for supervision, we show that our model trained with under-sampled data in a self-supervised fashion shows a superior recovery of fine structures compared to other works. We have extensively evaluated our network for SC and MC MRI reconstruction on brain, cardiac, and knee anatomies for <span><math><mn>4</mn><mo>×</mo></math></span> and <span><math><mn>5</mn><mo>×</mo></math></span> acceleration factors. We report significant improvements over popular deep learning-based methods when trained in supervised and self-supervised modes. We have also performed experiments demonstrating the strengths of extracting fine details and the anatomical structure and computing long-range relations within over-complete representations. Code for our proposed method is available at: <span><span><span>https://github.com/alfahimmohammad/OCUCFormer-main</span></span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105228"},"PeriodicalIF":4.2,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141997841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coplane-constrained sparse depth sampling and local depth propagation for depth estimation 用于深度估计的共面约束稀疏深度采样和局部深度传播
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-12 DOI: 10.1016/j.imavis.2024.105227
Jiehua Zhang , Zhiwen Yang , Chuqiao Chen , Hongkui Wang , Tingyu Wang , Chenggang Yan , Yihong Gong
{"title":"Coplane-constrained sparse depth sampling and local depth propagation for depth estimation","authors":"Jiehua Zhang ,&nbsp;Zhiwen Yang ,&nbsp;Chuqiao Chen ,&nbsp;Hongkui Wang ,&nbsp;Tingyu Wang ,&nbsp;Chenggang Yan ,&nbsp;Yihong Gong","doi":"10.1016/j.imavis.2024.105227","DOIUrl":"10.1016/j.imavis.2024.105227","url":null,"abstract":"<div><p>Depth estimation with sparse reference has emerged recently, and predicts depth map from a monocular image and a set of depth reference samples. Previous works randomly select reference samples by sensors, leading to severe depth bias as this sampling is independent to image semantic and neglects the unbalance of depth distribution in regions. This paper proposes a Coplane-Constrained sparse Depth (CCD) sampling to explore representative reference samples, and design a Local Depth Propagation (LDP) network for complete the sparse point cloud map. This can capture diverse depth information and diffuse the valid points to neighbors with geometry prior. Specifically, we first construct the surface normal map and detect coplane pixels by superpixel segmenting for sampling references, whose depth can be represented by that of superpixel centroid. Then, we introduce local depth propagation to obtain coarse-level depth map with geometric information, which dynamically diffuses the depth from the reference to neighbors based on local planar assumption. Further, we generate the fine-level depth map by devising a pixel-wise focal loss, which imposes the semantic and geometry calibration on pixels with low confidence in coarse-level prediction. Extensive experiments on public datasets demonstrate that our model outperforms SOTA depth estimation and completion methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105227"},"PeriodicalIF":4.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAM-RSP: A new few-shot segmentation method based on segment anything model and rough segmentation prompts SAM-RSP:基于分段任何事物模型和粗略分段提示的新型少镜头分段方法
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-12 DOI: 10.1016/j.imavis.2024.105214
Jiaguang Li, Ying Wei, Wei Zhang, Zhenrui Shi
{"title":"SAM-RSP: A new few-shot segmentation method based on segment anything model and rough segmentation prompts","authors":"Jiaguang Li,&nbsp;Ying Wei,&nbsp;Wei Zhang,&nbsp;Zhenrui Shi","doi":"10.1016/j.imavis.2024.105214","DOIUrl":"10.1016/j.imavis.2024.105214","url":null,"abstract":"<div><p>Few-shot segmentation (FSS) aims to segment novel classes with a few labeled images. The backbones used in existing methods are pre-trained through classification tasks on the ImageNet dataset. Although these backbones can effectively perceive the semantic categories of images, they cannot accurately perceive the regional boundaries within one image, which limits the model performance. Recently, Segment Anything Model (SAM) has achieved precise image segmentation based on point or box prompts, thanks to its excellent perception of region boundaries within one image. However, it cannot effectively provide semantic information of images. This paper proposes a new few-shot segmentation method that can effectively perceive both semantic categories and regional boundaries. This method first utilizes the SAM encoder to perceive regions and obtain the query embedding. Then the support and query images are input into a backbone pre-trained on ImageNet to perceive semantics and generate a rough segmentation prompt (RSP). This query embedding is combined with the prompt to generate a pixel-level query prototype, which can better match the query embedding. Finally, the query embedding, prompt, and prototype are combined and input into the designed multi-layer prompt transformer decoder, which is more efficient and lightweight, and can provide a more accurate segmentation result. In addition, other methods can be easily combined with our framework to improve their performance. Plenty of experiments on PASCAL-5<sup><em>i</em></sup> and COCO-20<sup><em>i</em></sup> under 1-shot and 5-shot settings prove the effectiveness of our method. Our method also achieves new state-of-the-art. Codes are available at <span><span>https://github.com/Jiaguang-NEU/SAM-RSP</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105214"},"PeriodicalIF":4.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthetic lidar point cloud generation using deep generative models for improved driving scene object recognition 利用深度生成模型生成合成激光雷达点云,提高驾驶场景物体识别能力
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-11 DOI: 10.1016/j.imavis.2024.105207
Zhengkang Xiang, Zexian Huang, Kourosh Khoshelham
{"title":"Synthetic lidar point cloud generation using deep generative models for improved driving scene object recognition","authors":"Zhengkang Xiang,&nbsp;Zexian Huang,&nbsp;Kourosh Khoshelham","doi":"10.1016/j.imavis.2024.105207","DOIUrl":"10.1016/j.imavis.2024.105207","url":null,"abstract":"<div><p>The imbalanced distribution of different object categories poses a challenge for training accurate object recognition models in driving scenes. Supervised machine learning models trained on imbalanced data are biased and easily overfit the majority classes, such as vehicles and pedestrians, which appear more frequently in driving scenes. We propose a novel data augmentation approach for object recognition in lidar point cloud of driving scenes, which leverages probabilistic generative models to produce synthetic point clouds for the minority classes and complement the original imbalanced dataset. We evaluate five generative models based on different statistical principles, including Gaussian mixture model, variational autoencoder, generative adversarial network, adversarial autoencoder and the diffusion model. Experiments with a real-world autonomous driving dataset show that the synthetic point clouds generated for the minority classes by the Latent Generative Adversarial Network result in significant improvement of object recognition performance for both minority and majority classes. The codes are available at <span><span>https://github.com/AAAALEX-XIANG/Synthetic-Lidar-Generation</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105207"},"PeriodicalIF":4.2,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0262885624003123/pdfft?md5=f149d58b78f107538ca14bc730d87d86&pid=1-s2.0-S0262885624003123-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-branch teacher-student with noise-tolerant learning for domain adaptive nighttime segmentation 用于域自适应夜间分段的具有容噪学习功能的双分支师生系统
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105211
Ruiying Chen , Yunan Liu , Yuming Bo , Mingyu Lu
{"title":"Dual-branch teacher-student with noise-tolerant learning for domain adaptive nighttime segmentation","authors":"Ruiying Chen ,&nbsp;Yunan Liu ,&nbsp;Yuming Bo ,&nbsp;Mingyu Lu","doi":"10.1016/j.imavis.2024.105211","DOIUrl":"10.1016/j.imavis.2024.105211","url":null,"abstract":"<div><p>While significant progress has been achieved in the field of image semantic segmentation, the majority of research has been primarily concentrated on daytime scenes. Semantic segmentation of nighttime images is equally critical for autonomous driving; however, this task presents greater challenges due to inadequate lighting and difficulties associated with obtaining accurate manual annotations. In this paper, we introduce a novel method called the Dual-Branch Teacher-Student (DBTS) framework for unsupervised nighttime semantic segmentation. Our approach combines domain alignment and knowledge distillation in a mutually reinforcing manner. Firstly, we employ a photometric alignment module to dynamically generate target-like latent images, bridging the appearance gap between the source domain (daytime) and the target domain (nighttime). Secondly, we establish a dual-branch framework, where each branch enhances collaboration between the teacher and student networks. The student network utilizes adversarial learning to align the target domain with another domain (i.e., source or latent domain), while the teacher network generates reliable pseudo-labels by distilling knowledge from the latent domain. Furthermore, recognizing the potential noise present in pseudo-labels, we propose a noise-tolerant learning method to mitigate the risks associated with overreliance on pseudo-labels during domain adaptation. When evaluated on benchmark datasets, the proposed DBTS achieves state-of-the-art performance. Specifically, DBTS, using different backbones, outperforms established baseline models by approximately 25% in mIoU on the Zurich dataset and by over 26% in mIoU on the ACDC dataset, demonstrating the effectiveness of our method in addressing the challenges of domain-adaptive nighttime segmentation.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105211"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved attentive residue multi-dilated network for thermal noise removal in magnetic resonance images 用于消除磁共振图像热噪声的改进型贴心残留多扩张网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105213
Bowen Jiang, Tao Yue, Xuemei Hu
{"title":"An improved attentive residue multi-dilated network for thermal noise removal in magnetic resonance images","authors":"Bowen Jiang,&nbsp;Tao Yue,&nbsp;Xuemei Hu","doi":"10.1016/j.imavis.2024.105213","DOIUrl":"10.1016/j.imavis.2024.105213","url":null,"abstract":"<div><p>Magnetic resonance imaging (MRI) technology is crucial in the medical field, but the thermal noise in the reconstructed MR images may interfere with the clinical diagnosis. Removing the thermal noise in MR images mainly contains two challenges. First, thermal noise in an MR image obeys Rician distribution, where the statistical features are not consistent in different regions of the image. In this case, conventional denoising methods like spatial convolutional filtering will not be appropriate to deal with it. Second, details and edge information in the image may get damaged while smoothing the noise. This paper proposes a novel deep-learning model to denoise MR images. First, the model learns a binary mask to separate the background and signal regions of the noised image, making the noise left in the signal region obey a unified statistical distribution. Second, the model is designed as an attentive residual multi-dilated network (ARM-Net), composed of a multi-branch structure, and supplemented with a frequency-domain-optimizable discrete cosine transform module. In this way, the deep-learning model will be more effective in removing the noise while maintaining the details of the original image. Furthermore, we have also made improvements on the original ARM-Net baseline to establish a new model called ARM-Net v2, which is more efficient and effective. Experimental results illustrate that over the BraTS 2018 dataset, our method achieves the PSNR of 39.7087 and 32.6005 at noise levels of 5% and 20%, which realizes the state-of-the-art performance among existing MR image denoising methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105213"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale feature correspondence and pseudo label retraining strategy for weakly supervised semantic segmentation 弱监督语义分割的多尺度特征对应和伪标签再训练策略
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105215
Weizheng Wang, Lei Zhou, Haonan Wang
{"title":"Multi-scale feature correspondence and pseudo label retraining strategy for weakly supervised semantic segmentation","authors":"Weizheng Wang,&nbsp;Lei Zhou,&nbsp;Haonan Wang","doi":"10.1016/j.imavis.2024.105215","DOIUrl":"10.1016/j.imavis.2024.105215","url":null,"abstract":"<div><p>Recently, the performance of semantic segmentation using weakly supervised learning has significantly improved. Weakly supervised semantic segmentation (WSSS) that uses only image-level labels has received widespread attention, it employs Class Activation Maps (CAM) to generate pseudo labels. Compared to traditional use of pixel-level labels, this technique greatly reduces annotation costs by utilizing simpler and more readily available image-level annotations. Besides, due to the local perceptual ability of Convolutional Neural Networks (CNN), the generated CAM cannot activate the entire object area. Researchers have found that this CNN limitation can be compensated for by using Vision Transformer (ViT). However, ViT also introduces an over-smoothing problem. Recent research has made good progress in solving this issue, but when discussing CAM and its related segmentation predictions, it is easy to overlook their intrinsic information and the interrelationships between them. In this paper, we propose a Multi-Scale Feature Correspondence (MSFC) method. Our MSFC can obtain the feature correspondence of CAM and segmentation predictions at different scales, re-extract useful semantic information from them, enhancing the network's learning of feature information and improving the quality of CAM. Moreover, to further improve the segmentation precision, we design a Pseudo Label Retraining Strategy (PLRS). This strategy refines the accuracy in local regions, elevates the quality of pseudo labels, and aims to enhance segmentation precision. Experimental results on the PASCAL VOC 2012 and MS COCO 2014 datasets show that our method achieves impressive performance among end-to-end WSSS methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105215"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142041055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image restoration based on light attenuation prior and color-contrast adaptive correction 基于光衰减先验和色彩对比度自适应校正的水下图像复原
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105217
Jianru Li , Xu Zhu , Yuchao Zheng , Huimin Lu , Yujie Li
{"title":"Underwater image restoration based on light attenuation prior and color-contrast adaptive correction","authors":"Jianru Li ,&nbsp;Xu Zhu ,&nbsp;Yuchao Zheng ,&nbsp;Huimin Lu ,&nbsp;Yujie Li","doi":"10.1016/j.imavis.2024.105217","DOIUrl":"10.1016/j.imavis.2024.105217","url":null,"abstract":"<div><p>Underwater imaging is uniquely beset by issues such as color distortion and diminished contrast due to the intricate behavior of light as it traverses water, being attenuated by processes of absorption and scattering. Distinct from traditional underwater image restoration techniques, our methodology uniquely accommodates attenuation coefficients pertinent to diverse water conditions. We endeavor to recover the pristine image by approximating decay rates, focusing particularly on the blue-red and blue-green color channels. Recognizing the inherent ambiguities surrounding water type classifications, we meticulously assess attenuation coefficient ratios for an array of predefined aquatic categories. Each classification results in a uniquely restored image, and an automated selection algorithm is employed to determine the most optimal output, rooted in its color distribution. In tandem, we've innovated a color-contrast adaptive correction technique, purposefully crafted to remedy color anomalies in underwater images while simultaneously amplifying contrast and detail fidelity. Extensive trials on benchmark datasets unambiguously highlight our method's preeminence over six other renowned strategies. Impressively, our methodology exhibits exceptional resilience and adaptability, particularly in scenarios dominated by green background imagery.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105217"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DynaSeg: A deep dynamic fusion method for unsupervised image segmentation incorporating feature similarity and spatial continuity DynaSeg:结合特征相似性和空间连续性的无监督图像分割深度动态融合方法
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105206
Boujemaa Guermazi , Riadh Ksantini , Naimul Khan
{"title":"DynaSeg: A deep dynamic fusion method for unsupervised image segmentation incorporating feature similarity and spatial continuity","authors":"Boujemaa Guermazi ,&nbsp;Riadh Ksantini ,&nbsp;Naimul Khan","doi":"10.1016/j.imavis.2024.105206","DOIUrl":"10.1016/j.imavis.2024.105206","url":null,"abstract":"<div><p>Our work tackles the fundamental challenge of image segmentation in computer vision, which is crucial for diverse applications. While supervised methods demonstrate proficiency, their reliance on extensive pixel-level annotations limits scalability. We introduce DynaSeg, an innovative unsupervised image segmentation approach that overcomes the challenge of balancing feature similarity and spatial continuity without relying on extensive hyperparameter tuning. Unlike traditional methods, DynaSeg employs a dynamic weighting scheme that automates parameter tuning, adapts flexibly to image characteristics, and facilitates easy integration with other segmentation networks. By incorporating a Silhouette Score Phase, DynaSeg prevents undersegmentation failures where the number of predicted clusters might converge to one. DynaSeg uses CNN-based and pre-trained ResNet feature extraction, making it computationally efficient and more straightforward than other complex models. Experimental results showcase state-of-the-art performance, achieving a 12.2% and 14.12% mIOU improvement over current unsupervised segmentation approaches on COCO-All and COCO-Stuff datasets, respectively. We provide qualitative and quantitative results on five benchmark datasets, demonstrating the efficacy of the proposed approach. Code available at url{<span><span>https://github.com/RyersonMultimediaLab/DynaSeg</span><svg><path></path></svg></span>}</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105206"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0262885624003111/pdfft?md5=da5c387758372711e4b28912d6fd15cc&pid=1-s2.0-S0262885624003111-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信