Hao Zhai, Yun Chen, Yao Wang, Yuncan Ouyang, Zhi Zeng
{"title":"W-shaped network combined with dual transformers and edge protection for multi-focus image fusion","authors":"Hao Zhai, Yun Chen, Yao Wang, Yuncan Ouyang, Zhi Zeng","doi":"10.1016/j.imavis.2024.105210","DOIUrl":"10.1016/j.imavis.2024.105210","url":null,"abstract":"<div><p>In this paper, a W-shaped network combined with dual transformers and edge protection is proposed for multi-focus image fusion. Different from the traditional Convolutional Neural Network (CNN) fusion method, a heterogeneous encoder network framework is designed for feature extraction, and a decoder is used for feature reconstruction. The purpose of this design is to preserve the local details and edge information of the source image to the maximum extent possible. Specifically, the first encoder uses adaptive average pooling to downsample the source image and extract important features from it. The source image pair for edge detection using the Gaussian Modified Laplace Operator (GMLO) is used as input for the second encoder, and adaptive maximum pooling is employed for downsampling. In addition, the encoder part of the network combines CNN and Transformer to extract both local and global features. By reconstructing the extracted feature information, the final fusion image is obtained. To evaluate the performance of this method, we compared 16 recent multi-focus image fusion methods and conducted qualitative and quantitative analyses. Experimental results on public datasets such as Lytro, MFFW, MFI-WHU, and the real scene dataset HBU-CVMDSP demonstrate that our method can accurately identify the focused and defocused regions of source images. It also preserves the edge details of the source images while extracting the focused regions.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105210"},"PeriodicalIF":4.2,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OCUCFormer: An Over-Complete Under-Complete Transformer Network for accelerated MRI reconstruction","authors":"Mohammad Al Fahim , Sriprabha Ramanarayanan , G.S. Rahul , Matcha Naga Gayathri , Arunima Sarkar , Keerthi Ram , Mohanasankar Sivaprakasam","doi":"10.1016/j.imavis.2024.105228","DOIUrl":"10.1016/j.imavis.2024.105228","url":null,"abstract":"<div><p>Many deep learning-based architectures have been proposed for accelerated Magnetic Resonance Imaging (MRI) reconstruction. However, existing encoder-decoder-based popular networks have a few shortcomings: (1) They focus on the anatomy structure at the expense of fine details, hindering their performance in generating faithful reconstructions; (2) Lack of long-range dependencies yields sub-optimal recovery of fine structural details. In this work, we propose an Over-Complete Under-Complete Transformer network (OCUCFormer) which focuses on better capturing fine edges and details in the image and can extract the long-range relations between these features for improved single-coil (SC) and multi-coil (MC) MRI reconstruction. Our model computes long-range relations in the highest resolutions using Restormer modules for improved acquisition and restoration of fine anatomical details. Towards learning in the absence of fully sampled ground truth for supervision, we show that our model trained with under-sampled data in a self-supervised fashion shows a superior recovery of fine structures compared to other works. We have extensively evaluated our network for SC and MC MRI reconstruction on brain, cardiac, and knee anatomies for <span><math><mn>4</mn><mo>×</mo></math></span> and <span><math><mn>5</mn><mo>×</mo></math></span> acceleration factors. We report significant improvements over popular deep learning-based methods when trained in supervised and self-supervised modes. We have also performed experiments demonstrating the strengths of extracting fine details and the anatomical structure and computing long-range relations within over-complete representations. Code for our proposed method is available at: <span><span><span>https://github.com/alfahimmohammad/OCUCFormer-main</span></span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105228"},"PeriodicalIF":4.2,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141997841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiehua Zhang , Zhiwen Yang , Chuqiao Chen , Hongkui Wang , Tingyu Wang , Chenggang Yan , Yihong Gong
{"title":"Coplane-constrained sparse depth sampling and local depth propagation for depth estimation","authors":"Jiehua Zhang , Zhiwen Yang , Chuqiao Chen , Hongkui Wang , Tingyu Wang , Chenggang Yan , Yihong Gong","doi":"10.1016/j.imavis.2024.105227","DOIUrl":"10.1016/j.imavis.2024.105227","url":null,"abstract":"<div><p>Depth estimation with sparse reference has emerged recently, and predicts depth map from a monocular image and a set of depth reference samples. Previous works randomly select reference samples by sensors, leading to severe depth bias as this sampling is independent to image semantic and neglects the unbalance of depth distribution in regions. This paper proposes a Coplane-Constrained sparse Depth (CCD) sampling to explore representative reference samples, and design a Local Depth Propagation (LDP) network for complete the sparse point cloud map. This can capture diverse depth information and diffuse the valid points to neighbors with geometry prior. Specifically, we first construct the surface normal map and detect coplane pixels by superpixel segmenting for sampling references, whose depth can be represented by that of superpixel centroid. Then, we introduce local depth propagation to obtain coarse-level depth map with geometric information, which dynamically diffuses the depth from the reference to neighbors based on local planar assumption. Further, we generate the fine-level depth map by devising a pixel-wise focal loss, which imposes the semantic and geometry calibration on pixels with low confidence in coarse-level prediction. Extensive experiments on public datasets demonstrate that our model outperforms SOTA depth estimation and completion methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105227"},"PeriodicalIF":4.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAM-RSP: A new few-shot segmentation method based on segment anything model and rough segmentation prompts","authors":"Jiaguang Li, Ying Wei, Wei Zhang, Zhenrui Shi","doi":"10.1016/j.imavis.2024.105214","DOIUrl":"10.1016/j.imavis.2024.105214","url":null,"abstract":"<div><p>Few-shot segmentation (FSS) aims to segment novel classes with a few labeled images. The backbones used in existing methods are pre-trained through classification tasks on the ImageNet dataset. Although these backbones can effectively perceive the semantic categories of images, they cannot accurately perceive the regional boundaries within one image, which limits the model performance. Recently, Segment Anything Model (SAM) has achieved precise image segmentation based on point or box prompts, thanks to its excellent perception of region boundaries within one image. However, it cannot effectively provide semantic information of images. This paper proposes a new few-shot segmentation method that can effectively perceive both semantic categories and regional boundaries. This method first utilizes the SAM encoder to perceive regions and obtain the query embedding. Then the support and query images are input into a backbone pre-trained on ImageNet to perceive semantics and generate a rough segmentation prompt (RSP). This query embedding is combined with the prompt to generate a pixel-level query prototype, which can better match the query embedding. Finally, the query embedding, prompt, and prototype are combined and input into the designed multi-layer prompt transformer decoder, which is more efficient and lightweight, and can provide a more accurate segmentation result. In addition, other methods can be easily combined with our framework to improve their performance. Plenty of experiments on PASCAL-5<sup><em>i</em></sup> and COCO-20<sup><em>i</em></sup> under 1-shot and 5-shot settings prove the effectiveness of our method. Our method also achieves new state-of-the-art. Codes are available at <span><span>https://github.com/Jiaguang-NEU/SAM-RSP</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105214"},"PeriodicalIF":4.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synthetic lidar point cloud generation using deep generative models for improved driving scene object recognition","authors":"Zhengkang Xiang, Zexian Huang, Kourosh Khoshelham","doi":"10.1016/j.imavis.2024.105207","DOIUrl":"10.1016/j.imavis.2024.105207","url":null,"abstract":"<div><p>The imbalanced distribution of different object categories poses a challenge for training accurate object recognition models in driving scenes. Supervised machine learning models trained on imbalanced data are biased and easily overfit the majority classes, such as vehicles and pedestrians, which appear more frequently in driving scenes. We propose a novel data augmentation approach for object recognition in lidar point cloud of driving scenes, which leverages probabilistic generative models to produce synthetic point clouds for the minority classes and complement the original imbalanced dataset. We evaluate five generative models based on different statistical principles, including Gaussian mixture model, variational autoencoder, generative adversarial network, adversarial autoencoder and the diffusion model. Experiments with a real-world autonomous driving dataset show that the synthetic point clouds generated for the minority classes by the Latent Generative Adversarial Network result in significant improvement of object recognition performance for both minority and majority classes. The codes are available at <span><span>https://github.com/AAAALEX-XIANG/Synthetic-Lidar-Generation</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105207"},"PeriodicalIF":4.2,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0262885624003123/pdfft?md5=f149d58b78f107538ca14bc730d87d86&pid=1-s2.0-S0262885624003123-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-branch teacher-student with noise-tolerant learning for domain adaptive nighttime segmentation","authors":"Ruiying Chen , Yunan Liu , Yuming Bo , Mingyu Lu","doi":"10.1016/j.imavis.2024.105211","DOIUrl":"10.1016/j.imavis.2024.105211","url":null,"abstract":"<div><p>While significant progress has been achieved in the field of image semantic segmentation, the majority of research has been primarily concentrated on daytime scenes. Semantic segmentation of nighttime images is equally critical for autonomous driving; however, this task presents greater challenges due to inadequate lighting and difficulties associated with obtaining accurate manual annotations. In this paper, we introduce a novel method called the Dual-Branch Teacher-Student (DBTS) framework for unsupervised nighttime semantic segmentation. Our approach combines domain alignment and knowledge distillation in a mutually reinforcing manner. Firstly, we employ a photometric alignment module to dynamically generate target-like latent images, bridging the appearance gap between the source domain (daytime) and the target domain (nighttime). Secondly, we establish a dual-branch framework, where each branch enhances collaboration between the teacher and student networks. The student network utilizes adversarial learning to align the target domain with another domain (i.e., source or latent domain), while the teacher network generates reliable pseudo-labels by distilling knowledge from the latent domain. Furthermore, recognizing the potential noise present in pseudo-labels, we propose a noise-tolerant learning method to mitigate the risks associated with overreliance on pseudo-labels during domain adaptation. When evaluated on benchmark datasets, the proposed DBTS achieves state-of-the-art performance. Specifically, DBTS, using different backbones, outperforms established baseline models by approximately 25% in mIoU on the Zurich dataset and by over 26% in mIoU on the ACDC dataset, demonstrating the effectiveness of our method in addressing the challenges of domain-adaptive nighttime segmentation.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105211"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved attentive residue multi-dilated network for thermal noise removal in magnetic resonance images","authors":"Bowen Jiang, Tao Yue, Xuemei Hu","doi":"10.1016/j.imavis.2024.105213","DOIUrl":"10.1016/j.imavis.2024.105213","url":null,"abstract":"<div><p>Magnetic resonance imaging (MRI) technology is crucial in the medical field, but the thermal noise in the reconstructed MR images may interfere with the clinical diagnosis. Removing the thermal noise in MR images mainly contains two challenges. First, thermal noise in an MR image obeys Rician distribution, where the statistical features are not consistent in different regions of the image. In this case, conventional denoising methods like spatial convolutional filtering will not be appropriate to deal with it. Second, details and edge information in the image may get damaged while smoothing the noise. This paper proposes a novel deep-learning model to denoise MR images. First, the model learns a binary mask to separate the background and signal regions of the noised image, making the noise left in the signal region obey a unified statistical distribution. Second, the model is designed as an attentive residual multi-dilated network (ARM-Net), composed of a multi-branch structure, and supplemented with a frequency-domain-optimizable discrete cosine transform module. In this way, the deep-learning model will be more effective in removing the noise while maintaining the details of the original image. Furthermore, we have also made improvements on the original ARM-Net baseline to establish a new model called ARM-Net v2, which is more efficient and effective. Experimental results illustrate that over the BraTS 2018 dataset, our method achieves the PSNR of 39.7087 and 32.6005 at noise levels of 5% and 20%, which realizes the state-of-the-art performance among existing MR image denoising methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105213"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-scale feature correspondence and pseudo label retraining strategy for weakly supervised semantic segmentation","authors":"Weizheng Wang, Lei Zhou, Haonan Wang","doi":"10.1016/j.imavis.2024.105215","DOIUrl":"10.1016/j.imavis.2024.105215","url":null,"abstract":"<div><p>Recently, the performance of semantic segmentation using weakly supervised learning has significantly improved. Weakly supervised semantic segmentation (WSSS) that uses only image-level labels has received widespread attention, it employs Class Activation Maps (CAM) to generate pseudo labels. Compared to traditional use of pixel-level labels, this technique greatly reduces annotation costs by utilizing simpler and more readily available image-level annotations. Besides, due to the local perceptual ability of Convolutional Neural Networks (CNN), the generated CAM cannot activate the entire object area. Researchers have found that this CNN limitation can be compensated for by using Vision Transformer (ViT). However, ViT also introduces an over-smoothing problem. Recent research has made good progress in solving this issue, but when discussing CAM and its related segmentation predictions, it is easy to overlook their intrinsic information and the interrelationships between them. In this paper, we propose a Multi-Scale Feature Correspondence (MSFC) method. Our MSFC can obtain the feature correspondence of CAM and segmentation predictions at different scales, re-extract useful semantic information from them, enhancing the network's learning of feature information and improving the quality of CAM. Moreover, to further improve the segmentation precision, we design a Pseudo Label Retraining Strategy (PLRS). This strategy refines the accuracy in local regions, elevates the quality of pseudo labels, and aims to enhance segmentation precision. Experimental results on the PASCAL VOC 2012 and MS COCO 2014 datasets show that our method achieves impressive performance among end-to-end WSSS methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105215"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142041055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianru Li , Xu Zhu , Yuchao Zheng , Huimin Lu , Yujie Li
{"title":"Underwater image restoration based on light attenuation prior and color-contrast adaptive correction","authors":"Jianru Li , Xu Zhu , Yuchao Zheng , Huimin Lu , Yujie Li","doi":"10.1016/j.imavis.2024.105217","DOIUrl":"10.1016/j.imavis.2024.105217","url":null,"abstract":"<div><p>Underwater imaging is uniquely beset by issues such as color distortion and diminished contrast due to the intricate behavior of light as it traverses water, being attenuated by processes of absorption and scattering. Distinct from traditional underwater image restoration techniques, our methodology uniquely accommodates attenuation coefficients pertinent to diverse water conditions. We endeavor to recover the pristine image by approximating decay rates, focusing particularly on the blue-red and blue-green color channels. Recognizing the inherent ambiguities surrounding water type classifications, we meticulously assess attenuation coefficient ratios for an array of predefined aquatic categories. Each classification results in a uniquely restored image, and an automated selection algorithm is employed to determine the most optimal output, rooted in its color distribution. In tandem, we've innovated a color-contrast adaptive correction technique, purposefully crafted to remedy color anomalies in underwater images while simultaneously amplifying contrast and detail fidelity. Extensive trials on benchmark datasets unambiguously highlight our method's preeminence over six other renowned strategies. Impressively, our methodology exhibits exceptional resilience and adaptability, particularly in scenarios dominated by green background imagery.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105217"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DynaSeg: A deep dynamic fusion method for unsupervised image segmentation incorporating feature similarity and spatial continuity","authors":"Boujemaa Guermazi , Riadh Ksantini , Naimul Khan","doi":"10.1016/j.imavis.2024.105206","DOIUrl":"10.1016/j.imavis.2024.105206","url":null,"abstract":"<div><p>Our work tackles the fundamental challenge of image segmentation in computer vision, which is crucial for diverse applications. While supervised methods demonstrate proficiency, their reliance on extensive pixel-level annotations limits scalability. We introduce DynaSeg, an innovative unsupervised image segmentation approach that overcomes the challenge of balancing feature similarity and spatial continuity without relying on extensive hyperparameter tuning. Unlike traditional methods, DynaSeg employs a dynamic weighting scheme that automates parameter tuning, adapts flexibly to image characteristics, and facilitates easy integration with other segmentation networks. By incorporating a Silhouette Score Phase, DynaSeg prevents undersegmentation failures where the number of predicted clusters might converge to one. DynaSeg uses CNN-based and pre-trained ResNet feature extraction, making it computationally efficient and more straightforward than other complex models. Experimental results showcase state-of-the-art performance, achieving a 12.2% and 14.12% mIOU improvement over current unsupervised segmentation approaches on COCO-All and COCO-Stuff datasets, respectively. We provide qualitative and quantitative results on five benchmark datasets, demonstrating the efficacy of the proposed approach. Code available at url{<span><span>https://github.com/RyersonMultimediaLab/DynaSeg</span><svg><path></path></svg></span>}</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105206"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0262885624003111/pdfft?md5=da5c387758372711e4b28912d6fd15cc&pid=1-s2.0-S0262885624003111-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}