Sidi He , Chengfang Zhang , Haoyue Li , Ziliang Feng
{"title":"Improved multi-focus image fusion using online convolutional sparse coding based on sample-dependent dictionary","authors":"Sidi He , Chengfang Zhang , Haoyue Li , Ziliang Feng","doi":"10.1016/j.image.2024.117213","DOIUrl":"10.1016/j.image.2024.117213","url":null,"abstract":"<div><div>Multi-focus image fusion merges multiple images captured from different focused regions of a scene to create a fully-focused image. Convolutional sparse coding (CSC) methods are commonly employed for accurate extraction of focused regions, but they often disregard computational costs. To overcome this, an online convolutional sparse coding (OCSC) technique was introduced, but its performance is still limited by the number of filters used, affecting overall performance negatively. To address these limitations, a novel approach called Sample-Dependent Dictionary-based Online Convolutional Sparse Coding (SCSC) was proposed. SCSC enables the utilization of additional filters while maintaining low time and space complexity for processing high-dimensional or large data. Leveraging the computational efficiency and effective global feature extraction of SCSC, we propose a novel method for multi-focus image fusion. Our method involves a two-layer decomposition of each source image, yielding a base layer capturing the predominant features and a detail layer containing finer details. The amalgamation of the fused base and detail layers culminates in the reconstruction of the final image. The proposed method significantly mitigates artifacts, preserves fine details at the focus boundary, and demonstrates notable enhancements in both visual quality and objective evaluation of multi-focus image fusion.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117213"},"PeriodicalIF":3.4,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan A.S. Lima , Cristiano J. Miosso , Mylène C.Q. Farias
{"title":"SynFlowMap: A synchronized optical flow remapping for video motion magnification","authors":"Jonathan A.S. Lima , Cristiano J. Miosso , Mylène C.Q. Farias","doi":"10.1016/j.image.2024.117203","DOIUrl":"10.1016/j.image.2024.117203","url":null,"abstract":"<div><div>Motion magnification refers to the process of spatially amplifying small movements in a video to reveal important information about a scene. Several motion magnification methods have been proposed, but most of them introduce perceptible and annoying visual artifacts. In this paper, we propose a method that first analyzes the optical flow between the original frame and the corresponding frames, which are motion-magnified with other methods. Then, the method uses the generated optical flow map and the original video to synthesize a combined motion-magnified video. The method is able to amplify the motion by larger values, invert the direction of the motion, and combine filtered motion from multiple frequencies and Eulerian methods. Amongst other advantages, the proposed approach eliminates artifacts caused by Eulerian motion-magnification methods. We present an extensive qualitative and quantitative analysis of the results compared to the main approaches for Eulerian methods. A final contribution of this work is a new video database for motion magnification that allows the evaluation of quantitative motion magnification.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117203"},"PeriodicalIF":3.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142416778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Arda Kırmızıoğlu , A. Murat Tekalp , Burak Görkemli
{"title":"Distributed virtual selective-forwarding units and SDN-assisted edge computing for optimization of multi-party WebRTC videoconferencing","authors":"R. Arda Kırmızıoğlu , A. Murat Tekalp , Burak Görkemli","doi":"10.1016/j.image.2024.117173","DOIUrl":"10.1016/j.image.2024.117173","url":null,"abstract":"<div><div>Network service providers (NSP) have growing interest in placing network intelligence and services at network edges by deploying software-defined network (SDN) and network function virtualization infrastructure. In multi-party WebRTC videoconferencing using scalable video coding, a selective forwarding unit (SFU) provides connectivity between peers with heterogeneous bandwidth and terminals. An important question is where in the network to place the SFU service in order to minimize end-to-end delay between all pairs of peers. Clearly, there is no single optimal place for a cloud SFU for all possible peer locations. We propose placing virtual SFUs at network edges leveraging NSP edge datacenters to optimize end-to-end delay and usage of overall network resources. The main advantage of the distributed edge-SFU framework is that each peer video stream travels the shortest path to reach other peers similar to mesh connection model, whereas each peer uploads a single stream to its edge-SFU avoiding the upload bottleneck. While the proposed distributed edge-SFU framework applies to both best-effort and managed service models, this paper proposes a premium managed, edge-integrated multi-party WebRTC service architecture with bandwidth and delay guarantees within access networks by SDN-assisted slicing of edge networks. The performance of the proposed distributed edge-SFU service architecture is demonstrated by means of experimental results.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117173"},"PeriodicalIF":3.4,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuzhen Tong , Qing Wang , Xuan Wei , Cheng Lu , Xiaobo Lu
{"title":"Modulated deformable convolution based on graph convolution network for rail surface crack detection","authors":"Shuzhen Tong , Qing Wang , Xuan Wei , Cheng Lu , Xiaobo Lu","doi":"10.1016/j.image.2024.117202","DOIUrl":"10.1016/j.image.2024.117202","url":null,"abstract":"<div><p>Accurate detection of rail surface cracks is essential but also tricky because of the noise, low contrast, and density inhomogeneity. In this paper, to deal with the complex situations in rail surface crack detection, we propose modulated deformable convolution based on a graph convolution network named MDCGCN. The MDCGCN is a novel convolution that calculates the offsets and modulation scalars of the modulated deformable convolution by conducting the graph convolution network on a feature map. The MDCGCN improves the performance of different networks in rail surface crack detection, harming the inference speed slightly. Finally, we demonstrate our methods’ numerical accuracy, computational efficiency, and effectiveness on the public segmentation dataset RSDD and our self-built detection dataset SEU-RSCD and explore an appropriate network structure in the baseline network UNet with the MDCGCN.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117202"},"PeriodicalIF":3.4,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142229160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A global reweighting approach for cross-domain semantic segmentation","authors":"Yuhang Zhang , Shishun Tian , Muxin Liao , Guoguang Hua , Wenbin Zou , Chen Xu","doi":"10.1016/j.image.2024.117197","DOIUrl":"10.1016/j.image.2024.117197","url":null,"abstract":"<div><div>Unsupervised domain adaptation semantic segmentation attracts much research attention due to the expensive pixel-level annotation cost. Since the adaptation difficulty of samples is different, the weight of samples should be set independently, which is called reweighting. However, existing reweighting methods only calculate local reweighting information from predicted results or context information in batch images of two domains, which may lead to over-alignment or under-alignment problems. To handle this issue, we propose a global reweighting approach. Specifically, we first define the target centroid distance, which describes the distance between the source batch data and the target centroid. Then, we employ a Fréchet Inception Distance metric to evaluate the domain divergence and embed it into the target centroid distance. Finally, a global reweighting strategy is proposed to enhance the knowledge transferability in the source domain supervision. Extensive experiments demonstrate that our approach achieves competitive performance and helps to improve performance in other methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117197"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142359027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaobao Yang , Shuai He , Jie Zhang , Sugang Ma , Zhiqiang Hou , Wei Sun
{"title":"Memory positional encoding for image captioning","authors":"Xiaobao Yang , Shuai He , Jie Zhang , Sugang Ma , Zhiqiang Hou , Wei Sun","doi":"10.1016/j.image.2024.117201","DOIUrl":"10.1016/j.image.2024.117201","url":null,"abstract":"<div><p>Transformer-based architectures represent the state-of-the-art in image captioning. Due to its natural parallel internal structure, it cannot be aware of the order of inputting tokens, so the positional encoding becomes an indispensable component of Transformer-based models. However, most of the existing absolute positional encodings (APE) have certain limitations for image captioning. Their spatial positional features are predefined and cannot been well generalized to other forms of data, such as visual data. Meanwhile, each positional features are decoupled from each other and lack internal correlation, therefore which affects the accuracy of spatial position context representation of visual or text semantic to a certain extent. Therefore, we propose a memory positional encoding (MPE), which has generalization ability that can be applied to both the visual encoder and the sequence decoder of the image captioning models. In MPE, each positional feature is recursively generated by the learnable network with memory function, making the current generated positional features effectively inherit the genetic information of the previous <span><math><mi>n</mi></math></span> positions. In addition, existing positional encodings provide positional features with fixed value and scale, that means, they provide the same positional encoding for different inputs, which is unreasonable. Thus, to address the previous issues of scale and value of current positional encoding methods in practical applications, we further explore dynamic memory positional encoding (DMPE) based on MPE. DMPE dynamically adjusts and generates positional features based on different input to provide them with unique positional representation. Extensive experiments on the MSCOCO validate the effectiveness of MPE and DMPE.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117201"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Style Optimization Networks for real-time semantic segmentation of rainy and foggy weather","authors":"Yifang Huang, Haitao He, Hongdou He, Guyu Zhao, Peng Shi, Pengpeng Fu","doi":"10.1016/j.image.2024.117199","DOIUrl":"10.1016/j.image.2024.117199","url":null,"abstract":"<div><div>Semantic segmentation is an essential task in the field of computer vision. Existing semantic segmentation models can achieve good results under good weather and lighting conditions. However, when the external environment changes, the effectiveness of these models are seriously affected. Therefore, we focus on the task of semantic segmentation in rainy and foggy weather. Fog is a common phenomenon in rainy weather conditions and has a negative impact on image visibility. Besides, to make the algorithm satisfy the application requirements of mobile devices, the computational cost and the real-time requirement of the model have become one of the major points of our research. In this paper, we propose a novel Style Optimization Network (SONet) architecture, containing a Style Optimization Module (SOM) that can dynamically learn style information, and a Key information Extraction Module (KEM) that extracts important spatial and contextual information. This can improve the learning ability and robustness of the model for rainy and foggy conditions. Meanwhile, we achieve real-time performance by using lightweight modules and a backbone network with low computational complexity. To validate the effectiveness of our SONet, we synthesized CityScapes dataset for rainy and foggy weather and evaluated the accuracy and complexity of our model. Our model achieves a segmentation accuracy of 75.29% MIoU and 83.62% MPA on a NVIDIA TITAN Xp GPU. Several comparative experiments have shown that our SONet can achieve good performance in semantic segmentation tasks under rainy and foggy weather, and due to the lightweight design of the model we have a good advantage in both accuracy and model complexity.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117199"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142359025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Mahdi Afrasiabi, Reshad Hosseini, Aliazam Abbasfar
{"title":"A novel theoretical analysis on optimal pipeline of multi-frame image super-resolution using sparse coding","authors":"Mohammad Mahdi Afrasiabi, Reshad Hosseini, Aliazam Abbasfar","doi":"10.1016/j.image.2024.117198","DOIUrl":"10.1016/j.image.2024.117198","url":null,"abstract":"<div><p>Super-resolution is the process of obtaining a high-resolution (HR) image from one or more low-resolution (LR) images. Single image super-resolution (SISR) deals with one LR image while multi-frame super-resolution (MFSR) employs several LR ones to reach the HR output. MFSR pipeline consists of alignment, fusion, and reconstruction. We conduct a theoretical analysis using sparse coding (SC) and iterative shrinkage-thresholding algorithm to fill the gap of mathematical justification in the execution order of the optimal MFSR pipeline. Our analysis recommends executing alignment and fusion before the reconstruction stage (whether through deconvolution or SISR techniques). The suggested order ensures enhanced performance in terms of peak signal-to-noise ratio and structural similarity index. The optimal pipeline also reduces computational complexity compared to intuitive approaches that apply SISR to each input LR image. Also, we demonstrate the usefulness of SC in analysis of computer vision tasks such as MFSR, leveraging the sparsity assumption in natural images. Simulation results support the findings of our theoretical analysis, both quantitatively and qualitatively.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117198"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142172531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Underwater image enhancement via brightness mask-guided multi-attention embedding","authors":"Yuanyuan Li, Zetian Mi, Peng Lin, Xianping Fu","doi":"10.1016/j.image.2024.117200","DOIUrl":"10.1016/j.image.2024.117200","url":null,"abstract":"<div><p>Numerous new underwater image enhancement methods have been proposed to correct color and enhance the contrast. Although these methods have achieved satisfactory enhancement results in some respects, few have taken into account the effect of the raw image illumination distribution on the enhancement results, often leading to oversaturation or undersaturation. To solve these problems, an underwater image enhancement network guided by brightness mask with multi-attention embedding, called BMGMANet, is designed. Specifically, considering that different regions in the underwater images have different degradation degrees, which can be implicitly reflected by a brightness mask characterizing the image illumination distribution, a decoder network guided by a reverse brightness mask is designed to enhance the dark regions while suppressing excessive enhancement of the bright regions. In addition, a triple-attention module is designed to further enhance the contrast of the underwater image and recover more details. Extensive comparative experiments demonstrate that the enhancement results of our network outperform those of existing state-of-the-art methods. Furthermore, additional experiments also prove that our BMGMANet can effectively enhance the non-uniform illumination underwater images and improve the performance of saliency object detection in underwater images.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117200"},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alireza Esmaeilzehi , Morteza Mirzaei , Hossein Zaredar , Dimitrios Hatzinakos , M. Omair Ahmad
{"title":"DJUHNet: A deep representation learning-based scheme for the task of joint image upsampling and hashing","authors":"Alireza Esmaeilzehi , Morteza Mirzaei , Hossein Zaredar , Dimitrios Hatzinakos , M. Omair Ahmad","doi":"10.1016/j.image.2024.117187","DOIUrl":"10.1016/j.image.2024.117187","url":null,"abstract":"<div><p>In recent years, numerous efficient schemes that employ deep neural networks have been developed for the task of image hashing. However, not much attention is paid to enhancing the performance and robustness of these deep hashing networks, when the input images do not possess high spatial resolution and visual quality. This is a critical problem, as often accessing high-quality high-resolution images is not guaranteed in real-life applications. In this paper, we propose a novel method for the task of joint image upsampling and hashing, that uses a three-stage design. Specifically, in the first two stages of the proposed scheme, we obtain two deep neural networks, each of which is individually trained for the task of image super resolution and image hashing, respectively. We then fine-tune the two deep networks thus obtained by using the ideas of representation learning and alternating optimization process, in order to produce a set of optimal parameters for the task of joint image upsampling and hashing. The effectiveness of the various ideas utilized for designing the proposed method is demonstrated by performing different experimentations. It is shown that the proposed scheme is able to outperform the state-of-the-art image super resolution and hashing methods, even when they are trained simultaneously in a joint end-to-end manner.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117187"},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}