Mohammad Mahdi Afrasiabi, Reshad Hosseini, Aliazam Abbasfar
{"title":"A novel theoretical analysis on optimal pipeline of multi-frame image super-resolution using sparse coding","authors":"Mohammad Mahdi Afrasiabi, Reshad Hosseini, Aliazam Abbasfar","doi":"10.1016/j.image.2024.117198","DOIUrl":"10.1016/j.image.2024.117198","url":null,"abstract":"<div><p>Super-resolution is the process of obtaining a high-resolution (HR) image from one or more low-resolution (LR) images. Single image super-resolution (SISR) deals with one LR image while multi-frame super-resolution (MFSR) employs several LR ones to reach the HR output. MFSR pipeline consists of alignment, fusion, and reconstruction. We conduct a theoretical analysis using sparse coding (SC) and iterative shrinkage-thresholding algorithm to fill the gap of mathematical justification in the execution order of the optimal MFSR pipeline. Our analysis recommends executing alignment and fusion before the reconstruction stage (whether through deconvolution or SISR techniques). The suggested order ensures enhanced performance in terms of peak signal-to-noise ratio and structural similarity index. The optimal pipeline also reduces computational complexity compared to intuitive approaches that apply SISR to each input LR image. Also, we demonstrate the usefulness of SC in analysis of computer vision tasks such as MFSR, leveraging the sparsity assumption in natural images. Simulation results support the findings of our theoretical analysis, both quantitatively and qualitatively.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117198"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142172531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Underwater image enhancement via brightness mask-guided multi-attention embedding","authors":"Yuanyuan Li, Zetian Mi, Peng Lin, Xianping Fu","doi":"10.1016/j.image.2024.117200","DOIUrl":"10.1016/j.image.2024.117200","url":null,"abstract":"<div><p>Numerous new underwater image enhancement methods have been proposed to correct color and enhance the contrast. Although these methods have achieved satisfactory enhancement results in some respects, few have taken into account the effect of the raw image illumination distribution on the enhancement results, often leading to oversaturation or undersaturation. To solve these problems, an underwater image enhancement network guided by brightness mask with multi-attention embedding, called BMGMANet, is designed. Specifically, considering that different regions in the underwater images have different degradation degrees, which can be implicitly reflected by a brightness mask characterizing the image illumination distribution, a decoder network guided by a reverse brightness mask is designed to enhance the dark regions while suppressing excessive enhancement of the bright regions. In addition, a triple-attention module is designed to further enhance the contrast of the underwater image and recover more details. Extensive comparative experiments demonstrate that the enhancement results of our network outperform those of existing state-of-the-art methods. Furthermore, additional experiments also prove that our BMGMANet can effectively enhance the non-uniform illumination underwater images and improve the performance of saliency object detection in underwater images.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117200"},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alireza Esmaeilzehi , Morteza Mirzaei , Hossein Zaredar , Dimitrios Hatzinakos , M. Omair Ahmad
{"title":"DJUHNet: A deep representation learning-based scheme for the task of joint image upsampling and hashing","authors":"Alireza Esmaeilzehi , Morteza Mirzaei , Hossein Zaredar , Dimitrios Hatzinakos , M. Omair Ahmad","doi":"10.1016/j.image.2024.117187","DOIUrl":"10.1016/j.image.2024.117187","url":null,"abstract":"<div><p>In recent years, numerous efficient schemes that employ deep neural networks have been developed for the task of image hashing. However, not much attention is paid to enhancing the performance and robustness of these deep hashing networks, when the input images do not possess high spatial resolution and visual quality. This is a critical problem, as often accessing high-quality high-resolution images is not guaranteed in real-life applications. In this paper, we propose a novel method for the task of joint image upsampling and hashing, that uses a three-stage design. Specifically, in the first two stages of the proposed scheme, we obtain two deep neural networks, each of which is individually trained for the task of image super resolution and image hashing, respectively. We then fine-tune the two deep networks thus obtained by using the ideas of representation learning and alternating optimization process, in order to produce a set of optimal parameters for the task of joint image upsampling and hashing. The effectiveness of the various ideas utilized for designing the proposed method is demonstrated by performing different experimentations. It is shown that the proposed scheme is able to outperform the state-of-the-art image super resolution and hashing methods, even when they are trained simultaneously in a joint end-to-end manner.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117187"},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Globally and locally optimized Pannini projection for high FoV rendering of 360° images","authors":"Falah Jabar, João Ascenso, Maria Paula Queluz","doi":"10.1016/j.image.2024.117190","DOIUrl":"10.1016/j.image.2024.117190","url":null,"abstract":"<div><p>To render a spherical (360° or omnidirectional) image on planar displays, a 2D image - called as viewport - must be obtained by projecting a sphere region on a plane, according to the user's viewing direction and a predefined field of view (FoV). However, any sphere to plan projection introduces geometric distortions, such as object stretching and/or bending of straight lines, which intensity increases with the considered FoV. In this paper, a fully automatic content-aware projection is proposed, aiming to reduce the geometric distortions when high FoVs are used. This new projection is based on the Pannini projection, whose parameters are firstly globally optimized according to the image content, followed by a local conformality improvement of relevant viewport objects. A crowdsourcing subjective test showed that the proposed projection is the most preferred solution among the considered state-of-the-art sphere to plan projections, producing viewports with a more pleasant visual quality.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117190"},"PeriodicalIF":3.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0923596524000912/pdfft?md5=1ff2da4c676f5e3a19cdbe6c4c5f6989&pid=1-s2.0-S0923596524000912-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142136394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prototype-wise self-knowledge distillation for few-shot segmentation","authors":"Yadang Chen , Xinyu Xu , Chenchen Wei , Chuhan Lu","doi":"10.1016/j.image.2024.117186","DOIUrl":"10.1016/j.image.2024.117186","url":null,"abstract":"<div><p>Few-shot segmentation was proposed to obtain segmentation results for a image with an unseen class by referring to a few labeled samples. However, due to the limited number of samples, many few-shot segmentation models suffer from poor generalization. Prototypical network-based few-shot segmentation still has issues with spatial inconsistency and prototype bias. Since the target class has different appearance in each image, some specific features in the prototypes generated from the support image and its mask do not accurately reflect the generalized features of the target class. To address the support prototype consistency issue, we put forward two modules: Data Augmentation Self-knowledge Distillation (DASKD) and Prototype-wise Regularization (PWR). The DASKD module focuses on enhancing spatial consistency by using data augmentation and self-knowledge distillation. Self-knowledge distillation helps the model acquire generalized features of the target class and learn hidden knowledge from the support images. The PWR module focuses on obtaining a more representative support prototype by conducting prototype-level loss to obtain support prototypes closer to the category center. Broad evaluation experiments on PASCAL-<span><math><msup><mrow><mn>5</mn></mrow><mrow><mi>i</mi></mrow></msup></math></span> and COCO-<span><math><mrow><mn>2</mn><msup><mrow><mn>0</mn></mrow><mrow><mi>i</mi></mrow></msup></mrow></math></span> demonstrate that our model outperforms the prior works on few-shot segmentation. Our approach surpasses the state of the art by 7.5% in PASCAL-<span><math><msup><mrow><mn>5</mn></mrow><mrow><mi>i</mi></mrow></msup></math></span> and 4.2% in COCO-<span><math><mrow><mn>2</mn><msup><mrow><mn>0</mn></mrow><mrow><mi>i</mi></mrow></msup></mrow></math></span>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117186"},"PeriodicalIF":3.4,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan-Lin Chen , Chun-Liang Lin , Yu-Chen Lin , Tzu-Chun Chen
{"title":"Transformer-CNN for small image object detection","authors":"Yan-Lin Chen , Chun-Liang Lin , Yu-Chen Lin , Tzu-Chun Chen","doi":"10.1016/j.image.2024.117194","DOIUrl":"10.1016/j.image.2024.117194","url":null,"abstract":"<div><p>Object recognition in computer vision technology has been a popular research field in recent years. Although the detection success rate of regular objects has achieved impressive results, small object detection (SOD) is still a challenging issue. In the Microsoft Common Objects in Context (MS COCO) public dataset, the detection rate of small objects is typically half that of regular-sized objects. The main reason is that small objects are often affected by multi-layer convolution and pooling, leading to insufficient details to distinguish them from the background or similar objects, resulting in poor recognition rates or even no results. This paper presents a network architecture, Transformer-CNN, that combines a self-attention mechanism-based transformer and a convolutional neural network (CNN) to improve the recognition rate of SOD. It captures global information through a transformer and uses the translation invariance and translation equivalence of CNN to maximize the retention of global and local features while improving the reliability and robustness of SOD. Our experiments show that the proposed model improves the small object recognition rate by 2∼5 % than the general transformer architectures.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117194"},"PeriodicalIF":3.4,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature extractor optimization for discriminative representations in Generalized Category Discovery","authors":"Zhonghao Chang, Xiao Li, Zihao Zhao","doi":"10.1016/j.image.2024.117195","DOIUrl":"10.1016/j.image.2024.117195","url":null,"abstract":"<div><p>Generalized Category Discovery (GCD) task involves transferring knowledge from labeled known categories to recognize both known and novel categories within an unlabeled dataset. A significant challenge arises from the lack of prior information for novel categories. To address this, we develop a feature extractor that can learn discriminative features for both known and novel categories. Our approach leverages the observation that similar samples often belong to the same class. We construct a similarity matrix and employ similarity contrastive loss to increase the similarity between similar samples in the feature space. Additionally, we incorporate cluster labels to further refine the feature extractor, utilizing K-means clustering to assign these labels to unlabeled data, providing valuable supervision. Our feature extractor is optimized through the utilization of instance-level contrastive learning and class-level contrastive learning constraints. These constraints promote similarity maximization in both the instance space and the label space for instances sharing the same pseudo-labels. These three components complement each other, facilitating the learning of discriminative representations for both known and novel categories. Through comprehensive evaluations of generic image recognition datasets and challenging fine-grained datasets, we demonstrate that our proposed method achieves state-of-the-art performance.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117195"},"PeriodicalIF":3.4,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142020716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image-based virtual try-on: Fidelity and simplification","authors":"Tasin Islam, Alina Miron, Xiaohui Liu, Yongmin Li","doi":"10.1016/j.image.2024.117189","DOIUrl":"10.1016/j.image.2024.117189","url":null,"abstract":"<div><p>We introduce a novel image-based virtual try-on model designed to replace a candidate’s garment with a desired target item. The proposed model comprises three modules: segmentation, garment warping, and candidate-clothing fusion. Previous methods have shown limitations in cases involving significant differences between the original and target clothing, as well as substantial overlapping of body parts. Our model addresses these limitations by employing two key strategies. Firstly, it utilises a candidate representation based on an RGB skeleton image to enhance spatial relationships among body parts, resulting in robust segmentation and improved occlusion handling. Secondly, truncated U-Net is employed in both the segmentation and warping modules, enhancing segmentation performance and accelerating the try-on process. The warping module leverages an efficient affine transform for ease of training. Comparative evaluations against state-of-the-art models demonstrate the competitive performance of our proposed model across various scenarios, particularly excelling in handling occlusion cases and significant differences in clothing cases. This research presents a promising solution for image-based virtual try-on, advancing the field by overcoming key limitations and achieving superior performance.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117189"},"PeriodicalIF":3.4,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0923596524000900/pdfft?md5=d7b74bcca8966cd1d3e0e38fa30c8482&pid=1-s2.0-S0923596524000900-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Liu , Xin Li , Jiaqi Zhang , Guangtao Zhai , Yuting Su , Yuyi Zhang , Bo Wang
{"title":"Duration-aware and mode-aware micro-expression spotting for long video sequences","authors":"Jing Liu , Xin Li , Jiaqi Zhang , Guangtao Zhai , Yuting Su , Yuyi Zhang , Bo Wang","doi":"10.1016/j.image.2024.117192","DOIUrl":"10.1016/j.image.2024.117192","url":null,"abstract":"<div><p>Micro-expressions (MEs) are unconscious, instant and slight facial movements, revealing people’s true emotions. Locating MEs is a prerequisite of classifying them, while only a few researches focus on this task. Among them, sliding window based methods are the most prevalent. Due to the differences of individual physiological and psychological mechanisms, and some uncontrollable factors, the durations and transition modes of different MEs fluctuate greatly. Limited to fixed window scale and mode, traditional sliding window based ME spotting methods fail to capture the motion changes of all MEs exactly, resulting in performance degradation. In this paper, an ensemble learning based duration & mode-aware (DMA) ME spotting framework is proposed. Specifically, we exploit multiple sliding windows of different scales and modes to generate multiple weak detectors, each of which accommodates to MEs with certain duration and transition mode. Additionally, to get a more comprehensive strong detector, we integrate the analysis results of multiple weak detectors using a voting based aggregation module. Furthermore, a novel interval generation scheme is designed to merge close peaks and their neighbor frames into a complete ME interval. Experimental results on two long video databases show the promising performance of our proposed DMA framework compared with state-of-the-art methods. The codes are available at <span><span>https://github.com/TJUMMG/DMA-ME-Spotting</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117192"},"PeriodicalIF":3.4,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingfei He, Zezhong Yang, Xunan Zheng, Xiaoyue Zhang, Ao Li
{"title":"Low-rank tensor completion based on tensor train rank with partially overlapped sub-blocks and total variation","authors":"Jingfei He, Zezhong Yang, Xunan Zheng, Xiaoyue Zhang, Ao Li","doi":"10.1016/j.image.2024.117193","DOIUrl":"10.1016/j.image.2024.117193","url":null,"abstract":"<div><p>Recently, the low-rank tensor completion method based on tensor train (TT) rank has achieved promising performance. Ket augmentation (KA) is commonly used in TT rank-based methods to improve the performance by converting low-dimensional tensors to higher-dimensional tensors. However, block artifacts are caused since KA also destroys the original structure and image continuity of original low-dimensional tensors. To tackle this issue, a low-rank tensor completion method based on TT rank with tensor augmentation by partially overlapped sub-blocks (TAPOS) and total variation (TV) is proposed in this paper. The proposed TAPOS preserves the image continuity of the original tensor and enhances the low-rankness of the generated higher-dimensional tensors, and a weighted de-augmentation method is used to assign different weights to the elements of sub-blocks and further reduce the block artifacts. To further alleviate the block artifacts and improve reconstruction accuracy, TV is introduced in the TAPOS-based model to add the piecewise smooth prior. The parallel matrix decomposition method is introduced to estimate the TT rank to reduce the computational cost. Numerical experiments show that the proposed method outperforms the existing state-of-the-art methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117193"},"PeriodicalIF":3.4,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}