{"title":"Iterative decoupling deconvolution network for image restoration","authors":"","doi":"10.1016/j.jvcir.2024.104288","DOIUrl":"10.1016/j.jvcir.2024.104288","url":null,"abstract":"<div><p>The iterative decoupled deblurring BM3D (IDDBM3D) (Danielyan et al., 2011) combines the analysis representation and the synthesis representation, where deblurring and denoising operations are decoupled, so that both problems can be easily solved. However, the IDDBM3D has some limitations. First, the analysis transformation and the synthesis transformation are analytical, thus have limited representation ability. Second, it is difficult to effectively remove image noise from threshold transformation. Third, there exists hyper-parameters to be tuned manually, which is difficult and time consuming. In this work, we propose an iterative decoupling deconvolution network(IDDNet), by unrolling the iterative decoupling algorithm of the IDDBM3D. In the proposed IDDNet, the analysis/synthesis transformation are implemented by encoder/decoder modules; the denoising is implemented by convolutional neural network based denoiser; the hyper-parameters are estimated by hyper-parameter module. We apply our models for image deblurring and super-resolution. Experimental results show that the IDDNet significantly outperforms the state-of-the-art unfolding networks.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LG-AKD: Application of a lightweight GCN model based on adversarial knowledge distillation to skeleton action recognition","authors":"","doi":"10.1016/j.jvcir.2024.104286","DOIUrl":"10.1016/j.jvcir.2024.104286","url":null,"abstract":"<div><p>Human action recognition, a pivotal topic in computer vision, is a highly complex and challenging task. It requires the analysis of not only spatial dependencies of targets but also temporal changes in these targets. In recent decades, the advancement of deep learning has led to the development of numerous action recognition methods based on deep neural networks. Given that the skeleton points of the human body can be treated as a graph structure, graph neural networks (GNNs) have emerged as an effective tool for modeling such data, garnering significant interest from researchers. This paper aims to address the issue of low test speed caused by over-complicated deep graph convolutional models. To achieve this, we compress the network structure using knowledge distillation from a teacher-student architecture, leading to a compact and lightweight student GNN. To enhance the model’s robustness and generalization capabilities, we introduce a data augmentation mechanism that generates diverse action sequences while maintaining consistent behavior labels, thereby providing a more comprehensive learning basis for the model. The proposed model integrates three distinct knowledge learning paths: teacher networks, original datasets, and derived data. The fusion of knowledge distillation and data augmentation enables lightweight student networks to outperform their teacher networks in terms of both performance and efficiency. Experimental results demonstrate the efficacy of our approach in the context of skeleton-based human action recognition, highlighting its potential to simplify state-of-the-art models while enhancing their performance.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-capacity reversible data hiding in encrypted images based on adaptive block coding selection","authors":"","doi":"10.1016/j.jvcir.2024.104291","DOIUrl":"10.1016/j.jvcir.2024.104291","url":null,"abstract":"<div><p>Recently, data hiding techniques have flourished and addressed various challenges. However, reversible data hiding for encrypted images (RDHEI) using vacating room after encryption (VRAE) framework often falls short in terms of data embedding performance. To address this issue, this paper proposes a novel and high-capacity data hiding method based on adaptive block coding selection. Specifically, iterative encryption and block permutation are applied during image encryption to maintain high pixel correlation within blocks. For each block in the encrypted image, both entropy coding and zero-valued high bit-planes compression coding are pre-applied, then the coding method that vacates the most space is selected, leveraging the strengths of both coding techniques to maximize the effective embeddable room of each encrypted block. This adaptive block coding selection mechanism is suitable for images with varying characteristics. Extensive experiments demonstrate that the proposed VRAE-based method outperforms some state-of-the-art RDHEI methods in data embedding capacity. The average embedding rates (ERs) of the proposed method for three publicly-used datasets including BOSSbase, BOWS-2 and UCID, are 4.041 bpp, 3.929 bpp, and 3.181 bpp, respectively.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved semantic-guided network for skeleton-based action recognition","authors":"","doi":"10.1016/j.jvcir.2024.104281","DOIUrl":"10.1016/j.jvcir.2024.104281","url":null,"abstract":"<div><p>A fundamental issue in skeleton-based action recognition is the extraction of useful features from skeleton joints. Unfortunately, the current state-of-the-art models for this task have a tendency to be overly complex and parameterized, which results in low model training and inference time efficiency for large-scale datasets. In this work, we develop a simple but yet an efficient baseline for skeleton-based Human Action Recognition (HAR). The architecture is based on adaptive GCNs (Graph Convolutional Networks) to capture the complex interconnections within skeletal structures automatically without the need of a predefined topology. The GCNs are followed and empowered with an attention mechanism to learn more informative representations. This paper reports interesting accuracy on a large-scale dataset NTU-RGB+D 60, 89.7% and 95.0% on respectively Cross-Subject, and Cross-View benchmarks. On NTU-RGB+D 120, 84.6% and 85.8% over Cross-Subject and Cross-Setup settings, respectively. This work provides an improvement of the existing model SGN (Semantic-Guided Neural Networks) when extracting more discriminant spatial and temporal features.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1047320324002372/pdfft?md5=13e62eaf463a376574412ad44a346dd4&pid=1-s2.0-S1047320324002372-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel approach for long-term secure storage of domain independent videos","authors":"","doi":"10.1016/j.jvcir.2024.104279","DOIUrl":"10.1016/j.jvcir.2024.104279","url":null,"abstract":"<div><p>Long-term protection of multimedia contents is a complex task, especially when the video has critical elements. It demands sophisticated technology to ensure confidentiality. In this paper, we propose a blended approach which uses proactive visual cryptography scheme along with video summarization techniques to circumvent the aforementioned issues. Proactive visual cryptography is used to protect digital data by updating periodically or renewing the shares, which are stored in different servers. And, video summarization schemes are useful in various scenarios where memory is a major concern. We use a domain independent scheme for summarizing videos and is applicable to both edited and unedited videos. In our scheme, the visual continuity of the raw video is preserved even after summarization. The original video can be reconstructed through the shares using auxiliary data, which was generated during video summarization phase. The mathematical studies and experimental results demonstrate the applicability of our proposed method.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142163278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VTPL: Visual and text prompt learning for visual-language models","authors":"","doi":"10.1016/j.jvcir.2024.104280","DOIUrl":"10.1016/j.jvcir.2024.104280","url":null,"abstract":"<div><p>Visual-language (V-L) models have achieved remarkable success in learning combined visual–textual representations from large web datasets. Prompt learning, as a solution for downstream tasks, can address the forgetting of knowledge associated with fine-tuning. However, current methods focus on a single modality and fail to fully use multimodal information. This paper aims to address these limitations by proposing a novel approach called visual and text prompt learning (VTPL) to train the model and enhance both visual and text prompts. Visual prompts align visual features with text features, whereas text prompts enrich the semantic information of the text. Additionally, this paper introduces a poly-1 information noise contrastive estimation (InfoNCE) loss and a center loss to increase the interclass distance and decrease the intraclass distance. Experiments on 11 image datasets show that VTPL outperforms state-of-the-art methods, achieving 1.61%, 1.63%, 1.99%, 2.42%, and 2.87% performance boosts over CoOp for 1, 2, 4, 8, and 16 shots, respectively.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAFLFusionGait: Gait recognition network with separate attention and different granularity feature learnability fusion","authors":"","doi":"10.1016/j.jvcir.2024.104284","DOIUrl":"10.1016/j.jvcir.2024.104284","url":null,"abstract":"<div><p>Gait recognition, an essential branch of biometric identification, uses walking patterns to identify individuals. Despite its effectiveness, gait recognition faces challenges such as vulnerability to changes in appearance due to factors like angles and clothing conditions. Recent progress in deep learning has greatly enhanced gait recognition, especially through methods like deep convolutional neural networks, which demonstrate impressive performance. However, current approaches often overlook the connection between coarse-grained and fine-grained features, thereby restricting their overall effectiveness. To address this limitation, we propose a new framework for gait recognition framework that combines deep-supervised fine-grained separation with coarse-grained feature learnability. Our framework includes the LFF module, which consists of the SSeg module for fine-grained information extraction and a mechanism for fusing coarse-grained features. Furthermore, we introduce the F-LCM module to extract local disparity features more effectively with learnable weights. Evaluation on CASIA-B and OU-MVLP datasets shows superior performance compared to classical networks.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind image deblurring with a difference of the mixed anisotropic and mixed isotropic total variation regularization","authors":"","doi":"10.1016/j.jvcir.2024.104285","DOIUrl":"10.1016/j.jvcir.2024.104285","url":null,"abstract":"<div><p>This paper proposes a simple model for image deblurring with a new total variation regularization. Classically, the <em>L</em><sub>1-21</sub> regularizer represents a difference of anisotropic (i.e. <em>L</em><sub>1</sub>) and isotropic (i.e. <em>L</em><sub>21</sub>) total variation, so we define a new regularization as <em>L</em><sub>e-2e</sub>, which is the weighted difference of the mixed anisotropic (i.e. <em>L</em><sub>0</sub> + <em>L</em><sub>1</sub> = <em>L</em><sub>e</sub>) and mixed isotropic (i.e. <em>L</em><sub>0</sub> + <em>L</em><sub>21</sub> = <em>L</em><sub>2e</sub>), and it is characterized by sparsity-promoting<!--> <!-->and robustness in image deblurring. Then, we merge the <em>L</em><sub>0</sub>-gradient into the model for edge-preserving and detail-removing. The union of the <em>L</em><sub>e-2e</sub> regularization and <em>L</em><sub>0</sub>-gradient improves the performance of image deblurring and yields high-quality blur kernel estimates. Finally, we design a new solution format that alternately iterates the difference of convex algorithm, the split Bregman method, and the approach of half-quadratic splitting to optimize the proposed model. Experimental results on quantitative datasets and real-world images show that the proposed method can obtain results comparable to state-of-the-art works.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Secret image sharing with distinct covers based on improved Cycling-XOR","authors":"","doi":"10.1016/j.jvcir.2024.104282","DOIUrl":"10.1016/j.jvcir.2024.104282","url":null,"abstract":"<div><p>Secret image sharing (SIS) is a technique used to distribute confidential data by dividing it into multiple image shadows. Most of the existing approaches or algorithms protect confidential data by encryption with secret keys. This paper proposes a novel SIS scheme without using any secret key. The secret images are first quantized and encrypted by self-encryption into noisy ones. Then, the encrypted images are mixed into secret shares by cross-encryption. The image shadows are generated by replacing the lower bit-planes of the cover images with the secret shares. In the extraction phase, the receiver can restore the quantized secret images by combinatorial operations of the extracted secret shares. Experimental results show that our method is able to deliver a large amount of data payload with a satisfactory cover image quality. Besides, the computational load is very low since the whole scheme is mostly based on cycling-XOR operations.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Background adaptive PosMarker based on online generation and detection for locating watermarked regions in photographs","authors":"","doi":"10.1016/j.jvcir.2024.104269","DOIUrl":"10.1016/j.jvcir.2024.104269","url":null,"abstract":"<div><p>Robust watermarking technology can embed invisible messages in screens to trace the source of unauthorized screen photographs. Locating the four vertices of the embedded region in the photograph is necessary, as existing watermarking methods require geometric correction of the embedded region before revealing the message. Existing localization methods suffer from a performance trade-off: either causing unaesthetic visual quality by embedding visible markers or achieving poor localization precision, leading to message extraction failure. To address this issue, we propose a background adaptive position marker, PosMarker, based on the gray level co-occurrence matrix and the noise visibility function. Besides, we propose an online generation scheme that employs a learnable generator to cooperate with the detector, allowing joint optimization between the two. This simultaneously improves both visual quality and detection precision. Extensive experiments demonstrate the superior localization precision of our PosMarker-based method compared to others.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}