{"title":"Deep video steganography using temporal-attention-based frame selection and spatial sparse adversarial attack","authors":"Beijing Chen , Yuting Hong , Yuxin Nie","doi":"10.1016/j.jvcir.2024.104311","DOIUrl":"10.1016/j.jvcir.2024.104311","url":null,"abstract":"<div><div>With the development of deep learning-based steganalysis, video steganography is facing with great challenges. To address the insufficient security against steganalysis of existing deep video steganography, given that the video has both spatial and temporal dimensions, this paper proposes a deep video steganography method using temporal frame selection and spatial sparse adversarial attack. In temporal dimension, a stego frame selection module based on temporal attention is designed to calculate the weight of each frame and selects frames with high weights for message and sparse perturbation embedding. In spatial dimension, sparse adversarial perturbations are performed in the selected frames to improve the ability of resisting steganalysis. Moreover, to control the adversarial perturbations’ sparsity flexibly, an intra-frame dynamic sparsity threshold mechanism is designed by using percentile. Experimental results demonstrate that the proposed method effectively enhances the visual quality and security against steganalysis of video steganography and has controllable sparsity of adversarial perturbations.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104311"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yueshuang Jiao , Zhenzhen Zhang , Zhenzhen Li , Zichen Li , Xiaolong Li , Jiaoyun Liu
{"title":"A robust coverless image-synthesized video steganography based on asymmetric structure","authors":"Yueshuang Jiao , Zhenzhen Zhang , Zhenzhen Li , Zichen Li , Xiaolong Li , Jiaoyun Liu","doi":"10.1016/j.jvcir.2024.104303","DOIUrl":"10.1016/j.jvcir.2024.104303","url":null,"abstract":"<div><div>Due to the ability of hiding secret information without modifying image content, coverless image stegonagraphy has gained higher level of security and become a research hot spot. However, in existing methods, the issue of image order disruption during network transmission is overlooked. In this paper, the image-synthesized video carrier is proposed for the first time. The selected images which represent secret information are synthesized to a video in order, thus the image order will not be disrupted during transmission and the effective capacity is greatly increased. Additionally, an asymmetric structure is designed to improve the robustness, in which only the receiver utilizes a robust image retrieval algorithm to restore secret information. Specifically, certain images are randomly selected from a public image database to create multiple coverless image datasets (MCIDs), with each image in a CID mapped to hash sequence. Images are indexed based on secret segments and synthesized into videos. After that, the synthesized videos are sent to the receiver. The receiver decodes the video into frames, identifies the corresponding CID of each frame, retrieves original image, and restores the secret information with the same mapping rule. Experimental results indicate that the proposed method outperforms existing methods in terms of capacity, robustness, and security.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104303"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142433307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized deep learning enabled lecture audio video summarization","authors":"Preet Chandan Kaur , Dr. Leena Ragha","doi":"10.1016/j.jvcir.2024.104309","DOIUrl":"10.1016/j.jvcir.2024.104309","url":null,"abstract":"<div><div>Video summarization plays an important role in multiple applications by compressing lengthy video content into compressed representation. The purpose is to present a fine-tuned deep model for lecture audio video summarization. Initially, the input lecture audio-visual video is taken from the dataset. Then, the video shot segmentation (slide segmentation) is done using the YCbCr space colour model. From each video shot, the audio and video within the video shot are segmented using the Honey Badger-based Bald Eagle Algorithm (HBBEA). The HBBEA is obtained by combining the Bald Eagle Search (BES) and Honey Badger Algorithm (HBA). The DRN training is executed by HBBEA to select the finest DRN weights. The relevant video frames are merged with the audio. The proposed HBBEA-based DRN outperformed with a better F1-Score of 91.9 %, Negative predictive value (NPV) of 89.6 %, Positive predictive value (PPV) of 90.7 %, Accuracy of 91.8 %, precision of 91 %, and recall of 92.8 %.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104309"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhouyan He , Renzhi Hu , Jun Wu , Ting Luo , Haiyong Xu
{"title":"A Transformer-based invertible neural network for robust image watermarking","authors":"Zhouyan He , Renzhi Hu , Jun Wu , Ting Luo , Haiyong Xu","doi":"10.1016/j.jvcir.2024.104317","DOIUrl":"10.1016/j.jvcir.2024.104317","url":null,"abstract":"<div><div>For the existing encoder-noise-decoder (END) based watermarking models, since the coupling between the encoder and the decoder is weak, the encoder generally embeds certain redundant features into the cover image to enable the decoder to extract watermark completely, which will affect watermarking invisibility. To address this problem, this paper proposes a Transformer-based invertible neural network (INN) for robust image watermarking (IWFormer). In order to effectively reduce redundant features, the INN framework is utilized for the watermark embedding and extracting processes, so that the encoded features are highly consistent with the features required for decoding. For enhancing watermarking robustness, an affine Transformer module is designed by mining the global correlation of the cover image. In addition, considering that the human visual system is sensitive to low-frequency variations, the wavelet low-frequency sub-band loss is deployed to guide watermark to be embedded in middle- and high-frequency components, thus further increasing the quality of the encoded images. Experimental results demonstrate that compared with the existing state-of-the-art watermarking models, the proposed IWFormer owns remarkable advantages in terms of both watermarking invisibility and robustness.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104317"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A robust watermarking approach for medical image authentication using dual image and quorum function","authors":"Ashis Dey , Partha Chowdhuri , Pabitra Pal , Utpal Nandi","doi":"10.1016/j.jvcir.2024.104299","DOIUrl":"10.1016/j.jvcir.2024.104299","url":null,"abstract":"<div><div>To safeguard the identity and copyright of a patient’s medical documents, watermarking strategies are widely used. This work provides a new dual image-based watermarking approach using the quorum function (QF) and AD interpolation technique. AD interpolation is used to create the dual images which helps to increase the embedding capacity. Moreover, the rules for using the QF are designed in such a way, that the original bits are least affected after embedding. As a result, it increases the visual quality of the stego images. A shared secret key has been employed to protect the information hidden in the medical image and to maintain the privacy and confidentiality. The experimental result using PSNR, SSIM, NCC, and EC shows that the suggested technique gives an average PSNR of 68.44 dB and SSIM is close to 0.99 after inserting 786432 watermark bits, which demonstrates the superiority of the scheme over other state-of-the-art schemes.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104299"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiuya Shi , Yi Yang , Hao Liu , Litai Ma , Zhibo Zhao , Chao Ren
{"title":"HPIDN: A Hierarchical prior-guided iterative denoising network with global–local fusion for enhancing low-dose CT images","authors":"Xiuya Shi , Yi Yang , Hao Liu , Litai Ma , Zhibo Zhao , Chao Ren","doi":"10.1016/j.jvcir.2024.104297","DOIUrl":"10.1016/j.jvcir.2024.104297","url":null,"abstract":"<div><div>Low-dose computed tomography (LDCT) is an emerging medical diagnostic tool that reduces radiation exposure but suffers from noise retention. Current CNN-based LDCT denoising algorithms struggle to capture comprehensive global representations, impacting diagnostic accuracy. To address this, we propose a novel Hierarchical Prior-guided Iterative Denoising Network (HPIDN) for LDCT images, consisting of two main modules: the Dynamic Feature Extraction and Fusion Module (DFEFM) and the Feature-domain Iterative Denoising Module (FIDM). DFEFM dynamically captures a comprehensive representation, encompassing detailed local features in intra-relationships and complex global features in inter-relationships. It effectively guides the multi-stage iterative denoising process. FIDM hierarchically fuses the prior with image features from DFEFM by using the dual-domain attention fusion sub-network (DAFSN), enhancing denoising robustness and adaptability. This yields higher-quality images with reduced noise artifacts. Extensive experiments on the Mayo and ELCAP Datasets demonstrate the superior performance of our method quantitatively and qualitatively, improving diagnostic accuracy of lung diseases.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104297"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lossless medical ultrasound image compression based on frequency domain decomposition","authors":"Yaqi Zhao, Yue Li","doi":"10.1016/j.jvcir.2024.104306","DOIUrl":"10.1016/j.jvcir.2024.104306","url":null,"abstract":"<div><div>Medical ultrasound imaging is a widely used non-invasive method for diagnosing diseases. However, these images contain significant speckle noise, which differs from the characteristics of natural images. This makes effective lossless compression of medical ultrasound images a challenging task. In this paper, we propose a novel hybrid ultrasound image lossless learning compression framework. Firstly, we use the traditional DCT (discrete cosine transform) to transform the original raw pixels of ultrasound images into the frequency domain. Secondly, to effectively compress the numerical values in the frequency domain, we decompose the DCT coefficients into different groups to reduce local and global information redundancy in the frequency domain. Finally, we use learned and non-learned methods to compress the DCT coefficients of different groups separately. The experimental results show that on the Breast ultrasound image dataset, our proposed method achieves a bit rate reduction of 8.6% to 68.9% compared to learned and non-learned methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104306"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local and global mixture network for image inpainting","authors":"Seunggyun Woo , Keunsoo Ko , Chang-Su Kim","doi":"10.1016/j.jvcir.2024.104312","DOIUrl":"10.1016/j.jvcir.2024.104312","url":null,"abstract":"<div><div>In general, CNN-based inpainting can recover local patterns effectively using convolutional filters, but it may not exploit global correlation fully. On the other hand, transformer-based inpainting can fill in large holes faithfully based on global correlation, rather than local one. In this paper, we propose a novel image inpainting algorithm, called local and global mixture (LGM), to take advantage of the strengths of both approaches and compensate for their weaknesses. The LGM network comprises the local inpainting network (LIN) and the global inpainting network (GIN) in parallel, which are based on convolutional layers and transformer blocks, respectively, and exchange their intermediate results with each other. Furthermore, we develop an error propagation model with a continuous error mask, updated in LIN but used in both LIN and GIN to provide more reliable inpainting results. Extensive experiments demonstrate that the proposed LGM algorithm provides excellent inpainting performance, which indicates the efficacy of the parallel combination of LIN and GIN and the effectiveness of the error propagation model.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104312"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Yang, Yuantong Zhang, Zhenzhong Chen, Daiqin Yang
{"title":"An illumination-guided dual-domain network for image exposure correction","authors":"Jie Yang, Yuantong Zhang, Zhenzhong Chen, Daiqin Yang","doi":"10.1016/j.jvcir.2024.104313","DOIUrl":"10.1016/j.jvcir.2024.104313","url":null,"abstract":"<div><div>Exposure problems, including underexposure and overexposure, can significantly degrade image quality. Poorly exposed images often suffer from coupled illumination degradation and detail degradation, aggravating the difficulty of recovery. These necessitate a spatial discriminating exposure correction, making achieving uniformly exposed and visually consistent images challenging. To address these issues, we propose an Illumination-guided Dual-domain Network (IDNet), which employs a Dual-Domain Module (DDM) to simultaneously recover illumination and details from the frequency and spatial domains, respectively. The DDM also integrates a structural re-parameterization technique to enhance the detail-aware capabilities with reduced computational cost. An Illumination Mask Predictor (IMP) is introduced to guide exposure correction by estimating the optimal illumination mask. The comparison with 26 methods on three benchmark datasets shows that IDNet achieves superior performance with fewer parameters and lower computational complexity. These results confirm the effectiveness and efficiency of our approach in enhancing image quality across various exposure scenarios.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104313"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust text watermarking based on average skeleton mass of characters against cross-media attacks","authors":"Xinyi Huang, Hongxia Wang","doi":"10.1016/j.jvcir.2024.104300","DOIUrl":"10.1016/j.jvcir.2024.104300","url":null,"abstract":"<div><div>The wide spread of digital documents makes it essential to protect intellectual property and information security. As a key method of digital copyright protection, robust document watermarking technology has attracted much attention in this context. With the rapid development of current electronic devices, the ways of document theft are no longer limited to copy and transmission. Due to the convenient and fast shooting operation of the camera on paper or screen, current text watermarking methods need to be robust to cope with cross-media transmission. To realize the corresponding robust text watermarking, a text watermarking scheme based on the average skeleton mass of characters is proposed in this paper, and the average skeleton mass of adjacent characters is used to represent the watermark information. In this paper, a watermarking scheme is designed to modify character pixels, which can modify glyphs without loss of transparency and provide high embedding capacity. Compared with the existing manually designed font-based text watermarking schemes, this scheme does not need to accurately segment characters, nor does it rely on stretching characters to the same size for matching, which reduces the need for character segmentation. In addition, the experimental results show that the proposed watermarking scheme can be robust to the information transmission modes including print-scan, print-camera and screen-camera.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104300"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}