Jie Yang, Yuantong Zhang, Zhenzhong Chen, Daiqin Yang
{"title":"An illumination-guided dual-domain network for image exposure correction","authors":"Jie Yang, Yuantong Zhang, Zhenzhong Chen, Daiqin Yang","doi":"10.1016/j.jvcir.2024.104313","DOIUrl":"10.1016/j.jvcir.2024.104313","url":null,"abstract":"<div><div>Exposure problems, including underexposure and overexposure, can significantly degrade image quality. Poorly exposed images often suffer from coupled illumination degradation and detail degradation, aggravating the difficulty of recovery. These necessitate a spatial discriminating exposure correction, making achieving uniformly exposed and visually consistent images challenging. To address these issues, we propose an Illumination-guided Dual-domain Network (IDNet), which employs a Dual-Domain Module (DDM) to simultaneously recover illumination and details from the frequency and spatial domains, respectively. The DDM also integrates a structural re-parameterization technique to enhance the detail-aware capabilities with reduced computational cost. An Illumination Mask Predictor (IMP) is introduced to guide exposure correction by estimating the optimal illumination mask. The comparison with 26 methods on three benchmark datasets shows that IDNet achieves superior performance with fewer parameters and lower computational complexity. These results confirm the effectiveness and efficiency of our approach in enhancing image quality across various exposure scenarios.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104313"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust text watermarking based on average skeleton mass of characters against cross-media attacks","authors":"Xinyi Huang, Hongxia Wang","doi":"10.1016/j.jvcir.2024.104300","DOIUrl":"10.1016/j.jvcir.2024.104300","url":null,"abstract":"<div><div>The wide spread of digital documents makes it essential to protect intellectual property and information security. As a key method of digital copyright protection, robust document watermarking technology has attracted much attention in this context. With the rapid development of current electronic devices, the ways of document theft are no longer limited to copy and transmission. Due to the convenient and fast shooting operation of the camera on paper or screen, current text watermarking methods need to be robust to cope with cross-media transmission. To realize the corresponding robust text watermarking, a text watermarking scheme based on the average skeleton mass of characters is proposed in this paper, and the average skeleton mass of adjacent characters is used to represent the watermark information. In this paper, a watermarking scheme is designed to modify character pixels, which can modify glyphs without loss of transparency and provide high embedding capacity. Compared with the existing manually designed font-based text watermarking schemes, this scheme does not need to accurately segment characters, nor does it rely on stretching characters to the same size for matching, which reduces the need for character segmentation. In addition, the experimental results show that the proposed watermarking scheme can be robust to the information transmission modes including print-scan, print-camera and screen-camera.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104300"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective image compression using hybrid DCT and hybrid capsule auto encoder for brain MR images","authors":"Bindu Puthentharayil Vikraman , Jabeena Afthab","doi":"10.1016/j.jvcir.2024.104296","DOIUrl":"10.1016/j.jvcir.2024.104296","url":null,"abstract":"<div><div>Nowadays, image compression is gaining popularity in various fields because of its storage and transmission capability. This work aims to introduce a medical image (MI) compression model in brain magnetic resonance images (MRI) to mitigate issues in bandwidth and storage. Initially, pre-processing is done to neglect the noises in inputs using the Adaptive Linear Smoothing and Histogram Equalization (ALSHE) method. Then, the Region of Interest (ROI) and Non-ROI parts are separately segmented by the Optimized Fuzzy C-Means (OFCM) approach for reducing high complexity issues. Finally, a novel Hybrid Discrete Cosine Transform-Improved Zero Wavelet (DCT-IZW) is proposed for lossless compression and Hybrid Equilibrium Optimization-Capsule Auto Encoder (EO-CAE) for lossy compression. Then, the compressed ROI and Non-ROI images are added together, and the inverse operation of the compression process is performed to obtain the reconstructed image. This study used BRATS (2015, 2018) datasets for simulation and attained better performance than other existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104296"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image quilting heuristic compressed sensing video privacy protection coding for abnormal behavior detection in private scenes","authors":"Jixin Liu, Shabo Hu, Haigen Yang, Ning Sun","doi":"10.1016/j.jvcir.2024.104307","DOIUrl":"10.1016/j.jvcir.2024.104307","url":null,"abstract":"<div><div>For video intelligence applications in private scenes such as home environments, traditional image processing methods are usually based on clear raw data and are prone to privacy leakage. Therefore, our team proposed multilayer compressed sensing (MCS) encoding to reduce image quality for visual privacy protection (VPP). However, the way in which MCS coding is implemented leads to unavoidable information loss. On this basis, inspired by the image quilting (IQ) algorithm, an image quilting heuristic MCS (IQ-MCS) coding method is proposed in this paper to improve the problem of faster information loss in the MCS coding process, which means that a similar privacy protection effect is achieved at lower coding layers, thus obtaining better application performance. To evaluate the level of VPP, a VPP evaluation algorithm is proposed that is more in line with subjective assessment. Finally, a correlation model between the VPP level and the performance of smart applications is established to balance the relationships between them, taking the detection of abnormal human behavior in private scenes as an example. The model can also provide a reference for the evaluation of other privacy protection methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104307"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PVT2DNet: Polyp segmentation with vision transformer and dual decoder refinement strategy","authors":"Yibiao Hu, Yan Jin, Zhiwei Jiang, Qiufu Zheng","doi":"10.1016/j.jvcir.2024.104304","DOIUrl":"10.1016/j.jvcir.2024.104304","url":null,"abstract":"<div><div>Colorectal carcinoma is a prevalent malignancy worldwide. Accurate polyp segmentation, along with endoscopic resection, can significantly reduce its incidence and mortality. Most polyp segmentation neural networks are CNN-based and single decoder strategy architectures, which learn limited robust representations. In this paper, we propose a novel network with the vision transformer and dual decoder refinement strategy called PVT2DNet to overcome some limitations of current networks and achieve more precise automated polyp segmentation. The PVT2DNet adopts a pyramid vision transformer encoder and enhances the multi-level features with the context-enhanced module (CEM). Moreover, instead of directly feeding features into a single decoder, we introduce a dual partial cascaded decoder refinement strategy to excavate more informative polyp cues. Extensive experimentations on five widely adopted datasets demonstrate the proposed network outperforms other state-of-the-art on most metrics.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104304"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Wei , Fei Han , Zizhu Fan , Linrui Shi , Cheng Peng
{"title":"Efficient license plate recognition in unconstrained scenarios","authors":"Chao Wei , Fei Han , Zizhu Fan , Linrui Shi , Cheng Peng","doi":"10.1016/j.jvcir.2024.104314","DOIUrl":"10.1016/j.jvcir.2024.104314","url":null,"abstract":"<div><div>Automatic license plate recognition (ALPR) is a critical technology for intelligent transportation systems. Most existing ALPR methods are focused on specific application scenarios. Although there are a few methods that focus on unconstrained scenarios, they are very time-consuming. In this work, we propose an efficient ALPR (EALPR) framework, where we can handle distorted license plates (LP) caused by perspective problems with high efficiency. We design a light LPD structure based on efficient object detection methods and use anchor-free strategies for LPD to alleviate the problem of expensive costs. Benefitting from these optimizations and a united framework structure, the proposed EALPR has real-time efficiency. We evaluate our method on five datasets and the results show that our method achieves state-of-the-art accuracy: 98.15% on OpenALPR(EU), 95.61% on OpenALPR(BR), 99.51% on AOLP(RP), 88.81% on SSIG, 79.41% on CD-HARD. Additionally, our method achieves an impressive speed of 74.9 FPS (Frames Per Second), outperforming existing approaches and demonstrating its efficiency. Our source code can be accessed at <span><span>https://github.com/wechao18/Efficient-alpr-unconstrained</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104314"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UAT:Unsupervised object tracking based on graph attention information embedding","authors":"Lixin Wei , Rongzhe Zhu , Ziyu Hu , Zeyu Xi","doi":"10.1016/j.jvcir.2024.104283","DOIUrl":"10.1016/j.jvcir.2024.104283","url":null,"abstract":"<div><div>An excellent unsupervised tracker includes a powerful base tracker and an effective unsupervised tracking strategy. However, most base trackers lack internal feature representations for information embedding processes. Most unsupervised trackers are not robust enough in complex environments and lack an effective template update strategy. We propose an unsupervised object tracking based on graph attention information embedding (UAT) to solve the above problems. UAT combines graph attention mechanism with multi-scale features to construct a multi-scale graph attention module (MGA). MGA module dynamically and efficiently completes the information embedding between the template branch and the search area branch. The response map obtained by fusing the feature maps of the two branches is more informative about the location of the target. An attention based information reinforcement update module (RUM) improves the robustness of the tracker. RUM enhances the representation of the feature map in both the spatial dimension and the channel dimension. Template features are also updated indirectly through information transfer between the two branches. RUM suppresses background interference and improves network perception during tracking. Experiments on challenging benchmarks such as VOT2018, VOT2019, TrackingNet, OTB100, LaSOT and UAV123 demonstrate that the proposed UAT achieves state-of-the-art performance in unsupervised trackers.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104283"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CAAM: A calibrated augmented attention module for masked face recognition","authors":"M. Saad Shakeel","doi":"10.1016/j.jvcir.2024.104315","DOIUrl":"10.1016/j.jvcir.2024.104315","url":null,"abstract":"<div><div>Along with other aspects of daily life, the COVID-19 pandemic has a substantial impact on the performance of facial recognition (FR) systems installed in various locations for identity verification. To address this pivotal issue, we propose an attention-guided masked face recognition (MFR) method, named Calibrated Augmented Attention Module (CAAM), which consists of two core components: Recursive Attention Gate (RAG) and an Augmented Feature Calibration Block (AFCB). In the first stage, RAG guides the backbone network to pay attention to non-occluded face regions for feature learning by calibrating multi-layer features while progressively reducing the network’s response to mask-occluded regions in a recursive manner. In the second stage, a dual-branch AFCB first augments the attention map generated by RAG to incorporate cross-dimensional interactions, which are then calibrated to build spatial and inter-channel dependencies across informative spatial locations for MFR. Experiments conducted on various masked face datasets validate the superior performance of CAAM.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104315"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distance distributions and runtime analysis of perceptual hashing algorithms","authors":"Shivdutt Sharma","doi":"10.1016/j.jvcir.2024.104310","DOIUrl":"10.1016/j.jvcir.2024.104310","url":null,"abstract":"<div><div>Perceptual image hashing refers to a class of algorithms that produce content-based image hashes. These systems use specialized perceptual hash algorithms like Phash, Microsoft’s PhotoDNA, or Facebook’s PDQ to generate a compact digest of an image file that can be roughly compared to a database of known illicit-content digests. Time taken by perceptual hashing algorithms to generate hash code has been computed. Then, we evaluated perceptual hashing algorithms on two million dataset of images. The produced nine variants of the original images were computed and then several distances were calculated. There have been several studies in the past, but in the existing literature size of the data is small and there are very few studies with hash code computation time and robustness tradeoff. This work shows that existing perceptual hashing algorithms are robust for most of the content-preserving operations and there is a tradeoff between computation time and robustness.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104310"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}