{"title":"Advancing white balance correction through deep feature statistics and feature distribution matching","authors":"Furkan Kınlı , Barış Özcan , Furkan Kıraç","doi":"10.1016/j.jvcir.2025.104412","DOIUrl":"10.1016/j.jvcir.2025.104412","url":null,"abstract":"<div><div>Auto-white balance (AWB) correction is a crucial process in digital imaging, ensuring accurate and consistent color correction across varying lighting conditions. This study presents an innovative AWB correction method that conceptualizes lighting conditions as the style factor, allowing for more adaptable and precise color correction. Previous studies predominantly relied on Gaussian distribution assumptions for feature distribution alignment, which can limit the ability to fully exploit the style information as a modifying factor. To address this limitation, we propose a U-shaped Transformer-based architecture, where the learning objective of style factor enforces matching deep feature statistics using the Exact Feature Distribution Matching algorithm. Our proposed method consistently outperforms existing AWB correction techniques, as evidenced by both extensive quantitative and qualitative analyses conducted on the Cube+ and a synthetic mixed-illuminant dataset. Furthermore, a systematic component-wise analysis provides deeper insights into the contributions of each element, further validating the robustness of the proposed approach.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104412"},"PeriodicalIF":2.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Ren , Zhenhai Wang , YiJun Jing , Hui Chen , Lutao Yuan , Hongyu Tian , Xing Wang
{"title":"SiamTP: A Transformer tracker based on target perception","authors":"Ying Ren , Zhenhai Wang , YiJun Jing , Hui Chen , Lutao Yuan , Hongyu Tian , Xing Wang","doi":"10.1016/j.jvcir.2025.104426","DOIUrl":"10.1016/j.jvcir.2025.104426","url":null,"abstract":"<div><div>Previous trackers based on Siamese network and transformer do not interact with the feature extraction stage during the feature fusion, excessive weight of the target features in the template area when the target deformation is large during feature fusion, causing target loss. This paper proposes a target tracking framework with target perception based on Siamese network and transformer. First, feature extraction was performed on the template area and search area and the extracted features were enhanced. A concatenation operation is used to combine them. Second, we used the feature perception obtained during the final stage of attention enhancement by searching for images to rank them and extracted the features with higher scores to enhance the feature fusion effect. Experimental results showed that the proposed tracker achieves good results on four common and challenging datasets while running at real-time speed with a speed of approximately 50 fps on a GPU.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104426"},"PeriodicalIF":2.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143550992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengju Pu , Jianjun Hao , Dingding Ma , Jiangting Yan
{"title":"Dynamic emotional memory analysis in digital animation via expression recognition and scene atmosphere enhancement","authors":"Pengju Pu , Jianjun Hao , Dingding Ma , Jiangting Yan","doi":"10.1016/j.jvcir.2025.104427","DOIUrl":"10.1016/j.jvcir.2025.104427","url":null,"abstract":"<div><div>Digital animation serves as a crucial medium for conveying emotions and values. To grasp the emotional perspectives within digital animation, this paper introduces an emotional memory analysis approach grounded in dynamic features. Initially, leveraging the bidirectional alignment mechanism, a CNN-based expression recognition system is proposed to extract the expressive information of characters in the animation. Subsequently, a sentiment analysis technique tailored for enhancing single-frame animation scenes is presented, addressing the issue of atmosphere intensification in these scenes. Ultimately, by integrating expression information and atmosphere data, a sentiment analysis method focused on dynamic features is suggested to establish the emotional relationship between frames, thereby deriving the emotional value of digital animation. Experiments can conclude that our method can obtain the excellent results which are better than state-of-the-art and can realize the emotional polarity analysis of digital animation content.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104427"},"PeriodicalIF":2.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143550991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Perumal Pitchandi , Vijaya Bhaskar Sadu , V. Kalaipoonguzhali , M. Arivukarasi
{"title":"A novel video anomaly detection using hybrid sand cat Swarm optimization with backpropagation neural network by UCSD Ped 1 dataset","authors":"Perumal Pitchandi , Vijaya Bhaskar Sadu , V. Kalaipoonguzhali , M. Arivukarasi","doi":"10.1016/j.jvcir.2025.104414","DOIUrl":"10.1016/j.jvcir.2025.104414","url":null,"abstract":"<div><div>Abnormal symptom detection can reduce the operating costs of power companies, human resource costs, and improve the quality of power grid services. Nonetheless, intrusion detection systems encounter dimensionality problems as next-generation communication networks become increasingly varied and linked. In this study, a backpropagation neural algorithm (BP) is proposed for a Sand Cat Swarm Optimization (SCSO) electrical inspection anomaly detection model. Using genetic algorithms (GA) for feature selection and slope reduction optimization, a novel intrusion detection model was proposed. To substantially enhance the detection capabilities of the proposed model, we initially employed a genetic algorithm-based approach to select highly relevant feature sets from the UCSD Ped 1 dataset. Subsequently, we utilized the SCSO method to train a backpropagation neural network (BPNN). The hybrid SCSO-BPNN approach was employed to address binary and multiclass classification challenges using the UCSD Ped 1 dataset. The effectiveness of the Hybrid Sand Cat Swarm Optimization algorithm and backpropagation model was validated through the application of both the UCSD Ped1 and KDD 99 datasets. In the proposed hybrid algorithm, the SCSO operator plays a crucial role in detecting the globally optimal solution, while simultaneously preventing the BP from becoming trapped in a local optimum. The evaluation of the algorithm effectiveness revealed that SCSO exhibited superior performance compared to PSO, GWO, and GA in terms of both solution quality and consistency. SCSO was employed to identify the optimal weights and thresholds for BP. The proposed model was validated using the electrical data supplied by utility companies during the experimental stage. The findings indicate that the SCSO-BP algorithm consistently achieves a decision-making precision exceeding 99.75%. This algorithm proved to be well suited for power grid surveillance and outperformed other algorithms in terms of accuracy.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104414"},"PeriodicalIF":2.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143551100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtual try-on with Pose-Aware diffusion models","authors":"Taenam Park, Seoung Bum Kim","doi":"10.1016/j.jvcir.2025.104424","DOIUrl":"10.1016/j.jvcir.2025.104424","url":null,"abstract":"<div><div>Image-based virtual try-on (VTON) refers to the task of synthesizing realistic images of a person wearing a target garment based on reference images. Existing approaches use diffusion models that demonstrate outstanding performance in image synthesis tasks but often fail in preserving the pose and body features of the reference person in certain cases. To address these limitations, we propose Pose-Aware Virtual Try-ON (PA-VTON), a methodology that uses a pretrained diffusion-based VTON framework and additional modules that specify in preserving the information of a person’s attributes. Our proposed module, PoseNet, adds spatial conditioning controls to the VTON process to enhance pose consistency preservation. Experimental results on two benchmark datasets demonstrate that our proposed method quantitatively improves image synthesis performance while qualitatively resolving issues such as ghosting effects and improper generation of body parts that previous methods struggled with.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104424"},"PeriodicalIF":2.6,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143526725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunjie Lv , Biyuan Li , Gaowei Sun , Xiuwei Wang , Pengfei Cai , Jun Yan
{"title":"SG-UNet: Hybrid self-guided transformer and U-Net fusion for CT image segmentation","authors":"Chunjie Lv , Biyuan Li , Gaowei Sun , Xiuwei Wang , Pengfei Cai , Jun Yan","doi":"10.1016/j.jvcir.2025.104416","DOIUrl":"10.1016/j.jvcir.2025.104416","url":null,"abstract":"<div><div>In recent years, transformer-based paradigms have made substantial inroads in the domain of CT image segmentation, The Swin Transformer has garnered praise for its strong performance, but it often struggles with capturing fine-grained details, especially in complex tasks like CT image segmentation, where distinguishing subtle differences in key areas is challenging. Additionally, due to its fixed window attention mechanism, Swin Transformer tends to overemphasize local features while overlooking global context, leading to insufficient understanding of critical information and potential loss of important details. To address the limitations of the Swin Transformer, we introduce an innovative U-shaped Hybrid Self-Guided Transformer network (SG-UNet), specifically tailored for CT image segmentation. Our approach refines the self-attention mechanism by integrating hybrid attention with self-guided attention. The hybrid attention mechanism employs adaptive fine-grained global self-attention to capture low-level details and guide token assignment in salient regions, while the self-guided attention dynamically reallocates tokens, prioritizing target regions and reducing attention computation for non-target areas. This synergy enables the model to autonomously refine saliency maps and reassign tokens based on regional importance. To enhance training dynamics, we incorporate a combination of CELoss and BDLoss, which improves training stability, mitigates gradient instability, and accelerates convergence. Additionally, a dynamic learning rate adjustment strategy is employed to optimize the model’s learning process in real-time, ensuring smoother convergence and enhanced performance. Empirical validation on the Synapse and lung datasets demonstrates the superior segmentation performance of the Hybrid Self-Guided Transformer UNet, achieving DSC and HD scores of 82.91 % and 16.46 mm on the Synapse dataset, and 98.13 % and 6.34 mm on the lung dataset, respectively. These results underscore both the effectiveness and the advanced capabilities of our model in segmentation tasks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104416"},"PeriodicalIF":2.6,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143488674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic textures modeling and its application in texture structure decomposition","authors":"Samah Khawaled , Yehoshua Y. Zeevi","doi":"10.1016/j.jvcir.2025.104411","DOIUrl":"10.1016/j.jvcir.2025.104411","url":null,"abstract":"<div><div>Natural stochastic textures coexist in images with complementary edge-type structural elements that constitute the cartoon-type skeleton of an image. Separating texture from the structure of natural image is an important inverse problem in image analysis. In this decomposition, the textural layer, which conveys fine details and small-scale variations, is separated from the image macrostructures (edges and contours). We propose a variational texture-structure separation scheme. Our approach involves texture modeling by a stochastic field; The 2D fractional Brownian motion (fBm), a non-stationary Gaussian self-similar process, which is suitable model for pure natural stochastic textures. We use it as a reconstruction prior to extract the corresponding textural element and show that this separation is crucial for improving the execution of various image processing tasks such as image denoising. Lastly, we highlight how manifold-based representation of texture-structure data, can be implemented in extraction of geometric features and construction of a classification space.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104411"},"PeriodicalIF":2.6,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaokang Liu , Enlong Wang , Huizi Man , Shihua Zhou , Yueping Wang
{"title":"RQVR: A multi-exposure image fusion network that optimizes rendering quality and visual realism","authors":"Xiaokang Liu , Enlong Wang , Huizi Man , Shihua Zhou , Yueping Wang","doi":"10.1016/j.jvcir.2025.104410","DOIUrl":"10.1016/j.jvcir.2025.104410","url":null,"abstract":"<div><div>Deep learning has made significant strides in multi-exposure image fusion in recent years. However, it is still challenging to maintain the integrity of texture details and illumination. This paper proposes a novel multi-exposure image fusion method to optimize Rendering Quality and Visual Realism (RQVR), addressing limitations in recovering details lost under extreme lighting conditions. The Contextual and Edge-aware Module (CAM) enhances image quality by balancing global features and local details, ensuring the texture details of fused images. To enhance the realism of visual effects, an Illumination Equalization Module (IEM) is designed to optimize light adjustment. Moreover, a fusion module (FM) is introduced to minimize information loss in the fused images. Comprehensive experiments conducted on two datasets demonstrate that our proposed method surpasses existing state-of-the-art techniques. The results show that our method not only attains substantial improvements in image quality but also outperforms most advanced techniques in terms of computational efficiency.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104410"},"PeriodicalIF":2.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143394394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiufeng Liu , Zhongqiu Zhao , Weidong Tian , Binbin Liu , Hongmei He
{"title":"Capsule network with using shifted windows for 3D human pose estimation","authors":"Xiufeng Liu , Zhongqiu Zhao , Weidong Tian , Binbin Liu , Hongmei He","doi":"10.1016/j.jvcir.2025.104409","DOIUrl":"10.1016/j.jvcir.2025.104409","url":null,"abstract":"<div><div>3D human pose estimation (HPE) is a vital technology with diverse applications, enhancing precision in tracking, analyzing, and understanding human movements. However, 3D HPE from monocular videos presents significant challenges, primarily due to self-occlusion, which can partially hinder traditional neural networks’ ability to accurately predict these positions. To address this challenge, we propose a novel approach using a capsule network integrated with the shifted windows attention model (SwinCAP). It improves prediction accuracy by effectively capturing the spatial hierarchical relationships between different parts and objects. A Parallel Double Attention mechanism is applied in SwinCAP enhances both computational efficiency and modeling capacity, and a Multi-Attention Collaborative module is introduced to capture a diverse range of information, including both coarse and fine details. Extensive experiments demonstrate that our SwinCAP achieves better or comparable results to state-of-the-art models in the challenging task of viewpoint transfer on two commonly used datasets: Human3.6M and MPI-INF-3DHP.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104409"},"PeriodicalIF":2.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143474872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic-guided face inpainting with subspace pyramid aggregation","authors":"Yaqian Li, Xiumin Zhang, Cunjun Xiao","doi":"10.1016/j.jvcir.2025.104408","DOIUrl":"10.1016/j.jvcir.2025.104408","url":null,"abstract":"<div><div>With the recent advancement of Generative Adversarial Networks, image inpainting has been improved, but the complexity of face structure makes face inpainting more challenging. The main reasons are attributed to two points: (1) the lack of geometry relation between facial features to synthesize fine textures, and (2) the difficulty of repairing occluded area based on known pixels at a distance, especially when the face is occluded over a large area. This paper proposes a face inpainting method based on semantic feature guidance and aggregated subspace pyramid module, where we use the semantic features of masked faces as the prior knowledge to guide the inpainting of masked areas. Besides, we propose an ASPM (Aggregated Subspace Pyramid Module), which aggregates contextual information from different receptive fields and allows the of capturing distant information. We do experiments on the CelebAMask-HQ dataset and the FlickrFaces-HQ dataset, qualitative and quantitative studies show that it surpasses state-of-the-art methods. Code is available at <span><span>https://github.com/xiumin123/Face_</span><svg><path></path></svg></span> inpainting.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104408"},"PeriodicalIF":2.6,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143444916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}