{"title":"PFFNet: A point cloud based method for 3D face flow estimation","authors":"Dong Li, Yuchen Deng, Zijun Huang","doi":"10.1016/j.jvcir.2024.104382","DOIUrl":"10.1016/j.jvcir.2024.104382","url":null,"abstract":"<div><div>In recent years, the research on 3D facial flow has received more attention, and it is of great significance for related research on 3D faces. Point cloud based 3D face flow estimation is inherently challenging due to non-rigid and large-scale motion. In this paper, we propose a novel method called PFFNet for estimating 3D face flow in a coarse-to-fine network. Specifically, an adaptive sampling module is proposed to learn sampling points, and an effective channel-wise feature extraction module is incorporated to learn facial priors from the point clouds, jointly. Additionally, to accommodate large-scale motion, we also introduce a normal vector angle upsampling module to enhance local semantic consistency, and a context-aware cost volume that learns the correlation between the two point clouds with context information. Experiments conducted on the FaceScape dataset demonstrate that the proposed method outperforms state-of-the-art scene flow methods by a significant margin.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104382"},"PeriodicalIF":2.6,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAFA: Lifelong Person Re-Identification learning by statistics-aware feature alignment","authors":"Qiankun Gao, Mengxi Jia, Jie Chen, Jian Zhang","doi":"10.1016/j.jvcir.2024.104378","DOIUrl":"10.1016/j.jvcir.2024.104378","url":null,"abstract":"<div><div>The goal of Lifelong Person Re-Identification (Re-ID) is to continuously update a model with new data to improve its generalization ability, without forgetting previously learned knowledge. Lifelong Re-ID approaches usually employs classifier-based knowledge distillation to overcome forgetting, where classifier parameters grow with the amount of learning data. In the fine-grained Re-ID task, features contain more valuable information than classifiers. However, due to feature space drift, naive feature distillation can overly suppress model’s plasticity. This paper proposes SAFA with statistics-aware feature alignment and progressive feature distillation. Specifically, we align new and old features based on coefficient of variation and gradually increase the strength of feature distillation. This encourages the model to learn new knowledge in early epochs, punishes it for forgetting in later epochs, and ultimately achieves a better stability–plasticity balance. Experiments on domain-incremental and intra-domain benchmarks demonstrate that our SAFA significantly outperforms counterparts while achieving better memory and computation efficiency.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104378"},"PeriodicalIF":2.6,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valter Estevam , Rayson Laroca , Helio Pedrini , David Menotti
{"title":"Dense video captioning using unsupervised semantic information","authors":"Valter Estevam , Rayson Laroca , Helio Pedrini , David Menotti","doi":"10.1016/j.jvcir.2024.104385","DOIUrl":"10.1016/j.jvcir.2024.104385","url":null,"abstract":"<div><div>We introduce a method to learn unsupervised semantic visual information based on the premise that complex events can be decomposed into simpler events and that these simple events are shared across several complex events. We first employ a clustering method to group representations producing a visual codebook. Then, we learn a dense representation by encoding the co-occurrence probability matrix for the codebook entries. This representation leverages the performance of the dense video captioning task in a scenario with only visual features. For example, we replace the audio signal in the BMT method and produce temporal proposals with comparable performance. Furthermore, we concatenate the visual representation with our descriptor in a vanilla transformer method to achieve state-of-the-art performance in the captioning subtask compared to the methods that explore only visual features, as well as a competitive performance with multi-modal methods. Our code is available at <span><span>https://github.com/valterlej/dvcusi</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104385"},"PeriodicalIF":2.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quality assessment of windowed 6DoF video with viewpoint switching","authors":"Wenhui Zou , Tingyan Tang , Weihua Chen , Gangyi Jiang , Zongju Peng","doi":"10.1016/j.jvcir.2024.104352","DOIUrl":"10.1016/j.jvcir.2024.104352","url":null,"abstract":"<div><div>Windowed six degrees of freedom (6DoF) video systems can provide users with highly interactive experiences by offering three rotational and three translational free movements. Free viewing in immersive scenes requires extensive viewpoint switching, which introduces new distortions (such as jitter and discomfort) to windowed 6DoF videos in addition to traditional compression and rendering distortions. This paper proposes a quality assessment method via spatiotemporal features and view switching smoothness for windowed 6DoF-synthesized videos with a wide field of view. Firstly, the edges are extracted from video frames to obtain local spatial distortion features by measuring their statistical characteristics through a generalized Gaussian distribution. Then, the synthesized videos are decomposed and reassembled in the temporal domain to intuitively describe the horizontal and vertical characteristics of the temporal distortions. A gradient-weighted local binary pattern is used to measure temporal flicker distortions. Next, to assess the impact of viewpoint switching on visual perception, a velocity model for retinal image motion is established. Finally, the objective quality score is predicted by a weighted regression model. The experimental results confirm that the proposed method is highly competitive.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104352"},"PeriodicalIF":2.6,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Three-dimension deep model for body mass index estimation from facial image sequences with different poses","authors":"Chenghao Xiang, Boxiang Liu, Liang Zhao, Xiujuan Zheng","doi":"10.1016/j.jvcir.2024.104381","DOIUrl":"10.1016/j.jvcir.2024.104381","url":null,"abstract":"<div><div>Body mass index (BMI), an essential indicator of human health, can be calculated based on height and weight. Previous studies have carried out visual BMI estimation from a frontal facial image. However, these studies have ignored the visual information provided by the different face poses on BMI estimation. Considering the contributions of different face poses, this study applies the perspective transformation to the public facial image dataset to simulate face rotation and collects a video dataset with face rotation in yaw type. A three-dimensional convolutional neural network, which integrates the facial three-dimensional information from an image sequence with different face poses, is proposed for BMI estimation. The proposed methods are validated using the public and private datasets. Ablation experiments demonstrate that the face sequence with different poses can improve the performance of visual BMI estimation. Comparison experiments indicate that the proposed method can increase classification accuracy and reduce visual BMI estimation errors. Code has been released: <span><span>https://github.com/xiangch1910/STNET-BMI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104381"},"PeriodicalIF":2.6,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancement-suppression driven lightweight fine-grained micro-expression recognition","authors":"Xinmiao Ding , Yuanyuan Li , Yulin Wu , Wen Guo","doi":"10.1016/j.jvcir.2024.104383","DOIUrl":"10.1016/j.jvcir.2024.104383","url":null,"abstract":"<div><div>Micro-expressions are short-lived and authentic emotional expressions used in several fields such as deception detection, criminal analysis, and medical diagnosis. Although deep learning-based approaches have achieved outstanding performance in micro-expression recognition, the recognition performance of lightweight networks for terminal applications is still unsatisfactory. This is mainly because existing models either excessively focus on a single region or lack comprehensiveness in identifying various regions, resulting in insufficient extraction of fine-grained features. To address this problem, this paper proposes a lightweight micro-expression recognition framework –Lightweight Fine-Grained Network (LFGNet). The proposed network adopts EdgeNeXt as the backbone network to effectively combine local and global features, as a result, it greatly reduces the complexity of the model while capturing micro-expression actions. To further enhance the feature extraction ability of the model, the Enhancement-Suppression Module (ESM) is developed where the Feature Suppression Module(FSM) is used to force the model to extract other potential features at deeper layers. Finally, a multi-scale Feature Fusion Module (FFM) is proposed to weigh the fusion of the learned features at different granularity scales for improving the robustness of the model. Experimental results, obtained from four datasets, demonstrate that the proposed method outperforms already existing methods in terms of recognition accuracy and model complexity.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104383"},"PeriodicalIF":2.6,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Register assisted aggregation for visual place recognition","authors":"Xuan Yu, Zhenyong Fu","doi":"10.1016/j.jvcir.2024.104384","DOIUrl":"10.1016/j.jvcir.2024.104384","url":null,"abstract":"<div><div>Visual Place Recognition (VPR) refers to use computer vision to recognize the position of the current query image. Due to the significant changes in appearance caused by season, lighting, and time spans between query and database images, these differences increase the difficulty of place recognition. Previous approaches often discard irrelevant features (such as sky, roads and vehicles) as well as features that can enhance recognition accuracy (such as buildings and trees). To address this, we propose a novel feature aggregation method designed to preserve these critical features. Specifically, we introduce additional registers on top of the original image tokens to facilitate model training, enabling the extraction of both global and local features that contain discriminative place information. Once the attention weights are reallocated, these registers will be discarded. Experimental results demonstrate that our approach effectively separates unstable features from original image representation, and achieves superior performance compared to state-of-the-art methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104384"},"PeriodicalIF":2.6,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Devising a comprehensive synthetic underwater image dataset","authors":"Kuruma Purnima, C.Siva Kumar","doi":"10.1016/j.jvcir.2024.104386","DOIUrl":"10.1016/j.jvcir.2024.104386","url":null,"abstract":"<div><div>The underwater environment is characterized by complex light interactions, including effects such as color loss, contrast loss, water distortion, backscatter, light attenuation, and color cast, which vary depending on water purity, depth, and other factors. While many datasets in the literature contain specific ground-truth images, image pairs, or limited analysis with metrics, there is a need for a comprehensive dataset that covers a wide range of underwater effects with varying severity levels. This paper introduces a dataset consisting of 100 ground-truth images and 15,000 synthetic underwater images. Given the complexity of underwater light variations, simulating these effects is challenging. This study approximates the underwater effects using implementable combinations of color cast, blurring, low-light, and contrast reduction. In addition to generating 15,100 images, the dataset includes a comprehensive analysis with 21 focus metrics, such as the average contrast measure operator and Brenner’s gradient-based metric, as well as 7 statistical measures, including mean intensity and skewness.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104386"},"PeriodicalIF":2.6,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuting Zuo , Jing Chen , Kaixing Wang , Qi Lin , Huanqiang Zeng
{"title":"Local flow propagation and global multi-scale dilated Transformer for video inpainting","authors":"Yuting Zuo , Jing Chen , Kaixing Wang , Qi Lin , Huanqiang Zeng","doi":"10.1016/j.jvcir.2024.104380","DOIUrl":"10.1016/j.jvcir.2024.104380","url":null,"abstract":"<div><div>In this paper, a video inpainting framework that combines Local Flow Propagation with the Global Multi-scale Dilated Transformer, referred to as LFP-GMDT, is proposed. First, optical flow is utilized to guide the bidirectional propagation of features between adjacent frames for local inpainting. With the introduction of deformable convolutions, optical flow errors are corrected, substantially enhancing the accuracy of both local inpainting and frame alignment. Following the local inpainting stage, a multi-scale dilated Transformer module is designed for global inpainting. This module integrates multi-scale feature representations with an attention mechanism, introducing a multi-scale dilated attention mechanism that balances the modeling capabilities of local details and global structures while reducing computational complexity. Experimental results show that, compared to existing models, LFP-GMDT performs exceptionally well in detail restoration and structural integrity, particularly excelling in the recovery of edge structures, leading to an overall enhancement in visual quality.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104380"},"PeriodicalIF":2.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PMDNet: A multi-stage approach to single image dehazing with contextual and spatial feature preservation","authors":"D. Pushpalatha, P. Prithvi","doi":"10.1016/j.jvcir.2024.104379","DOIUrl":"10.1016/j.jvcir.2024.104379","url":null,"abstract":"<div><div>Hazy images suffer from degraded contrast and visibility due to atmospheric factors, affecting the accuracy of object detection in computer vision tasks. To address this, we propose a novel Progressive Multiscale Dehazing Network (PMDNet) for restoring the original quality of hazy images. Our network aims to balance high-level contextual information and spatial details effectively during the image recovery process. PMDNet employs a multi-stage architecture that gradually learns to remove haze by breaking down the dehazing process into manageable steps. Starting with a U-Net encoder-decoder to capture high-level context, PMDNet integrates a subnetwork to preserve local feature details. A SAN reweights features at each stage, ensuring smooth information transfer and preventing loss through cross-connections. Extensive experiments on datasets like RESIDE, I-HAZE, O-HAZE, D-HAZE, REAL-HAZE48, RTTS and Forest datasets, demonstrate the robustness of PMDNet, achieving strong qualitative and quantitative results.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104379"},"PeriodicalIF":2.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}