Journal of Visual Communication and Image Representation最新文献_第7页

Generalization enhancement strategy based on ensemble learning for open domain image manipulation detection 基于集成学习的开放域图像处理检测泛化增强策略

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-31 DOI: 10.1016/j.jvcir.2025.104396

H. Cheng , L. Niu , Z. Zhang , L. Ye

{"title":"Generalization enhancement strategy based on ensemble learning for open domain image manipulation detection","authors":"H. Cheng , L. Niu , Z. Zhang , L. Ye","doi":"10.1016/j.jvcir.2025.104396","DOIUrl":"10.1016/j.jvcir.2025.104396","url":null,"abstract":"<div><div>Image manipulation detection methods play a pivotal role in safeguarding digital image authenticity and integrity by identifying and locating manipulations. Existing image manipulation detection methods suffer from limited generalization, as it is difficult for existing training datasets to cover different manipulation modalities in the open domain. In this paper, we propose a Generalization Enhancement Strategy (GES) based on data augmentation and ensemble learning. Specifically, the GES consists of two modules, namely an Additive Image Manipulation Data Augmentation(AIM-DA) module and a Mask Confidence Estimate based Ensemble Learning (MCE-EL) module. In order to take full advantage of the limited number of real and manipulated images, the AIM-DA module enriches the diversity of the data by generating manipulated traces accumulatively with different kinds of manipulation methods. The MCE-EL module is designed to improve the accuracy of detection in the open domain, which is based on computing and integrating the evaluation of the confidence level of the output masks from different image manipulation detection models. Our proposed GES can be adapted to existing popular image manipulation detection methods. Extensive subjective and objective experimental results show that the detection F1 score can be improved by up to 34.9%, and the localization F1 score can be improved by up to 11.7%, which validates the effectiveness of our method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104396"},"PeriodicalIF":2.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143339496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Masked facial expression recognition based on temporal overlap module and action unit graph convolutional network 基于时间重叠模块和动作单元图卷积网络的人脸隐藏表情识别

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-31 DOI: 10.1016/j.jvcir.2025.104398

Zheyuan Zhang , Bingtong Liu , Ju Zhou , Hanpu Wang , Xinyu Liu , Bing Lin , Tong Chen

{"title":"Masked facial expression recognition based on temporal overlap module and action unit graph convolutional network","authors":"Zheyuan Zhang , Bingtong Liu , Ju Zhou , Hanpu Wang , Xinyu Liu , Bing Lin , Tong Chen","doi":"10.1016/j.jvcir.2025.104398","DOIUrl":"10.1016/j.jvcir.2025.104398","url":null,"abstract":"<div><div>Facial expressions may not truly reflect genuine emotions of people . People often use masked facial expressions (MFEs) to hide their genuine emotions. The recognition of MFEs can help reveal these emotions, which has very important practical value in the field of mental health, security and education. However, MFE is very complex and lacks of research, and the existing facial expression recognition algorithms cannot well recognize the MFEs and the hidden genuine emotions at the same time. To obtain better representations of MFE, we first use the transformer model as the basic framework and design the temporal overlap module to enhance temporal receptive field of the tokens, so as to strengthen the capture of muscle movement patterns in MFE sequences. Secondly, we design a graph convolutional network (GCN) with action unit (AU) intensity as node features and the 3D learnable adjacency matrix based on AU activation state to reduce the irrelevant identity information introduced by image input. Finally, we propose a novel end-to-end dual-stream network combining the image stream (transformer) with the AU stream (GCN) for automatic recognition of MFEs. Compared with other methods, our approach has achieved state-of-the-art results on the core tasks of Masked Facial Expression Database (MFED).</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104398"},"PeriodicalIF":2.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transformer-guided exposure-aware fusion for single-shot HDR imaging 用于单镜头HDR成像的变压器引导曝光感知融合

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-31 DOI: 10.1016/j.jvcir.2025.104401

An Gia Vien , Chul Lee

引用次数: 0

Semantic feature refinement of YOLO for human mask detection in dense crowded 面向密集人群人脸检测的YOLO语义特征细化

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-30 DOI: 10.1016/j.jvcir.2025.104399

Dan Zhang , Qiong Gao , Zhenyu Chen , Zifan Lin

{"title":"Semantic feature refinement of YOLO for human mask detection in dense crowded","authors":"Dan Zhang , Qiong Gao , Zhenyu Chen , Zifan Lin","doi":"10.1016/j.jvcir.2025.104399","DOIUrl":"10.1016/j.jvcir.2025.104399","url":null,"abstract":"<div><div>Due to varying scenes, changes in lighting, crowd density, and the ambiguity or small size of targets, issues often arise in mask detection regarding reduced accuracy and recall rates. To address these challenges, we developed a dataset covering diverse mask categories (CM-D) and designed the YOLO-SFR convolutional network (Semantic Feature Refinement of YOLO). To mitigate the impact of lighting and scene variability on network performance, we introduced the Direct Input Head (DIH). This method enhances the backbone’s ability to filter out light noise by directly incorporating backbone features into the objective function. To address distortion in detecting small and blurry targets during forward propagation, we devised the Progressive Multi-Scale Fusion Module (PMFM). This module integrates multi-scale features from the backbone to minimize feature loss associated with small or blurry targets. We proposed the Shunt Transit Feature Extraction Structure (STFES) to enhance the network’s discriminative capability for dense targets. Extensive experiments on CM-D, which requires less emphasis on high-level features, and MD-3, which demands more sophisticated feature handling, demonstrate that our approach outperforms existing state-of-the-art methods in mask detection. On CM-D, the Ap50 reaches as high as 0.934, and the Ap reaches 0.668. On MD-3, the Ap50 reaches as high as 0.915, and the Ap reaches 0.635.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104399"},"PeriodicalIF":2.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143339497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis and evaluation of improved algorithm for aerial image homogenization processing 航空图像均匀化处理改进算法的分析与评价

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-30 DOI: 10.1016/j.jvcir.2025.104403

Zhihua Zhang , Hao Yuan , Xinxiu Zhang , Dongdong Feng , Yikun Li , Shuwen Yang

{"title":"Analysis and evaluation of improved algorithm for aerial image homogenization processing","authors":"Zhihua Zhang , Hao Yuan , Xinxiu Zhang , Dongdong Feng , Yikun Li , Shuwen Yang","doi":"10.1016/j.jvcir.2025.104403","DOIUrl":"10.1016/j.jvcir.2025.104403","url":null,"abstract":"<div><div>The imaging time, light change, sensor lens angle and ground characteristics have all contributed to significant discrepancies in the brightness and colour distribution of the aerial images. These phenomena will have a significant impact on the production of DOM, and will present challenges in the interpretation, transliteration, feature extraction, and other related processes. In addressing these issues, the current methodologies exhibit shortcomings in terms of artificial subjectivity and an inability to exert comprehensive control over the processing effects, an improved method based on Mask dodging algorithm is proposed after a comprehensive evaluation of some different aerial image dodging algorithms. The aerial image is subjected to the Mask uniform light algorithm for processing. However, due to the uneven distribution of tonal contrast, a gradient stretching algorithm for the image histogram is employed to stretch the aerial image and drone image. Based on statistics, we are aiming to enhance the evaluation index value of image quality and improve contrast stretching through parameter selection and the Linear2% stretching processing algorithm. The comparative analysis reveals that the improved gradient stretching algorithm, based on image histogram, effectively ensures consistent overall brightness and texture contrast in aerial images while significantly improve the clarity of images.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104403"},"PeriodicalIF":2.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Visible-Infrared person re-identification algorithm based on skeleton Insight Criss-Cross network 基于骨架Insight纵横网络的可见-红外人再识别算法

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-28 DOI: 10.1016/j.jvcir.2025.104395

Pan Jiaxing , Zhang Baohua , Zhang Jiale , Gu Yu , Shan Chongrui , Sun Yanxia , Wu Dongyang

{"title":"A Visible-Infrared person re-identification algorithm based on skeleton Insight Criss-Cross network","authors":"Pan Jiaxing , Zhang Baohua , Zhang Jiale , Gu Yu , Shan Chongrui , Sun Yanxia , Wu Dongyang","doi":"10.1016/j.jvcir.2025.104395","DOIUrl":"10.1016/j.jvcir.2025.104395","url":null,"abstract":"<div><div>There are significant inter-class differences in the cross-modal feature space. If the pedestrian skeleton information is used as the discrimination basis for cross-modal person re-identification, the problem of mismatch between the skeleton features and the ID attributes is inevitable. In order to solve the above problems, this paper proposes a novel Skeleton Insight Criss-Cross Network (SI-CCN), which consists of a Skeleton Insight Module (SIM) and a Criss-Cross Module (CCM). The former uses the skeleton hierarchical mechanism to extract the key skeleton information of the pedestrian limb area, obtain the characteristics of the skeleton key points at the pixel level, and the skeleton key points are used as the graph nodes to construct the skeleton posture structure of the pedestrian. And as a result, the SIM module can not only accurately capture the spatial information of various parts of the pedestrian, but also maintain the relative positional relationship between the key points of the skeleton to form a complete skeleton structure. The latter cooperatively optimizes the characteristics of high-dimensional skeleton and low-dimensional identity identification by using a cross-learning mechanism. In order to effectively capture the diverse skeleton posture, the attention distribution of the two in the feature extraction process is dynamically adjusted to integrate identity details at the same time, and the consistency of cross-modal features is improved. The experiments on the two cross-modal person re-identification data sets of SYSU-MM01 and RegDB show that the Rank-1 and mAP of the SI-CCN on the SYSU-MM01 data set are 81.94% and 76.92%, respectively, and the Rank-1 and mAP on the RegDB data set are 95.49% and 95.67%, respectively. The proposed method has better performance than that of the recent representative methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104395"},"PeriodicalIF":2.6,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visual object tracking based on adaptive deblurring integrating motion blur perception 基于融合运动模糊感知的自适应去模糊视觉目标跟踪

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-27 DOI: 10.1016/j.jvcir.2025.104388

Lifan Sun , Baocheng Gong , Jianfeng Liu , Dan Gao

{"title":"Visual object tracking based on adaptive deblurring integrating motion blur perception","authors":"Lifan Sun , Baocheng Gong , Jianfeng Liu , Dan Gao","doi":"10.1016/j.jvcir.2025.104388","DOIUrl":"10.1016/j.jvcir.2025.104388","url":null,"abstract":"<div><div>Visual object tracking in motion-blurred scenes is crucial for applications such as traffic monitoring and navigation, including intelligent video surveillance, robotic vision navigation, and automated driving. Existing tracking algorithms primarily cater to sharp images, exhibiting significant performance degradation in motion-blurred scenes. Image degradation and decreased contrast resulting from motion blur compromise feature extraction quality. This paper proposes a visual object tracking algorithm, SiamADP, based on adaptive deblurring and integrating motion blur perception. First, the proposed algorithm employs a blur perception mechanism to detect whether the input image is severely blurred. After that, an effective motion blur removal network is used to generate blur-free images, facilitating rich and useful feature information extraction. Given the scarcity of motion blur datasets for object tracking evaluation, four test datasets are proposed: three synthetic datasets and a manually collected and labeled real motion blur dataset. Comparative experiments with existing trackers demonstrate the effectiveness and robustness of SiamADP in motion blur scenarios, validating its performance.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104388"},"PeriodicalIF":2.6,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Skeleton-guided and supervised learning of hybrid network for multi-modal action recognition 多模态动作识别的骨架引导和监督学习混合网络

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-25 DOI: 10.1016/j.jvcir.2025.104389

Ziliang Ren , Li Luo , Yong Qin

{"title":"Skeleton-guided and supervised learning of hybrid network for multi-modal action recognition","authors":"Ziliang Ren , Li Luo , Yong Qin","doi":"10.1016/j.jvcir.2025.104389","DOIUrl":"10.1016/j.jvcir.2025.104389","url":null,"abstract":"<div><div>With the wide application of multi-modal data in computer vision classification tasks, multi-modal action recognition has become a high-profile research area. However, it has been a challenging task to fully utilize the complementarities between different modalities and extract high-level semantic features that are closely related to actions. In this paper, we employ a skeleton alignment mechanism and design a sampling and skeleton-guided cropping module (SSGCM), which serves to crop redundant background information in RGB and depth sequences, thereby enhancing the representation of important RGB and depth information that is closely related to actions. In addition, we transform the entire skeleton information into a set of pseudo-images by mapping and normalizing the information of skeleton data in a matrix, which is used as a supervised information flow for extracting multi-modal complementary features. Furthermore, we propose an innovative multi-modal supervised learning framework based on a hybrid network, which aims to learn compensatory features from RGB, depth and skeleton modalities to improve the performance of multi-modal action recognition. We comprehensively evaluate our recognition framework on the three benchmark multi-modal dataset: NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD. The results show that our method achieved the state-of-the-art action recognition performance on these three benchmark datasets through the joint training and supervised learning strategies with SSGCM.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104389"},"PeriodicalIF":2.6,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning scalable Omni-scale distribution for crowd counting 学习可扩展的全尺度分布人群计数

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-18 DOI: 10.1016/j.jvcir.2025.104387

Huake Wang , Xingsong Hou , Kaibing Zhang , Xin Zeng , Minqi Li , Wenke Sun , Xueming Qian

{"title":"Learning scalable Omni-scale distribution for crowd counting","authors":"Huake Wang , Xingsong Hou , Kaibing Zhang , Xin Zeng , Minqi Li , Wenke Sun , Xueming Qian","doi":"10.1016/j.jvcir.2025.104387","DOIUrl":"10.1016/j.jvcir.2025.104387","url":null,"abstract":"<div><div>Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104387"},"PeriodicalIF":2.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PFFNet: A point cloud based method for 3D face flow estimation PFFNet：基于点云的三维人脸流估计方法

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-10 DOI: 10.1016/j.jvcir.2024.104382

Dong Li, Yuchen Deng, Zijun Huang

引用次数: 0