Journal of Visual Communication and Image Representation最新文献_第6页

ACGC: Adaptive chrominance gamma correction for low-light image enhancement 低光图像增强的自适应色度伽玛校正

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-02-05 DOI: 10.1016/j.jvcir.2025.104402

N. Severoglu, Y. Demir, N.H. Kaplan, S. Kucuk

引用次数: 0

Noise variances and regularization learning gradient descent network for image deconvolution 图像反卷积的噪声方差和正则化学习梯度下降网络

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-02-05 DOI: 10.1016/j.jvcir.2025.104391

Shengjiang Kong , Weiwei Wang , Yu Han , Xiangchu Feng

{"title":"Noise variances and regularization learning gradient descent network for image deconvolution","authors":"Shengjiang Kong , Weiwei Wang , Yu Han , Xiangchu Feng","doi":"10.1016/j.jvcir.2025.104391","DOIUrl":"10.1016/j.jvcir.2025.104391","url":null,"abstract":"<div><div>Existing image deblurring approaches usually assume uniform Additive White Gaussian Noise (AWGN). However, the noise in real-world images is generally non-uniform AWGN and exhibits variations across different images. This work presents a deep learning framework for image deblurring that addresses non-uniform AWGN. We introduce a novel data fitting term within a regularization framework to better handle noise variations. Using gradient descent algorithm, we learn the inverse covariance of the non-uniform AWGN, the gradient of the regularization term, and the gradient adjusting factor from data. To achieve this, we unroll the gradient descent iteration into an end-to-end trainable network, where, these components are parameterized by convolutional neural networks. The proposed model is called the noise variances and regularization learning gradient descent network (NRL-GDN). Its major advantage is that it can automatically deal with both uniform and non-uniform AWGN. Experimental results on synthetic and real-world images demonstrate its superiority over existing baselines.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104391"},"PeriodicalIF":2.6,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143339499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GroupRF: Panoptic Scene Graph Generation with group relation tokens GroupRF：使用组关系令牌生成全景场景图

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-02-04 DOI: 10.1016/j.jvcir.2025.104405

Hongyun Wang , Jiachen Li , Xiang Xiang , Qing Xie , Yanchun Ma , Yongjian Liu

{"title":"GroupRF: Panoptic Scene Graph Generation with group relation tokens","authors":"Hongyun Wang , Jiachen Li , Xiang Xiang , Qing Xie , Yanchun Ma , Yongjian Liu","doi":"10.1016/j.jvcir.2025.104405","DOIUrl":"10.1016/j.jvcir.2025.104405","url":null,"abstract":"<div><div>Panoptic Scene Graph Generation (PSG) aims to predict a variety of relations between pairs of objects within an image, and indicate the objects by panoptic segmentation masks instead of bounding boxes. Existing PSG methods attempt to straightforwardly fuse the object tokens for relation prediction, thus failing to fully utilize the interaction between the pairwise objects. To address this problem, we propose a novel framework named <strong>Group R</strong>elation<strong>F</strong>ormer (GroupRF) to capture the fine-grained inter-dependency among all instances. Our method introduce a set of learnable tokens termed group rln tokens, which exploit fine-grained contextual interaction between object tokens with multiple attentive relations. In the process of relation prediction, we adopt multiple triplets to take advantage of the fine-grained interaction included in group rln tokens. We conduct comprehensive experiments on OpenPSG dataset, which show that our method outperforms the previous state-of-the-art method. Furthermore, we also show the effectiveness of our framework by ablation studies. Our code is available at <span><span>https://github.com/WHY-student/GroupRF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104405"},"PeriodicalIF":2.6,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143350262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing low-light color image visibility with hybrid contrast and saturation modification using a saturation-aware map 增强低光彩色图像的可见度与混合对比度和饱和度修改使用饱和度感知地图

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-02-01 DOI: 10.1016/j.jvcir.2025.104392

Sepideh Khormaeipour, Fatemeh Shakeri

{"title":"Enhancing low-light color image visibility with hybrid contrast and saturation modification using a saturation-aware map","authors":"Sepideh Khormaeipour, Fatemeh Shakeri","doi":"10.1016/j.jvcir.2025.104392","DOIUrl":"10.1016/j.jvcir.2025.104392","url":null,"abstract":"<div><div>In this paper, we present a two-stage technique for color image enhancement. In the first stage, we apply the well-established Histogram Equalization method to enhance the overall contrast of the image. This is followed by a local enhancement method to address the differences in average local contrast between the original and enhanced images. In the second stage, we introduce a novel weighted map within a variational framework to adjust the saturation of the contrast-enhanced image. This weighted map identifies regions that require saturation modification and enables a controllable level of adjustment. The map is then multiplied by a maximally saturated color image derived from the original image, and the result is merged with the contrast-enhanced image. Compared to the original low-light image, our method significantly improves image quality, structure, color preservation, and saturation. Additionally, numerical experiments demonstrate that the proposed method outperforms other enhancement techniques in both qualitative and quantitative evaluations.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104392"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scale-invariant mask-guided vehicle keypoint detection from a monocular image 基于尺度不变掩模制导的单眼图像车辆关键点检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-02-01 DOI: 10.1016/j.jvcir.2025.104397

Sunpil Kim , Gang-Joon Yoon , Jinjoo Song , Sang Min Yoon

{"title":"Scale-invariant mask-guided vehicle keypoint detection from a monocular image","authors":"Sunpil Kim , Gang-Joon Yoon , Jinjoo Song , Sang Min Yoon","doi":"10.1016/j.jvcir.2025.104397","DOIUrl":"10.1016/j.jvcir.2025.104397","url":null,"abstract":"<div><div>Intelligent vehicle detection and localization are important for autonomous driving systems, particularly traffic scene understanding. Robust vision-based vehicle localization directly affects the accuracy of self-driving systems but remains challenging to implement reliably due to differences in vehicle sizes, illumination changes, background clutter, and partial occlusion. Bottom-up-based vehicle detection using vehicle keypoint localization efficiently provides semantic information for partial occlusion and complex poses. However, bottom-up-based approaches still struggle to handle robust heatmap estimation from vehicles with scale variations and background ambiguities. This paper addresses the problem of predicting multiple vehicle locations by learning semantic vehicle keypoints using a multi-resolution feature extractor, an offset regression branch, and a heatmap regression branch network. The proposed pipeline estimates the vehicle keypoint by effectively eliminating similar background features using a mask-guided heatmap regression branch and emphasizing scale-adaptive heatmap features in the network. Quantitative and qualitative analyses, including ablation tests, verify that the proposed method is universally applicable, unlike previous approaches.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104397"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Depth completion based on multi-scale spatial propagation and tensor decomposition 基于多尺度空间传播和张量分解的深度补全

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-31 DOI: 10.1016/j.jvcir.2025.104394

Mingming Sun, Tao Li, Qing Liao, Minghui Zhou

引用次数: 0

Document forgery detection based on spatial-frequency and multi-scale feature network 基于空频多尺度特征网络的文件伪造检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-31 DOI: 10.1016/j.jvcir.2025.104393

Li Li , Yu Bai , Shanqing Zhang , Mahmoud Emam

{"title":"Document forgery detection based on spatial-frequency and multi-scale feature network","authors":"Li Li , Yu Bai , Shanqing Zhang , Mahmoud Emam","doi":"10.1016/j.jvcir.2025.104393","DOIUrl":"10.1016/j.jvcir.2025.104393","url":null,"abstract":"<div><div>Passive image forgery detection is one of the main tasks for digital image forensics. Although it is easy to detect and localize forged regions with high accuracies from tampered images through utilizing the diversity and rich detail features of natural images, detecting tampered regions from a tampered textual document image (photographs) still presents many challenges. These challenges include poor detection results and difficulty of identifying the applied forgery type. In this paper, we propose a robust multi-category tampering detection algorithm based on spatial-frequency(SF) domain and multi-scale feature fusion network. First, we employ frequency domain transform and SF feature fusion strategy to strengthen the network’s ability to discriminate tampered document textures. Secondly, we combine HRNet, attention mechanism and a multi-supervision module to capture the features of the document images at different scales and improve forgery detection results. Furthermore, we design a multi-category detection head module to detect multiple types of forgeries that can improve the generalization ability of the proposed algorithm. Extensive experiments on a constructed dataset based on the public StaVer and SCUT-EnsExam datasets have been conducted. The experimental results show that the proposed algorithm improves F1 score of document images tampering detection by nearly 5.73%, and it’s not only able to localize the tampering location, but also accurately identify the applied tampering type.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104393"},"PeriodicalIF":2.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GMNet: Low overlap point cloud registration based on graph matching GMNet：基于图匹配的低重叠点云配准

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-31 DOI: 10.1016/j.jvcir.2025.104400

Lijia Cao , Xueru Wang , Chuandong Guo

引用次数: 0

S2Mix: Style and Semantic Mix for cross-domain 3D model retrieval 跨域三维模型检索的样式和语义混合

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-31 DOI: 10.1016/j.jvcir.2025.104390

Xinwei Fu , Dan Song , Yue Yang , Yuyi Zhang , Bo Wang

{"title":"S2Mix: Style and Semantic Mix for cross-domain 3D model retrieval","authors":"Xinwei Fu , Dan Song , Yue Yang , Yuyi Zhang , Bo Wang","doi":"10.1016/j.jvcir.2025.104390","DOIUrl":"10.1016/j.jvcir.2025.104390","url":null,"abstract":"<div><div>With the development of deep neural networks and image processing technology, cross-domain 3D model retrieval algorithms based on 2D images have attracted much attention, utilizing visual information from labeled 2D images to assist in processing unlabeled 3D models. Existing unsupervised cross-domain 3D model retrieval algorithm use domain adaptation to narrow the modality gap between 2D images and 3D models. However, these methods overlook specific style visual information between different domains of 2D images and 3D models, which is crucial for reducing the domain distribution discrepancy. To address this issue, this paper proposes a Style and Semantic Mix (S<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Mix) network for cross-domain 3D model retrieval, which fuses style visual information and semantic consistency features between different domains. Specifically, we design a style mix module to perform on shallow feature maps that are closer to the input data, learning 2D image and 3D model features with intermediate domain mixed style to narrow the domain distribution discrepancy. In addition, in order to improve the semantic prediction accuracy of unlabeled samples, a semantic mix module is also designed to operate on deep features, fusing features from reliable unlabeled 3D model and 2D image samples with semantic consistency. Our experiments demonstrate the effectiveness of the proposed S<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Mix on two commonly-used cross-domain 3D model retrieval datasets MI3DOR-1 and MI3DOR-2.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104390"},"PeriodicalIF":2.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BAO: Background-aware activation map optimization for weakly supervised semantic segmentation without background threshold BAO：无背景阈值弱监督语义分割的背景感知激活图优化

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-01-31 DOI: 10.1016/j.jvcir.2025.104404

Izumi Fujimori , Masaki Oono , Masami Shishibori

{"title":"BAO: Background-aware activation map optimization for weakly supervised semantic segmentation without background threshold","authors":"Izumi Fujimori , Masaki Oono , Masami Shishibori","doi":"10.1016/j.jvcir.2025.104404","DOIUrl":"10.1016/j.jvcir.2025.104404","url":null,"abstract":"<div><div>Weakly supervised semantic segmentation (WSSS), which employs only image-level labels, has attracted significant attention due to its low annotation cost. In WSSS, pseudo-labels are derived from class activation maps (CAMs) generated by convolutional neural networks or vision transformers. However, during the generation of pseudo-labels from CAMs, a background threshold is typically used to define background regions. In WSSS scenarios, pixel-level labels are typically unavailable, which makes it challenging to determine an optimal background threshold. This study proposes a method for generating pseudo-labels without a background threshold. The proposed method generates CAMs that activate background regions from CAMs initially based on foreground objects. These background-activated CAMs are then used to generate pseudo-labels. The pseudo-labels are then used to train the model to distinguish between the foreground and background regions in the newly generated activation maps. During inference, the background activation map obtained via training replaces the background threshold. To validate the effectiveness of the proposed method, we conducted experiments using the PASCAL VOC 2012 and MS COCO 2014 datasets. The results demonstrate that the pseudo-labels generated using the proposed method significantly outperform those generated using conventional background thresholds. The code is available at: <span><span>https://github.com/i2mond/BAO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104404"},"PeriodicalIF":2.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143339471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0