Journal of Visual Communication and Image Representation最新文献_第8页

GFIA: Generative Fault Image Analysis via vision–language model its application to train bogie transmission system 基于视觉语言模型的生成式故障图像分析及其在列车转向架传动系统中的应用

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-25 DOI: 10.1016/j.jvcir.2025.104482

Chunming Zhang , Yu Wang , Xinge You

{"title":"GFIA: Generative Fault Image Analysis via vision–language model its application to train bogie transmission system","authors":"Chunming Zhang , Yu Wang , Xinge You","doi":"10.1016/j.jvcir.2025.104482","DOIUrl":"10.1016/j.jvcir.2025.104482","url":null,"abstract":"<div><div>Multimedia fault analytics plays a critical role in industrial applications, ensuring safety and reliability. Previous studies have explored fault classification using either one-dimensional signals or two-dimensional images, while understanding fault types and providing appropriate responses remains challenging, especially for complex system failures. To step further in this field, we leverage the powerful reasoning and generative capabilities of Large Multimodal Models (LMMs) for the fault analysis, then transform multi-channel sensor signals from the system into structured grayscale images suitable for visual–language models. Additionally, a domain-specific, strongly supervised dataset is constructed, that is, the Bogie Transmission Unified Fault Dataset (BTU), which contains expert-curated fault types, causes, and solutions. By integrating both image and language modalities, we fine-tune a visual–language model, Generative Fault Image Analysis (GFIA), to enhance fault reasoning and interpretation. Extensive experiments on our BTU dataset demonstrate that GFIA achieves an average diagnostic accuracy exceeding 99.9% for motor faults, reaching 100% for gearbox faults, and exceeding 99.8% for leftaxlebox faults. The proposed GFIA model outperforms traditional deep-learning methods and state-of-the-art large language models, highlighting the effectiveness of vision–language integration for fault analysis.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104482"},"PeriodicalIF":2.6,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144523501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GLST-Net: Global and local spatio-temporal feature fusion network for skeleton-based action recognition GLST-Net：用于骨骼动作识别的全局和局部时空特征融合网络

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-23 DOI: 10.1016/j.jvcir.2025.104515

Runjie Li , Ning He , Jinhua Wang , Fengxi Sun , Hongfei Liu

{"title":"GLST-Net: Global and local spatio-temporal feature fusion network for skeleton-based action recognition","authors":"Runjie Li , Ning He , Jinhua Wang , Fengxi Sun , Hongfei Liu","doi":"10.1016/j.jvcir.2025.104515","DOIUrl":"10.1016/j.jvcir.2025.104515","url":null,"abstract":"<div><div>The Graph Convolutional Network (GCN) has been widely applied in skeleton-based action recognition. However, GCNs typically operate based on local graph structures, which limits their ability to recognize and process complex long-range relationships between joints. The proposed GLST-Net consists of three main modules: the Global–Local Dual-Stream Feature Extraction (GLDE) module, the Multi-Scale Temporal Difference Modeling (MTDM) module, and the Temporal Feature Extraction (TFE) module. The GLDE is designed to capture both global and local feature information throughout the motion process and dynamically combine these two types of features. Additionally, since motion is defined by the changes observed between consecutive frames, MTDM extracts inter-frame difference information by calculating differences across multiple time scales, thereby enhancing the model’s temporal modeling capability. Finally, TFE effectively strengthens the model’s ability to extract temporal features. Extensive experiments conducted on the challenging NTU-RGB+D and UAV-Human datasets demonstrate the effectiveness and superiority of the proposed method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104515"},"PeriodicalIF":2.6,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved threat item detection in baggage X-ray imagery through image projection 通过图像投影改进了行李x射线图像中的威胁物品检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-23 DOI: 10.1016/j.jvcir.2025.104517

Archana Singh , Mriganka Thakur , Dhiraj

引用次数: 0

ALR-Video: A multi-class large-scale compressed video dataset for JNQP prediction 用于JNQP预测的多类大规模压缩视频数据集

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-18 DOI: 10.1016/j.jvcir.2025.104513

Zhun Li , Yuyang Wang , Lianmin Zhang , Hongkui Wang , Haibing Yin , Yong Chen , Wei Zhang

{"title":"ALR-Video: A multi-class large-scale compressed video dataset for JNQP prediction","authors":"Zhun Li , Yuyang Wang , Lianmin Zhang , Hongkui Wang , Haibing Yin , Yong Chen , Wei Zhang","doi":"10.1016/j.jvcir.2025.104513","DOIUrl":"10.1016/j.jvcir.2025.104513","url":null,"abstract":"<div><div>In order to further improve video compression, numerous compressed video datasets have been released to predict the just noticeable distortion (JND) or the just noticeable quantization parameter (JNQP) for perceptual video coding. However, existing compressed video datasets are unable to meet the precise prediction of JNQP (or JND) applicable to different codecs and coding modes. Thus, regarding the current mainstream video codecs, this paper selects 50 source videos and compresses them with H.265 and H.266 codecs with the all intra (AI), the random access (RA) and the low delay (LD) coding modes using 38 quantization parameters. Then, 50 testers are asked to evaluate the JNQP values for three perceptual quality levels for each source video. All JNQP samples have been fully processed to meet the requirement of JNQP prediction for each codec under different coding modes. Our dataset is the first one for JNQP prediction across multiple codecs and coding modes, which is named by ALR-Video and can be downloaded at https://github.com/903365130/ALR-Video.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104513"},"PeriodicalIF":2.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144470633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Meta-RawResampler: Raw image rescaling based on pattern guidance Meta-RawResampler：基于模式引导的原始图像重新缩放

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-18 DOI: 10.1016/j.jvcir.2025.104514

Jingyun Liu , Han Zhu , Daiqin Yang , Zhenzhong Chen , Shan Liu

{"title":"Meta-RawResampler: Raw image rescaling based on pattern guidance","authors":"Jingyun Liu , Han Zhu , Daiqin Yang , Zhenzhong Chen , Shan Liu","doi":"10.1016/j.jvcir.2025.104514","DOIUrl":"10.1016/j.jvcir.2025.104514","url":null,"abstract":"<div><div>Modern digital cameras allow users to set the resolution at which images are saved. When low-resolution images (LR) are required for storage, images will be downscaled. If high-resolution images (HR) are needed subsequently, upscaling will be performed. Image rescaling aims at jointly optimizing downscaling/upscaling to achieve both visually plausible LR and high-fidelity HR. However, previous works have primarily focused on rescaling from RGB images. They cannot alleviate the errors produced during image signal processing (ISP), particularly arising from demosaicking and denoising. Such errors may propagate through the downscaling process and ultimately degrade the upscaling results. In contrast, we directly produce LR from noisy raw images, facing additional challenges due to incomplete color information and noises in raw images that hinder the encoding of texture details from high-resolution images into their LR counterparts. To this issue, Meta-RawResampler is proposed, which performs downscaling with spatial-color wise resampling kernels. The kernel weights are generated under the guidance of both pattern information and image content to facilitate the interaction between color channels. This interaction helps the model to infer information about missing colors based on other recorded colors, thereby enhancing the network’s ability to understand and further preserve high-frequency information. Moreover, a Pattern-Content Dynamic Guidance Module (PCDG) is proposed, which is decomposed into a Channel-wise Per-pixel Color Interpolation Block and a Color-wise Feature Interpolation Block. The former utilizes pattern information and image content to generate channel-wise spatial adaptive kernel weights, fully exploring color correlation between color, channel, and spatial dimensions to facilitate adaptive color interaction. Meanwhile, the latter employs color-wise convolution to further enhance the model’s ability to learn spatial information. Through these designs, our resampler can achieve upscaling results with higher fidelity. Extensive experiments validate the superiority of the proposed Meta-RawResampler both quantitatively and qualitatively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104514"},"PeriodicalIF":2.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144489423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-supervised panoramic stitched image quality assessment based on contrastive learning 基于对比学习的自监督全景拼接图像质量评价

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-16 DOI: 10.1016/j.jvcir.2025.104519

Xiaoer Li , Kexin Zhang , Feng Shao

{"title":"Self-supervised panoramic stitched image quality assessment based on contrastive learning","authors":"Xiaoer Li , Kexin Zhang , Feng Shao","doi":"10.1016/j.jvcir.2025.104519","DOIUrl":"10.1016/j.jvcir.2025.104519","url":null,"abstract":"<div><div>In recent years, with the rapid development of virtual reality technology and the advent of the 5G era, panoramic images have received increasingly widespread attention. Nowadays, researchers have proposed numerous image stitching algorithms. However, research on assessing the quality of stitched images is still relatively scarce. Furthermore, the stitching distortions introduced during the generation of panoramic content make the task of quality assessment even more challenging. In this paper, a new network for panoramic stitched image quality assessment is proposed. To be specific, this model contains two stages: the contrastive learning stage and the quality prediction stage. In the first stage, we introduce two pretext tasks as learning objectives: distortion type prediction and distortion level prediction. This allows the network to learn corresponding features from different viewpoints with varying distortion types and severities. During this process, we utilize prior knowledge of four pre-classified distortion types as category labels and three distortion severity levels as distortion severity labels to assist the pretext tasks. Subsequently, a universal convolutional neural network (CNN) model is trained using a pairwise comparison method. In the quality prediction stage, the trained CNN weights are frozen, and the learned feature representation is mapped to the final quality score through linear regression. We evaluate the proposed network on two benchmark databases and results demonstrate that the combination of two pretext tasks can obtain more accurate results. Overall, our method is superior to existing full-reference and no-reference models designed for 2D images and 360° panoramic stitched image quality assessment.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104519"},"PeriodicalIF":2.6,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lite transformer with medium self attention for efficient traffic sign recognition 具有中等自关注的生活变压器，用于高效的交通标志识别

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-14 DOI: 10.1016/j.jvcir.2025.104502

Junbi Xiao, Qi Zhang, Wenjuan Gong, Jianhang Liu

{"title":"Lite transformer with medium self attention for efficient traffic sign recognition","authors":"Junbi Xiao, Qi Zhang, Wenjuan Gong, Jianhang Liu","doi":"10.1016/j.jvcir.2025.104502","DOIUrl":"10.1016/j.jvcir.2025.104502","url":null,"abstract":"<div><div>The accuracy of traffic sign recognition is of paramount importance for autonomous driving systems. This paper introduces the Indexing-and-Low-Rank-Medium Self Attention mechanism, an innovative approach to self-attention that has been developed with the objective of reducing the size of the model and the computational demand. This mechanism establishes macro-regional connections between queries and keys using indexing, combined with low-rank matrices in order to efficiently compute similarities, thereby reducing the computational overhead. To address the potential for feature loss from low-rank approximations, particularly in critical traffic sign details, we integrate a feature enhancement technique. This technique applies selective thresholding at the outset of the feature extraction process, emphasizing essential features while suppressing less significant ones, without significantly increasing the parameter count. This streamlined approach serves as the foundation for our lightweight model, IMSA-Net. Moreover, IMSA-Net achieves notable accuracies, with 81.7% on the ImageNet-1K dataset, representing a 3% improvement over MobileFormer. This is accompanied by a notable reduction in model parameters by 45.7% in comparison to MobileFormer. Furthermore, IMSA-Net surpasses models such as MobileFormer with accuracies of 93.75% on the German Traffic Sign Recognition Benchmark dataset and 92.97% on the Chinese Traffic Sign Database. This evidence substantiates the efficiency and effectiveness of IMSA-Net in traffic sign recognition tasks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104502"},"PeriodicalIF":2.6,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144306371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lightweight peach detection using partial convolution and improved Non-maximum suppression 使用部分卷积和改进的非最大抑制的轻量级桃检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-09 DOI: 10.1016/j.jvcir.2025.104495

Jiachun Wu , Jinlai Zhang , Jihong Zhu , Fengkun Wang , Binqiang Si , Yi Huang , Jiacheng Zhang , Hui Liu , Yanmei Meng

{"title":"Lightweight peach detection using partial convolution and improved Non-maximum suppression","authors":"Jiachun Wu , Jinlai Zhang , Jihong Zhu , Fengkun Wang , Binqiang Si , Yi Huang , Jiacheng Zhang , Hui Liu , Yanmei Meng","doi":"10.1016/j.jvcir.2025.104495","DOIUrl":"10.1016/j.jvcir.2025.104495","url":null,"abstract":"<div><div>The automation of fruit detection within the agricultural sector is an essential component for enhancing efficiency and ensuring high-quality produce. Among various crops, peaches present a unique challenge due to their soft texture, susceptibility to damage, and the common occurrence of overlapping fruits. These characteristics not only complicate manual inspection but also pose significant hurdles for automated detection systems. In this paper, we propose a new lightweight model for scenarios with a large number of overlapping peaches and low-quality samples in the training dataset. We refer to the proposed model as the Lightweight Peach Detector (LP-Det). Specifically, we proposed Inshape-IoU-Soft-Non-maximum suppression (ISIS-NMS) algorithm to refine the model’s precision and accuracy in detecting overlapping peaches, and proposed Partial Convolution Batch-normalization SiLU (PCBS) module to diminish the model’s size and expedite inference. Moreover, we introduced Wise-Intersection over Union v3 (WIoUv3) to mitigate the influence of low-quality samples on model training and enhance localization accuracy. Our method was assessed using a peach dataset acquired from a peach orchard, demonstrating superior performance compared with several state-of-the-art (SOTA) object detection models.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104495"},"PeriodicalIF":2.6,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144263059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reversible data hiding in encrypted images using adaptive classification encoding 使用自适应分类编码在加密图像中隐藏可逆数据

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-08 DOI: 10.1016/j.jvcir.2025.104501

Haiqing Dong , Jie Song , Heng Yao , Chuan Qin

{"title":"Reversible data hiding in encrypted images using adaptive classification encoding","authors":"Haiqing Dong , Jie Song , Heng Yao , Chuan Qin","doi":"10.1016/j.jvcir.2025.104501","DOIUrl":"10.1016/j.jvcir.2025.104501","url":null,"abstract":"<div><div>An essential information security technology is reversible data hiding in encrypted images (RDHEI), which not only transmits secret data through the encrypted carrier image, but also restores the carrier image error-free. One common framework for RDHEI is vacating room after encryption (VRAE); nevertheless, its embedding capacity tends to be undesirable. We present an RDHEI approach with an enhanced adaptive classification embedding mechanism to ameliorate the issue. To maintain the correlation of pixels within the block, the content owner first encrypts the cover image employing block-level stream encryption and block permutation. The data hider computes the amount of identical bit planes within a block beginning with the most significant bit (MSB) utilizing the MSB prediction method to derive the higher-order bit plane-related coefficients (HOBPRCs). These coefficients are integrated as block labels within the corresponding blocks through the Huffman coding. Then, a refinement classification compression is exploited, mainly using different compression strategies according to different HOBPRCs. For blocks with small HOBPRCs, the adaptive coding strategy is recommended to improve embedding capability. The average payloads of the suggested RDHEI method for the BOSSbase and BOWS-2 datasets are 2.9652 bpp and 2.7615 bpp, respectively, and the experimental results demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104501"},"PeriodicalIF":2.6,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144262260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weakly-supervised road condition detection via scribble annotations 通过潦草注释进行的弱监督路况检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-06-07 DOI: 10.1016/j.jvcir.2025.104494

Hongshuai Qin, Xiao-Diao Chen, Wen Wu, Wenya Yang

{"title":"Weakly-supervised road condition detection via scribble annotations","authors":"Hongshuai Qin, Xiao-Diao Chen, Wen Wu, Wenya Yang","doi":"10.1016/j.jvcir.2025.104494","DOIUrl":"10.1016/j.jvcir.2025.104494","url":null,"abstract":"<div><div>Deep learning-based solutions in road condition detection rely on fully supervised learning widely which requires a large quantity of training data with pixel-wise annotation from researchers. Accurate dense labeling is challenging since the boundaries between road condition regions are ambiguous. To combat this issue, this work aims to introduce a weakly supervised road condition detection framework, which can generate high-quality pseudo masks from sparse scribble labels and train a road condition detection network using these masks. Specifically, we collect a dataset for road condition detection and annotate it with scribble. Next, we propose a graph convolutional network (GCN)-based label augmentation strategy, which considers both local and global image information, to generate pixel-level pseudo-labels by augmenting the label information from scribbles to the whole. To alleviate the label inconsistency caused by sparse annotations, we adopt the supervision strategy with joint loss of labeled and unlabeled regions during training. Extensive experiments demonstrate that the proposed method can work well on various road condition detection and is on par with the full-supervision method. The code will be made publicly available at <span><span>https://github.com/qinhs9/Scribble_road_Condition</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104494"},"PeriodicalIF":2.6,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144290954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0