Journal of Visual Communication and Image Representation最新文献_第10页

PMDNet: A multi-stage approach to single image dehazing with contextual and spatial feature preservation PMDNet：一种具有背景和空间特征保留的多阶段单幅图像去雾方法

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-20 DOI: 10.1016/j.jvcir.2024.104379

D. Pushpalatha, P. Prithvi

引用次数: 0

A lightweight gesture recognition network 一个轻量级的手势识别网络

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-20 DOI: 10.1016/j.jvcir.2024.104362

Jinzhao Guo, Xuemei Lei, Bo Li

{"title":"A lightweight gesture recognition network","authors":"Jinzhao Guo, Xuemei Lei, Bo Li","doi":"10.1016/j.jvcir.2024.104362","DOIUrl":"10.1016/j.jvcir.2024.104362","url":null,"abstract":"<div><div>As one of the main human–computer interaction methods, gesture recognition has an urgent issue to be addressed, which huge paramaters and massive computation of the classification and recognition algorithm cause high cost in practical applications. To reduce cost and enhance the detection efficiency, a lightweight model of gesture recognition algorithms is proposed in this paper, based on the YOLOv5s framework. Firstly, we adopt ShuffleNetV2 as the backbone network to reduce the computational load and enhance the model’s detection speed. Additionally, lightweight modules such as GSConv and VoVGSCSP are introduced into the neck network to further compress the model size while maintaining accuracy. Furthermore, the BiFPN (Bi-directional Feature Pyramid Network) structure is incorporated to enhance the network’s detection accuracy at a lower computational cost. Lastly, we introduce the Coordinate Attention (CA) mechanism to enhance the network’s focus on key features. To investigate the rationale behind the introduction of the CA attention mechanism and the BiFPN network structure, we analyze the extracted features and validate the network’s attention on different parts of the feature maps through visualization. Experimental results demonstrate that the proposed algorithm achieves an average precision of 95.2% on the HD-HaGRID dataset. Compared to the original YOLOv5s model, the proposal model reduces the parameter count by 70.6% and the model size by 69.2%. Therefore, this model is suitable for real-time gesture recognition classification and detection, demonstrating significant potential for practical applications.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104362"},"PeriodicalIF":2.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing industrial anomaly detection with Mamba-inspired feature fusion 用曼巴启发的特征融合增强工业异常检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-19 DOI: 10.1016/j.jvcir.2024.104368

Mingjing Pei , Xiancun Zhou , Yourui Huang , Fenghui Zhang , Mingli Pei , Yadong Yang , Shijian Zheng , Mai Xin

{"title":"Enhancing industrial anomaly detection with Mamba-inspired feature fusion","authors":"Mingjing Pei , Xiancun Zhou , Yourui Huang , Fenghui Zhang , Mingli Pei , Yadong Yang , Shijian Zheng , Mai Xin","doi":"10.1016/j.jvcir.2024.104368","DOIUrl":"10.1016/j.jvcir.2024.104368","url":null,"abstract":"<div><div>Image anomaly detection is crucial in industrial applications, with significant research value and practical application potential. Despite recent advancements using image segmentation techniques, challenges remain in global feature extraction, computational complexity, and pixel-level anomaly localization. A scheme is designed to address the issues above. First, the Mamba concept is introduced to enhance global feature extraction while reducing computational complexity. This dual benefit optimizes performance in both aspects. Second, an effective feature fusion module is designed to integrate low-level information into high-level features, improving segmentation accuracy by enabling more precise decoding. The proposed model was evaluated on three datasets, including MVTec AD, BTAD, and AeBAD, demonstrating superior performance across different types of anomalies. Specifically, on the MVTec AD dataset, our method achieved an average AUROC of 99.1% for image-level anomalies and 98.1% for pixel-level anomalies, including a state-of-the-art (SOTA) result of 100% AUROC in the texture anomaly category. These results demonstrate the effectiveness of our method as a valuable reference for industrial image anomaly detection.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104368"},"PeriodicalIF":2.6,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RCMixer: Radar-camera fusion based on vision transformer for robust object detection RCMixer：基于视觉变换的雷达-相机融合鲁棒目标检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-18 DOI: 10.1016/j.jvcir.2024.104367

Lindong Wang , Hongya Tuo , Yu Yuan , Henry Leung , Zhongliang Jing

{"title":"RCMixer: Radar-camera fusion based on vision transformer for robust object detection","authors":"Lindong Wang , Hongya Tuo , Yu Yuan , Henry Leung , Zhongliang Jing","doi":"10.1016/j.jvcir.2024.104367","DOIUrl":"10.1016/j.jvcir.2024.104367","url":null,"abstract":"<div><div>In real-world object detection applications, the camera would be affected by poor lighting conditions, resulting in a deteriorate performance. Millimeter-wave radar and camera have complementary advantages, radar point cloud can help detecting small objects under low light. In this study, we focus on feature-level fusion and propose a novel end-to-end detection network RCMixer. RCMixer mainly includes depth pillar expansion(DPE), hierarchical vision transformer and radar spatial attention (RSA) module. DPE enhances radar projection image according to perspective principle and invariance assumption of adjacent depth; The hierarchical vision transformer backbone alternates the feature extraction of spatial dimension and channel dimension; RSA extracts the radar attention, then it fuses radar and camera features at the late stage. The experiment results on nuScenes dataset show that the accuracy of RCMixer exceeds all comparison networks and its detection ability of small objects in dark light is better than the camera-only method. In addition, the ablation study demonstrates the effectiveness of our method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104367"},"PeriodicalIF":2.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large-scale UAV image stitching based on global registration optimization and graph-cut method 基于全局配准优化和图切法的大规模无人机图像拼接

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-18 DOI: 10.1016/j.jvcir.2024.104354

Zhongxing Wang , Zhizhong Fu , Jin Xu

{"title":"Large-scale UAV image stitching based on global registration optimization and graph-cut method","authors":"Zhongxing Wang , Zhizhong Fu , Jin Xu","doi":"10.1016/j.jvcir.2024.104354","DOIUrl":"10.1016/j.jvcir.2024.104354","url":null,"abstract":"<div><div>This paper presents a large-scale unmanned aerial vehicle (UAV) image stitching method based on global registration optimization and the graph-cut technique. To minimize cumulative registration errors in large-scale image stitching, we propose a two-step global registration optimization approach, which includes affine transformation optimization followed by projective transformation optimization. Evenly distributed matching points are used to formulate the objective function for registration optimization, with the optimal affine transformation serving as the initial value for projective transformation optimization. Additionally, a rigid constraint is incorporated as the regularization term for projective transformation optimization to preserve shape and prevent unnatural warping of the aligned images. After global registration, the graph-cut method is employed to blend the aligned images and generate the final mosaic. The proposed method is evaluated on five UAV-captured remote sensing image datasets. Experimental results demonstrate that our approach effectively aligns multiple images and produces high-quality, seamless mosaics.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104354"},"PeriodicalIF":2.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic gesture recognition using 3D central difference separable residual LSTM coordinate attention networks 基于三维中心差可分离残差LSTM坐标注意网络的动态手势识别

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-17 DOI: 10.1016/j.jvcir.2024.104364

Lichuan Geng , Jie Chen , Yun Tie , Lin Qi , Chengwu Liang

{"title":"Dynamic gesture recognition using 3D central difference separable residual LSTM coordinate attention networks","authors":"Lichuan Geng , Jie Chen , Yun Tie , Lin Qi , Chengwu Liang","doi":"10.1016/j.jvcir.2024.104364","DOIUrl":"10.1016/j.jvcir.2024.104364","url":null,"abstract":"<div><div>The area of human–computer interaction has generated considerable interest in dynamic gesture recognition. However, the intrinsic qualities of the gestures themselves, including their flexibility and spatial scale, as well as external factors such as lighting and background, have impeded the improvement of recognition accuracy. To address this, we present a novel end-to-end recognition network named 3D Central Difference Separable Residual Long Short-Term Memory (LSTM) Coordinate Attention (3D CRLCA) in this paper. The network is composed of three components: (1) 3D Central Difference Separable Convolution (3D CDSC), (2) a residual module to enhance the network’s capability to distinguish between categories, and (3) an LSTM-Coordinate Attention (LSTM-CA) module to direct the network’s attention to the gesture region and its temporal and spatial characteristics. Our experiments using the ChaLearn Large-scale Gesture Recognition Dataset (IsoGD) and IPN datasets demonstrate the effectiveness of our approach, surpassing other existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104364"},"PeriodicalIF":2.6,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MSFFT-Net: A multi-scale feature fusion transformer network for underwater image enhancement MSFFT-Net：用于水下图像增强的多尺度特征融合变压器网络

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-12 DOI: 10.1016/j.jvcir.2024.104355

Zeju Wu , Kaiming Chen , Panxin Ji , Haoran Zhao , Xin Sun

{"title":"MSFFT-Net: A multi-scale feature fusion transformer network for underwater image enhancement","authors":"Zeju Wu , Kaiming Chen , Panxin Ji , Haoran Zhao , Xin Sun","doi":"10.1016/j.jvcir.2024.104355","DOIUrl":"10.1016/j.jvcir.2024.104355","url":null,"abstract":"<div><div>Due to light attenuation and scattering, underwater images typically experience various levels of degradation. This degradation adversely affect object detection and recognition in underwater imagery. Nevertheless, the methods based on convolutional networks have limitations in capturing long-distance dependencies and the methods based on generative adversarial networks exhibit a poor enhancement effect on local detail features. To address this issue, we propose a Multi-Scale Feature Fusion Transformer Network (MSFFT-Net). We design an Underwater Transformer Feature Extraction Module (UTFEM) for conducting window self-attention calculations via maskless reflection filling, thereby enabling the capture of long-distance dependencies. The Channel Transformer Selective Kernel Fusion module (CTSKF) is devised as a replacement for the skip connection. By employing one-stage multi-scale feature coding recombination and two-stage selective kernel (SK) fusion, the model enhances its focus on local detailed features. Extensive experiments on three publicly available datasets demonstrate that our MSFFT-Net achieves better performance than some well-recognized technologies.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104355"},"PeriodicalIF":2.6,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DA4NeRF: Depth-aware Augmentation technique for Neural Radiance Fields DA4NeRF：神经辐射场的深度感知增强技术

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-09 DOI: 10.1016/j.jvcir.2024.104365

Hamed Razavi Khosroshahi , Jaime Sancho , Gun Bang , Gauthier Lafruit , Eduardo Juarez , Mehrdad Teratani

{"title":"DA4NeRF: Depth-aware Augmentation technique for Neural Radiance Fields","authors":"Hamed Razavi Khosroshahi , Jaime Sancho , Gun Bang , Gauthier Lafruit , Eduardo Juarez , Mehrdad Teratani","doi":"10.1016/j.jvcir.2024.104365","DOIUrl":"10.1016/j.jvcir.2024.104365","url":null,"abstract":"<div><div>Neural Radiance Fields (NeRF) demonstrate impressive capabilities in rendering novel views of specific scenes by learning an implicit volumetric representation from posed RGB images without any depth information. View synthesis is the computational process of synthesizing novel images of a scene from different viewpoints, based on a set of existing images. One big problem is the need for a large number of images in the training datasets for neural network-based view synthesis frameworks. The challenge of data augmentation for view synthesis applications has not been addressed yet. NeRF models require comprehensive scene coverage in multiple views to accurately estimate radiance and density at any point. In cases without sufficient coverage of scenes with different viewing directions, cannot effectively interpolate or extrapolate unseen scene parts. In this paper, we introduce a new pipeline to tackle this data augmentation problem using depth data. We use MPEG’s Depth Estimation Reference Software and Reference View Synthesizer to add novel non-existent views to the training sets needed for the NeRF framework. Experimental results show that our approach improves the quality of the rendered images using NeRF’s model. The average quality increased by 6.4 dB in terms of Peak Signal-to-Noise Ratio (PSNR), with the highest increase being 11 dB. Our approach not only adds the ability to handle the sparsely captured multiview content to be used in the NeRF framework, but also makes NeRF more accurate and useful for creating high-quality virtual views.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104365"},"PeriodicalIF":2.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143173436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semantic-aware representations for unsupervised Camouflaged Object Detection 无监督伪装对象检测的语义感知表示

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-09 DOI: 10.1016/j.jvcir.2024.104366

Zelin Lu, Xing Zhao, Liang Xie, Haoran Liang, Ronghua Liang

{"title":"Semantic-aware representations for unsupervised Camouflaged Object Detection","authors":"Zelin Lu, Xing Zhao, Liang Xie, Haoran Liang, Ronghua Liang","doi":"10.1016/j.jvcir.2024.104366","DOIUrl":"10.1016/j.jvcir.2024.104366","url":null,"abstract":"<div><div>Unsupervised image segmentation algorithms face challenges due to the lack of human annotations. They typically employ representations derived from self-supervised models to generate pseudo-labels for supervising model training. Using this strategy, the model’s performance largely depends on the quality of the generated pseudo-labels. In this study, we design an unsupervised framework to perform COD (Camouflaged Object Detection) without the need for generating pseudo-labels. Specifically, we utilize semantic-aware representations, trained in a self-supervised manner on large-scale unlabeled datasets, to guide the training process. These representations not only capturing rich contextual semantic information but also assist in refining the blurred boundaries of camouflaged objects. Furthermore, we design a framework that integrates these semantic-aware representations with task-specific features, enabling the model to perform the UCOD (Unsupervised Camouflaged Object Detection) task with enhanced contextual understanding. Moreover, we introduce an innovative multi-scale token loss function, which maintain the structural integrity of objects at various scales in the model’s predictions through mutual supervision between different features and scales. Extensive experimental validation demonstrates that our model significantly enhances the performance of UCOD, closely approaching the capabilities of state-of-the-art weakly-supervised COD models.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104366"},"PeriodicalIF":2.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143173439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DRGNet: Dual-Relation Graph Network for point cloud analysis DRGNet：用于点云分析的双关系图网络

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-05 DOI: 10.1016/j.jvcir.2024.104353

Ce Zhou, Qiang Ling

{"title":"DRGNet: Dual-Relation Graph Network for point cloud analysis","authors":"Ce Zhou, Qiang Ling","doi":"10.1016/j.jvcir.2024.104353","DOIUrl":"10.1016/j.jvcir.2024.104353","url":null,"abstract":"<div><div>Recently point cloud analysis has attracted more and more attention. However, it is a challenging task because point clouds are irregular, sparse, and unordered. To accomplish that task, this paper proposes Dual Relation Convolution (DRConv) which utilizes both geometric relations and feature-level relations to effectively aggregate discriminative features. The geometric relations take the explicit geometric structures to establish the spatial connections in the local regions while the implicit feature-level relations are taken to capture the neighboring points with the same semantic properties. Based on our proposed DRConv, we construct a Dual-Relation Graph Network (DRGNet) for point cloud analysis. To capture long-range contextual information, our DRGNet searches for neighboring points in both 3D geometric space and feature space to effectively aggregate local and distant information. Furthermore, we propose a Channel Attention Block (CAB), which puts more emphasis on important feature channels and effectively captures global information, and can further improve the performance of point cloud segmentation. Extensive experiments on object classification, shape part segmentation, normal estimation, and semantic segmentation tasks demonstrate that our proposed methods can achieve superior performance.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104353"},"PeriodicalIF":2.6,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0