Journal of Visual Communication and Image Representation最新文献_第9页

Enhancing industrial anomaly detection with Mamba-inspired feature fusion 用曼巴启发的特征融合增强工业异常检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-19 DOI: 10.1016/j.jvcir.2024.104368

Mingjing Pei , Xiancun Zhou , Yourui Huang , Fenghui Zhang , Mingli Pei , Yadong Yang , Shijian Zheng , Mai Xin

{"title":"Enhancing industrial anomaly detection with Mamba-inspired feature fusion","authors":"Mingjing Pei , Xiancun Zhou , Yourui Huang , Fenghui Zhang , Mingli Pei , Yadong Yang , Shijian Zheng , Mai Xin","doi":"10.1016/j.jvcir.2024.104368","DOIUrl":"10.1016/j.jvcir.2024.104368","url":null,"abstract":"<div><div>Image anomaly detection is crucial in industrial applications, with significant research value and practical application potential. Despite recent advancements using image segmentation techniques, challenges remain in global feature extraction, computational complexity, and pixel-level anomaly localization. A scheme is designed to address the issues above. First, the Mamba concept is introduced to enhance global feature extraction while reducing computational complexity. This dual benefit optimizes performance in both aspects. Second, an effective feature fusion module is designed to integrate low-level information into high-level features, improving segmentation accuracy by enabling more precise decoding. The proposed model was evaluated on three datasets, including MVTec AD, BTAD, and AeBAD, demonstrating superior performance across different types of anomalies. Specifically, on the MVTec AD dataset, our method achieved an average AUROC of 99.1% for image-level anomalies and 98.1% for pixel-level anomalies, including a state-of-the-art (SOTA) result of 100% AUROC in the texture anomaly category. These results demonstrate the effectiveness of our method as a valuable reference for industrial image anomaly detection.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104368"},"PeriodicalIF":2.6,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RCMixer: Radar-camera fusion based on vision transformer for robust object detection RCMixer：基于视觉变换的雷达-相机融合鲁棒目标检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-18 DOI: 10.1016/j.jvcir.2024.104367

Lindong Wang , Hongya Tuo , Yu Yuan , Henry Leung , Zhongliang Jing

{"title":"RCMixer: Radar-camera fusion based on vision transformer for robust object detection","authors":"Lindong Wang , Hongya Tuo , Yu Yuan , Henry Leung , Zhongliang Jing","doi":"10.1016/j.jvcir.2024.104367","DOIUrl":"10.1016/j.jvcir.2024.104367","url":null,"abstract":"<div><div>In real-world object detection applications, the camera would be affected by poor lighting conditions, resulting in a deteriorate performance. Millimeter-wave radar and camera have complementary advantages, radar point cloud can help detecting small objects under low light. In this study, we focus on feature-level fusion and propose a novel end-to-end detection network RCMixer. RCMixer mainly includes depth pillar expansion(DPE), hierarchical vision transformer and radar spatial attention (RSA) module. DPE enhances radar projection image according to perspective principle and invariance assumption of adjacent depth; The hierarchical vision transformer backbone alternates the feature extraction of spatial dimension and channel dimension; RSA extracts the radar attention, then it fuses radar and camera features at the late stage. The experiment results on nuScenes dataset show that the accuracy of RCMixer exceeds all comparison networks and its detection ability of small objects in dark light is better than the camera-only method. In addition, the ablation study demonstrates the effectiveness of our method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104367"},"PeriodicalIF":2.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large-scale UAV image stitching based on global registration optimization and graph-cut method 基于全局配准优化和图切法的大规模无人机图像拼接

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-18 DOI: 10.1016/j.jvcir.2024.104354

Zhongxing Wang , Zhizhong Fu , Jin Xu

{"title":"Large-scale UAV image stitching based on global registration optimization and graph-cut method","authors":"Zhongxing Wang , Zhizhong Fu , Jin Xu","doi":"10.1016/j.jvcir.2024.104354","DOIUrl":"10.1016/j.jvcir.2024.104354","url":null,"abstract":"<div><div>This paper presents a large-scale unmanned aerial vehicle (UAV) image stitching method based on global registration optimization and the graph-cut technique. To minimize cumulative registration errors in large-scale image stitching, we propose a two-step global registration optimization approach, which includes affine transformation optimization followed by projective transformation optimization. Evenly distributed matching points are used to formulate the objective function for registration optimization, with the optimal affine transformation serving as the initial value for projective transformation optimization. Additionally, a rigid constraint is incorporated as the regularization term for projective transformation optimization to preserve shape and prevent unnatural warping of the aligned images. After global registration, the graph-cut method is employed to blend the aligned images and generate the final mosaic. The proposed method is evaluated on five UAV-captured remote sensing image datasets. Experimental results demonstrate that our approach effectively aligns multiple images and produces high-quality, seamless mosaics.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104354"},"PeriodicalIF":2.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic gesture recognition using 3D central difference separable residual LSTM coordinate attention networks 基于三维中心差可分离残差LSTM坐标注意网络的动态手势识别

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-17 DOI: 10.1016/j.jvcir.2024.104364

Lichuan Geng , Jie Chen , Yun Tie , Lin Qi , Chengwu Liang

{"title":"Dynamic gesture recognition using 3D central difference separable residual LSTM coordinate attention networks","authors":"Lichuan Geng , Jie Chen , Yun Tie , Lin Qi , Chengwu Liang","doi":"10.1016/j.jvcir.2024.104364","DOIUrl":"10.1016/j.jvcir.2024.104364","url":null,"abstract":"<div><div>The area of human–computer interaction has generated considerable interest in dynamic gesture recognition. However, the intrinsic qualities of the gestures themselves, including their flexibility and spatial scale, as well as external factors such as lighting and background, have impeded the improvement of recognition accuracy. To address this, we present a novel end-to-end recognition network named 3D Central Difference Separable Residual Long Short-Term Memory (LSTM) Coordinate Attention (3D CRLCA) in this paper. The network is composed of three components: (1) 3D Central Difference Separable Convolution (3D CDSC), (2) a residual module to enhance the network’s capability to distinguish between categories, and (3) an LSTM-Coordinate Attention (LSTM-CA) module to direct the network’s attention to the gesture region and its temporal and spatial characteristics. Our experiments using the ChaLearn Large-scale Gesture Recognition Dataset (IsoGD) and IPN datasets demonstrate the effectiveness of our approach, surpassing other existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104364"},"PeriodicalIF":2.6,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MSFFT-Net: A multi-scale feature fusion transformer network for underwater image enhancement MSFFT-Net：用于水下图像增强的多尺度特征融合变压器网络

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-12 DOI: 10.1016/j.jvcir.2024.104355

Zeju Wu , Kaiming Chen , Panxin Ji , Haoran Zhao , Xin Sun

{"title":"MSFFT-Net: A multi-scale feature fusion transformer network for underwater image enhancement","authors":"Zeju Wu , Kaiming Chen , Panxin Ji , Haoran Zhao , Xin Sun","doi":"10.1016/j.jvcir.2024.104355","DOIUrl":"10.1016/j.jvcir.2024.104355","url":null,"abstract":"<div><div>Due to light attenuation and scattering, underwater images typically experience various levels of degradation. This degradation adversely affect object detection and recognition in underwater imagery. Nevertheless, the methods based on convolutional networks have limitations in capturing long-distance dependencies and the methods based on generative adversarial networks exhibit a poor enhancement effect on local detail features. To address this issue, we propose a Multi-Scale Feature Fusion Transformer Network (MSFFT-Net). We design an Underwater Transformer Feature Extraction Module (UTFEM) for conducting window self-attention calculations via maskless reflection filling, thereby enabling the capture of long-distance dependencies. The Channel Transformer Selective Kernel Fusion module (CTSKF) is devised as a replacement for the skip connection. By employing one-stage multi-scale feature coding recombination and two-stage selective kernel (SK) fusion, the model enhances its focus on local detailed features. Extensive experiments on three publicly available datasets demonstrate that our MSFFT-Net achieves better performance than some well-recognized technologies.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104355"},"PeriodicalIF":2.6,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DA4NeRF: Depth-aware Augmentation technique for Neural Radiance Fields DA4NeRF：神经辐射场的深度感知增强技术

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-09 DOI: 10.1016/j.jvcir.2024.104365

Hamed Razavi Khosroshahi , Jaime Sancho , Gun Bang , Gauthier Lafruit , Eduardo Juarez , Mehrdad Teratani

{"title":"DA4NeRF: Depth-aware Augmentation technique for Neural Radiance Fields","authors":"Hamed Razavi Khosroshahi , Jaime Sancho , Gun Bang , Gauthier Lafruit , Eduardo Juarez , Mehrdad Teratani","doi":"10.1016/j.jvcir.2024.104365","DOIUrl":"10.1016/j.jvcir.2024.104365","url":null,"abstract":"<div><div>Neural Radiance Fields (NeRF) demonstrate impressive capabilities in rendering novel views of specific scenes by learning an implicit volumetric representation from posed RGB images without any depth information. View synthesis is the computational process of synthesizing novel images of a scene from different viewpoints, based on a set of existing images. One big problem is the need for a large number of images in the training datasets for neural network-based view synthesis frameworks. The challenge of data augmentation for view synthesis applications has not been addressed yet. NeRF models require comprehensive scene coverage in multiple views to accurately estimate radiance and density at any point. In cases without sufficient coverage of scenes with different viewing directions, cannot effectively interpolate or extrapolate unseen scene parts. In this paper, we introduce a new pipeline to tackle this data augmentation problem using depth data. We use MPEG’s Depth Estimation Reference Software and Reference View Synthesizer to add novel non-existent views to the training sets needed for the NeRF framework. Experimental results show that our approach improves the quality of the rendered images using NeRF’s model. The average quality increased by 6.4 dB in terms of Peak Signal-to-Noise Ratio (PSNR), with the highest increase being 11 dB. Our approach not only adds the ability to handle the sparsely captured multiview content to be used in the NeRF framework, but also makes NeRF more accurate and useful for creating high-quality virtual views.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104365"},"PeriodicalIF":2.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143173436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semantic-aware representations for unsupervised Camouflaged Object Detection 无监督伪装对象检测的语义感知表示

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-09 DOI: 10.1016/j.jvcir.2024.104366

Zelin Lu, Xing Zhao, Liang Xie, Haoran Liang, Ronghua Liang

{"title":"Semantic-aware representations for unsupervised Camouflaged Object Detection","authors":"Zelin Lu, Xing Zhao, Liang Xie, Haoran Liang, Ronghua Liang","doi":"10.1016/j.jvcir.2024.104366","DOIUrl":"10.1016/j.jvcir.2024.104366","url":null,"abstract":"<div><div>Unsupervised image segmentation algorithms face challenges due to the lack of human annotations. They typically employ representations derived from self-supervised models to generate pseudo-labels for supervising model training. Using this strategy, the model’s performance largely depends on the quality of the generated pseudo-labels. In this study, we design an unsupervised framework to perform COD (Camouflaged Object Detection) without the need for generating pseudo-labels. Specifically, we utilize semantic-aware representations, trained in a self-supervised manner on large-scale unlabeled datasets, to guide the training process. These representations not only capturing rich contextual semantic information but also assist in refining the blurred boundaries of camouflaged objects. Furthermore, we design a framework that integrates these semantic-aware representations with task-specific features, enabling the model to perform the UCOD (Unsupervised Camouflaged Object Detection) task with enhanced contextual understanding. Moreover, we introduce an innovative multi-scale token loss function, which maintain the structural integrity of objects at various scales in the model’s predictions through mutual supervision between different features and scales. Extensive experimental validation demonstrates that our model significantly enhances the performance of UCOD, closely approaching the capabilities of state-of-the-art weakly-supervised COD models.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104366"},"PeriodicalIF":2.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143173439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DRGNet: Dual-Relation Graph Network for point cloud analysis DRGNet：用于点云分析的双关系图网络

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-12-05 DOI: 10.1016/j.jvcir.2024.104353

Ce Zhou, Qiang Ling

{"title":"DRGNet: Dual-Relation Graph Network for point cloud analysis","authors":"Ce Zhou, Qiang Ling","doi":"10.1016/j.jvcir.2024.104353","DOIUrl":"10.1016/j.jvcir.2024.104353","url":null,"abstract":"<div><div>Recently point cloud analysis has attracted more and more attention. However, it is a challenging task because point clouds are irregular, sparse, and unordered. To accomplish that task, this paper proposes Dual Relation Convolution (DRConv) which utilizes both geometric relations and feature-level relations to effectively aggregate discriminative features. The geometric relations take the explicit geometric structures to establish the spatial connections in the local regions while the implicit feature-level relations are taken to capture the neighboring points with the same semantic properties. Based on our proposed DRConv, we construct a Dual-Relation Graph Network (DRGNet) for point cloud analysis. To capture long-range contextual information, our DRGNet searches for neighboring points in both 3D geometric space and feature space to effectively aggregate local and distant information. Furthermore, we propose a Channel Attention Block (CAB), which puts more emphasis on important feature channels and effectively captures global information, and can further improve the performance of point cloud segmentation. Extensive experiments on object classification, shape part segmentation, normal estimation, and semantic segmentation tasks demonstrate that our proposed methods can achieve superior performance.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104353"},"PeriodicalIF":2.6,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Virtualized three-dimensional reference tables for efficient data embedding 虚拟化三维参考表，有效的数据嵌入

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-11-30 DOI: 10.1016/j.jvcir.2024.104351

Wien Hong , Guan-Zhong Su , Wei-Ling Lin , Tung-Shou Chen

引用次数: 0

A multi-exposure image fusion using adaptive color dissimilarity and dynamic equalization techniques 基于自适应色彩不相似度和动态均衡技术的多曝光图像融合

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2024-11-28 DOI: 10.1016/j.jvcir.2024.104350

Jishnu C.R., Vishnukumar S.

{"title":"A multi-exposure image fusion using adaptive color dissimilarity and dynamic equalization techniques","authors":"Jishnu C.R., Vishnukumar S.","doi":"10.1016/j.jvcir.2024.104350","DOIUrl":"10.1016/j.jvcir.2024.104350","url":null,"abstract":"<div><div>In the domain of image processing, Multi-Exposure Image Fusion (MEF) emerges as a crucial technique for developing high dynamic range (HDR) representations from fusing sequences of low dynamic range images. Conventional fusion methods often suffer from shortcomings such as detail loss, edge artifacts, and color inconsistencies, thereby compromising the quality of the fused output which is further diminished with extremely exposed and limited inputs. While there have been a few efforts to conduct fusion on limited and impaired static input images, there has been no exploration into the fusion of dynamic image sets. This paper proposes an effective MEF approach that operates on a minimum of two extremely exposed, limited datasets of both static and dynamic scenes. The approach initiates with categorizing input images into under-exposed and over-exposed categories based on lighting levels, subsequently applying tailored exposure correction strategies. Through iterative refinement and selection of optimally exposed variant, we construct an advanced intermediate stack, upon which fusion is performed by a pyramidal fusion technique. The method relies on adaptive well-exposedness and color gradient to develop weight maps for pyramidal fusion. The initial weights are refined using a Gaussian filter and this results in the creation of a seamlessly fused image with expanded dynamic range. Additionally, for dynamic imagery, we propose an adaptive color dissimilarity and dynamic equalization to reduce ghosting artifacts. Comparative assessments against existing methodologies, both visually and empirically confirms the superior performance of the proposed model.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104350"},"PeriodicalIF":2.6,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143173438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0