Ao Luo , Linxin Song , Keisuke Nonaka , Jinming Liu , Kyohei Unno , Kohei Matsuzaki , Heming Sun , Jiro Katto
{"title":"MDLPCC: Misalignment-aware dynamic LiDAR point cloud compression","authors":"Ao Luo , Linxin Song , Keisuke Nonaka , Jinming Liu , Kyohei Unno , Kohei Matsuzaki , Heming Sun , Jiro Katto","doi":"10.1016/j.jvcir.2025.104481","DOIUrl":"10.1016/j.jvcir.2025.104481","url":null,"abstract":"<div><div>LiDAR point cloud plays an important role in various real-world areas. It is usually generated as sequences by LiDAR on moving vehicles. Regarding the large data size of LiDAR point clouds, Dynamic Point Cloud Compression (DPCC) methods are developed to reduce transmission and storage data costs. However, most existing DPCC methods neglect the intrinsic misalignment in LiDAR point cloud sequences, limiting the rate–distortion (RD) performance. This paper proposes a Misalignment-aware Dynamic LiDAR Point Cloud Compression method (MDLPCC), which alleviates the misalignment problem in both macroscope and microscope. MDLPCC exploits a global transformation (GlobTrans) method to eliminate the macroscopic misalignment problem, which is the obvious gap between two continuous point cloud frames. MDLPCC also uses a spatial–temporal mixed structure to alleviate the microscopic misalignment, which still exists in the detailed parts of two point clouds after GlobTrans. The experiments on our MDLPCC show superior performance over existing point cloud compression methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104481"},"PeriodicalIF":2.6,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144134004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Star-PMFI: Star-attention and pyramid multi-scale feature integration network for small object detection in drone imagery","authors":"Wenyuan Yang , Zhongxu Li , Qihan He","doi":"10.1016/j.jvcir.2025.104479","DOIUrl":"10.1016/j.jvcir.2025.104479","url":null,"abstract":"<div><div>With their high flexibility and cost-effectiveness, Unmanned Aerial Vehicle (UAV) plays a crucial role in target detection and are widely used in military, rescue, and traffic surveillance scenarios. However, due to its particular aerial viewpoint, UAV images contain many small and densely distributed targets, which poses a severe challenge for accurate detection. In this study, we propose a novel UAV target detection model, Star-PMFI, consisting of the Star-Attention (Star-A) backbone network and the Pyramid Multi-scale Feature Integration (PMFI) neck. The Star-A utilizes the star operation and attention mechanism to extract the rich features, and the PMFI module performs the initial integration of features through the pyramid structure, followed by in-depth feature interaction. First, the model extracts multi-scale features using Star-A, which skillfully combines the star operation and attention mechanism to capture an extensive range of contextual information. Second, PMFI initially integrates the features through the pyramid structure, followed by deep feature interaction to realize cross-scale and cross-level information fusion. Finally, the model employs six detection heads, each responsible for target detection at different scales or features, to enhance small target detection capability. The experimental results show that the Star-PMFI model performs excellently on multiple datasets. On VisDrone and UAVDT datasets, <span><math><mrow><mi>m</mi><mi>A</mi><mi>P</mi><mi>@</mi><mn>0</mn><mo>.</mo><mn>5</mn><mo>:</mo><mn>0</mn><mo>.</mo><mn>95</mn></mrow></math></span> reaches 28.7% and 84.0%, respectively. Our code is available at: <span><span>https://github.com/yangwygithub/PaperCode/tree/main/WenyuanYang_Star-PMFI_UAV</span><svg><path></path></svg></span></div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104479"},"PeriodicalIF":2.6,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144262259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developing lightweight object detection models for USV with enhanced maritime surface visible imaging","authors":"Longhui Niu, Yunsheng Fan, Ting Liu, Qi Han","doi":"10.1016/j.jvcir.2025.104477","DOIUrl":"10.1016/j.jvcir.2025.104477","url":null,"abstract":"<div><div>Maritime surface object detection is a key technology for the autonomous navigation of unmanned surface vehicles (USVs). However, Maritime surface object detectors often face challenges such as large parameter sizes, object size variations, and image degradation caused by complex sea environments, severely affecting the deployment and detection accuracy on USVs. To address these challenges, this paper proposes the LightV7-enhancer object detection framework. This framework is based on the CPA-Enhancer image enhancement module and an improved YOLOv7 detection module for joint optimal learning. First, a new lightweight backbone network, GhostOECANet, was designed based on Ghost modules and improved coordinate attention. Second, by integrating ELAN and Efficient Multi-scale attention, an ELAN-EMA module is constructed to enhance the network’s perception and multi-scale feature extraction capabilities. Additionally, to improve the detection accuracy of small objects, multi-scale object detection layers are added based on the YOLOv5 detection head. The paper also introduces CPA-Enhancer in conjunction with the improved YOLOv7 detection module for joint training to adaptively restore degraded Maritime surface images, thereby improving detection accuracy in complex maritime backgrounds. Finally, the SeaShips dataset and Singapore Maritime Dataset are used to evaluate and compare LightV7-enhancer with other mainstream detectors. The results show that LightV7-enhancer supports object detection in various degraded maritime scenarios, achieving a balance between accuracy and computational complexity compared to other mainstream models. Compared to the baseline YOLOv7, LightV7-enhancer improves mAP by 2.7% and 7.5% on the two datasets, respectively, and has only half the number of parameters of YOLOv7, demonstrating robustness in degraded sea surface scenarios.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104477"},"PeriodicalIF":2.6,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144169125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QLight-Net: Quaternion based low light image enhancement network","authors":"Sudeep Kumar Acharjee, Kavinder Singh, Anil Singh Parihar","doi":"10.1016/j.jvcir.2025.104478","DOIUrl":"10.1016/j.jvcir.2025.104478","url":null,"abstract":"<div><div>Images captured at night suffer from various degradations such as color distortion, low contrast, and noise. Many existing methods improve low-light images may sometimes amplify noise, cause color distortion, and lack finer details. The existing methods require larger number of parameters, which limits the adoption of these methods in vision-based applications. In this paper, we proposed a QLight-Net method to achieve a better enhancement with a comparably lower number of parameters. We proposed depth-wise quaternion convolution, and quaternion cross attention to develop the two-branch architecture for low-light image enhancement. The proposed model leverages gradient branch to extract color-aware gradient features. Further, It uses color branch to extract gradient-aware color features. The proposed method achieves an LPIPS score of 0.047, which surpasses the previous best results with lesser parameters, and achieves 0.88 and 29.05 scores of SSIM and PSNR, respectively. Our approach achieves a balance between computational efficiency and better enhancement.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104478"},"PeriodicalIF":2.6,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144169820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AttenScribble: Attention-enhanced scribble supervision for medical image segmentation","authors":"Mu Tian , Qinzhu Yang , Yi Gao","doi":"10.1016/j.jvcir.2025.104476","DOIUrl":"10.1016/j.jvcir.2025.104476","url":null,"abstract":"<div><div>The success of deep networks in medical image segmentation relies heavily on massive labeled training data. However, acquiring dense annotations is a time-consuming process. Weakly supervised methods normally employ less expensive forms of supervision, among which scribbles started to gain popularity lately thanks to their flexibility. However, due to the lack of shape and boundary information, it is extremely challenging to train a deep network on scribbles that generalize on unlabeled pixels. In this paper, we present a straightforward yet effective scribble-supervised learning framework. Inspired by recent advances in transformer-based segmentation, we create a pluggable spatial self-attention module that could be attached on top of any internal feature layers of arbitrary fully convolutional network (FCN) backbone. The module infuses global interaction while keeping the efficiency of convolutions. Descended from this module, we construct a similarity metric based on normalized and symmetrized attention. This attentive similarity leads to a novel regularization loss that imposes consistency between segmentation prediction and visual affinity. This attentive similarity loss optimizes the alignment of FCN encoders, attention mapping and model prediction. Ultimately, the proposed FCN+Attention architecture can be trained end-to-end guided by a combination of three learning objectives: partial segmentation loss, customized masked conditional random fields, and the proposed attentive similarity loss. Extensive experiments on public datasets (ACDC and CHAOS) showed that our framework not only outperforms existing state-of-the-art but also delivers close performance to fully-supervised benchmarks. The code is available at <span><span>https://github.com/YangQinzhu/AttenScribble.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104476"},"PeriodicalIF":2.6,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image steganography based on wavelet transform and Generative Adversarial Networks","authors":"Yan Zhao, Pei Yao, Liang Xue","doi":"10.1016/j.jvcir.2025.104474","DOIUrl":"10.1016/j.jvcir.2025.104474","url":null,"abstract":"<div><div>For most steganography based on GANs, repeated encoding and decoding operations can easily lead to information loss, making it hampers the generator’s ability to effectively capture essential image features. To address the limitations in the current work, we propose a new generator with U-Net architecture. Introducing the graph network part to process the information of graph structure, and introducing a feature transfer module designed to preserve and transfer critical feature information. In addition, a new generator loss structure is proposed, it contains three parts: the adversarial loss <span><math><msubsup><mrow><mi>l</mi></mrow><mrow><mi>G</mi></mrow><mrow><mn>1</mn></mrow></msubsup></math></span>, which significantly enhances resistance to detection, the entropy loss <span><math><msubsup><mrow><mi>l</mi></mrow><mrow><mi>G</mi></mrow><mrow><mn>2</mn></mrow></msubsup></math></span>, which ensures the embedding capability of steganographic images, and the low-frequency wavelet loss <span><math><msub><mrow><mi>l</mi></mrow><mrow><mi>f</mi></mrow></msub></math></span>, which optimizes the overall steganographic performance of the images. Through a large number of experiments and comparisons, our proposed method effectively improves the steganography detection ability, and verifies the reasonableness of the proposed method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104474"},"PeriodicalIF":2.6,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DII-FRSA: Diverse image inpainting with multi-scale feature representation and separable attention","authors":"Jixiang Cheng, Yuan Wu, Zhidan Li, Yiluo Zhang","doi":"10.1016/j.jvcir.2025.104472","DOIUrl":"10.1016/j.jvcir.2025.104472","url":null,"abstract":"<div><div>Diverse image inpainting is the process of generating multiple visually realistic completion results. Although previous methods in this area have seen success, they still exhibit some limitations. First, one-stage approaches must make a trade-off between diversity and consistency. Second, while two-stage approaches can overcome such problems, they require autoregressive models to estimate the probability distribution of the structural priors, which has a significant impact on inference speed. This paper introduces DII-FRSA, a method for diverse image inpainting utilizing multi-scale feature representation and separable attention. In the first stage, we build a Gaussian distribution from the dataset to sample multiple coarse results. To enhance the modeling capability of the Variational Auto-Encoder, we propose a multi-scale feature representation module for the encoder and decoder. In the second stage, the coarse results are refined while maintaining overall consistency of appearance. Additionally, we design a refinement network based on the proposed separable attention to further improve the quality of the coarse results and maintain consistency in the appearance of the visible and masked regions. Our method was tested on well-established datasets-Places2, CelebA-HQ, and Paris Street View, and outperformed modern techniques. Our network not only enhances the diversity of the completed results but also enhances their visual realism.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104472"},"PeriodicalIF":2.6,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144105599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shikun Zhang , Yiqun Wang , Cunjian Chen , Yong Li , Qiuhong Ke
{"title":"3D surface reconstruction with enhanced high-frequency details","authors":"Shikun Zhang , Yiqun Wang , Cunjian Chen , Yong Li , Qiuhong Ke","doi":"10.1016/j.jvcir.2025.104475","DOIUrl":"10.1016/j.jvcir.2025.104475","url":null,"abstract":"<div><div>Neural implicit 3D reconstruction can reproduce shapes without the need for 3D supervision, making it a significant advancement in computer vision and graphics. This technique leverages volume rendering methods and neural implicit representations to learn and reconstruct 3D scenes directly from 2D images, enabling the generation of complex geometries and detailed structures with minimal data. The field has gained significant traction in recent years, due to advancements in deep learning, 3D vision, and rendering techniques that allow for more efficient and realistic reconstructions. Current neural surface reconstruction methods tend to randomly sample the entire image, making it difficult to learn high-frequency details on the surface, and thus the reconstruction results tend to be too smooth. We designed a method, termed FreNeuS (Frequency-guided Neural Surface Reconstruction), which leverages high-frequency information to address the problem of insufficient surface detail. Specifically, FreNeuS uses pixel gradient changes to easily acquire high-frequency regions in an image and uses the obtained high-frequency information to guide surface detail reconstruction. High-frequency information is first used to guide the dynamic sampling of rays, applying different sampling strategies according to variations in high-frequency regions. To further enhance the focus on surface details, we have designed a high-frequency weighting method that constrains the representation of high-frequency details during the reconstruction process. Compared to the baseline method, Neus, our approach reduces the reconstruction error by 13% on the DTU dataset. Additionally, on the NeRF-synthetic dataset, our method demonstrates a significant advantage in visualization, producing clearer texture details. In addition, our method is more applicable and can be generalized to any reconstruction method based on NeuS.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104475"},"PeriodicalIF":2.6,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144169821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio temporal 3D skeleton kinematic joint point classification model for human activity recognition","authors":"S. Karthika , Y. Nancy Jane , H. Khanna Nehemiah","doi":"10.1016/j.jvcir.2025.104471","DOIUrl":"10.1016/j.jvcir.2025.104471","url":null,"abstract":"<div><div>Human activity recognition in video data is challenging due to factors like cluttered backgrounds and complex movements. This work introduces the Stacked Ensemble 3D Skeletal Human Activity Recognition (SES-HAR) framework to tackle these issues. The framework utilizes MoveNet Lightning Pose Estimation to generate 2D skeletal kinematic joint points, which are then mapped to 3D using a Gaussian Radial Basis Function Kernel. SES-HAR employs a stacking ensemble approach with two layers: level-0 base learners and a level-1 meta-learner. Base learners include Convolutional Two-Part Long Short-Term Memory Network (Conv2P-LSTM), Spatial Bidirectional Gated Temporal Graph Convolutional Network (SBGTGCN) with attention, and Convolutional eXtreme Gradient Boosting (ConvXGB). Their outputs are pooled and processed by a Logistic Regression (LR) meta-learner in the level-1 layer to generate final predictions. Experimental results show that SES-HAR achieves significant performance improvements on NTU-RGB + D 60, NTU-RGB + D 120, Kinetics-700–2020, and Micro-Action-52 datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104471"},"PeriodicalIF":2.6,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Cui , Li Li , Jianfeng Lu , Shanqing Zhang , Chin-Chen Chang
{"title":"A novel high-fidelity reversible data hiding scheme based on multi-classification pixel value ordering","authors":"Chen Cui , Li Li , Jianfeng Lu , Shanqing Zhang , Chin-Chen Chang","doi":"10.1016/j.jvcir.2025.104473","DOIUrl":"10.1016/j.jvcir.2025.104473","url":null,"abstract":"<div><div>Pixel value ordering (PVO) is a highly effective technique that employs a pixel block partitioning and sorting for reversible data hiding (RDH). However, its embedding performance is significantly impacted by block size. To address this, an improved pixel-based PVO (IPPVO) was developed adopting a per-pixel approach and adaptive context size. Nevertheless, IPPVO only considers pixels below and to the right for prediction, neglecting other closer neighboring regions, leading to inaccurate predictions. This study presents a RDH strategy using multi-classification embedding to enhance performance. First, pixels are categorized into four classes based on parity coordinates, obtaining higher correlation prediction values using an adaptive nearest neighbor content size. Second, a new complexity calculation method is introduced, the complexity frequency of pixel regions to better differentiate between complex and flat regions. Finally, an effective embedding ratio and index value constraint are introduced to mitigate the challenge of excessive distortion when embedding large capacities. Experimental results indicate that the proposed scheme offers superior embedding capacity with low distortion compared to state-of-the-art PVO-based RDH methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104473"},"PeriodicalIF":2.6,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}