Ying Zhang;Puhong Duan;Lianhui Liang;Xudong Kang;Jun Li;Antonio Plaza
{"title":"PFS3F: Probabilistic Fusion of Superpixel-Wise and Semantic-Aware Structural Features for Hyperspectral Image Classification","authors":"Ying Zhang;Puhong Duan;Lianhui Liang;Xudong Kang;Jun Li;Antonio Plaza","doi":"10.1109/TCSVT.2025.3556548","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3556548","url":null,"abstract":"Processing high-dimensional data cubes and developing high-performance classifiers are core objectives in the field of hyperspectral image classification (HSIC). Superpixel-based methods are widely used in HSIC due to their efficacy in reducing redundant information and enhancing local features. However, imprecise segmentation, especially in complex structures and textures of hyperspectral images (HSIs), may lead to inconsistencies in the regions extracted by superpixels and the boundaries between different ground objects. Such inconsistencies significantly degrade the classification performance of HSIs. Alternatively, when parameter settings are inaccurate, edge-aware feature extraction methods often introduce sharpening artifacts at the image boundaries, resulting in a decrease in classification accuracy. To effectively address these challenges, we propose a novel probabilistic fusion method for HSIC. This method consists of the following stages. First, spatial information is extracted by a multiscale superpixel segmentation method and then probabilistically optimized by the extended random walk (ERW) method. Next, semantic-aware structural features (S2Fs) are extracted along with edge information of different objects. Lastly, a probabilistic framework is proposed to fuse the class probabilities of superpixel-based spatial information and semantic-aware structural features. Experimental results on three real datasets show state-of-the-art classification performance, even with limited training sets.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8723-8737"},"PeriodicalIF":11.1,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zezeng Li;Zhihui Qi;Weimin Wang;Ziliang Wang;Junyi Duan;Na Lei
{"title":"Point2Quad: Generating Quad Meshes From Point Clouds via Face Prediction","authors":"Zezeng Li;Zhihui Qi;Weimin Wang;Ziliang Wang;Junyi Duan;Na Lei","doi":"10.1109/TCSVT.2025.3556130","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3556130","url":null,"abstract":"Quad meshes are essential in geometric modeling and computational mechanics. Although learning-based methods for triangle mesh demonstrate considerable advancements, quad mesh generation remains less explored due to the challenge of ensuring coplanarity, convexity, and quad-only meshes. In this paper, we present <bold>Point2Quad</b>, the first learning-based method for quad-only mesh generation from point clouds. The key idea is learning to identify quad mesh with fused pointwise and facewise features. Specifically, Point2Quad begins with a k-NN-based candidate generation considering the coplanarity and squareness. Then, two encoders are followed to extract geometric and topological features that address the challenge of quad-related constraints, especially by combining in-depth quadrilaterals-specific characteristics. Subsequently, the extracted features are fused to train the classifier with a designed compound loss. The final results are derived after the refinement by a quad-specific post-processing. Extensive experiments on both clear and noise data demonstrate the effectiveness and superiority of Point2Quad, compared to baseline methods under comprehensive metrics. The code and dataset are available at <uri>https://github.com/cognaclee/Point2Quad</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8586-8597"},"PeriodicalIF":11.1,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Contrast MRI Arbitrary-Scale Super-Resolution via Dynamic Implicit Network","authors":"Jinbao Wei;Gang Yang;Wei Wei;Aiping Liu;Xun Chen","doi":"10.1109/TCSVT.2025.3556210","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3556210","url":null,"abstract":"Multi-contrast MRI super-resolution (SR) aims to restore high-resolution target image from low-resolution one, where reference image from another contrast is used to promote this task. To better meet clinical needs, current studies mainly focus on developing arbitrary-scale MRI SR solutions rather than fixed-scale ones. However, existing arbitrary-scale SR methods still suffer from the following two issues: 1) They typically rely on fixed convolutions to learn multi-contrast features, struggling to handle the feature transformations under varying scales and input image pairs, thus limiting their representation ability. 2) They simply combine the multi-contrast features as prior information, failing to fully exploit the complementary information in the texture-rich reference images. To address these issues, we propose a Dynamic Implicit Network (DINet) for multi-contrast MRI arbitrary-scale SR. DINet offers several key advantages. First, the scale-adaptive dynamic convolution facilitates dynamic feature learning based on scale factors and input image pairs, significantly enhancing the representation ability of multi-contrast features. Second, the dual-branch implicit attention enables arbitrary-scale upsampling of MR images through implicit neural representation. Following this, we propose the modulation-then-fusion block to adaptively align and fuse multi-contrast features, effectively incorporating complementary details from reference images into the target images. By jointly combining the above-mentioned modules, our proposed DINet achieves superior MRI SR performance at arbitrary scales. Extensive experiments on three datasets demonstrate that DINet significantly outperforms state-of-the-art methods, highlighting its potential for clinical applications. The code is available at <uri>https://github.com/weijinbao1998/DINet</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8973-8988"},"PeriodicalIF":11.1,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keyi Zhou;Li Li;Wengang Zhou;Yonghui Wang;Hao Feng;Houqiang Li
{"title":"LaneTCA: Enhancing Video Lane Detection With Temporal Context Aggregation","authors":"Keyi Zhou;Li Li;Wengang Zhou;Yonghui Wang;Hao Feng;Houqiang Li","doi":"10.1109/TCSVT.2025.3554175","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3554175","url":null,"abstract":"In video lane detection, there are rich temporal contexts among successive frames, which is under-explored in existing lane detectors. In this work, we propose LaneTCA to bridge the individual video frames and explore how to effectively aggregate the temporal context. Technically, we develop an accumulative attention module and an adjacent attention module to abstract the long-term and short-term temporal context, respectively. The accumulative attention module continuously accumulates visual information during the journey of a vehicle, while the adjacent attention module propagates this lane information from the previous frame to the current frame. The two modules are meticulously designed based on the transformer architecture. Finally, these long-short context features are fused with the current frame features to predict the lane lines in the current frame. Extensive quantitative and qualitative experiments are conducted on two prevalent benchmark datasets. The results demonstrate the effectiveness of our method, achieving several new state-of-the-art records. The codes and models are available at <monospace><uri>https://github.com/Alex-1337/LaneTCA</uri></monospace>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8574-8585"},"PeriodicalIF":11.1,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GAEM: Graph-Driven Attention-Based Entropy Model for LiDAR Point Cloud Compression","authors":"Mingyue Cui;Yuyang Zhong;Mingjian Feng;Junhua Long;Yehua Ling;Jiahao Xu;Kai Huang","doi":"10.1109/TCSVT.2025.3554300","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3554300","url":null,"abstract":"High-quality LiDAR point cloud (LPC) coding is essential for efficiently transmitting and storing the vast amounts of data required for accurate 3D environmental representation. The Octree-based entropy coding framework has emerged as the predominant method, however, previous study usually overly relies on large-scale attention-based context prediction to encode Octree nodes, overlooking the inherent correlational properties of this structure. In this paper, we propose a novel Graph-driven Attention-based Entropy Model (<bold>GAEM</b>), which adopts partitioned graph attention mechanisms to uncover contextual dependencies among neighboring nodes. Different from the Cartesian coordinate-based coding mode with higher redundancy, GAEM uses the multi-level spherical Octree to organize point clouds, improving the quality of LPC reconstruction. GAEM combines graph convolution for node feature embedding and grouped-graph attention for exploiting dependency among contexts, which preserves performance in low-computation using localized nodes. Besides, to further increase the receptive field, we design a high-resolution cross-attention module introducing sibling nodes. Experimental results show that our method achieves state-of-the-art performance on the LiDAR benchmark SemanticKITTI and MPEG-specified dataset Ford, compared to all baselines. Compared to the benchmark GPCC, our method achieves gains of up to 53.9% and 53.6% on SemanticKITTI and Ford while compared to the sibling-introduced methods, we achieve up to 42.3% and 44.7% savings in encoding/decoding time. In particular, our GAEM allows for extension to downstream tasks (<italic>i.e.,</i> vehicle detection and semantic segmentation), further demonstrating the practicality of the method.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9105-9118"},"PeriodicalIF":11.1,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Secret Image Sharing Scheme Based on Polynomial k-Consistency","authors":"Lizhi Xiong;Rui Ding;Ching-Nung Yang;Zhangjie Fu","doi":"10.1109/TCSVT.2025.3554842","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3554842","url":null,"abstract":"The (<italic>k</i>, <italic>n</i>)-threshold Secret Image Sharing (SIS) is a naturally fault-tolerant technique for image privacy protection. A secret image is processed through secret sharing to generate <italic>n</i> shadow images, which are then distributed to <italic>n</i> different recipients. During the recovery phase, the complete secret image can be reconstructed by any <italic>k</i> out of <italic>n</i> shadow images. Although (<italic>k</i>, <italic>n</i>)-threshold SIS itself allows for the loss of up to <inline-formula> <tex-math>$n-k$ </tex-math></inline-formula> shadow images, if there are pixel errors in the remaining <italic>k</i> shadow images, the recovery of the secret image will be declared a failure. Therefore, Robust Secret Image Sharing (RSIS) has been proposed to address the issue. However, the current proposed RSIS schemes only demonstrated limited robustness against noise attacks. This paper presents a novel <italic>k</i>-consistency-based RSIS scheme to resist malicious attacks, including noise, JPEG compression, tampering, and cropping. In the sharing phase, a dual-SIS mechanism is first designed to perform two rounds of secret sharing on the secret image. In the recovery phase, high-quality secret image can be reconstructed based on <italic>k</i>-consistency after attacking. The experimental results demonstrated that our scheme not only provides comprehensive robustness but also allows for flexible adjustment of shadow images’ sizes, ensuring both security and efficiency during image sharing.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8880-8892"},"PeriodicalIF":11.1,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingwen Zhang;Meng Wang;Junru Li;Kai Zhang;Li Zhang;Shiqi Wang
{"title":"A Theoretical and Experimental Study for Dependent Learned Rate-Distortion Optimization","authors":"Yingwen Zhang;Meng Wang;Junru Li;Kai Zhang;Li Zhang;Shiqi Wang","doi":"10.1109/TCSVT.2025.3555152","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3555152","url":null,"abstract":"Recent advancements in learned rate-distortion optimization (RDO) showcase that by making the intra coding decisions based on a learned measure, the encoding can be significantly accelerated without incurring much coding loss. Despite great progress in complexity reduction, the dependency issue has been largely neglected in the current learned RDO research. In this study, aiming to tap the full potential of dependent learned RDO, we first derive a probabilistic RDO framework for theoretical analysis, under which the classic and the learned RDO problems are equivalent to the maximum a posteriori (MAP) inference and the distribution imitation, respectively. Subsequently, we probabilistically revisit dependency considerations in the intra RDO research. Our key finding is that the existing learned RDO scheme can only produce a measure that indicates the local “goodness” of coding decisions. We therefore further discuss the opportunities for learning a dependent measure that is more optimal in the long run. Finally, as learning an accurate measure for the full decision space could be extremely challenging, taking the High Efficiency Video Coding (HEVC) intra coding as a case study, we experimentally identify that the prediction decision accounts for the majority of the dependent optimization gain and is of the utmost value to be learned, paving the way for future research on dependent learned RDO.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9414-9427"},"PeriodicalIF":11.1,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NUC-Net: Non-Uniform Cylindrical Partition Network for Efficient LiDAR Semantic Segmentation","authors":"Xuzhi Wang;Wei Feng;Lingdong Kong;Liang Wan","doi":"10.1109/TCSVT.2025.3554182","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3554182","url":null,"abstract":"LiDAR semantic segmentation plays a vital role in autonomous driving. Existing voxel-based methods for LiDAR semantic segmentation apply uniform partition to the 3D LiDAR point cloud to form a structured representation based on cartesian/cylindrical coordinates. Although these methods show impressive performance, the drawback of existing voxel-based methods remains in two aspects: 1) it requires a large enough input voxel resolution, which brings a large amount of computation cost and memory consumption. 2) it does not well handle the unbalanced point distribution of LiDAR point cloud. In this paper, we propose a non-uniform cylindrical partition network named NUC-Net to tackle the above challenges. Specifically, we propose the Arithmetic Progression of Interval (API) method to non-uniformly partition the radial axis and generate the voxel representation which is representative and efficient. Moreover, we propose a non-uniform multi-scale aggregation method to improve contextual information. Our method achieves state-of-the-art performance on SemanticKITTI and nuScenes datasets with much faster speed and much less training time. And our method can be a general component for LiDAR semantic segmentation, which significantly improves both the accuracy and efficiency of the uniform counterpart by <inline-formula> <tex-math>$4 times $ </tex-math></inline-formula> training faster and <inline-formula> <tex-math>$2 times $ </tex-math></inline-formula> GPU memory reduction and <inline-formula> <tex-math>$3 times $ </tex-math></inline-formula> inference speedup. We further provide theoretical analysis towards understanding why NUC is effective and how point distribution affects performance. Code is available at <uri>https://github.com/alanWXZ/NUC-Net</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9090-9104"},"PeriodicalIF":11.1,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Chen;He Wang;Zhifeng Hao;Zemin Cai;Ling Mei;Tianshu Liu
{"title":"Flow Visualization for Complex Fluid Flows via a Structure-Enhanced Motion Estimator","authors":"Jun Chen;He Wang;Zhifeng Hao;Zemin Cai;Ling Mei;Tianshu Liu","doi":"10.1109/TCSVT.2025.3554535","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3554535","url":null,"abstract":"Flow visualization through motion estimation using time-sequenced images plays a significant role in analyzing and understanding complex flow phenomena, and it is widely used in meteorology, oceanography, medicine, astronomy, experimental fluid mechanics, etc. However, it is difficult for current motion estimators to adapt to illumination changes, remove instable perturbation, and capture diverse motion patterns. In this paper, a novel flow visualization tool is developed to address these issues by employing a structure-enhanced motion estimator composed of a data term and a regularization term. Specifically, a statistical correlation descriptor is designed for the data term to improve the accuracy of motion estimation by enhancing both illumination robustness and matching discrimination. Inspired by the strong distinguishability of a structure-texture distribution in a local window, a structure-enhanced regularizer that considers the physical mechanism of fluid diffusion is introduced to capture different motion patterns, enhance prominent flow structures, and remove unnecessary ripples or textures caused by instable perturbation or noise. The experimental results demonstrate that our approach significantly outperforms current motion estimators in handling illumination changes and predicting complex fluid flows, and it also achieves state-of-the-art evaluation results on the public fluid flow datasets. Furthermore, the designed flow visualization tool successfully captures diverse motion patterns in Jupiter’s White Ovals, which is crucial for understanding the physical mechanisms behind their formation and sustenance.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8559-8573"},"PeriodicalIF":11.1,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VRAR: Video-Radar Automatic Registration Method Based on Trajectory Spatiotemporal Features and Bidirectional Mapping","authors":"Kong Li;Zhe Dai;Hua Cui;Xuan Wang;Huansheng Song","doi":"10.1109/TCSVT.2025.3554441","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3554441","url":null,"abstract":"Automating video and radar spatial registration without sensor layout constraints is crucial for enhancing the flexibility of perception systems. However, this remains challenging due to the lack of effective approaches for constructing and utilizing matching information between heterogeneous sensors. Existing methods rely on human intervention or prior knowledge, making it difficult to achieve true automation. Consequently, establishing a registration model that automatically extracts matching information from heterogeneous sensor data remains a key challenge. To address these issues, we propose a novel Video-Radar Automatic Registration (VRAR) method based on vehicle trajectory spatiotemporal feature encoding and a bidirectional mapping network. We first establish a unified representation for heterogeneous sensor data by encoding spatiotemporal features of vehicle trajectories. Based on this, we automatically extract a large number of high-quality matching points from synchronized trajectory pairs using a frame synchronization strategy. Subsequently, we utilize the proposed Video-Radar Bidirectional Mapping Network to process these matching points. This network learns the bidirectional mapping between the two sensor modalities, extending the alignment from discrete local observation points to the entire observable space. Experimental results demonstrate that the VRAR method exhibits significant performance advantages in various traffic scenarios, verifying its effectiveness and generalizability. This capability of automated and adaptive registration highlights the method’s potential for broader applications in heterogeneous sensor integration.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8707-8722"},"PeriodicalIF":11.1,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}