IEEE Transactions on Circuits and Systems for Video Technology最新文献_第9页

WFANet-DDCL: Wavelet-Based Frequency Attention Network and Dual Domain Consistency Learning for 7T MRI Synthesis From 3T MRI WFANet-DDCL：基于小波频率注意网络和对偶域一致性学习的7T MRI与3T MRI合成

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-30 DOI: 10.1109/TCSVT.2025.3536807

Xiaolong Liu;Song Qiu;Mei Zhou;Weijie Le;Qingli Li;Yan Wang

{"title":"WFANet-DDCL: Wavelet-Based Frequency Attention Network and Dual Domain Consistency Learning for 7T MRI Synthesis From 3T MRI","authors":"Xiaolong Liu;Song Qiu;Mei Zhou;Weijie Le;Qingli Li;Yan Wang","doi":"10.1109/TCSVT.2025.3536807","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3536807","url":null,"abstract":"Ultra-high field magnetic resonance imaging (MRI), such as 7-Tesla (7T) MRI, provides significantly enhanced tissue contrast and anatomical details compared to 3T MRI. However, 7T MRI scanners are more costly and less accessible in clinical settings than 3T scanners. In this paper, we propose a wavelet-based frequency attention network (WFANet) and a semi-supervised method named dual domain consistency learning (DDCL), and combine them to form a WFANet-DDCL framework for 7T MRI synthesis. WFANet leverages the frequency sensitivity of the proposed wavelet-based frequency attention encoder (WFAE) along with the large receptive field of dilated convolution. WFAE is proposed as an independent module to capture multi-scale frequency attention via the proposed wavelet-based frequency attention (WFA) mechanism. WFAE can be integrated into any backbone network as a plug-and-play component and improve network performance. To tackle the challenge of limited paired data for network training, DDCL is proposed to take advantage of both paired and unpaired data. Frequency domain perturbation is proposed and combined with Gaussian noise to regularize the supervised learning process in dual domains, better avoiding overfitting. Extensive experimental results demonstrate that WFANet-DDCL can achieve comparable performance to state-of-the-art supervised methods even using 66% of all paired data.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 6","pages":"5617-5632"},"PeriodicalIF":8.3,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144272823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tensorized Tri-Factor Decomposition for Multi-View Clustering 多视图聚类的张张化三因子分解

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-29 DOI: 10.1109/TCSVT.2025.3536629

Rui Wang;Quanxue Gao;Ming Yang;Qianqian Wang

引用次数: 0

CPAL: Cross-Prompting Adapter With LoRAs for RGB+X Semantic Segmentation 带有lora的RGB+X语义分割的交叉提示适配器

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-29 DOI: 10.1109/TCSVT.2025.3536086

Ye Liu;Pengfei Wu;Miaohui Wang;Jun Liu

{"title":"CPAL: Cross-Prompting Adapter With LoRAs for RGB+X Semantic Segmentation","authors":"Ye Liu;Pengfei Wu;Miaohui Wang;Jun Liu","doi":"10.1109/TCSVT.2025.3536086","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3536086","url":null,"abstract":"As sensor technology evolves, RGB+X systems combine traditional RGB cameras with another type of auxiliary sensor, which enhances perception capabilities and provides richer information for important tasks such as semantic segmentation. However, acquiring massive RGB+X data is difficult due to the need for specific acquisition equipment. Therefore, traditional RGB+X segmentation methods often perform pretraining on relatively abundant RGB data. However, these methods lack corresponding mechanisms to fully exploit the pretrained model, and the scope of the pretraining RGB dataset remains limited. Recent works have employed prompt learning to tap into the potential of pretrained foundation models, but these methods adopt a unidirectional prompting approach i.e., using X or RGB+X modality to prompt pretrained foundation models in RGB modality, neglecting the potential in non-RGB modalities. In this paper, we are dedicated to developing the potential of pretrained foundation models in both RGB and non-RGB modalities simultaneously, which is non-trivial due to the semantic gap between modalities. Specifically, we present the CPAL (Cross-prompting Adapter with LoRAs), a framework that features a novel bi-directional adapter to simultaneously fully exploit the complementarity and bridging the semantic gap between modalities. Additionally, CPAL introduces low-rank adaption (LoRA) to fine-tune the foundation model of each modal. With the support of these elements, we have successfully unleashed the potential of RGB foundation models in both RGB and non-RGB modalities simultaneously. Our method achieves state-of-the-art (SOTA) performance on five multi-modal benchmarks, including RGB+Depth, RGB+Thermal, RGB+Event, and a multi-modal video object segmentation benchmark, as well as four multi-modal salient object detection benchmarks. The code and results are available at: <uri>https://github.com/abelny56/CPAL</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 6","pages":"5858-5871"},"PeriodicalIF":8.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MLMamba: A Mamba-Based Efficient Network for Multi-Label Remote Sensing Scene Classification 基于mamba的多标签遥感场景分类高效网络

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-29 DOI: 10.1109/TCSVT.2025.3535939

Ruiqi Du;Xu Tang;Jingjing Ma;Xiangrong Zhang;Licheng Jiao

{"title":"MLMamba: A Mamba-Based Efficient Network for Multi-Label Remote Sensing Scene Classification","authors":"Ruiqi Du;Xu Tang;Jingjing Ma;Xiangrong Zhang;Licheng Jiao","doi":"10.1109/TCSVT.2025.3535939","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3535939","url":null,"abstract":"As a useful remote sensing (RS) scene interpretation technique, multi-label RS scene classification (RSSC) always attracts researchers’ attention and plays an important role in the RS community. To assign multiple semantic labels to a single RS image according to its complex contents, the existing methods focus on learning the valuable visual features and mining the latent semantic relationships from the RS images. This is a feasible and helpful solution. However, they are often associated with high computational costs due to the widespread use of Transformers. To alleviate this problem, we propose a Mamba-based efficient network based on the newly emerged state space model called MLMamba. In addition to the basic feature extractor (convolutional neural network and language model) and classifier (multiple perceptrons), MLMamba consists of two key components: a pyramid Mamba and a feature-guided semantic modeling (FGSM) Mamba. Pyramid Mamba uses multi-scale scanning to establish global relationships within and across different scales, improving MLMamba’s ability to explore RS images. Under the guidance of the obtained visual features, FGSM Mamba establishes associations between different land covers. Combining these two components can deeply mine local features, multi-scale information, and long-range dependencies from RS images and build semantic relationships between different surface covers. These superiorities guarantee that MLMamba can fully understand the complex contents within RS images and accurately determine which categories exist. Furthermore, the simple and effective structure and linear computational complexity of the state space model ensure that pyramid Mamba and FGSM Mamba will not impose too much computational burden on MLMamba. Extensive experiments counted on three benchmark multi-label RSSC data sets validate the effectiveness of MLMamba. The positive results demonstrate that MLMamba achieves state-of-the-art performance, surpassing existing methods in accuracy, model size, and computational efficiency. Our source codes are available at <uri>https://github.com/TangXu-Group/ multilabelRSSC/tree/main/MLMamba</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"6245-6258"},"PeriodicalIF":8.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Enhanced Global Intra Prediction for Efficient Video Coding in Beyond VVC 基于自适应增强全局内预测的超VVC高效视频编码

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-29 DOI: 10.1109/TCSVT.2025.3535951

Junyan Huo;Yanzhuo Ma;Zhenyao Zhang;Hongli Zhang;Hui Yuan;Shuai Wan;Fuzheng Yang

{"title":"Adaptive Enhanced Global Intra Prediction for Efficient Video Coding in Beyond VVC","authors":"Junyan Huo;Yanzhuo Ma;Zhenyao Zhang;Hongli Zhang;Hui Yuan;Shuai Wan;Fuzheng Yang","doi":"10.1109/TCSVT.2025.3535951","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3535951","url":null,"abstract":"Global intra prediction (GIP), including intra-block copy and template matching prediction (TMP), exploits the global correlation of the same image to improve the coding efficiency. In Beyond VVC, TMP uses template matching to determine the reference blocks for efficient prediction. There usually exists an error between the coding block and reference blocks, caused by the content mismatch or the coding distortion of the reference blocks. We propose an enhancement over the reference blocks, namely enhanced GIP (EGIP). Specifically, we design an enhanced filter according to the templates of the coding block and the reference blocks, with the reconstructed template of the coding block as the label for supervised learning. To support different enhancements, we design two types of inputs, i.e., EGIP based on neighboring samples (N-EGIP) and EGIP based on multiple hypothesis references (M-EGIP). Experimental results show that, based on enhanced compression model (ECM) version 8.0, N-EGIP achieves BD-rate reductions of 0.37%, 0.42%, and 0.40%, and M-EGIP brings 0.34%, 0.37%, and 0.34% BD-rate savings for Y, Cb, and Cr components, respectively. A higher coding gain, 0.46%, 0.54%, and 0.52% BD-rate savings, can be achieved by integrating N-EGIP and M-EGIP together. Owing to the coding gain and small complexity increase, the proposed EGIP has been adopted in the exploration of Beyond VVC and integrated into its reference software.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 6","pages":"6145-6157"},"PeriodicalIF":8.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hypergraph Contrastive Learning for Large-Scale Hyperspectral Image Clustering 大规模高光谱图像聚类的超图对比学习

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-29 DOI: 10.1109/TCSVT.2025.3535930

Yuxuan Yao;Bo Peng;Tianyi Qin;Yanfeng Gu;Nam Ling;Jianjun Lei

{"title":"Hypergraph Contrastive Learning for Large-Scale Hyperspectral Image Clustering","authors":"Yuxuan Yao;Bo Peng;Tianyi Qin;Yanfeng Gu;Nam Ling;Jianjun Lei","doi":"10.1109/TCSVT.2025.3535930","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3535930","url":null,"abstract":"Large-scale hyperspectral image (HSI) clustering has become an important research task owing to its promising applications in various fields. Recently, beneficial from the correlation modeling capability of graphs, graph contrastive learning methods have received increasing attention in the clustering task. However, these methods usually have limited ability to explore the high-order correlation as well as beneficial clustering information of large-scale HSI, thus limiting the clustering performance on large-scale HSI. To this end, a novel hypergraph contrastive learning network (HCL-Net) for large-scale HSI clustering is proposed in this paper. Specifically, a diffusion hypergraph-based contrastive clustering mechanism is presented, in which a diffusion hypergraph is constructed to model the high-order correlation in large-scale HSI, thus guiding contrastive learning for obtaining more discriminative representations. Besides, by mining the confident clustering information, a confidence-guided positive-negative updating strategy is designed to dynamically update positives and negatives for contrastive learning, thereby obtaining a more compact clustering structure. The proposed method is evaluated on three public large-scale HSI datasets. The experimental results have demonstrated the superior performance of the proposed HCL-Net over state-of-the-art methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"7090-7100"},"PeriodicalIF":8.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid Representation Learning for End-to-End Multi-Person Pose Estimation 端到端多人姿态估计的混合表示学习

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-29 DOI: 10.1109/TCSVT.2025.3536381

Qiyuan Dai;Qiang Ling

{"title":"Hybrid Representation Learning for End-to-End Multi-Person Pose Estimation","authors":"Qiyuan Dai;Qiang Ling","doi":"10.1109/TCSVT.2025.3536381","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3536381","url":null,"abstract":"Recent multi-person pose estimation methods design end-to-end pipelines under the DETR framework. However, these methods involve complex keypoint decoding processes because the DETR framework cannot be directly used for pose estimation, which results in constrained performance and ineffective information interaction between human instances. To tackle this issue, we propose a hybrid representation learning method for end-to-end multi-person pose estimation. Our method represents instance-level and keypoint-level information as hybrid queries based on point set prediction and can facilitate parallel interaction between instance-level and keypoint-level representations in a unified decoder. We also employ the instance segmentation task for auxiliary training to enrich the spatial context of hybrid representations. Furthermore, we introduce a pose-unified query selection (PUQS) strategy and an instance-gated module (IGM) to improve the keypoint decoding process. PUQS predicts local pose proposals to produce scale-aware instance initializations and can avoid the scale assignment mistake of one-to-one matching. IGM refines instance contents and filters out invalid information with the message of cross-instance interaction and can enhance the decoder’s capability to handle queries of instances. Compared with current end-to-end multi-person pose estimation methods, our method can detect human instances and body keypoints simultaneously through a concise decoding process. Extensive experiments on COCO Keypoint and CrowdPose benchmarks demonstrate that our method outperforms some state-of-the-art methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"6437-6451"},"PeriodicalIF":8.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IEEE Circuits and Systems Society Information IEEE电路与系统学会信息

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-29 DOI: 10.1109/TCSVT.2025.3525919

引用次数: 0

IEEE Transactions on Circuits and Systems for Video Technology Publication Information IEEE视频技术电路与系统汇刊

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-29 DOI: 10.1109/TCSVT.2025.3525917

引用次数: 0

Content-Aware Dynamic In-Loop Filter With Adjustable Complexity for VVC Intra Coding 具有可调复杂度的VVC编码内容感知动态环内滤波器

IF 8.3 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-28 DOI: 10.1109/TCSVT.2025.3535784

Hengyu Man;Hao Wang;Riyu Lu;Zhaolin Wan;Xiaopeng Fan;Debin Zhao

{"title":"Content-Aware Dynamic In-Loop Filter With Adjustable Complexity for VVC Intra Coding","authors":"Hengyu Man;Hao Wang;Riyu Lu;Zhaolin Wan;Xiaopeng Fan;Debin Zhao","doi":"10.1109/TCSVT.2025.3535784","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3535784","url":null,"abstract":"Recently, neural network-based in-loop filters have been rapidly developed, effectively improving the reconstruction quality and compression efficiency in video coding. Existing deep in-loop filters typically employed networks with fixed structures to process all image blocks. However, under various bitrate conditions, compressed image blocks with different textures exhibit varying degradations, which poses a challenge for high-quality and low-complexity filtering. Additionally, different complexity requirements for coding tools in various scenarios limit the versatility of fixed models. To address these problems, a content-aware dynamic in-loop filter (dubbed DILF) with adjustable complexity is proposed in this paper. Specifically, DILF comprises a policy network and a filtering network. For each reconstructed image block, the policy network dynamically generates a filtering network topology based on pixel information and the quantization parameter (QP), guiding the filtering network to skip redundant layers and conduct content-aware image enhancement, thereby improving the filtering performance. In addition, by introducing a user-defined balancing factor into the policy network, the content-aware filtering network topology can be further adjusted according to user’s requirements, facilitating adjustable complexity with a single model. We integrate DILF into Versatile Video Coding (VVC) to replace the built-in deblocking filter. Extensive experiments demonstrate the efficiency of DILF in processing image blocks with varying degrees of degradation and its flexibility in controlling complexity. When the balancing factor is set to 2e-5, DILF achieves bitrate savings of 8.07%, 17.97%, and 20.93% on average for YUV components over VVC reference software VTM-11.0 under all-intra configuration. Compared to static networks with fixed structures, DILF demonstrates superior performance and lower computational complexity.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 6","pages":"6114-6128"},"PeriodicalIF":8.3,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0