{"title":"VMG: Rethinking U-Net Architecture for Video Super-Resolution","authors":"Jun Tang;Lele Niu;Linlin Liu;Hang Dai;Yong Ding","doi":"10.1109/TBC.2024.3486967","DOIUrl":"https://doi.org/10.1109/TBC.2024.3486967","url":null,"abstract":"The U-Net architecture has exhibited significant efficacy across various vision tasks, yet its adaptation for Video Super-Resolution (VSR) remains underexplored. While the Video Restoration Transformer (VRT) introduced U-Net into the VSR domain, it poses challenges due to intricate design and substantial computational overhead. In this paper, we present VMG, a streamlined framework tailored for VSR. Through empirical analysis, we identify the crucial stages of the U-Net architecture contributing to performance enhancement in VSR tasks. Our optimized architecture substantially reduces model parameters and complexity while improving performance. Additionally, we introduce two key modules, namely the Gated MLP-like Mixer (GMM) and the Flow-Guided cross-attention Mixer (FGM), designed to enhance spatial and temporal feature aggregation. GMM dynamically encodes spatial correlations with linear complexity in space and time, and FGM leverages optical flow to capture motion variation and implement sparse attention to efficiently aggregate temporally related information. Extensive experiments demonstrate that VMG achieves nearly 70% reduction in GPU memory usage, 30% fewer parameters, and 10% lower computational complexity (FLOPs) compared to VRT, while yielding highly competitive or superior results across four benchmark datasets. Qualitative assessments reveal VMG’s ability to preserve remarkable details and sharp structures in the reconstructed videos. The code and pre-trained models are available at <uri>https://github.com/EasyVision-Ton/VMG</uri>.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"334-349"},"PeriodicalIF":3.2,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sunhyoung Kwon;Seok-Ki Ahn;Sungjun Ahn;Sungho Jeon;Sesh Simha;Mark Aitken;Anindya Saha;Prashant M. Maru;Parag Naik;Sung-Ik Park
{"title":"Comparative Assessment of Physical Layer Performance: ATSC 3.0 vs. 5G Broadcast in Laboratory and Field Tests","authors":"Sunhyoung Kwon;Seok-Ki Ahn;Sungjun Ahn;Sungho Jeon;Sesh Simha;Mark Aitken;Anindya Saha;Prashant M. Maru;Parag Naik;Sung-Ik Park","doi":"10.1109/TBC.2024.3482183","DOIUrl":"https://doi.org/10.1109/TBC.2024.3482183","url":null,"abstract":"This paper presents a comparative analysis of the physical layer performance of ATSC 3.0 and 3GPP 5G Broadcast through comprehensive laboratory and field tests. The study evaluates various reception scenarios, including fixed and mobile environments and various channel conditions, such as additive white Gaussian noise and mobile channels. Key performance metrics such as threshold of visibility (ToV) and erroneous second ratio (ESR) are measured to assess the reception quality of each standard. The results demonstrate that ATSC 3.0 generally outperforms 5G Broadcast due to its advanced bit-interleaved coded modulation and time interleaving techniques, effectively mitigating burst errors in mobile channels.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"2-10"},"PeriodicalIF":3.2,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiahui Song;Yonghong Hou;Bo Peng;Tianyi Qin;Qingming Huang;Jianjun Lei
{"title":"Unsupervised 3D Point Cloud Reconstruction via Exploring Multi-View Consistency and Complementarity","authors":"Jiahui Song;Yonghong Hou;Bo Peng;Tianyi Qin;Qingming Huang;Jianjun Lei","doi":"10.1109/TBC.2024.3484269","DOIUrl":"https://doi.org/10.1109/TBC.2024.3484269","url":null,"abstract":"Unsupervised 3D point cloud reconstruction has increasingly played an important role in 3D multimedia broadcasting, virtual reality, and augmented reality. Considering that multiple views collectively provide abundant object geometry and structure information, this paper proposes a novel <underline>U</u>nsupervised <underline>M</u>ulti-View 3D <underline>P</u>oint Cloud <underline>R</u>econstruction <underline>Net</u>work (UMPR-Net) to reconstruct high-quality 3D point clouds by effectively exploring multi-view consistency and complementarity. In particular, by effectively perceiving the consistency of local object information contained in different views, a consistency-aware point cloud reconstruction module is designed to reconstruct 3D point clouds for each individual view. Additionally, a complementarity-oriented point cloud fusion module is presented to aggregate reliable complementary information explored from multiple point clouds corresponding to diverse views, thus ultimately obtaining a refined 3D point cloud. By projecting reconstructed 3D point clouds onto 2D planes and subsequently constraining the consistency between 2D projections and 2D supervision, the proposed UMPR-Net is encouraged to reconstruct high-quality 3D point clouds from multiple views. Experimental results on the synthetic and real-world datasets have validated the effectiveness of the proposed UMPR-Net.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"193-202"},"PeriodicalIF":3.2,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perception- and Fidelity-Aware Reduced-Reference Super-Resolution Image Quality Assessment","authors":"Xinying Lin;Xuyang Liu;Hong Yang;Xiaohai He;Honggang Chen","doi":"10.1109/TBC.2024.3475820","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475820","url":null,"abstract":"With the advent of image super-resolution (SR) algorithms, how to evaluate the quality of generated SR images has become an urgent task. Although full-reference methods perform well in SR image quality assessment (SR-IQA), their reliance on high-resolution (HR) images limits their practical applicability. Leveraging available reconstruction information as much as possible for SR-IQA, such as low-resolution (LR) images and the scale factors, is a promising way to enhance assessment performance for SR-IQA without HR for reference. In this paper, we attempt to evaluate the perceptual quality and reconstruction fidelity of SR images considering LR images and scale factors. Specifically, we propose a novel dual-branch reduced-reference SR-IQA network, <italic>i.e.</i>, Perception- and Fidelity-aware SR-IQA (PFIQA). The perception-aware branch evaluates the perceptual quality of SR images by leveraging the merits of global modeling of Vision Transformer (ViT) and local relation of ResNet, and incorporating the scale factor to enable comprehensive visual perception. Meanwhile, the fidelity-aware branch assesses the reconstruction fidelity between LR and SR images through their visual perception. The combination of the two branches substantially aligns with the human visual system, enabling a comprehensive SR image evaluation. Experimental results indicate that our PFIQA outperforms current state-of-the-art models across three widely-used SR-IQA benchmarks. Notably, PFIQA excels in assessing the quality of real-world SR images. Our code is available at <uri>https://github.com/xinyouu/PFIQA</uri>.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"323-333"},"PeriodicalIF":3.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinqiang Wu;Zhouyan He;Gangyi Jiang;Mei Yu;Yang Song;Ting Luo
{"title":"No-Reference Point Cloud Quality Assessment Through Structure Sampling and Clustering Based on Graph","authors":"Xinqiang Wu;Zhouyan He;Gangyi Jiang;Mei Yu;Yang Song;Ting Luo","doi":"10.1109/TBC.2024.3482173","DOIUrl":"https://doi.org/10.1109/TBC.2024.3482173","url":null,"abstract":"As a popular multimedia representation, 3D Point Clouds (PC) inevitably encounter distortion during their acquisition, processing, coding, and transmission, resulting in visual quality degradation. Therefore, it is critical to propose a Point Cloud Quality Assessment (PCQA) method to perceive the visual quality of PC. In this paper, we propose a no-reference PCQA method through structure sampling and clustering based on graph, which consists of two-stage pre-processing, quality feature extraction, attention-based feature fusion, and feature regression. For pre-processing, considering the Human Visual System (HVS) tendency to perceive distortions in both the global structure and local details of PCs, a two-stage sampling strategy is introduced. Specifically, to adapt to the irregular structure of PCs, it introduces structural key point sampling and local cluster to capture both global and local information, respectively, thereby facilitating more effective learning of distortion features. Then, in quality feature extraction, two modules are designed based on the two-stage pre-processing results (i.e., Global Feature Extraction (GFE) and Local Feature Extraction (LFE)) to respectively extract global and local quality features. Additionally, for attention-based feature fusion, a Unified Feature Integrator (UFI) module is proposed. This module enhances quality perception capability by integrating global features and individual local quality features and introduces the Transformer to interact with the integrated quality features. Finally, feature regression is conducted to map the final features into the quality score. The performance of the proposed method is tested on four publicly available databases, and the experimental results show that the proposed method is superior compared with existing state-of-the-art no-reference PCQA methods in most cases.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"307-322"},"PeriodicalIF":3.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Radio Propagation Modeling for a Cost-Effective DAB+ Service Coverage in Tunnels","authors":"Bruno Sacco;Assunta De Vita","doi":"10.1109/TBC.2024.3484268","DOIUrl":"https://doi.org/10.1109/TBC.2024.3484268","url":null,"abstract":"Providing a satisfactory coverage of Digital Audio Broadcasting (DAB+) service inside tunnels, in the VHF band, represents a very challenging task. The classic - but expensive - solution adopted so far is the use of radiating cables (“leaky feeders”) installed on the tunnel’s ceiling over its entire length. An alternative and cheaper solution, investigated in the present paper, is the so-called “direct RF radiation” approach, consisting of antennas placed inside the tunnel or just outside its entrance. A simulative analysis has been carried out in order to evaluate the impact of the design parameters, but also to serve as a tool for the estimation of the achievable service coverage. In addition, assuming a gallery to behave like a lossy waveguide, a <italic>mode analysis</i> has been performed on the tunnel cross section, providing a fairly good estimation of the wave propagation attenuation. Interesting outcomes have been obtained from this simulative study: for instance, the behavior of the electric field as a function of distance suggests that, in the absence of geometric perturbations, the slope in the far zone is in good agreement with the attenuation value per unit distance of the main propagation mode. Curved sections cause a further attenuation which depends on the radius of curvature and the geometric dimensions of the tunnel section have a very strong impact on attenuation. Furthermore, more interesting outcomes show that, for arched tunnel sections, the fundamental propagation mode is horizontally polarized. As a result, the typical “whip” vehicular receiving antenna is not adequate: a horizontally polarized antenna would provide a much better service inside the tunnels. The investigation of the above findings have led to the set-up of a tool that can be applicable to every type of tunnel’s configuration for the verification and optimization of direct RF radiation installations for DAB/DAB+ services.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"52-62"},"PeriodicalIF":3.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Outage Probability Analysis of Cooperative NOMA With Successive Refinement","authors":"Meng Cheng;Yifan Zhou;Shuang Wei;Shen Qian","doi":"10.1109/TBC.2024.3477000","DOIUrl":"https://doi.org/10.1109/TBC.2024.3477000","url":null,"abstract":"This paper proposes a broadcasting system with cooperative non-orthogonal multiple access (CO-NOMA) and successive refinement (SR) coding. Specifically, signals containing the basic description of the source and the refinement are overlapped at the transmitter, and broadcast to user equipment (UE) having different qualities-of-service (QoS) requirements. Although the far UEs may only be capable of decoding the basic description allocated with higher transmit power, some of them may still demand a high QoS like the near UE. To address this issue, this work utilizes the near UE to establish a relay transmission, thereby the information recovered at the far UE can be refined. Considering three different relaying schemes, the outage probabilities of the proposed system are derived in closed-form, assuming all channels suffer from block Rayleigh fading. Based on the optimal power allocations, the best scheme yielding the lowest outage probabilities is found, and the advantages over down-link NOMA with SR (DN-SR) and conventional CO-NOMA are also demonstrated.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"42-51"},"PeriodicalIF":3.2,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lizhi Hou;Linyao Gao;Qian Zhang;Yiling Xu;Jenq-Neng Hwang;Dong Wang
{"title":"Rate Control for Geometry-Based LiDAR Point Cloud Compression via Multi-Factor Modeling","authors":"Lizhi Hou;Linyao Gao;Qian Zhang;Yiling Xu;Jenq-Neng Hwang;Dong Wang","doi":"10.1109/TBC.2024.3475808","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475808","url":null,"abstract":"Geometry-based Point Cloud Compression (G-PCC) standard developed by the Moving Picture Experts Group has shown a promising prospect for compressing extremely sparse point clouds captured by the Light Detection And Ranging (LiDAR) equipment. However, as an essential functionality for low delay and limited bandwidth transmission, rate control for Geometry-based LiDAR Point Cloud Compression (G-LPCC) has not been fully studied. In this paper, we propose a rate control scheme for G-LPCC. We first adopt the best configuration of G-PCC for the LiDAR point cloud as the basis in terms of the Rate-Distortion (R-D) performance, which is the predictive tree (PT) for geometry compression and Region Adaptive Haar Transform (RAHT) for attribute compression. The common challenge of designing rate control algorithms for PT and RAHT is that their rates are determined by multiple factors. To address that, we propose a <italic>l</i> domain rate control algorithm for PT that unifies the various geometry influential factors in the expression of the minimum arc length <inline-formula> <tex-math>$mathrm {d}l$ </tex-math></inline-formula> to determine the final rate. A power-style geometry rate curve characterized by <inline-formula> <tex-math>$mathrm {d}l$ </tex-math></inline-formula> has been modeled. By analyzing the distortion behavior of different quantization parameters, an adaptive bitrate control method is proposed to improve the R-D performance. In addition, we borrow the <inline-formula> <tex-math>$rho $ </tex-math></inline-formula> factor from the previous 2D video rate control and successfully apply it to RAHT rate control. A simple and clean linear attribute rate curve characterized by <inline-formula> <tex-math>$rho $ </tex-math></inline-formula> has been modeled, and a corresponding parameter estimation method based on the cumulative distribution function is proposed for bitrate control. The experimental results demonstrate that the proposed rate control algorithm can achieve accurate rate control with additional Bjontegaard-Delta-rate (BD-rate) gains.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"167-179"},"PeriodicalIF":3.2,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Khai Nguyen;Ebrahim Bedeer;Ha H. Nguyen;J. Eric Salt;Colin Howlett
{"title":"Optimized Canceling Signals for PTS Schemes to Improve the PAPR of OFDM Systems Without Side Information","authors":"The Khai Nguyen;Ebrahim Bedeer;Ha H. Nguyen;J. Eric Salt;Colin Howlett","doi":"10.1109/TBC.2024.3475748","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475748","url":null,"abstract":"This paper introduces a novel blind partial transmission sequence (PTS) scheme to lower the peak-to-average-power ratio (PAPR) of orthogonal frequency division multiplexing (OFDM) systems. Unlike existing PTS schemes in which the first sub-block (SB) is preserved as a phase reference for other SBs, we propose to add an optimized canceling signal (CS) to the first SB to further reduce the PAPR. The CS is designed such that they can be reconstructed by the receiver, and subtracted from the received signals before demodulation without requiring side information (SI). Since errors in reproducing the CS at the receiver can degrade the error performance, we design a novel CS protection mechanism specifically to protect the reconstruction of the CS. The proposed method is shown to significantly reduce the PAPR and symbol error rate (SER) without sacrificing the data rate due to using SI as many other existing PTS schemes.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"360-370"},"PeriodicalIF":3.2,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient and Flexible Complexity Control Method for Versatile Video Coding","authors":"Yan Zhao;Chen Zhu;Jun Xu;Guo Lu;Li Song;Siwei Ma","doi":"10.1109/TBC.2024.3475811","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475811","url":null,"abstract":"Recently, numerous complexity control approaches have been proposed to achieve the target encoding complexity. However, only few of them were developed for VVC encoders. This paper fills this gap by proposing an efficient and flexible complexity control approach for VVC. The support for both Acceleration Ratio Control (ARC) and Encoding Time Control (ETC) makes our method highly versatile for various applications. At first, we introduce a sequence-level complexity estimation model to merge the ARC and ETC tasks. Then, four key modules are involved for complexity control: complexity allocation, complexity estimation, encoding configuration decision, and feedback. Specifically, we hierarchically allocate the complexity budget to three coding levels: GOP, frame, and Basic Unit (BU). Each BU’s allocation weight is decided by its SSIM distortion, whereby the perceptual quality can be ensured. The multi-complexity configurations are established by altering the partition depth and number of reference frames. Via tuning each BU’s configuration according to its target acceleration ratio and adaptively updating the control strategies based on the feedback, our scheme can precisely realize any achievable acceleration targets within one-pass encoding. Moreover, each BU’s un-accelerated reference encoding time, which is used to calculate its target acceleration ratio, is estimated by SVR models. Experiments prove that for both the ARC and ETC tasks, our scheme can precisely achieve a wide range of complexity targets (30% <inline-formula> <tex-math>$sim ~100$ </tex-math></inline-formula>%) with negligible RD loss in PSNR and SSIM, outperforming other state-of-the-art methods.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"96-110"},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}