IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献_第7页

IEEE Circuits and Systems Society 电气和电子工程师学会电路与系统协会

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3405094

引用次数: 0

Guest Editorial Advances in Generative Visual Signal Coding and Processing 特邀编辑：生成式视觉信号编码与处理的进展

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3403318

Zhibo Chen;Heming Sun;Li Zhang;Fan Zhang

{"title":"Guest Editorial Advances in Generative Visual Signal Coding and Processing","authors":"Zhibo Chen;Heming Sun;Li Zhang;Fan Zhang","doi":"10.1109/JETCAS.2024.3403318","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3403318","url":null,"abstract":"This special issue of IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS) is dedicated to demonstrating the latest developments in algorithms, implementations, and applications related to visual signal coding and processing with generative models. In recent years, generative models have emerged as one of the most significant and rapidly developing areas of research in artificial intelligence. They have proved to be an important instrument for advancing research in AI-based visual signal coding and processing. For instance, the variational autoencoder (VAE) has been used as a fundamental framework for end-to-end learned image coding, the autoregressive (AR) model has been extensively studied for efficient entropy coding, and the generative adversarial network (GAN) has been utilized frequently to enhance the subjective quality of coding schemes. Meanwhile, generative models have also been explored in various visual signal processing tasks, including quality assessment, restoration, enhancement, editing, and interpolation.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"145-148"},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579096","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141495162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parameter Reduction of Kernel-Based Video Frame Interpolation Methods Using Multiple Encoders 使用多个编码器减少基于核的视频帧插值方法的参数

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-30 DOI: 10.1109/JETCAS.2024.3395418

Issa Khalifeh;Luka Murn;Ebroul Izquierdo

{"title":"Parameter Reduction of Kernel-Based Video Frame Interpolation Methods Using Multiple Encoders","authors":"Issa Khalifeh;Luka Murn;Ebroul Izquierdo","doi":"10.1109/JETCAS.2024.3395418","DOIUrl":"10.1109/JETCAS.2024.3395418","url":null,"abstract":"Video frame interpolation synthesises a new frame from existing frames. Several approaches have been devised to handle this core computer vision problem. Kernel-based approaches use an encoder-decoder architecture to extract features from the inputs and generate weights for a local separable convolution operation which is used to warp the input frames. The warped inputs are then combined to obtain the final interpolated frame. The ease of implementation of such an approach and favourable performance have enabled it to become a popular method in the field of interpolation. One downside, however, is that the encoder-decoder feature extractor is large and uses a lot of parameters. We propose a Multi-Encoder Method for Parameter Reduction (MEMPR) that can significantly reduce parameters by up to 85% whilst maintaining a similar level of performance. This is achieved by leveraging multiple encoders to focus on different aspects of the input. The approach can also be used to improve the performance of kernel-based models in a parameter-effective manner. To encourage the adoption of such an approach in potential future kernel-based methods, the approach is designed to be modular, intuitive and easy to implement. It is implemented on some of the most impactful kernel-based works such as SepConvNet, AdaCoFNet and EDSC. Extensive experiments on datasets with varying ranges of motion highlight the effectiveness of the MEMPR approach and its generalisability to different convolutional backbones and kernel-based operators.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"245-260"},"PeriodicalIF":3.7,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10510388","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TM-GAN: A Transformer-Based Multi-Modal Generative Adversarial Network for Guided Depth Image Super-Resolution TM-GAN：用于深度图像超分辨率的基于变换器的多模态生成对抗网络

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-29 DOI: 10.1109/JETCAS.2024.3394495

Jiang Zhu;Van Kwan Zhi Koh;Zhiping Lin;Bihan Wen

{"title":"TM-GAN: A Transformer-Based Multi-Modal Generative Adversarial Network for Guided Depth Image Super-Resolution","authors":"Jiang Zhu;Van Kwan Zhi Koh;Zhiping Lin;Bihan Wen","doi":"10.1109/JETCAS.2024.3394495","DOIUrl":"10.1109/JETCAS.2024.3394495","url":null,"abstract":"Despite significant strides in deep single image super-resolution (SISR), the development of robust guided depth image super-resolution (GDSR) techniques presents a notable challenge. Effective GDSR methods must not only exploit the properties of the target image but also integrate complementary information from the guidance image. The state-of-the-art in guided image super-resolution has been dominated by convolutional neural network (CNN) based methods, which leverage CNN as their architecture. However, CNN has limitations in capturing global information effectively, and their traditional regression training techniques can sometimes lead to challenges in the precise generating of high-frequency details, unlike transformers that have shown remarkable success in deep learning through the self-attention mechanism. Drawing inspiration from the transformative impact of transformers in both language and vision applications, we propose a Transformer-based Multi-modal Generative Adversarial Network dubbed TM-GAN. TM-GAN is designed to effectively process and integrate multi-modal data, leveraging the global contextual understanding and detailed feature extraction capabilities of transformers within a GAN architecture for GDSR, aiming to effectively integrate and utilize multi-modal data sources. Experimental evaluations of TM-GAN on a variety of RGB-D datasets demonstrate its superiority over the state-of-the-art methods, showcasing its effectiveness in leveraging transformer-based techniques for GDSR.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"261-274"},"PeriodicalIF":3.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Compressed-Domain Vision Transformer for Image Classification 用于图像分类的压缩域视觉变换器

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-29 DOI: 10.1109/JETCAS.2024.3394878

Ruolei Ji;Lina J. Karam

{"title":"Compressed-Domain Vision Transformer for Image Classification","authors":"Ruolei Ji;Lina J. Karam","doi":"10.1109/JETCAS.2024.3394878","DOIUrl":"10.1109/JETCAS.2024.3394878","url":null,"abstract":"Compressed-domain visual task schemes, where visual processing or computer vision are directly performed on the compressed-domain representations, were shown to achieve a higher computational efficiency during training and deployment by avoiding the need to decode the compressed visual information while resulting in a competitive or even better performance as compared to corresponding spatial-domain visual tasks. This work is concerned with learning-based compressed-domain image classification, where the image classification is performed directly on compressed-domain representations, also known as latent representations, that are obtained using a learning-based visual encoder. In this paper, a compressed-domain Vision Transformer (cViT) is proposed to perform image classification in the learning-based compressed-domain. For this purpose, the Vision Transformer (ViT) architecture is adopted and modified to perform classification directly in the compressed-domain. As part of this work, a novel feature patch embedding is introduced leveraging the within- and cross-channel information in the compressed-domain. Also, an adaptation training strategy is designed to adopt the weights from the pre-trained spatial-domain ViT and adapt these to the compressed-domain classification task. Furthermore, the pre-trained ViT weights are utilized through interpolation for position embedding initialization to further improve the performance of cViT. The experimental results show that the proposed cViT outperforms the existing compressed-domain classification networks in terms of Top-1 and Top-5 classification accuracies. Moreover, the proposed cViT can yield competitive classification accuracies with a significantly higher computational efficiency as compared to pixel-domain approaches.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"299-310"},"PeriodicalIF":3.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting FVIFormer：用于视频绘制的流量引导全局-本地聚合变换器网络

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-25 DOI: 10.1109/JETCAS.2024.3392972

Weiqing Yan;Yiqiu Sun;Guanghui Yue;Wei Zhou;Hantao Liu

{"title":"FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting","authors":"Weiqing Yan;Yiqiu Sun;Guanghui Yue;Wei Zhou;Hantao Liu","doi":"10.1109/JETCAS.2024.3392972","DOIUrl":"10.1109/JETCAS.2024.3392972","url":null,"abstract":"Video inpainting has been extensively used in recent years. Established works usually utilise the similarity between the missing region and its surrounding features to inpaint in the visually damaged content in a multi-stage manner. However, due to the complexity of the video content, it may result in the destruction of structural information of objects within the video. In addition to this, the presence of moving objects in the damaged regions of the video can further increase the difficulty of this work. To address these issues, we propose a flow-guided global-Local aggregation Transformer network for video inpainting. First, we use a pre-trained optical flow complementation network to repair the defective optical flow of video frames. Then, we propose a content inpainting module, which use the complete optical flow as a guide, and propagate the global content across the video frames using efficient temporal and spacial Transformer to inpaint in the corrupted regions of the video. Finally, we propose a structural rectification module to enhance the coherence of content around the missing regions via combining the extracted local and global features. In addition, considering the efficiency of the overall framework, we also optimized the self-attention mechanism to improve the speed of training and testing via depth-wise separable encoding. We validate the effectiveness of our method on the YouTube-VOS and DAVIS video datasets. Extensive experiment results demonstrate the effectiveness of our approach in edge-complementing video content that has undergone stabilisation algorithms.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"235-244"},"PeriodicalIF":3.7,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Image Quality by Reducing Compression Artifacts Using Dynamic Window Swin Transformer 利用动态窗口斯温变换器减少压缩伪影，提高图像质量

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-24 DOI: 10.1109/JETCAS.2024.3392868

Zhenchao Ma;Yixiao Wang;Hamid Reza Tohidypour;Panos Nasiopoulos;Victor C. M. Leung

{"title":"Enhancing Image Quality by Reducing Compression Artifacts Using Dynamic Window Swin Transformer","authors":"Zhenchao Ma;Yixiao Wang;Hamid Reza Tohidypour;Panos Nasiopoulos;Victor C. M. Leung","doi":"10.1109/JETCAS.2024.3392868","DOIUrl":"10.1109/JETCAS.2024.3392868","url":null,"abstract":"Video/image compression codecs utilize the characteristics of the human visual system and its varying sensitivity to certain frequencies, brightness, contrast, and colors to achieve high compression. Inevitably, compression introduces undesirable visual artifacts. As compression standards improve, restoring image quality becomes more challenging. Recently, deep learning based models, especially transformer-based image restoration models, have emerged as a promising approach for reducing compression artifacts, demonstrating very good restoration performance. However, all the proposed transformer based restoration methods use a same fixed window size, confining pixel dependencies in fixed areas. In this paper, we propose a new and unique image restoration method that addresses the shortcoming of existing methods by first introducing a content adaptive dynamic window that is applied to self-attention layers which in turn are weighted by our channel and spatial attention module utilized in Swin Transformer to mainly capture long and medium range pixel dependencies. In addition, local dependencies are further enhanced by integrating a CNN based network inside the Swin Transformer Block to process the image augmented by our self-attention module. Performance evaluations using images compressed by one of the latest compression standards, namely the Versatile Video Coding (VVC), when measured in Peak Signal-to-Noise Ratio (PSNR), our proposed approach achieves an average gain of 1.32dB on three different benchmark datasets for VVC compression artifacts reduction. Additionally, our proposed approach improves the visual quality of compressed images by an average of 2.7% in terms of Video Multimethod Assessment Fusion (VMAF).","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"275-285"},"PeriodicalIF":3.7,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low Latency Variational Autoencoder on FPGAs FPGA 上的低延迟变异自动编码器

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-16 DOI: 10.1109/JETCAS.2024.3389660

Zhiqiang Que;Minghao Zhang;Hongxiang Fan;He Li;Ce Guo;Wayne Luk

引用次数: 0

CGVC-T: Contextual Generative Video Compression With Transformers CGVC-T：使用变形器的上下文生成式视频压缩

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-10 DOI: 10.1109/JETCAS.2024.3387301

Pengli Du;Ying Liu;Nam Ling

{"title":"CGVC-T: Contextual Generative Video Compression With Transformers","authors":"Pengli Du;Ying Liu;Nam Ling","doi":"10.1109/JETCAS.2024.3387301","DOIUrl":"10.1109/JETCAS.2024.3387301","url":null,"abstract":"With the high demands for video streaming, recent years have witnessed a growing interest in utilizing deep learning for video compression. Most existing neural video compression approaches adopt the predictive residue coding framework, which is sub-optimal in removing redundancy across frames. In addition, purely minimizing the pixel-wise differences between the raw frame and the decompressed frame is ineffective in improving the perceptual quality of videos. In this paper, we propose a contextual generative video compression method with transformers (CGVC-T), which adopts generative adversarial networks (GAN) for perceptual quality enhancement and applies contextual coding to improve coding efficiency. Besides, we employ a hybrid transformer-convolution structure in the auto-encoders of the CGVC-T, which learns both global and local features within video frames to remove temporal and spatial redundancy. Furthermore, we introduce novel entropy models to estimate the probability distributions of the compressed latent representations, so that the bit rates required for transmitting the compressed video are decreased. The experiments on HEVC, UVG, and MCL-JCV datasets demonstrate that the perceptual quality of our CGVC-T in terms of FID, KID, and LPIPS scores surpasses state-of-the-art learned video codecs, the industrial video codecs x264 and x265, as well as the official reference software JM, HM, and VTM. Our CGVC-T also offers superior DISTS scores among all compared learned video codecs.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"209-223"},"PeriodicalIF":3.7,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Physically Guided Generative Adversarial Network for Holographic 3D Content Generation From Multi-View Light Field 从多视角光场生成全息三维内容的物理引导生成对抗网络

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-09 DOI: 10.1109/JETCAS.2024.3386672

Yunhui Zeng;Zhenwei Long;Yawen Qiu;Shiyi Wang;Junjie Wei;Xin Jin;Hongkun Cao;Zhiheng Li

{"title":"Physically Guided Generative Adversarial Network for Holographic 3D Content Generation From Multi-View Light Field","authors":"Yunhui Zeng;Zhenwei Long;Yawen Qiu;Shiyi Wang;Junjie Wei;Xin Jin;Hongkun Cao;Zhiheng Li","doi":"10.1109/JETCAS.2024.3386672","DOIUrl":"10.1109/JETCAS.2024.3386672","url":null,"abstract":"Realizing high-fidelity three-dimensional (3D) scene representation through holography presents a formidable challenge, primarily due to the unknown mechanism of the optimal hologram and huge computational load as well as memory usage. Herein, we propose a Physically Guided Generative Adversarial Network (PGGAN), which is the first generative model to transform the multi-view light field directly to holographic 3D content. PGGAN harmoniously fuses the fidelity of data-driven learning with the rigor of physical optics principles, ensuring a stable reconstruction quality across wide field of view, which is unreachable by current central-view-centric approaches. The proposed framework presents an innovative encoder-generator-discriminator, which is informed by a physical optics model. It benefits from the speed and adaptability of data-driven methods to facilitate rapid learning and effectively transfer to novel scenes, while its physics-based guidance ensures that the generated holograms adhere to holographic standards. A unique, differentiable physical model facilitates end-to-end training, which aligns the generative process with the “holographic space”, thereby improving the quality of the reconstructed light fields. Employing an adaptive loss strategy, PGGAN dynamically adjusts the influence of physical guidance in the initial training stages, later optimizing for reconstruction accuracy. Empirical evaluations reveal PGGAN’s exceptional ability to swiftly generate a detailed hologram in as little as 0.002 seconds, significantly eclipsing current state-of-the-art techniques in speed while maintaining superior angular reconstruction fidelity. These results demonstrate PGGAN’s effectiveness in producing high-quality holograms rapidly from multi-view datasets, advancing real-time holographic rendering significantly.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"286-298"},"PeriodicalIF":3.7,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140592787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0