2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

筛选
英文 中文
Perceptual Evaluation of Pre-processing for Video Transcoding 视频转码预处理的感知评价
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675438
Shiyu Huang, Ziyuan Luo, Jiahua Xu, Wei Zhou, Zhibo Chen
{"title":"Perceptual Evaluation of Pre-processing for Video Transcoding","authors":"Shiyu Huang, Ziyuan Luo, Jiahua Xu, Wei Zhou, Zhibo Chen","doi":"10.1109/VCIP53242.2021.9675438","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675438","url":null,"abstract":"Recently, the pre-processed video transcoding has attracted wide attention and has been increasingly used in practical applications for improving the perceptual experience and saving transmission resources. However, very few works have been conducted to evaluate the performance of pre-processing methods. In this paper, we select the source (SRC) videos and various pre-processing approaches to construct the first Pre-processed and Transcoded Video Database (PTVD). Then, we conduct the subjective experiment, showing that compared with the video sent to the codec directly at the same bitrate, the appropriate pre-processing methods indeed improve the perceptual quality. Finally, existing image/video quality metrics are evaluated on our database. The results indicate that the performance of the existing image/video quality assessment (IQA/VQA) approaches remain to be improved. We will make our database publicly available soon.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122839661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAESR: Conditional Autoencoder and Super-Resolution for Learned Spatial Scalability 条件自编码器和学习空间可扩展性的超分辨率
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675351
Charles Bonnineau, W. Hamidouche, J. Travers, N. Sidaty, Jean-Yves Aubié, O. Déforges
{"title":"CAESR: Conditional Autoencoder and Super-Resolution for Learned Spatial Scalability","authors":"Charles Bonnineau, W. Hamidouche, J. Travers, N. Sidaty, Jean-Yves Aubié, O. Déforges","doi":"10.1109/VCIP53242.2021.9675351","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675351","url":null,"abstract":"In this paper, we present CAESR, an hybrid learning-based coding approach for spatial scalability based on the versatile video coding (VVC) standard. Our framework considers a low-resolution signal encoded with VVC intra-mode as a base-layer (BL), and a deep conditional autoencoder with hyperprior (AE-HP) as an enhancement-layer (EL) model. The EL encoder takes as inputs both the upscaled BL reconstruction and the original image. Our approach relies on conditional coding that learns the optimal mixture of the source and the upscaled BL image, enabling better performance than residual coding. On the decoder side, a super-resolution (SR) module is used to recover high-resolution details and invert the conditional coding process. Experimental results have shown that our solution is competitive with the VVC full-resolution intra coding while being scalable.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115225404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Security and Forensics Exploration of Learning-based Image Coding 基于学习的图像编码的安全性和法医学探索
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675445
Deepayan Bhowmik, Mohamed Elawady, Keiller Nogueira
{"title":"Security and Forensics Exploration of Learning-based Image Coding","authors":"Deepayan Bhowmik, Mohamed Elawady, Keiller Nogueira","doi":"10.1109/VCIP53242.2021.9675445","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675445","url":null,"abstract":"Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Experts Team's (JVET) deep neural networks (DNN) based video coding. These codecs in fact represent a new type of media format. As a dire consequence, traditional media security and forensic techniques will no longer be of use. This paper proposes an initial study on the effectiveness of traditional watermarking on two state-of-the-art learning based image coding. Results indicate that traditional watermarking methods are no longer effective. We also examine the forensic trails of various DNN architectures in the learning based codecs by proposing a residual noise based source identification algorithm that achieved 79% accuracy.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127250561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning-Based Complexity Reduction Scheme for VVC Intra-Frame Prediction 基于学习的VVC帧内预测复杂度降低方案
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675394
Mário Saldanha, G. Sanchez, C. Marcon, L. Agostini
{"title":"Learning-Based Complexity Reduction Scheme for VVC Intra-Frame Prediction","authors":"Mário Saldanha, G. Sanchez, C. Marcon, L. Agostini","doi":"10.1109/VCIP53242.2021.9675394","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675394","url":null,"abstract":"This paper presents a learning-based complexity reduction scheme for Versatile Video Coding (VVC) intra-frame prediction. VVC introduces several novel coding tools to improve the coding efficiency of the intra-frame prediction at the cost of a high computational effort. Thus, we developed an efficient complexity reduction scheme composed of three solutions based on machine learning and statistical analysis to reduce the number of intra prediction modes evaluated in the costly Rate-Distortion Optimization (RDO) process. Experimental results demonstrated that the proposed solution provides 18.32% encoding timesaving with a negligible impact on the coding efficiency.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127353997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Faster and Finer Pose Estimation for Object Pool in a Single RGB Image 更快和更精细的姿态估计对象池在一个单一的RGB图像
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675316
Lee Aing, W. Lie, J. Chiang
{"title":"Faster and Finer Pose Estimation for Object Pool in a Single RGB Image","authors":"Lee Aing, W. Lie, J. Chiang","doi":"10.1109/VCIP53242.2021.9675316","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675316","url":null,"abstract":"Predicting/estimating the 6DoF pose parameters for multi-instance objects accurately in a fast manner is an important issue in robotic and computer vision. Even though some bottom-up methods have been proposed to be able to estimate multiple instance poses simultaneously, their accuracy cannot be considered as good enough when compared to other state-of-the-art top-down methods. Their processing speed still cannot respond to practical applications. In this paper, we present a faster and finer bottom-up approach of deep convolutional neural network to estimate poses of the object pool even multiple instances of the same object category present high occlusion/overlapping. Several techniques such as prediction of semantic segmentation map, multiple keypoint vector field, and 3D coordinate map, and diagonal graph clustering are proposed and combined to achieve the purpose. Experimental results and ablation studies show that the proposed system can achieve comparable accuracy at a speed of 24.7 frames per second for up to 7 objects by evaluation on the well-known Occlusion LINEMOD dataset.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126903455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer HCiT:使用CNN特征和视觉变压器混合模型的深度假视频检测
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675402
Bachir Kaddar, Sid Ahmed Fezza, W. Hamidouche, Z. Akhtar, A. Hadid
{"title":"HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer","authors":"Bachir Kaddar, Sid Ahmed Fezza, W. Hamidouche, Z. Akhtar, A. Hadid","doi":"10.1109/VCIP53242.2021.9675402","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675402","url":null,"abstract":"The number of new falsified video contents is dramatically increasing, making the need to develop effective deepfake detection methods more urgent than ever. Even though many existing deepfake detection approaches show promising results, the majority of them still suffer from a number of critical limitations. In general, poor generalization results have been obtained under unseen or new deepfake generation methods. Consequently, in this paper, we propose a deepfake detection method called HCiT, which combines Convolutional Neural Network (CNN) with Vision Transformer (ViT). The HCiT hybrid architecture exploits the advantages of CNN to extract local information with the ViT's self-attention mechanism to improve the detection accuracy. In this hybrid architecture, the feature maps extracted from the CNN are feed into ViT model that determines whether a specific video is fake or real. Experiments were performed on Faceforensics++ and DeepFake Detection Challenge preview datasets, and the results show that the proposed method significantly outperforms the state-of-the-art methods. In addition, the HCiT method shows a great capacity for generalization on datasets covering various techniques of deepfake generation. The source code is available at: https://github.com/KADDAR-Bachir/HCiT","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125272892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
LRS-Net: invisible QR Code embedding, detection, and restoration LRS-Net:隐形二维码嵌入、检测、还原
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675327
Yiyan Yang, Zhongpai Gao, Guangtao Zhai
{"title":"LRS-Net: invisible QR Code embedding, detection, and restoration","authors":"Yiyan Yang, Zhongpai Gao, Guangtao Zhai","doi":"10.1109/VCIP53242.2021.9675327","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675327","url":null,"abstract":"QR code is a powerful tool to bridge the offline and online worlds. It has been widely used because it can store a large amount of information in a small space. However, the black-and-white style of QR codes is not attractive to the human eyes when embedded in videos, which greatly affects the viewing experience. Invisible QR code has proposed based on temporal psycho-visual modulation (TPVM) to embed invisible hyperlinks in shopping websites, copyright watermarks in movies, etc. However, existing embedding and detection methods are not robust enough. In this paper, we adopt a novel embedding method to greatly improve the visual quality of the embedded video. Furthermore, we build a new dataset of invisible QR codes named 'IQRCodes' to train deep neural networks. At last, we propose localization, refinement, and segmentation neural netowrks (LRS-Net) to efficiently detect and restore invisible QR codes that are captured by mobile phones.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114558838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
360HRL: Hierarchical Reinforcement Learning Based Rate Adaptation for 360-Degree Video Streaming 360HRL:基于分层强化学习的360度视频流速率自适应
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675439
Jun Fu, Chen Hou, Zhibo Chen
{"title":"360HRL: Hierarchical Reinforcement Learning Based Rate Adaptation for 360-Degree Video Streaming","authors":"Jun Fu, Chen Hou, Zhibo Chen","doi":"10.1109/VCIP53242.2021.9675439","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675439","url":null,"abstract":"Recently, reinforced adaptive bitrate (ABR) algorithms have achieved remarkable success in tile-based 360-degree video streaming. However, they heavily rely on accurate viewport prediction. To alleviate this issue, we propose a hierarchical reinforcement-learning (RL) based ABR algorithm, dubbed 360HRL. Specifically, 360HRL consists of a top agent and a bottom agent. The former is used to decide whether to download a new segment for continuous playback or re-download an old segment for correcting wrong bitrate decisions caused by inaccurate viewport estimation, and the latter is used to select bitrates for tiles in the chosen segment. In addition, 360HRL adopts a two-stage training methodology. In the first stage, the bottom agent is trained under the environment where the top agent always chooses to download a new segment. In the second stage, the bottom agent is fixed and the top agent is optimized with the help of a heuristic decision rule. Experimental results demonstrate that 360HRL outperforms existing RL-based ABR algorithms across a broad of network conditions and quality of experience (QoE) objectives.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124075777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Scalable Privacy in Multi-Task Image Compression 多任务图像压缩中的可扩展隐私
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675357
Saeed Ranjbar Alvar, I. Bajić
{"title":"Scalable Privacy in Multi-Task Image Compression","authors":"Saeed Ranjbar Alvar, I. Bajić","doi":"10.1109/VCIP53242.2021.9675357","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675357","url":null,"abstract":"Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various analyses of the input image, such as object detection or segmentation, besides decoding the image. At the same time, privacy concerns around visual ana-lytics have grown in response to the increasing capabilities of such systems to reveal private information. In this paper, we propose a method to make latent-space inference more privacy-friendly using mutual information-based criteria. In particular, we show how organizing and compressing the latent representation of the image according to task-specific mutual information can make the model maintain high analytics accuracy while becoming less able to reconstruct the input image and thereby reveal private information.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115486521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Spatio-spectral Image Reconstruction Using Non-local Filtering 基于非局部滤波的空间光谱图像重构
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675421
Frank Sippel, Jürgen Seiler, A. Kaup
{"title":"Spatio-spectral Image Reconstruction Using Non-local Filtering","authors":"Frank Sippel, Jürgen Seiler, A. Kaup","doi":"10.1109/VCIP53242.2021.9675421","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675421","url":null,"abstract":"In many image processing tasks it occurs that pixels or blocks of pixels are missing or lost in only some channels. For example during defective transmissions of RGB images, it may happen that one or more blocks in one color channel are lost. Nearly all modern applications in image processing and transmission use at least three color channels, some of the applications employ even more bands, for example in the infrared and ultraviolet area of the light spectrum. Typically, only some pixels and blocks in a subset of color channels are distorted. Thus, other channels can be used to reconstruct the missing pixels, which is called spatio-spectral reconstruction. Current state-of-the-art methods purely rely on the local neighborhood, which works well for homogeneous regions. However, in high-frequency regions like edges or textures, these methods fail to properly model the relationship between color bands. Hence, this paper introduces non-local filtering for building a linear regression model that describes the inter-band relationship and is used to reconstruct the missing pixels. Our novel method is able to increase the PSNR on average by 2 dB and yields visually much more appealing images in high-frequency regions.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121806766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信