Christian R. Helmrich, Ivan Zupancic, J. Brandenburg, Valeri George, A. Wieckowski, B. Bross
{"title":"Visually Optimized Two-Pass Rate Control for Video Coding Using the Low-Complexity XPSNR Model","authors":"Christian R. Helmrich, Ivan Zupancic, J. Brandenburg, Valeri George, A. Wieckowski, B. Bross","doi":"10.1109/VCIP53242.2021.9675364","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675364","url":null,"abstract":"Two-pass rate control (RC) schemes have proven useful for generating low-bitrate video-on-demand or streaming catalogs. Visually optimized encoding particularly using latest-generation coding standards like Versatile Video Coding (VVC), however, is still a subject of intensive study. This paper describes the two-pass RC method integrated into version 1 of VVenC, an open VVC encoding software. The RC design is based on a novel two-step rate-quantization parameter (R-QP) model to derive the second-pass coding parameters, and it uses the low-complexity XPSNR visual distortion measure to provide numerically as well as visually stable, perceptually R-D optimized encoding results. Random-access evaluation experiments confirm the improved objective as well as subjective performance of our RC solution.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130082554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Wang, Li Zhang, Kai Zhang, Yuwen He, Hongbin Liu
{"title":"Probability-based decoder-side intra mode derivation for VVC","authors":"Yang Wang, Li Zhang, Kai Zhang, Yuwen He, Hongbin Liu","doi":"10.1109/VCIP53242.2021.9675443","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675443","url":null,"abstract":"Intra prediction is typically used to exploit the spatial redundancy in video coding. In the latest video coding standard Versatile Video Coding (VVC), 67 intra prediction modes are adopted in intra prediction. The encoder selects the best one from 67 modes and signals it to the decoder. Bits consuming of signaling the selected mode may limit the coding efficiency. To reduce the overhead of signaling the intra prediction mode, a probability-based decoder-side intra mode derivation (P-DIMD) is proposed in this paper. Specifically, an intra prediction mode candidate set is constructed based on the probabilities of intra prediction modes. The probability of an intra prediction mode is mainly estimated in two ways. First, the textures are typically continuous within a local region and intra prediction modes of neighboring blocks are similar to each other. Second, some intra prediction modes are preferable to be used than others. For each intra prediction mode in the constructed candidate set, intra prediction is processed on a template to calculate a cost. The intra prediction mode with the minimum cost is determined as the optimal mode and used in the intra prediction of the current block. Experimental results demonstrate that P-DIMD can achieve 0.56% BD-rate saving on average compared to VTM-11.0 under all intra configuration.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131617768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixed-precision Quantization with Dynamical Hessian Matrix for Object Detection Network","authors":"Zerui Yang, Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, H. Xiong","doi":"10.1109/VCIP53242.2021.9675341","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675341","url":null,"abstract":"Mixed-precision quantization with adaptive bitwidth allocation for neural network has achieved higher compression rate and accuracy in classification task. However, it has not been well explored for object detection networks. In this paper, we propose a novel mixed-precision quantization scheme with dynamical Hessian matrix for object detection networks. We iteratively select a layer with the lowest sensitivity based on the Hessian matrix and downgrade its precision to reach the required compression ratio. The L-BFGS algorithm is utilized for updating the Hessian matrix in each quantization iteration. Moreover, we specifically design the loss function for objection detection networks by jointly considering the quantization effects on classification and regression loss. Experimental results on RetinaNet and Faster R-CNN show that the proposed DHMQ achieves state-of-the-art performance for quantized object detec-tors.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128941435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SalGFCN: Graph Based Fully Convolutional Network for Panoramic Saliency Prediction","authors":"Yiwei Yang, Yucheng Zhu, Zhongpai Gao, Guangtao Zhai","doi":"10.1109/VCIP53242.2021.9675373","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675373","url":null,"abstract":"The saliency prediction of panoramic images is dramatically affected by the distortion caused by non-Euclidean geometry characteristic. Traditional CNN based saliency pre-diction algorithms for 2D images are no longer suitable for 360-degree images. Intuitively, we propose a graph based fully convolutional network for saliency prediction of 360-degree images, which can reasonably map panoramic pixels to spherical graph data structures for representation. The saliency prediction network is based on residual U-Net architecture, with dilated graph convolutions and attention mechanism in the bottleneck. Furthermore, we design a fully convolutional layer for graph pooling and unpooling operations in spherical graph space to retain node-to-node features. Experimental results show that our proposed method outperforms other state-of-the-art saliency models on the large-scale dataset.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"8 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131437935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunqian Wen, Bo Liu, Rong Xie, Jingyi Cao, Li Song
{"title":"Deep Motion Flow Aided Face Video De-identification","authors":"Yunqian Wen, Bo Liu, Rong Xie, Jingyi Cao, Li Song","doi":"10.1109/VCIP53242.2021.9675353","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675353","url":null,"abstract":"Advances in cameras and web technology have made it easy to capture and share large amounts of face videos over to an unknown audience with uncontrollable purposes. These raise increasing concerns about unwanted identity-relevant computer vision devices invading the characters's privacy. Previous de-identification methods rely on designing novel neural networks and processing face videos frame by frame, which ignore the data feature in redundancy and continuity. Besides, these techniques are incapable of well-balancing privacy and utility, and per-frame evaluation is easy to cause flicker. In this paper, we present deep motion flow, which can create remarkable de-identified face videos with a good privacy-utility tradeoff. It calculates the relative dense motion flow between every two adjacent original frames and runs the high quality image anonymization only on the first frame. The de-identified video will be obtained based on the anonymous first frame via the relative dense motion flow. Extensive experiments demonstrate the effectiveness of our proposed de-identification method.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133829069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing Time Complexity of Practical Learned Image Compression Models","authors":"Xiaohan Pan, Zongyu Guo, Zhibo Chen","doi":"10.1109/VCIP53242.2021.9675424","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675424","url":null,"abstract":"We have witnessed the rapid development of learned image compression (LIC). The latest LIC models have outperformed almost all traditional image compression standards in terms of rate-distortion (RD) performance. However, the time complexity of LIC model is still underdiscovered, limiting the practical applications in industry. Even with the acceleration of GPU, LIC models still struggle with long coding time, especially on the decoder side. In this paper, we analyze and test a few prevailing and representative LIC models, and compare their complexity with traditional codecs including H.265/HEVC intra and H.266/VVC intra. We provide a comprehensive analysis on every module in the LIC models, and investigate how bitrate changes affect coding time. We observe that the time complexity bottleneck mainly exists in entropy coding and context modelling. Although this paper pay more attention to experimental statistics, our analysis reveals some insights for further acceleration of LIC model, such as model modification for parallel computing, model pruning and a more parallel context model.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2008 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128159255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eva Gil San Antonio, T. Heinis, Louis Carteron, Melpomeni Dimopoulou, M. Antonini
{"title":"Nanopore Sequencing Simulator for DNA Data Storage","authors":"Eva Gil San Antonio, T. Heinis, Louis Carteron, Melpomeni Dimopoulou, M. Antonini","doi":"10.1109/VCIP53242.2021.9675388","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675388","url":null,"abstract":"The exponential increase of digital data and the limited capacity of current storage devices have made clear the need for exploring new storage solutions. Thanks to its biological properties, DNA has proven to be a potential candidate for this task, allowing the storage of information at a high density for hundreds or even thousands of years. With the release of nanopore sequencing technologies, DNA data storage is one step closer to become a reality. Many works have proposed solutions for the simulation of this sequencing step, aiming to ease the development of algorithms addressing nanopore-sequenced reads. However, these simulators target the sequencing of complete genomes, whose characteristics differ from the ones of synthetic DNA. This work presents a nanopore sequencing simulator targeting synthetic DNA on the context of DNA data storage.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127342332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face 2D to 3D Reconstruction Network Based on Head Pose and 3D Facial Landmarks","authors":"Yuanquan Xu, Cheolkon Jung","doi":"10.1109/VCIP53242.2021.9675325","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675325","url":null,"abstract":"Although most existing methods based on 3D mor-phable model (3DMM) need annotated parameters for training as ground truth, only a few datasets contain them. Moreover, it is difficult to acquire accurate 3D face models aligned with the input images due to the gap in dimensions. In this paper, we propose a face 2D to 3D reconstruction network based on head pose and 3D facial landmarks. We build a head pose guided face reconstruction network to regress an accurate 3D face model with the help of 3D facial landmarks. Different from 3DMM parameters, head pose and 3D facial landmarks are successfully estimated even in the wild images. Experiments on 300W-LP, AFLW2000-3D and CelebA HQ datasets show that the proposed method successfully reconstructs 3D face model from a single RGB image thanks to 3D facial landmarks as well as achieves state-of-the-art performance in terms of the normalized mean error (NME).","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129926816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A High Accuracy Camera Calibration Method for Sport Videos","authors":"Neng Zhang, E. Izquierdo","doi":"10.1109/VCIP53242.2021.9675379","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675379","url":null,"abstract":"Camera calibration for sport videos enables precise and natural delivery of graphics on video footage and several other special effects. This in turns substantially improves the visual experience in the audience and facilitates sports analysis within or after the live show. In this paper, we propose a high accuracy camera calibration method for sport videos. First, we generate a homography database by uniformly sampling camera parameters. This database includes more than 91 thousand different homography matrices. Then, we use the conditional generative adversarial network (cGAN) to achieve semantic segmentation splitting the broadcast frames into four classes. In a subsequent processing step, we build an effective feature extraction network to extract the feature of semantic segmented images. After that, we search for the feature in the database to find the best matching homography. Finally, we refine the homography by image alignment. In a comprehensive evaluation using the 2014 World Cup dataset, our method outperforms other state-of-the-art techniques.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"See SIFT in a Rain: Divide-and-conquer SIFT Key Point Recovery from a Single Rainy Image","authors":"Ping Wang, Wei Wu, Zhu-jun Li, Yong Liu","doi":"10.1109/VCIP53242.2021.9675434","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675434","url":null,"abstract":"Scale-Invariant Feature Transform (SIFT) is one of the most well-known image matching methods, which has been widely applied in various visual fields. Because of the adoption of a difference of Gaussian (DoG) pyramid and Gaussian gradient information for extrema detection and description, respectively, SIFT achieves accurate key points and thus has shown excellent matching results but except under adverse weather conditions like rain. To address the issue, in the paper we propose a divide-and-conquer SIFT key points recovery algorithm from a single rainy image. In the proposed algorithm, we do not aim to improve quality for a derained image, but divide the key point recovery problem from a rainy image into two sub-problems, one being how to recover the DoG pyramid for the derained image and the other being how to recover the gradients of derained Gaussian images at multiple scales. We also propose two separate deep learning networks with different losses and structures to recover them, respectively. This divide-and-conquer scheme to set different objectives for SIFT extrema detection and description leads to very robust performance. Experimental results show that our proposed algorithm achieves state-of-the-art performances on widely used image datasets in both quantitative and qualitative tests.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131736110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}