Ekrem Çetinkaya, Hadi Amirpour, C. Timmerer, M. Ghanbari
{"title":"FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning","authors":"Ekrem Çetinkaya, Hadi Amirpour, C. Timmerer, M. Ghanbari","doi":"10.1109/VCIP49819.2020.9301850","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301850","url":null,"abstract":"HTTP Adaptive Streaming (HAS) is the most common approach for delivering video content over the Internet. The requirement to encode the same content at different quality levels (i.e., representations) in HAS is a challenging problem for content providers. Fast multirate encoding approaches try to accelerate this process by reusing information from previously encoded representations. In this paper, we propose to use convolutional neural networks (CNNs) to speed up the encoding of multiple representations with a specific focus on parallel encoding. In parallel encoding, the overall time-complexity is limited to the maximum time-complexity of one of the representations that are encoded in parallel. Therefore, instead of reducing the time-complexity for all representations, the highest time-complexities are reduced. Experimental results show that FaME-ML achieves significant time-complexity savings in parallel encoding scenarios (41% in average) with a slight increase in bitrate and quality degradation compared to the HEVC reference software.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133000090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Bounding Box based Pseudo Annotation Generation Method for Semantic Segmentation","authors":"Xiaolong Xu, Fanman Meng, Hongliang Li, Q. Wu, King Ngi Ngan, Shuai Chen","doi":"10.1109/VCIP49819.2020.9301833","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301833","url":null,"abstract":"This paper proposes a fusion-based method to generate pseudo-annotations from bounding boxes for semantic segmentation. The idea is to first generate diverse foreground masks by multiple bounding box segmentation methods, and then combine these masks to generate pseudo-annotations. Existing methods generate foreground masks from bounding boxes by classical segmentation methods driving by low-level features and own local information, which is hard to generate accurate and diverse results for the fusion. Different from the traditional methods, multiple class-agnostic models are modeled to learn the objectiveness cues by using existing labeled pixel-level annotations and then to fuse. Firstly, the classical Fully Convolutional Network (FCN) that densely predicts the pixels’ labels is used. Then, two new sparse prediction based class-agnostic models are proposed, which simplify the segmentation task as sparsely predicting the boundary points through predicting the distance from the bounding box border to the object boundary in Cartesian Coordinate System and the Polar Coordinate System, respectively. Finally, a voting-based strategy is proposed to combine these segmentation results to form better pseudo-annotations. We conduct experiments on PASCAL VOC 2012 dataset. The mIoU of the proposed method is 68.7%, which outperforms the state-of-the-art method by 1.9%.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130225007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning From Paired and Unpaired Data: Alternately Trained CycleGAN for Near Infrared Image Colorization","authors":"Zaifeng Yang, Zhenghua Chen","doi":"10.1109/VCIP49819.2020.9301791","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301791","url":null,"abstract":"This paper presents a novel near infrared (NIR) image colorization approach for the Grand Challenge held by 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). A Cycle-Consistent Generative Adversarial Network (CycleGAN) with cross-scale dense connections is developed to learn the color translation from the NIR domain to the RGB domain based on both paired and unpaired data. Due to the limited number of paired NIR-RGB images, data augmentation via cropping, scaling, contrast and mirroring operations have been adopted to increase the variations of the NIR domain. An alternating training strategy has been designed, such that CycleGAN can efficiently and alternately learn the explicit pixel-level mappings from the paired NIR-RGB data, as well as the implicit domain mappings from the unpaired ones. Based on the validation data, we have evaluated our method and compared it with conventional CycleGAN method in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and angular error (AE). The experimental results validate the proposed colorization framework.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128775734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Segmentation of Maxillary Sinus Membrane using Automatic Vertex Screening","authors":"K. Li, Tai-Chiu Hsung, A. Yeung, M. Bornstein","doi":"10.1109/VCIP49819.2020.9301845","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301845","url":null,"abstract":"The purpose of this study is to develop an automatic technique to segment the membrane of the maxillary sinus with morphological changes (e.g. thickened membrane and cysts) for the detection of abnormalities. The first step is to segment the sinus bone cavity in the CBCT image using fuzzy C-mean algorithm. Then, the vertices of inner bone walls of sinus in the mesh model are screened with vertex normal direction and angular based mean-distance filtering. The resulted vertices are then used to generate the bony sinus cavity mesh model by using Poisson surface reconstruction. Finally, the sinus membrane morphological changes are segmented by subtracting the air sinus segmentation from the reconstructed bony sinus cavity. The proposed method has been applied on 5 maxillary sinuses with mucosal thickening and has demonstrated that it can segment thin membrane thickening (< 2 mm) successfully within 4.1% and 3.5% error in volume and surface area respectively. Existing methods have issues of leakages at openings and thin bones, and inaccuracy with irregular contours commonly seen in maxillary sinus. The current method overcomes these shortcomings.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125449946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chain Code-Based Occupancy Map Coding for Video-Based Point Cloud Compression","authors":"Runyu Yang, Ning Yan, Li Li, Dong Liu, Feng Wu","doi":"10.1109/VCIP49819.2020.9301867","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301867","url":null,"abstract":"In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is directly compressed losslessly with High Efficiency Video Coding (HEVC). However, the coding tools in HEVC are specifically designed for natural images, thus unsuitable for the occupancy map. In this paper, we present a novel quadtree-based scheme for lossless occupancy map coding. In this scheme, the occupancy map is firstly divided into several coding tree units (CTUs). Then, the CTU is divided into coding units (CUs) recursively using a quadtree. The quadtree partition is terminated when one of the three conditions is satisfied. Firstly, all the pixels have the same value. Secondly, the pixels in the CU only have two kinds of values and they can be separated by a continuous edge whose endpoints lie on the side of the CU. The continuous edge is then coded using chain code. Thirdly, the CU reaches the minimum size. This scheme simplifies the design of block partitioning in HEVC and designs simpler yet more effective coding tools. Experimental results show significant reduction of bit-rate and complexity compared with the occupancy map coding scheme in V-PCC. In addition, this scheme is also very efficient to compress the semantic map.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125501290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Geometry Estimation for Phase-coding Structured Light Field","authors":"Li Liu, S. Xiang, Huiping Deng, Jin Wu","doi":"10.1109/VCIP49819.2020.9301777","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301777","url":null,"abstract":"Estimation scene geometry is an important and fundamental task in light field processing. In conventional light field, there exist homogeneous texture surfaces, which brings ambiguity and heavy computation load in estimating the depth. In this paper, we propose phase-coding structured light field (PSLF), which projects sinusoidal waveform patterns and the phase is assigned to every pixel as the code. With the EPI of PSLF, we propose a depth estimation method. To be specific, the cost is convex with respect to the inclination angle of the candidate line in the EPI, and we propose to iterate rotating the candidate line until it converges to the optimal one. In addition, to cope with problem that the candidate samples cover multiple depth layers, we propose a method to reject the outlier samples. Experimental results demonstrate that, compared with conventional LF, the proposed PSLF improves the depth quality with mean absolute error being 0.007 pixels. In addition, the proposed optimization-based depth estimation method improves efficiency obviously with the processing speed being about 2.71 times of the tradition method.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121289868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Scale Video Inverse Tone Mapping with Deformable Alignment","authors":"Jiaqi Zou, Ke Mei, Songlin Sun","doi":"10.1109/VCIP49819.2020.9301780","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301780","url":null,"abstract":"Inverse tone mapping(iTM) is an operation to transform low-dynamic-range (LDR) content to high-dynamic-range (HDR) content, which is an effective technique to improve the visual experience. ITM has developed rapidly with deep learning algorithms in recent years. However, the great majority of deeplearning-based iTM methods are aimed at images and ignore the temporal correlations of consecutive frames in videos. In this paper, we propose a multi-scale video iTM network with deformable alignment, which increases time consistency in videos. We first a lign t he i nput c onsecutive L DR f rames a t t he feature level by deformable convolutions and then simultaneously use multi-frame information to generate the HDR frame. Additionally, we adopt a multi-scale iTM architecture with a pyramid pooling module, which enables our network to reconstruct details as well as global features. The proposed network achieves better performance compared to other iTM methods on quantitative metrics and gain a significant visual improvement.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123023411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
João M. Santos, Lucas A. Thomaz, P. Assunção, L. Cruz, Luis M. N. Tavora, S. Faria
{"title":"Disparity compensation of light fields for improved efficiency in 4D transform-based encoders","authors":"João M. Santos, Lucas A. Thomaz, P. Assunção, L. Cruz, Luis M. N. Tavora, S. Faria","doi":"10.1109/VCIP49819.2020.9301829","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301829","url":null,"abstract":"Efficient light field en coders take advantage of the inherent 4D data structures to achieve high compression performance. This is accomplished by exploiting the redundancy of co-located pixels in different sub-aperture images (SAIs) through prediction and/or transform schemes to find a m ore compact representation of the signal. However, in image regions with higher disparity between SAIs, such scheme’s performance tends to decrease, thus reducing the compression efficiency. This paper introduces a reversible pre-processing algorithm for disparity compensation that operates on the SAI domain of light field data. The proposed method contributes to improve the transform efficiency of the encoder, since the disparity-compensated data presents higher correlation between co-located image blocks. The experimental results show significant improvements in the compression performance of 4D light fields, achieving Bjontegaard delta rate gains of about 44% on average for MuLE codec using the 4D discrete cosine transform, when encoding High Density Camera Arrays (HDCA) light field images.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127623329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanyuan Qi, Jiayue Zhang, Weiran Xu, Jun Guo, Honggang Zhang
{"title":"Learning Graph Topology Representation with Attention Networks","authors":"Yuanyuan Qi, Jiayue Zhang, Weiran Xu, Jun Guo, Honggang Zhang","doi":"10.1109/VCIP49819.2020.9301864","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301864","url":null,"abstract":"Contextualized neural language models have gained much attention in Information Retrieval (IR) with its ability to achieve better word understanding by capturing contextual structure on sentence level. However, to understand a document better, it is necessary to involve contextual structure from document level. Moreover, some words contributes more information to delivering the meaning of a document. Motivated by this, in this paper, we take the advantages of Graph Convolutional Networks (GCN) and Graph Attention Networks (GAN) to model global word-relation structure of a document with attention mechanism to improve context-aware document ranking. We propose to build a graph for a document to model the global contextual structure. The nodes and edges of the graph are constructed from contextual embeddings. We first apply graph convolution on the graph and then use attention networks to explore the influence of more informative words to obtain a new representation. This representation covers both local contextual and global structure information. The experimental results show that our method outperforms the state-of-the-art contextual language models, which demonstrate that incorporating contextual structure is useful for improving document ranking.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128101463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Versatile Video Coding (VVC) Arrives","authors":"G. Sullivan","doi":"10.1109/VCIP49819.2020.9301847","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301847","url":null,"abstract":"Seven years after the development of the first version of the High Efficiency Video Coding (HEVC) standard, the major international organizations in the world of video coding have completed the next major generation, called Versatile Video Coding (VVC). The VVC standard, formally designated as ITU-T H.266 and ISO/IEC 23090-3, promises a major improvement in video compression relative to its predecessors. It can offer roughly double the coding efficiency – i.e., it can be used to encode video content to the same level of visual quality while using about 50% fewer bits than HEVC and thus using about 75% fewer bits than H.264/AVC, today’s most widely used format. Thus it can ease the burden on worldwide networks, where video now comprises about 80% of all internet traffic. Moreover, VVC has enhanced features in its syntax for supporting an unprecedented breadth of applications, giving meaning to the word \"versatility\" used in its title. Completed in July 2020, VVC has begun to emerge in practical implementations and is undergoing testing to characterize its subjective performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133908843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}