2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献_第3页

FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning FaME-ML:使用机器学习的HTTP自适应流的快速多速率编码

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301850

Ekrem Çetinkaya, Hadi Amirpour, C. Timmerer, M. Ghanbari

引用次数: 5

A New Bounding Box based Pseudo Annotation Generation Method for Semantic Segmentation 一种新的基于边界框的语义分割伪标注生成方法

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301833

Xiaolong Xu, Fanman Meng, Hongliang Li, Q. Wu, King Ngi Ngan, Shuai Chen

{"title":"A New Bounding Box based Pseudo Annotation Generation Method for Semantic Segmentation","authors":"Xiaolong Xu, Fanman Meng, Hongliang Li, Q. Wu, King Ngi Ngan, Shuai Chen","doi":"10.1109/VCIP49819.2020.9301833","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301833","url":null,"abstract":"This paper proposes a fusion-based method to generate pseudo-annotations from bounding boxes for semantic segmentation. The idea is to first generate diverse foreground masks by multiple bounding box segmentation methods, and then combine these masks to generate pseudo-annotations. Existing methods generate foreground masks from bounding boxes by classical segmentation methods driving by low-level features and own local information, which is hard to generate accurate and diverse results for the fusion. Different from the traditional methods, multiple class-agnostic models are modeled to learn the objectiveness cues by using existing labeled pixel-level annotations and then to fuse. Firstly, the classical Fully Convolutional Network (FCN) that densely predicts the pixels’ labels is used. Then, two new sparse prediction based class-agnostic models are proposed, which simplify the segmentation task as sparsely predicting the boundary points through predicting the distance from the bounding box border to the object boundary in Cartesian Coordinate System and the Polar Coordinate System, respectively. Finally, a voting-based strategy is proposed to combine these segmentation results to form better pseudo-annotations. We conduct experiments on PASCAL VOC 2012 dataset. The mIoU of the proposed method is 68.7%, which outperforms the state-of-the-art method by 1.9%.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130225007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Learning From Paired and Unpaired Data: Alternately Trained CycleGAN for Near Infrared Image Colorization 从配对和非配对数据学习:交替训练的CycleGAN近红外图像着色

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301791

Zaifeng Yang, Zhenghua Chen

{"title":"Learning From Paired and Unpaired Data: Alternately Trained CycleGAN for Near Infrared Image Colorization","authors":"Zaifeng Yang, Zhenghua Chen","doi":"10.1109/VCIP49819.2020.9301791","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301791","url":null,"abstract":"This paper presents a novel near infrared (NIR) image colorization approach for the Grand Challenge held by 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). A Cycle-Consistent Generative Adversarial Network (CycleGAN) with cross-scale dense connections is developed to learn the color translation from the NIR domain to the RGB domain based on both paired and unpaired data. Due to the limited number of paired NIR-RGB images, data augmentation via cropping, scaling, contrast and mirroring operations have been adopted to increase the variations of the NIR domain. An alternating training strategy has been designed, such that CycleGAN can efficiently and alternately learn the explicit pixel-level mappings from the paired NIR-RGB data, as well as the implicit domain mappings from the unpaired ones. Based on the validation data, we have evaluated our method and compared it with conventional CycleGAN method in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and angular error (AE). The experimental results validate the proposed colorization framework.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128775734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

On Segmentation of Maxillary Sinus Membrane using Automatic Vertex Screening 基于自动顶点筛选的上颌窦膜分割

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301845

K. Li, Tai-Chiu Hsung, A. Yeung, M. Bornstein

{"title":"On Segmentation of Maxillary Sinus Membrane using Automatic Vertex Screening","authors":"K. Li, Tai-Chiu Hsung, A. Yeung, M. Bornstein","doi":"10.1109/VCIP49819.2020.9301845","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301845","url":null,"abstract":"The purpose of this study is to develop an automatic technique to segment the membrane of the maxillary sinus with morphological changes (e.g. thickened membrane and cysts) for the detection of abnormalities. The first step is to segment the sinus bone cavity in the CBCT image using fuzzy C-mean algorithm. Then, the vertices of inner bone walls of sinus in the mesh model are screened with vertex normal direction and angular based mean-distance filtering. The resulted vertices are then used to generate the bony sinus cavity mesh model by using Poisson surface reconstruction. Finally, the sinus membrane morphological changes are segmented by subtracting the air sinus segmentation from the reconstructed bony sinus cavity. The proposed method has been applied on 5 maxillary sinuses with mucosal thickening and has demonstrated that it can segment thin membrane thickening (< 2 mm) successfully within 4.1% and 3.5% error in volume and surface area respectively. Existing methods have issues of leakages at openings and thin bones, and inaccuracy with irregular contours commonly seen in maxillary sinus. The current method overcomes these shortcomings.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125449946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Chain Code-Based Occupancy Map Coding for Video-Based Point Cloud Compression 基于链码的占用地图编码在视频点云压缩中的应用

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301867

Runyu Yang, Ning Yan, Li Li, Dong Liu, Feng Wu

{"title":"Chain Code-Based Occupancy Map Coding for Video-Based Point Cloud Compression","authors":"Runyu Yang, Ning Yan, Li Li, Dong Liu, Feng Wu","doi":"10.1109/VCIP49819.2020.9301867","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301867","url":null,"abstract":"In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is directly compressed losslessly with High Efficiency Video Coding (HEVC). However, the coding tools in HEVC are specifically designed for natural images, thus unsuitable for the occupancy map. In this paper, we present a novel quadtree-based scheme for lossless occupancy map coding. In this scheme, the occupancy map is firstly divided into several coding tree units (CTUs). Then, the CTU is divided into coding units (CUs) recursively using a quadtree. The quadtree partition is terminated when one of the three conditions is satisfied. Firstly, all the pixels have the same value. Secondly, the pixels in the CU only have two kinds of values and they can be separated by a continuous edge whose endpoints lie on the side of the CU. The continuous edge is then coded using chain code. Thirdly, the CU reaches the minimum size. This scheme simplifies the design of block partitioning in HEVC and designs simpler yet more effective coding tools. Experimental results show significant reduction of bit-rate and complexity compared with the occupancy map coding scheme in V-PCC. In addition, this scheme is also very efficient to compress the semantic map.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125501290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Fast Geometry Estimation for Phase-coding Structured Light Field 相位编码结构光场的快速几何估计

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301777

Li Liu, S. Xiang, Huiping Deng, Jin Wu

{"title":"Fast Geometry Estimation for Phase-coding Structured Light Field","authors":"Li Liu, S. Xiang, Huiping Deng, Jin Wu","doi":"10.1109/VCIP49819.2020.9301777","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301777","url":null,"abstract":"Estimation scene geometry is an important and fundamental task in light field processing. In conventional light field, there exist homogeneous texture surfaces, which brings ambiguity and heavy computation load in estimating the depth. In this paper, we propose phase-coding structured light field (PSLF), which projects sinusoidal waveform patterns and the phase is assigned to every pixel as the code. With the EPI of PSLF, we propose a depth estimation method. To be specific, the cost is convex with respect to the inclination angle of the candidate line in the EPI, and we propose to iterate rotating the candidate line until it converges to the optimal one. In addition, to cope with problem that the candidate samples cover multiple depth layers, we propose a method to reject the outlier samples. Experimental results demonstrate that, compared with conventional LF, the proposed PSLF improves the depth quality with mean absolute error being 0.007 pixels. In addition, the proposed optimization-based depth estimation method improves efficiency obviously with the processing speed being about 2.71 times of the tradition method.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121289868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multi-Scale Video Inverse Tone Mapping with Deformable Alignment 具有可变形对齐的多尺度视频反色调映射

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301780

Jiaqi Zou, Ke Mei, Songlin Sun

引用次数: 3

Disparity compensation of light fields for improved efficiency in 4D transform-based encoders 提高4D变换编码器效率的光场视差补偿

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301829

João M. Santos, Lucas A. Thomaz, P. Assunção, L. Cruz, Luis M. N. Tavora, S. Faria

{"title":"Disparity compensation of light fields for improved efficiency in 4D transform-based encoders","authors":"João M. Santos, Lucas A. Thomaz, P. Assunção, L. Cruz, Luis M. N. Tavora, S. Faria","doi":"10.1109/VCIP49819.2020.9301829","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301829","url":null,"abstract":"Efficient light field en coders take advantage of the inherent 4D data structures to achieve high compression performance. This is accomplished by exploiting the redundancy of co-located pixels in different sub-aperture images (SAIs) through prediction and/or transform schemes to find a m ore compact representation of the signal. However, in image regions with higher disparity between SAIs, such scheme’s performance tends to decrease, thus reducing the compression efficiency. This paper introduces a reversible pre-processing algorithm for disparity compensation that operates on the SAI domain of light field data. The proposed method contributes to improve the transform efficiency of the encoder, since the disparity-compensated data presents higher correlation between co-located image blocks. The experimental results show significant improvements in the compression performance of 4D light fields, achieving Bjontegaard delta rate gains of about 44% on average for MuLE codec using the 4D discrete cosine transform, when encoding High Density Camera Arrays (HDCA) light field images.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127623329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning Graph Topology Representation with Attention Networks 用注意网络学习图拓扑表示

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301864

Yuanyuan Qi, Jiayue Zhang, Weiran Xu, Jun Guo, Honggang Zhang

{"title":"Learning Graph Topology Representation with Attention Networks","authors":"Yuanyuan Qi, Jiayue Zhang, Weiran Xu, Jun Guo, Honggang Zhang","doi":"10.1109/VCIP49819.2020.9301864","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301864","url":null,"abstract":"Contextualized neural language models have gained much attention in Information Retrieval (IR) with its ability to achieve better word understanding by capturing contextual structure on sentence level. However, to understand a document better, it is necessary to involve contextual structure from document level. Moreover, some words contributes more information to delivering the meaning of a document. Motivated by this, in this paper, we take the advantages of Graph Convolutional Networks (GCN) and Graph Attention Networks (GAN) to model global word-relation structure of a document with attention mechanism to improve context-aware document ranking. We propose to build a graph for a document to model the global contextual structure. The nodes and edges of the graph are constructed from contextual embeddings. We first apply graph convolution on the graph and then use attention networks to explore the influence of more informative words to obtain a new representation. This representation covers both local contextual and global structure information. The experimental results show that our method outperforms the state-of-the-art contextual language models, which demonstrate that incorporating contextual structure is useful for improving document ranking.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128101463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Versatile Video Coding (VVC) Arrives 多功能视频编码(VVC)的到来

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301847

G. Sullivan

引用次数: 12