2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献

筛选
英文 中文
FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning FaME-ML:使用机器学习的HTTP自适应流的快速多速率编码
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301850
Ekrem Çetinkaya, Hadi Amirpour, C. Timmerer, M. Ghanbari
{"title":"FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning","authors":"Ekrem Çetinkaya, Hadi Amirpour, C. Timmerer, M. Ghanbari","doi":"10.1109/VCIP49819.2020.9301850","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301850","url":null,"abstract":"HTTP Adaptive Streaming (HAS) is the most common approach for delivering video content over the Internet. The requirement to encode the same content at different quality levels (i.e., representations) in HAS is a challenging problem for content providers. Fast multirate encoding approaches try to accelerate this process by reusing information from previously encoded representations. In this paper, we propose to use convolutional neural networks (CNNs) to speed up the encoding of multiple representations with a specific focus on parallel encoding. In parallel encoding, the overall time-complexity is limited to the maximum time-complexity of one of the representations that are encoded in parallel. Therefore, instead of reducing the time-complexity for all representations, the highest time-complexities are reduced. Experimental results show that FaME-ML achieves significant time-complexity savings in parallel encoding scenarios (41% in average) with a slight increase in bitrate and quality degradation compared to the HEVC reference software.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133000090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A New Bounding Box based Pseudo Annotation Generation Method for Semantic Segmentation 一种新的基于边界框的语义分割伪标注生成方法
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301833
Xiaolong Xu, Fanman Meng, Hongliang Li, Q. Wu, King Ngi Ngan, Shuai Chen
{"title":"A New Bounding Box based Pseudo Annotation Generation Method for Semantic Segmentation","authors":"Xiaolong Xu, Fanman Meng, Hongliang Li, Q. Wu, King Ngi Ngan, Shuai Chen","doi":"10.1109/VCIP49819.2020.9301833","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301833","url":null,"abstract":"This paper proposes a fusion-based method to generate pseudo-annotations from bounding boxes for semantic segmentation. The idea is to first generate diverse foreground masks by multiple bounding box segmentation methods, and then combine these masks to generate pseudo-annotations. Existing methods generate foreground masks from bounding boxes by classical segmentation methods driving by low-level features and own local information, which is hard to generate accurate and diverse results for the fusion. Different from the traditional methods, multiple class-agnostic models are modeled to learn the objectiveness cues by using existing labeled pixel-level annotations and then to fuse. Firstly, the classical Fully Convolutional Network (FCN) that densely predicts the pixels’ labels is used. Then, two new sparse prediction based class-agnostic models are proposed, which simplify the segmentation task as sparsely predicting the boundary points through predicting the distance from the bounding box border to the object boundary in Cartesian Coordinate System and the Polar Coordinate System, respectively. Finally, a voting-based strategy is proposed to combine these segmentation results to form better pseudo-annotations. We conduct experiments on PASCAL VOC 2012 dataset. The mIoU of the proposed method is 68.7%, which outperforms the state-of-the-art method by 1.9%.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130225007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Learning From Paired and Unpaired Data: Alternately Trained CycleGAN for Near Infrared Image Colorization 从配对和非配对数据学习:交替训练的CycleGAN近红外图像着色
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301791
Zaifeng Yang, Zhenghua Chen
{"title":"Learning From Paired and Unpaired Data: Alternately Trained CycleGAN for Near Infrared Image Colorization","authors":"Zaifeng Yang, Zhenghua Chen","doi":"10.1109/VCIP49819.2020.9301791","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301791","url":null,"abstract":"This paper presents a novel near infrared (NIR) image colorization approach for the Grand Challenge held by 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). A Cycle-Consistent Generative Adversarial Network (CycleGAN) with cross-scale dense connections is developed to learn the color translation from the NIR domain to the RGB domain based on both paired and unpaired data. Due to the limited number of paired NIR-RGB images, data augmentation via cropping, scaling, contrast and mirroring operations have been adopted to increase the variations of the NIR domain. An alternating training strategy has been designed, such that CycleGAN can efficiently and alternately learn the explicit pixel-level mappings from the paired NIR-RGB data, as well as the implicit domain mappings from the unpaired ones. Based on the validation data, we have evaluated our method and compared it with conventional CycleGAN method in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and angular error (AE). The experimental results validate the proposed colorization framework.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128775734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
On Segmentation of Maxillary Sinus Membrane using Automatic Vertex Screening 基于自动顶点筛选的上颌窦膜分割
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301845
K. Li, Tai-Chiu Hsung, A. Yeung, M. Bornstein
{"title":"On Segmentation of Maxillary Sinus Membrane using Automatic Vertex Screening","authors":"K. Li, Tai-Chiu Hsung, A. Yeung, M. Bornstein","doi":"10.1109/VCIP49819.2020.9301845","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301845","url":null,"abstract":"The purpose of this study is to develop an automatic technique to segment the membrane of the maxillary sinus with morphological changes (e.g. thickened membrane and cysts) for the detection of abnormalities. The first step is to segment the sinus bone cavity in the CBCT image using fuzzy C-mean algorithm. Then, the vertices of inner bone walls of sinus in the mesh model are screened with vertex normal direction and angular based mean-distance filtering. The resulted vertices are then used to generate the bony sinus cavity mesh model by using Poisson surface reconstruction. Finally, the sinus membrane morphological changes are segmented by subtracting the air sinus segmentation from the reconstructed bony sinus cavity. The proposed method has been applied on 5 maxillary sinuses with mucosal thickening and has demonstrated that it can segment thin membrane thickening (< 2 mm) successfully within 4.1% and 3.5% error in volume and surface area respectively. Existing methods have issues of leakages at openings and thin bones, and inaccuracy with irregular contours commonly seen in maxillary sinus. The current method overcomes these shortcomings.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125449946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Chain Code-Based Occupancy Map Coding for Video-Based Point Cloud Compression 基于链码的占用地图编码在视频点云压缩中的应用
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301867
Runyu Yang, Ning Yan, Li Li, Dong Liu, Feng Wu
{"title":"Chain Code-Based Occupancy Map Coding for Video-Based Point Cloud Compression","authors":"Runyu Yang, Ning Yan, Li Li, Dong Liu, Feng Wu","doi":"10.1109/VCIP49819.2020.9301867","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301867","url":null,"abstract":"In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is directly compressed losslessly with High Efficiency Video Coding (HEVC). However, the coding tools in HEVC are specifically designed for natural images, thus unsuitable for the occupancy map. In this paper, we present a novel quadtree-based scheme for lossless occupancy map coding. In this scheme, the occupancy map is firstly divided into several coding tree units (CTUs). Then, the CTU is divided into coding units (CUs) recursively using a quadtree. The quadtree partition is terminated when one of the three conditions is satisfied. Firstly, all the pixels have the same value. Secondly, the pixels in the CU only have two kinds of values and they can be separated by a continuous edge whose endpoints lie on the side of the CU. The continuous edge is then coded using chain code. Thirdly, the CU reaches the minimum size. This scheme simplifies the design of block partitioning in HEVC and designs simpler yet more effective coding tools. Experimental results show significant reduction of bit-rate and complexity compared with the occupancy map coding scheme in V-PCC. In addition, this scheme is also very efficient to compress the semantic map.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125501290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fast Geometry Estimation for Phase-coding Structured Light Field 相位编码结构光场的快速几何估计
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301777
Li Liu, S. Xiang, Huiping Deng, Jin Wu
{"title":"Fast Geometry Estimation for Phase-coding Structured Light Field","authors":"Li Liu, S. Xiang, Huiping Deng, Jin Wu","doi":"10.1109/VCIP49819.2020.9301777","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301777","url":null,"abstract":"Estimation scene geometry is an important and fundamental task in light field processing. In conventional light field, there exist homogeneous texture surfaces, which brings ambiguity and heavy computation load in estimating the depth. In this paper, we propose phase-coding structured light field (PSLF), which projects sinusoidal waveform patterns and the phase is assigned to every pixel as the code. With the EPI of PSLF, we propose a depth estimation method. To be specific, the cost is convex with respect to the inclination angle of the candidate line in the EPI, and we propose to iterate rotating the candidate line until it converges to the optimal one. In addition, to cope with problem that the candidate samples cover multiple depth layers, we propose a method to reject the outlier samples. Experimental results demonstrate that, compared with conventional LF, the proposed PSLF improves the depth quality with mean absolute error being 0.007 pixels. In addition, the proposed optimization-based depth estimation method improves efficiency obviously with the processing speed being about 2.71 times of the tradition method.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121289868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Scale Video Inverse Tone Mapping with Deformable Alignment 具有可变形对齐的多尺度视频反色调映射
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301780
Jiaqi Zou, Ke Mei, Songlin Sun
{"title":"Multi-Scale Video Inverse Tone Mapping with Deformable Alignment","authors":"Jiaqi Zou, Ke Mei, Songlin Sun","doi":"10.1109/VCIP49819.2020.9301780","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301780","url":null,"abstract":"Inverse tone mapping(iTM) is an operation to transform low-dynamic-range (LDR) content to high-dynamic-range (HDR) content, which is an effective technique to improve the visual experience. ITM has developed rapidly with deep learning algorithms in recent years. However, the great majority of deeplearning-based iTM methods are aimed at images and ignore the temporal correlations of consecutive frames in videos. In this paper, we propose a multi-scale video iTM network with deformable alignment, which increases time consistency in videos. We first a lign t he i nput c onsecutive L DR f rames a t t he feature level by deformable convolutions and then simultaneously use multi-frame information to generate the HDR frame. Additionally, we adopt a multi-scale iTM architecture with a pyramid pooling module, which enables our network to reconstruct details as well as global features. The proposed network achieves better performance compared to other iTM methods on quantitative metrics and gain a significant visual improvement.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123023411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Disparity compensation of light fields for improved efficiency in 4D transform-based encoders 提高4D变换编码器效率的光场视差补偿
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301829
João M. Santos, Lucas A. Thomaz, P. Assunção, L. Cruz, Luis M. N. Tavora, S. Faria
{"title":"Disparity compensation of light fields for improved efficiency in 4D transform-based encoders","authors":"João M. Santos, Lucas A. Thomaz, P. Assunção, L. Cruz, Luis M. N. Tavora, S. Faria","doi":"10.1109/VCIP49819.2020.9301829","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301829","url":null,"abstract":"Efficient light field en coders take advantage of the inherent 4D data structures to achieve high compression performance. This is accomplished by exploiting the redundancy of co-located pixels in different sub-aperture images (SAIs) through prediction and/or transform schemes to find a m ore compact representation of the signal. However, in image regions with higher disparity between SAIs, such scheme’s performance tends to decrease, thus reducing the compression efficiency. This paper introduces a reversible pre-processing algorithm for disparity compensation that operates on the SAI domain of light field data. The proposed method contributes to improve the transform efficiency of the encoder, since the disparity-compensated data presents higher correlation between co-located image blocks. The experimental results show significant improvements in the compression performance of 4D light fields, achieving Bjontegaard delta rate gains of about 44% on average for MuLE codec using the 4D discrete cosine transform, when encoding High Density Camera Arrays (HDCA) light field images.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127623329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Graph Topology Representation with Attention Networks 用注意网络学习图拓扑表示
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301864
Yuanyuan Qi, Jiayue Zhang, Weiran Xu, Jun Guo, Honggang Zhang
{"title":"Learning Graph Topology Representation with Attention Networks","authors":"Yuanyuan Qi, Jiayue Zhang, Weiran Xu, Jun Guo, Honggang Zhang","doi":"10.1109/VCIP49819.2020.9301864","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301864","url":null,"abstract":"Contextualized neural language models have gained much attention in Information Retrieval (IR) with its ability to achieve better word understanding by capturing contextual structure on sentence level. However, to understand a document better, it is necessary to involve contextual structure from document level. Moreover, some words contributes more information to delivering the meaning of a document. Motivated by this, in this paper, we take the advantages of Graph Convolutional Networks (GCN) and Graph Attention Networks (GAN) to model global word-relation structure of a document with attention mechanism to improve context-aware document ranking. We propose to build a graph for a document to model the global contextual structure. The nodes and edges of the graph are constructed from contextual embeddings. We first apply graph convolution on the graph and then use attention networks to explore the influence of more informative words to obtain a new representation. This representation covers both local contextual and global structure information. The experimental results show that our method outperforms the state-of-the-art contextual language models, which demonstrate that incorporating contextual structure is useful for improving document ranking.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128101463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Versatile Video Coding (VVC) Arrives 多功能视频编码(VVC)的到来
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301847
G. Sullivan
{"title":"Versatile Video Coding (VVC) Arrives","authors":"G. Sullivan","doi":"10.1109/VCIP49819.2020.9301847","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301847","url":null,"abstract":"Seven years after the development of the first version of the High Efficiency Video Coding (HEVC) standard, the major international organizations in the world of video coding have completed the next major generation, called Versatile Video Coding (VVC). The VVC standard, formally designated as ITU-T H.266 and ISO/IEC 23090-3, promises a major improvement in video compression relative to its predecessors. It can offer roughly double the coding efficiency – i.e., it can be used to encode video content to the same level of visual quality while using about 50% fewer bits than HEVC and thus using about 75% fewer bits than H.264/AVC, today’s most widely used format. Thus it can ease the burden on worldwide networks, where video now comprises about 80% of all internet traffic. Moreover, VVC has enhanced features in its syntax for supporting an unprecedented breadth of applications, giving meaning to the word \"versatility\" used in its title. Completed in July 2020, VVC has begun to emerge in practical implementations and is undergoing testing to characterize its subjective performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133908843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信