2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献

筛选
英文 中文
Learning to encode user-generated short videos with lower bitrate and the same perceptual quality 学习以较低的比特率和相同的感知质量对用户生成的短视频进行编码
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301835
Shengbin Meng, Yang Li, Yiting Liao, Junlin Li, Shiqi Wang
{"title":"Learning to encode user-generated short videos with lower bitrate and the same perceptual quality","authors":"Shengbin Meng, Yang Li, Yiting Liao, Junlin Li, Shiqi Wang","doi":"10.1109/VCIP49819.2020.9301835","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301835","url":null,"abstract":"On a platform of user-generated content (UGC), the uploaded videos need to be encoded again before distribution. For this specific encoding scenario, we propose a novel dataset and a corresponding learning-based scheme that is able to achieve significant bitrate saving without decreasing perceptual quality. In the dataset, each video’s label indicates whether it can be encoded with a much lower bitrate while still keeps the same perceptual quality. Models trained on this dataset can then be used to classify the input video and adjust its final encoding parameters accordingly. With enough classification accuracy, more than 20% average bitrate saving can be obtained through the proposed scheme. The dataset will be further expanded to facilitate the study on this problem.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129182762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic Sheep Counting by Multi-object Tracking 基于多目标跟踪的自动数羊方法
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301868
Jingsong Xu, Litao Yu, Jian Zhang, Qiang Wu
{"title":"Automatic Sheep Counting by Multi-object Tracking","authors":"Jingsong Xu, Litao Yu, Jian Zhang, Qiang Wu","doi":"10.1109/VCIP49819.2020.9301868","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301868","url":null,"abstract":"Animal counting is a highly skilled yet tedious task in livestock transportation and trading. To effectively free up the human labour and provide accurate counts for sheep loading/unloading, we develop an auto sheep counting system based on multi-object detection, tracking and extrapolation techniques. Our system has demonstrated more than 99.9% accuracy with sheep moving freely in a race under optimal visual conditions.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120994298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Content-aware Hybrid Equi-angular Cubemap Projection for Omnidirectional Video Coding 面向全向视频编码的内容感知混合等角立方体映射投影
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301893
Jinyong Pi, Yun Zhang, Linwei Zhu, Xinju Wu, Xuemei Zhou
{"title":"Content-aware Hybrid Equi-angular Cubemap Projection for Omnidirectional Video Coding","authors":"Jinyong Pi, Yun Zhang, Linwei Zhu, Xinju Wu, Xuemei Zhou","doi":"10.1109/VCIP49819.2020.9301893","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301893","url":null,"abstract":"Omnidirectional video is required to be projected from the Three-Dimensional (3D) sphere to a Two-Dimensional (2D) plane before compression due to its spherical characteristics. Therefore, various projection formats have been proposed in recent years. However, these existing projection methods have problems of either oversampling or discontinuous boundary, which penalize the coding performance. Among them, Hybrid Equiangular Cubemap (HEC) projection has achieved significant coding gains by keeping boundary continuity when compared with Equi-Angular Cubemap (EAC) projection. However, the parameters of its mapping function are fixed and cannot adapt to the video contents, which results in non-uniform sampling in certain regions. To address this limitation, a projection method named Content-aware HEC (CHEC) is presented in this paper. In particular, these parameters of mapping function are adaptively achieved by minimizing the projection conversion distortion. Additionally, an omnidirectional video coding framework with adaptive parameters of mapping function is proposed to effectively improve the coding performance. Experimental results show that the proposed scheme achieves 8.57% and 0.11% bit rate reduction on average in terms of End-to-End Weighted to Spherically uniform Peak Signal to Noise Ratio (E2E WS-PSNR) when compared with Equi-Rectangular Projection (ERP) and HEC projections, respectively.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122842491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
HDR Deghosting Using Motion-Registration-Free Fusion in the Luminance Gradient Domain 在亮度梯度域中使用无运动配准融合的HDR去重影
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301844
Cheng-Yeh Liou, Cheng-Yen Chuang, Chia-Han Huang, Yi-Chang Lu
{"title":"HDR Deghosting Using Motion-Registration-Free Fusion in the Luminance Gradient Domain","authors":"Cheng-Yeh Liou, Cheng-Yen Chuang, Chia-Han Huang, Yi-Chang Lu","doi":"10.1109/VCIP49819.2020.9301844","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301844","url":null,"abstract":"For most of the existing high dynamic range (HDR) deghosting flows, they require a time-consuming motion registration step to generate ghost-free HDR results. Since the motion registration step usually becomes the bottleneck of the entire flow, in this paper, we propose a novel H DR deghosting flow which does not require any motion registration process. By taking channel properties into account, the luminance and chrominance channels are fused differently in the proposed flow. Our motion-registration-free fusion could generate high-quality HDR results swiftly even if the original Low Dynamic Range (LDR) images contain objects with large foreground motions.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122950048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-to-Image Generation via Semi-Supervised Training 通过半监督训练生成文本到图像
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301888
Zhongyi Ji, Wenmin Wang, Baoyang Chen, Xiao Han
{"title":"Text-to-Image Generation via Semi-Supervised Training","authors":"Zhongyi Ji, Wenmin Wang, Baoyang Chen, Xiao Han","doi":"10.1109/VCIP49819.2020.9301888","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301888","url":null,"abstract":"Synthesizing images from text is an important problem and has various applications. Most of the existing studies of text-to-image generation utilize supervised methods and rely on a fully-labeled dataset, but detailed and accurate descriptions of images are onerous to obtain. In this paper, we introduce a simple but effective semi-supervised approach that considers the feature of unlabeled images as \"Pseudo Text Feature\". Therefore, the unlabeled data can participate in the following training process. To achieve this, we design a Modality-invariant Semantic- consistent Module which aims to make the image feature and the text feature indistinguishable and maintain their semantic information. Extensive qualitative and quantitative experiments on MNIST and Oxford-102 flower datasets demonstrate the effectiveness of our semi-supervised method in comparison to supervised ones. We also show that the proposed method can be easily plugged into other visual generation models such as image translation and performs well.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115469204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Empirical Study of Emotion Recognition from Thermal Video Based on Deep Neural Networks 基于深度神经网络的热视频情绪识别实证研究
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301883
Herman Prawiro, Tse-Yu Pan, Min-Chun Hu
{"title":"An Empirical Study of Emotion Recognition from Thermal Video Based on Deep Neural Networks","authors":"Herman Prawiro, Tse-Yu Pan, Min-Chun Hu","doi":"10.1109/VCIP49819.2020.9301883","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301883","url":null,"abstract":"Emotion recognition is a crucial problem in affective computing. Most of previous works utilized facial expression from visible spectrum data to solve emotion recognition task. Thermal videos provide temperature measurement of human body over time, which can be used to recognize affective states by learning its temporal pattern. In this paper, we conduct comparative experiments to study the effectiveness of the existing deep neural networks when applied to emotion recognition task from thermal video. We analyze the effect of various approaches for frame sampling in video, temporal aggregation between frames, and different convolutional neural network architectures. To the best of our knowledge, we are the first w ork t o c onduct s tudy on emotion recognition from thermal video based on deep neural networks. Our work can provide preliminary study to design new methods for emotion recognition in thermal domain.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132319944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D-CNN Autoencoder for Plenoptic Image Compression 3D-CNN自编码器的全光学图像压缩
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301793
Tingting Zhong, Xin Jin, Kedeng Tong
{"title":"3D-CNN Autoencoder for Plenoptic Image Compression","authors":"Tingting Zhong, Xin Jin, Kedeng Tong","doi":"10.1109/VCIP49819.2020.9301793","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301793","url":null,"abstract":"Recently, plenoptic image has attracted great attentions because of its applications in various scenarios. However, high resolution and special pixel distribution structure bring huge challenges to its storage and transmission. In order to adapt compression to the structural characteristic of plenoptic image, in this paper, we propose a Data Structure Adaptive 3D-convolutional(DSA-3D) autoencoder. The DSA-3D autoencoder enables up-sampling and down-samping the sub-aperture sequence along the angular resolution or spatial resolution, thereby avoiding the artifacts caused by directly compressing plenoptic image and achieving better compression efficiency. In addition, we propose a special and efficient Square rearrangement to generate sub-aperture sequence. We compare Square with Zigzag sub-aperture sequence rearrangements, and analyzed the compression efficiency of block image compression and whole image compression. Compared with traditional hybrid encoders HEVC, JPEG2000 and JPEG PLENO(WaSP), the proposed DSA-3D(Square) autoencoder achieves a superior performance in terms of PSNR metrics.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126754714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Theory of Occlusion for Improving Rendering Quality of Views 一种提高视图渲染质量的遮挡理论
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301887
Yijun Zeng, Weiyan Chen, Mengqin Bai, Yangdong Zeng, Changjian Zhu
{"title":"A Theory of Occlusion for Improving Rendering Quality of Views","authors":"Yijun Zeng, Weiyan Chen, Mengqin Bai, Yangdong Zeng, Changjian Zhu","doi":"10.1109/VCIP49819.2020.9301887","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301887","url":null,"abstract":"Occlusion lack compensation (OLC) is a multiplexing gain optimization data acquisition and novel views rendering strategy for light field rendering (LFR). While the achieved OLC is much higher than previously thought possible, the improvement comes at the cost of requiring more scene information. This can capture more detailed scene information, including geometric information, texture information and depth information, by learning and training methods. In this paper, we develop an occlusion compensation (OCC) model based on restricted boltzmann machine (RBM) to compensate for lack scene information caused by occlusion. We show that occlusion will cause the lack of captured scene information, which will lead to the decline of view rendering quality. The OCC model can estimate and compensate the lack information of occlusion edge by learning. We present experimental results to demonstrate the performance of OCC model with analog training, verify our theoretical analysis, and extend our conclusions on optimal rendering quality of light field.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114179823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSCNet: A Shallow Single Column Network for Crowd Counting CSCNet:用于人群计数的浅单列网络
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301855
Zhida Zhou, Li Su, Guorong Li, Yifan Yang, Qingming Huang
{"title":"CSCNet: A Shallow Single Column Network for Crowd Counting","authors":"Zhida Zhou, Li Su, Guorong Li, Yifan Yang, Qingming Huang","doi":"10.1109/VCIP49819.2020.9301855","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301855","url":null,"abstract":"Crowd counting in complex scene is an important but challenge task. The scale variation of crowd makes the shallow network hard to extract effective features. In this paper, we propose a shallow single column network named CSCNet for crowd counting. The key component is complementary scale context block (CSCB). It is designed to capture complementary scale context and obtains a high accuracy with limited depth of the network. As far as we know, CSCNet is the shallowest single column network in existing works. We demonstrate our methods on three challenge benchmarks. Compared to state-of-the-art methods, CSCNet achieves comparable accuracy with much less complexity. CSCNet provides an alternative to achieve comparable or even better performance with about 30% of depth and 50% of width decrease. Besides, CSCNet performs more stably on both sparse and congested crowd scenes.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115218621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Geometric-visual descriptor for improved image based localization 改进的基于图像定位的几何视觉描述符
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301831
Achref Ouni, E. Royer, Marc Chevaldonné, M. Dhome
{"title":"Geometric-visual descriptor for improved image based localization","authors":"Achref Ouni, E. Royer, Marc Chevaldonné, M. Dhome","doi":"10.1109/VCIP49819.2020.9301831","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301831","url":null,"abstract":"This paper addresses the problem of image based localization. The goal is to find quickly and accurately the relative pose from a query taken from a stereo camera and a map obtained using visual SLAM which contains poses and 3D points associated to descriptors. In this paper we introduce a new method that leverages the stereo vision by adding geometric information to visual descriptors. This method can be used when the vertical direction of the camera is known (for example on a wheeled robot). This new geometric visual descriptor can be used with several image based localization algorithms based on visual words. We test the approach with different datasets (indoor, outdoor) and we show experimentally that the new geometric-visual descriptor improves standard image based localization approaches.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115349441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信