{"title":"A Fully Automatic Approach for Fisheye Camera Calibration","authors":"Yen-Chou Tai, Yi-Yu Hsieh, Jen-Hui Chuang","doi":"10.1109/VCIP.2018.8698621","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698621","url":null,"abstract":"An automatic calibration procedure for a fisheye camera is presented in this paper by employing a flat panel monitor. The procedure does not require precise camera-monitor alignment, and any manual input of data or commands, making it useful for factory automation for mass production of such cameras. The fully automatic calibration procedure, which requires the generation of various test patterns on the display, and analysis of fisheye images of these patterns, consists of the following steps: (i) estimate the image center of the camera, (ii) identify the line on the monitor which intersects optical axis of the camera perpendicularly, and (iii) along the above line, obtain calibration data needed in de-warping the fisheye image. Experimental results demonstrate that the proposed approach performs satisfactorily in terms of effectiveness and accuracy.","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124740443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probability-Based Intra Encoder Optimization in High Efficiency Video Coding","authors":"Hongan Wei, Minghai Wang, Yiwen Xu, Yisang Liu, Tiesong Zhao","doi":"10.1109/VCIP.2018.8698730","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698730","url":null,"abstract":"The High Efficiency Video Coding (HEVC) adopts increasing number of intra prediction modes and Coding Unit (CU) partitions, in order to diminish its cost at required bitrates. However, its attendant computational complexity is unfavorable in lots of real-world applications. In this paper, we propose a probability-based strategy that benefits a lowcomplexity HEVC intra-encoder and also provides guidance to design low-complexity framework of next generation encoder. The proposed strategy comprises three steps: a candidate mode list initialization based on the estimated winning probabilities of all intra modes, an early termination of intra prediction based on the estimated probability of obtaining the best intra mode, and a pre-decision of Coding Unit (CU) split based on the estimated distributions of the Rate-Distortion (RD) costs. Comprehensive experiments have validated the effectiveness of the proposed algorithm, with a promising simulation performance under Common Test Conditions (CTC).","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125018378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Stream Federated Learning: Reduce the Communication Costs","authors":"Xin Yao, C. Huang, Lifeng Sun","doi":"10.1109/VCIP.2018.8698609","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698609","url":null,"abstract":"Federated learning algorithm solves the problem of training machine learning models over distributed networks that consist of a massive amount of modern smart devices. It overcomes the challenge of privacy preservation, unbalanced and Non-IID data distributions, and does its best to reduce the required communication rounds. However, communication costs are still the principle constraint compared to other factors, such as computation costs. In this paper, we adopt a two-stream model with MMD (Maximum Mean Discrepancy) constraint instead of the single model to be trained on devices in standard federated learning settings. Following experiments show that the proposed model outperforms baseline methods, especially in Non-IID data distributions, and achieves a reduction of more than 20% in required communication rounds.","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125421086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Motion Vector Prediction for Omnidirectional Video","authors":"R. G. Youvalari, A. Aminlou","doi":"10.1109/VCIP.2018.8698614","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698614","url":null,"abstract":"Omnidirectional video is widely used in virtual reality applications in order to create the immersive experience to the user. Such content is projected onto a 2D image plane in order to make it suitable for compression purposes by using current standard codecs. However, the resulted projected video contains deformations mainly due to the oversampling of the projection plane. These deformations are not favorable for the motion models that are used in the recent video compression standards. Hence, omnidirectional video is not efficiently compressible with the current codecs. In this work, an adaptive motion vector prediction method is proposed for efficiently coding the motion information of such content. The proposed method adaptively models the motion vectors of the coding block based on the motion information of the neighboring blocks and calculates a more optimal motion vector predictor for coding the motion information. The experimented results showed that the proposed motion vector prediction method provides up to 2.2% bitrate reduction in the content with high motion and on average 1.1% bitrate reduction for the tested sequences.","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115993052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comprehensive Samples Constrain for Person Search","authors":"Liangqi Li, Hua Yang, Lin Chen","doi":"10.1109/VCIP.2018.8698700","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698700","url":null,"abstract":"In this paper, we propose a method to further improve person search by fully utilizing the combination of pedestrian detection and person re-identification tasks. An improved constrain that utilizes comprehensive samples in the dataset is proposed to fully excavate information for recognition. Besides the label constrain for training the model in traditional classification task, unlabeled identities that do not have specific IDs are utilized as well to constitute a tailored triplet loss for more performance improvement. Meanwhile, a novel large-scale challenging dataset, SJTU318, which uses videos acquired through twelve cameras is proposed to demonstrate the effectiveness of our method. It contains 443 identities and 14,610 frames in which pedestrians are annotated with their bounding box positions and identities. Experiments conducted on a public dataset, CUHK-SYSU and our proposed dataset SJTU318 show that our method outperforms existing state-of-the-art approaches.","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128209892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Statistical-based Rate Adaptation Approach for Short Video Service","authors":"Chao Zhou, Shucheng Zhong, Yufeng Geng, Ting Yu","doi":"10.1109/VCIP.2018.8698706","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698706","url":null,"abstract":"Dynamic adaptive streaming has been recently widely adopted for providing uninterrupted video streaming services to users with dynamic network conditions and heterogeneous devices in Live and VoD (Video on Demand). However, to the best of our knowledge, no rate adaptation work has been done for the new arisen short video service, where a user generally watches many independent short videos with different contents, quality, bitrate, and length (generally about several seconds). In this work, we are the first to study the rate adaptation problem for this scenario and a Statistical-based Rate Adaptation Approach (SR2A) is proposed. In SR2A, each short video is transcoded into several versions with different bitrate. Then, when a user watches the short videos, the network conditions and player status are collected, and together with the to be requested video’s information, the best video version (bitrate or quality) will be selected and requested. Thus, the user will experience the short videos with the most suitable quality depending on the current network conditions. We have collected the network trace and user behavior data from Kuaishou1, the largest short video community in China. By the collected data set, the users’ watching behavior is analyzed, and a statistical model is designed for bandwidth prediction. Then, combined with the video information derived from the manifest, the maximal video bitrate is selected under the condition that the probability of play interruption is smaller than a predefined threshold during the whole playback process. The trace based experiments show that SR2A can greatly improve the user experience in quality and fluency of watching short videos.","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128502580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FORECAST-CLSTM: A New Convolutional LSTM Network for Cloudage Nowcasting","authors":"Chao Tan, Xin Feng, Jianwu Long, Li Geng","doi":"10.1109/VCIP.2018.8698733","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698733","url":null,"abstract":"With the highly demand of large-scale and real-time weather service for public, a refinement of short-time cloudage prediction has become an essential part of the weather forecast productions. To provide a weather-service-compliant cloudage nowcasting, in this paper, we propose a novel hierarchical Convolutional Long-Short-Term Memory network based deep learning model, which we term as FORECAST-CLSTM, with a new Forecaster loss function to predict the future satellite cloud images. The model is designed to fuse multi-scale features in the hierarchical network structure to predict the pixel value and the morphological movement of the cloudage simultaneously. We also collect about 40K infrared satellite nephograms and create a large-scale Satellite Cloudage Map Dataset(SCMD). The proposed FORECAST-CLSTM model is shown to achieve better prediction performance compared with the state-of-the-art ConvLSTM model and the proposed Forecaster Loss Function is also demonstrated to retain the uncertainty of the real atmosphere condition better than conventional loss function.","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129434452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid one-shot depth measuring for stereo-view structured light systems","authors":"S. Xiang, Huiping Deng, Jin Wu, Lei Zhu, Li Yu","doi":"10.1109/VCIP.2018.8698659","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698659","url":null,"abstract":"In this paper, we propose a hybrid scheme to measure depth values for a stereo structured light system with only a single shot. We design a dual-frequency monochromatic pattern, based on which depth values are computed in three steps. Firstly, phases and coarse depth maps are computed by following the idea of Fourier transform profilometry, where a novel phase unwrapping method is proposed. Afterward, errors in coarse depth maps are detected according to cross-view geometry consistency. Finally, spatial stereo matching is conducted to refine the detected errors. Experiments demonstrate that the proposed scheme can generate accurate stereo depth maps with only one shot, which can be used in a range of real-time applications.","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114554981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saiping Zhang, Li Li, Mengpin Qiu, Fuzheng Yang, Shuai Wan
{"title":"Stretching Schemes for Coding Frames of Panoramic Videos in Craster Parabolic Projection","authors":"Saiping Zhang, Li Li, Mengpin Qiu, Fuzheng Yang, Shuai Wan","doi":"10.1109/VCIP.2018.8698626","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698626","url":null,"abstract":"Panoramic videos are spherical in nature, which further brings great challenges to deal with them. Usually they are projected to planar domain and processed as planar perspective videos. Craster parabolic projection (CPP), as a sphere-to-plane projection format, achieves approximately uniform sampling on the sphere. Without redundant pixels, it can store and represent panoramic videos effectively. However, frames in CPP format are no longer rectangular, which further violates the off-the-shelf video coding standards. In this paper, four stretching schemes are proposed for coding frames of panoramic videos in CPP. For introducing as few pixels as possible, strips are regard as the basic units. Strips in frames in different areas are stretched into rectangles in different sizes for coding. Spherical continuity, planar continuity, nearest-neighbour interpolation and Lanczos interpolation are considered in stretching respectively. Experimental results demonstrate that, compared with strips in Equi-rectangular projection (ERP) format, the proposed schemes can achieve BD-rate reductions up to 30.68% for Y, 32.68% for U and 34.13% for V, and that different schemes are well adapted for different strips.","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124325687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Rate Control Method for Logo Insertion Video Coding in HEVC","authors":"Yunchang Li, Qi Jing, Yingfan Zhang, Jun Sun","doi":"10.1109/VCIP.2018.8698657","DOIUrl":"https://doi.org/10.1109/VCIP.2018.8698657","url":null,"abstract":"In order to improve the network transmission stability of logo inserted video, an efficient rate control method for HEVC logo insertion is proposed in this paper. Based on the acceleration algorithm, the proposed mehtod optimizes bit allocation process in region and CTU level. First, each logo inserted video frame is separated into different regions to stop pixel error propagation. Then bits for different regions are allocated according to their coding characteristics. At last, target bit of CTUs are adjusted according to the video content information. Experimental results show that the proposed rate control method can achieve 4.46% BD-Rate saving on average with only 2.26% speed loss compared with the acceleration algorithm.","PeriodicalId":270457,"journal":{"name":"2018 IEEE Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127681449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}