Ioannis Mademlis, V. Mygdalis, N. Nikolaidis, I. Pitas
{"title":"Challenges in Autonomous UAV Cinematography: An Overview","authors":"Ioannis Mademlis, V. Mygdalis, N. Nikolaidis, I. Pitas","doi":"10.1109/ICME.2018.8486586","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486586","url":null,"abstract":"Autonomous UAV cinematography is an active research field with exciting potential for the media industry. It bears the promise of greatly facilitating UAV shooting for various applications, while significantly reducing the costs compared to manual shooting. However, the general problem has not been clearly defined and the challenges arising from current legislation and technology restrictions have not been fully charted. A complete overview of issues related to autonomous UAV cinematography is needed, pertaining to the current situation in the field, so as to guide immediate-future research. The purpose of this paper is to lay exactly this groundwork, with the expectation of providing a global perspective to multiple domain-specific research communities. The outlined issues are partitioned into challenges deriving from ethical/legal/safety considerations and from operational/production requirements. A brief survey of current technological solutions, including their limitations, is also provided for each issue.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121500609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongxin Ge, Xinqian Gu, Min Chen, Hongxing Wang, Dan Yang
{"title":"Deep Multi-Metric Learning for Person Re-Identification","authors":"Yongxin Ge, Xinqian Gu, Min Chen, Hongxing Wang, Dan Yang","doi":"10.1109/ICME.2018.8486502","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486502","url":null,"abstract":"In this paper, to exploit more discriminative information of the global-body and body-parts features, we present a novel deep multi-metric learning (DMML) network for person re-identification under the triplet framework. The main novelty of our learning framework lies in two aspects: 1) Unlike most existing metric learning-based approaches, which learn only one distance metric for comparison, our DMM-L method aims to learn different metrics for the global-body and body-parts features respectively by using convolutional neural network (CNN); 2) A new multi-metric loss function is proposed to train the DMML network, under which the distance of each negative pair is greater than that of each positive pair by a predefined margin, and the correlations of different metrics are maximized. Compared with the previous person re-identification methods that have shown state-of-the-art performances, our DMML approach can achieve competitive results on the challenging CUHK03, CUHKOl, VIPeR and iLIDS datasets.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122545878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feed-Net: Fully End-to-End Dehazing","authors":"S. Zhang, Wenqi Ren, Jian Yao","doi":"10.1109/ICME.2018.8486435","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486435","url":null,"abstract":"This paper proposes an image dehazing model built with a fully convolutional neural network (CNN), called Fully End-to-End Dehazing Network (FEED-Net). In contrast to estimate the transmission map and the atmospheric light separately as most previous deep learning methods, FEED-Net recovers the hazy-free image directly from a hazy image via a light-weight CNN. In addition, we introduce contextual information into dehazing via dilated convolution and use dense skip connection for feature fusion, which makes end-to-end dehazing possible. Experimental results show our method outperforms the state-of-the-art algorithms on both synthetic dataset and real-world images.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128314799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two Pass Rate Control for Consistent Quality Based on Down-Sampling Video in HEVC","authors":"Yu-Yao Shen, Chih-Hung Kuo","doi":"10.1109/ICME.2018.8486544","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486544","url":null,"abstract":"Rate control plays an important role in video coding and streaming applications with bandwidth constraints. While most researches are proposed to improve the coding efficiency’ the fluctuation of video quality is seldom considered. Many rate control schemes suffer from unreliable initialization of coding parameters, which leads to seriously inconsistent quality at the beginning of a video. Besides, the hierarchical structure for frame references introduces more quality fluctuations, although it improves the coding efficiency significantly. This paper presents a two pass rate control method that aims for a consistent visual quality. The video is downsampled by four times, and then encoded for the first pass. A fixed Lagrange multiplier $(lambda)$ is derived from the information recorded in the first pass, and then applied for all frames in the second coding pass. A QP adjustment policy is adopted to maintain a consistent quality and a constant bitrate. Experimental results show that the proposed rate control method can reduce the fluctuation of video quality to be averagely 94.63% less than that encoded by the HEVC Test Model (HM16.9).","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"60 30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125782013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijun Zhang, Lin Zhang, Xiao Liu, Ying Shen, Dongqing Wang
{"title":"Image Exposure Assessment: A Benchmark and a Deep Convolutional Neural Networks Based Model","authors":"Lijun Zhang, Lin Zhang, Xiao Liu, Ying Shen, Dongqing Wang","doi":"10.1109/ICME.2018.8486569","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486569","url":null,"abstract":"In the camera equipment manufacturing industry, the exposure calibration is one of the basic steps for manufacturers to consider before launching their products to the market. To this end, a method that can objectively and automatically assess the exposure levels of images taken by the camera is highly desired. However, few studies have been conducted in this area. In this paper, we attempt to solve this issue to some extent and our contributions are twofold. Firstly, in order to facilitate the study of image exposure assessment, an Image Exposure Database $(IE_{ps}D)$ is established. In this database, there are 15, 582 images with various exposure levels, and for each image there is an associated subjective exposure score which could reflect its perceptual exposure level. Secondly, we propose a novel highly accurate DCNN-based model, namely $IE_{ps}M$ (Image Exposure Metric), to predict the exposure level of a given image.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125962371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Liu, Shengcai Liao, W. Hu, Xuezhi Liang, Yan Zhang
{"title":"Improving Tiny Vehicle Detection in Complex Scenes","authors":"W. Liu, Shengcai Liao, W. Hu, Xuezhi Liang, Yan Zhang","doi":"10.1109/ICME.2018.8486507","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486507","url":null,"abstract":"Vehicle detection is still a challenge in complex traffic scenes, especially for vehicles of tiny scales. Though RCNN based two-stage detectors have demonstrated considerably good performance, less attention has been paid to the quality of the first stage, where, however, tiny vehicles are very likely to be missed. In this paper, we propose a deep network for accurate vehicle detection, with the main idea of using a relatively large feature map for proposal generation, and keeping ROI feature's spatial layout to represent and detect tiny vehicles. However, large feature maps in lower levels of a deep network generally contain limited discriminant information. To address this, we introduce a backward feature enhancement operation, which absorbs higher level information step by step to enhance the base feature map. By doing so, even with only 100 proposals, the resulting proposal network achieves an encouraging recall over 99%. Furthermore, unlike a common practice which flatten features after ROI pooling, we argue that for a better detection of tiny vehicles, the spatial layout of the ROI features should be preserved and fully integrated. Accordingly, we use a multi-path light-weight processing chain to effectively integrate ROI features, while preserving the spatial layouts. Experiments done on the challenging DETRAC vehicle detection benchmark show that the proposed method largely improves a competitive baseline (ResNet50 based Faster RCNN) by 16.5% mAP, and it outperforms all previously published and unpublished results.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126284294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dense Reconstruction from Monocular Slam with Fusion of Sparse Map-Points and Cnn-Inferred Depth","authors":"Xiang Ji, Xinchen Ye, Hongcan Xu, Haojie Li","doi":"10.1109/ICME.2018.8486548","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486548","url":null,"abstract":"Real-time monocular visual SLAM approaches relying on building sparse correspondences between two or multiple views of the scene, are capable of accurately tracking camera pose and inferring structure of the environment. However, these methods have the common problem, i.e., the reconstructed 3D map is extremely sparse. Recently, convolutional neural network (CNN) is widely used for estimating scene depth from monocular color images. As we observe, sparse map-points generated from epipolar geometry are locally accurate, while CNN-inferred depth map contains high-level global context but generates blurry depth boundaries. Therefore, we propose a depth fusion framework to yield a dense monocular reconstruction that fully exploits the sparse depth samples and the CNN-inferred depth. Color key-frames are employed to guide the depth reconstruction process, avoiding smoothing over depth boundaries. Experimental results on benchmark datasets show the robustness and accuracy of our method.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127424383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soft Clustering Guided Image Smoothing","authors":"Liangkai Li, Xiaojie Guo, Wei Feng, Jiawan Zhang","doi":"10.1109/ICME.2018.8486448","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486448","url":null,"abstract":"Image smoothing, which aims to remove unwanted textures and preserve desired structures, plays an important role in many multimedia and computer vision tasks. The key to image smoothing, despite different applications, is to distinguish the structures from the textures. This paper presents a novel image smoothing method, following the principle that, for a certain pixel, its neighbors in both space and intensity should contribute more on smoothing, while the distant ones be insulated for avoiding over-smoothing. Intuitively, clustering is a good candidate to achieve the goal. However, due to rich textures and clutters within images, simply performing the clustering on the input likely obtains inaccurate results, and thus leads to unsatisfied smoothing results. In addition, for our task, using traditional hard clustering techniques is at high risk of generating staircase artifacts. For addressing these issues, an algorithm is customized, which on the one hand adopts the soft clustering to more faithfully assign pixels, on the other hand iterates the soft clustering and smoothing, expecting to improve each other. Experiments on several challenging images are provided to show the efficacy of our method, and its superiority over other prevailing approaches.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127495067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco Javier Velárquez-García, P. Halvorsen, H. Stensland, F. Eliassen
{"title":"Dynamic Adaptation of Multimedia Presentations for Videoconferencing in Application Mobility","authors":"Francisco Javier Velárquez-García, P. Halvorsen, H. Stensland, F. Eliassen","doi":"10.1109/ICME.2018.8486565","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486565","url":null,"abstract":"Application mobility is the paradigm where users can move their running applications to heterogeneous devices in a seamless manner. This mobility involves dynamic context changes of hardware, network resources, user environment, and user preferences. In order to continue multimedia processing under these context changes, applications need to adapt not only the collection of media streams, i.e., multimedia presentation, but also their internal configuration to work on different hardware. We present the performance analysis to adapt a videoconferencing prototype application in a proposed adaptation control loop to autonomously adapt multimedia pipelines. Results show that the time spent to create an adaptation plan and execute it is in the order of hundreds of milliseconds. The reconfiguration of pipelines, compared to building them from scratch, is approximately 1000 times faster when re-utilizing already instantiated hardware-dependent components. Therefore, we conclude that the adaptation of multimedia pipelines is a feasible approach for multimedia applications that adhere to application mobility.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114756223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoqian Wang, Wangpeng An, Xingzheng Wang, Lu Fang, Jiahui Yuan
{"title":"Magnify-Net for Multi-Person 2D Pose Estimation","authors":"Haoqian Wang, Wangpeng An, Xingzheng Wang, Lu Fang, Jiahui Yuan","doi":"10.1109/ICME.2018.8486591","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486591","url":null,"abstract":"We propose a novel method for multi-person 2D pose estimation. Our model zooms in the image gradually, which we refer to as the Magnify-Net, to solve the bottleneck problem of mean average precision (mAP) versus pixel error. Moreover, we squeeze the network efficiently by an inspired design that increases the mAP while saving the processing time. It is a simple, yet robust, bottom-up approach consisting of one stage. The architecture is designed to detect the part position and their association jointly via two branches of the same sequential prediction process, resulting in a remarkable performance and efficiency rise. Our method outcompetes the previous state-of-the-art results on the challenging COCO key-points task and MPII Multi-Person Dataset.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116894309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}