Fanyi Duanmu, Xin Feng, Xiaoqing Zhu, Wai-tian Tan, Yao Wang
{"title":"A Multi-View Pedestrian Tracking Framework Based on Graph Matching","authors":"Fanyi Duanmu, Xin Feng, Xiaoqing Zhu, Wai-tian Tan, Yao Wang","doi":"10.1109/MIPR.2018.00072","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00072","url":null,"abstract":"In the applications of video monitoring over large public or private spaces, multiple cameras are required to cover the entire space and resolve the problems of occlusion, object intersection and so on. In this work, a novel multi-view pedestrian tracking framework is proposed to simultaneously detect and associate human objects across views using graph matching techniques to fully exploit the object features and the spatial/temporal relationships among the objects. Experimental results are provided to demonstrate the accuracy of our proposed framework.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129627484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Buffer Management for Synchronous and Low-Latency Playback of Multi-Stream User-Generated Content","authors":"Emmanouil Potetsianakis, J. L. Feuvre","doi":"10.1109/MIPR.2018.00022","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00022","url":null,"abstract":"User-Generated Content (UGC) platforms, that distribute multimedia streams recorded from users, have become increasingly popular. However, several challenges occur when the content is consumed in a synchronous manner, and because of the lack of control in the production end, they must be resolved on the client-side. Especially in live scenarios, that the content is consumed as it is produced, low-latency distribution is not straightforward. Also, the diversity of available sources and possible produced streams (both in term of number and modalities) amplify the distribution issues. In this work, we are addressing the aforementioned challenges of UGC platforms. By prioritizing seamless playback (i.e. eliminating buffering events), and using as basis existing solutions, we provide a client-side buffer management scheme, based on signaling Delay information, for synchronous and low-latency playback of multi-stream multi-media content. Our model supports large delays and out-of-order frame arrival, that might occur when dealing with diverse sources, modalities and qualities.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123429720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haohong Wang, Yaoyuan Fu, Yang Li, G. Ning, Zhihai He, Mengwen Liu
{"title":"A New TV World for Kids - When ZUI Meets Deep Learning","authors":"Haohong Wang, Yaoyuan Fu, Yang Li, G. Ning, Zhihai He, Mengwen Liu","doi":"10.1109/MIPR.2018.00029","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00029","url":null,"abstract":"In this work, we propose a novel application of Zoomable User Interface (ZUI) for TV world. With the utilization of latest advances of deep learning, visual tags of video titles can be semi-automatically extracted. By creating a cascaded visual tag tree structure, the ZUI representation problem is converted into an optimization problem of choosing tags to achieve the least user interactions. To the best of our knowledge, this is the world’s first effort in adopting ZUI on TV display with an optimized solution to achieve the best user experiences. Experimental results indicate that this invention is favored by kids, due to its nature of visual richness, intuitive, and user friendly.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127718451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"COME for No-Reference Video Quality Assessment","authors":"Chunfeng Wang, Li Su, W. Zhang","doi":"10.1109/MIPR.2018.00056","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00056","url":null,"abstract":"Nowadays, the issue of objective Video Quality Assessment (VQA) has been extensively studied. In this paper, we present an effective general-purpose VQA method named COnvolutional neural network and Multi-regression based Evaluation (COME). It requires no referred lossless video and is universal for non-specific types of distortion. A modified 2D convolutional neural network is introduced to learn the spatial features at frame level. At the same time, the motion information is extracted as temporal features at sequence level. And a multi-regression model is proposed to comprehensively assess the final video quality according to human’s psychological perception. The proposed method is tested on two commonly used databases with numerous kinds of distortions. The experimental results show that the proposed COME method is comparable with most popular full-reference VQA methods.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131551301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Mohan, Ahmed S. Kaseb, Kent W. Gauen, Yung-Hsiang Lu, A. Reibman, T. Hacker
{"title":"Determining the Necessary Frame Rate of Video Data for Object Tracking under Accuracy Constraints","authors":"A. Mohan, Ahmed S. Kaseb, Kent W. Gauen, Yung-Hsiang Lu, A. Reibman, T. Hacker","doi":"10.1109/MIPR.2018.00081","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00081","url":null,"abstract":"Network cameras, a type of surveillance cameras, generate real-time, versatile, and high quality video content that can be used for applications such as public safety and surveillance. Analyzing high frame rate video streams im- poses heavy computing needs and significant loads to the network. High frame rates may not be essential for meeting the accuracy requirements of the analyses. For example, high frame rates may not be required to track cars inside a garage compared with cars on a highway. In this paper, we study object tracking and propose a method to automatically determine the necessary frame rate for videos in network cameras for object tracking and adapt to run- time conditions. We demonstrate that the frame rates can be reduced up to 80% based on accuracy constraints.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128705930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Compressed Domain Copy Detection with Motion Vector Imaging","authors":"Yuanyuan Yang, Yixiong Zou, Yemin Shi, Qingsheng Yuan, Yaowei Wang, Yonghong Tian","doi":"10.1109/MIPR.2018.00086","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00086","url":null,"abstract":"With an increasing number of videos uploaded to the Internet, how to fast detect copy videos in compressed domain has been paid greater attention to. Many researchers have tried using information in motion vector to be the feature. However, in these methods motion vectors are used as histogram, which lacks structural information in detail. To address this problem, in this paper we propose a new way of using Motion Vector Imaging. We first extract motion vector from a compressed video, and then project them onto a canvas to generate a MVI which contains detail motion information. Based on these MVIs, a siamese deep neural network is utilized to train on pairs from dataset and one side of the network is applied to extract features. Finally, a cascade system using MVI model and I frames is used to do fast copy detection. Results on public dataset CC_WEB_VIDEO show that MVI can achieve high recall rate and precision rate at a high speed.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115414434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Facial Differences in European Countries Boundary by Fine-Tuned Neural Networks","authors":"Viet-Duy Nguyen, Minh Tran, Jiebo Luo","doi":"10.1109/MIPR.2018.00062","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00062","url":null,"abstract":"Travel Agents and retailers are always curious about where their customers come from, as this would help them increase their sale and optimize their marketing models. In this study, we build a system to predict where people come from in Europe by analyzing their faces. The countries that have been chosen for the study are Russia, Italy, Germany, Spain, and France. In the first stage of the study, we implement different neural network classifiers on the dataset of people's faces that we collected from Twitter. The highest accuracy achieved is 53.2%, while human accuracy is only 26.96%. In the second stage of the study, we analyze 11 different facial features that might differentiate people in those five countries. The study lays the groundwork for future work to find out differences/similarities of people's faces around the world regardless of their current geographic location.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116421987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Khodabakhsh, C. Busch, Ramachandra Raghavendra
{"title":"A Taxonomy of Audiovisual Fake Multimedia Content Creation Technology","authors":"Ali Khodabakhsh, C. Busch, Ramachandra Raghavendra","doi":"10.1109/MIPR.2018.00082","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00082","url":null,"abstract":"The spread of fake and misleading multimedia content on social media has become commonplace and is effecting society and its decision procedures negatively in many ways. One special case of exploiting fake content is where the deceiver uses the credibility of a trustworthy source as the means of spreading disinformation. Thanks to advancements in technology, the creation of such content is becoming possible in audiovisual form with limited technical knowledge and at low cost. The potential harm of circulation of these content in media calls for the development of automated detection methods. This paper offers a categorization of such fake content creation technology in an attempt to facilitate further study on generalized countermeasures for their detection.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116599399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nelson Ruwa, Qi-rong Mao, Liangjun Wang, Ming Dong
{"title":"Affective Visual Question Answering Network","authors":"Nelson Ruwa, Qi-rong Mao, Liangjun Wang, Ming Dong","doi":"10.1109/MIPR.2018.00038","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00038","url":null,"abstract":"Visual Question Answering (VQA) has recently attracted considerable attention from researchers in the trending field of deep learning. The need to improve VQA models by focusing on local regions of images, has resulted in the development of various attention models. This paper proposes the Affective Visual Question Answering Network (AVQAN), an attention model that combines the locality of the image features, the question and the mood detected from the specific region of the image to produce an affective answer using a preprocessed image dataset. The experimental results depict that AVQAN enriches the analysis and understanding of images by adding affective information to the answer, while still managing to maintain the accuracy levels within the range of recent ordinary VQA baseline models. The proposed model significantly contributes towards the development of rapidly improving emotion-aware machines that are becoming increasingly vital in everyday life.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132588034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed S. Kaseb, Bo Fu, A. Mohan, Yung-Hsiang Lu, A. Reibman, G. Thiruvathukal
{"title":"Analyzing Real-Time Multimedia Content from Network Cameras Using CPUs and GPUs in the Cloud","authors":"Ahmed S. Kaseb, Bo Fu, A. Mohan, Yung-Hsiang Lu, A. Reibman, G. Thiruvathukal","doi":"10.1109/MIPR.2018.00020","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00020","url":null,"abstract":"Millions of network cameras are streaming real-time multimedia content (images or videos) for various environments (e.g., highways and malls) and can be used for a variety of applications. Analyzing the content from many network cameras requires significant amounts of computing resources. Cloud vendors offer resources in the form of cloud instances with different capabilities and hourly costs. Some instances include GPUs that can accelerate analysis programs. Doing so incurs additional monetary cost because instances with GPUs are more expensive. It is a challenging problem to reduce the overall monetary cost of using the cloud to analyze the real-time multimedia content from network cameras while meeting the desired analysis frame rates. This paper describes a cloud resource manager that solves this problem by estimating the resource requirements of executing analysis programs using CPU or GPU, formulating the resource allocation problem as a multiple-choice vector bin packing problem, and solving it using an existing algorithm. The experiments show that the manager can reduce up to 61% of the cost compared with other allocation strategies.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128922594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}