{"title":"Multiple Fisheye Camera Tracking via Real-Time Feature Clustering","authors":"Chon-Hou Sio, Hong-Han Shuai, Wen-Huang Cheng","doi":"10.1145/3338533.3366581","DOIUrl":"https://doi.org/10.1145/3338533.3366581","url":null,"abstract":"Recently, Multi-Target Multi-Camera Tracking (MTMC) makes a breakthrough due to the release of DukeMTMC and show the feasibility of related applications. However, most of the existing MTMC methods focus on the batch methods which attempt to find the global optimal solution from the entire image sequence and thus are not suitable for the real-time applications, e.g., customer tracking in unmanned stores. In this paper, we propose a low-cost online tracking algorithm, namely, Deep Multi-Fisheye-Camera Tracking (DeepMFCT) to identify the customers and locate the corresponding positions from multiple overlapping fisheye cameras. Based on any single camera tracking algorithm (e.g., Deep SORT), our proposed algorithm establishes the correlation between different single camera tracks. Owing to the lack of well-annotated multiple overlapping fisheye cameras dataset, the main challenge of this issue is to efficiently overcome the domain gap problem between normal cameras and fisheye cameras based on existed deep learning based model. To address this challenge, we integrate a single camera tracking algorithm with cross camera clustering including location information that achieves great performance on the unmanned store dataset and Hall dataset. Experimental results show that the proposed algorithm improves the baselines by at least 7% in terms of MOTA on the Hall dataset.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126118802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimedia Information Retrieval","authors":"Yangyang Guo","doi":"10.1145/3338533.3372212","DOIUrl":"https://doi.org/10.1145/3338533.3372212","url":null,"abstract":"My main research interests include product search and visual question answering (VQA), lying in the field of information retrieval (IR), which aims to obtain information system resources relevant to an information need from a collection. Product search focuses on the E-commerce domain and aims to retrieve products which are not only relevant to the submitted queries but also fit users' personal preferences; Visual question answering aims to provide a natural language answer for a given image and a free-form, open-ended, natural-language question about this image, which requires semantic understanding on natural language and visual content, as well as knowledge extraction and logic reasoning.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127517181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Distillation Metric Learning","authors":"Jiaxu Han, Tianyu Zhao, Changqing Zhang","doi":"10.1145/3338533.3366560","DOIUrl":"https://doi.org/10.1145/3338533.3366560","url":null,"abstract":"Due to the emergence of large-scale and high-dimensional data, measuring the similarity between data points becomes challenging. In order to obtain effective representations, metric learning has become one of the most active researches in the field of computer vision and pattern recognition. However, models using trained networks for predictions are often cumbersome and difficult to be deployed. Therefore, in this paper, we propose a novel deep distillation metric learning (DDML) for online teaching in the procedure of learning the distance metric. Specifically, we employ model distillation to transfer the knowledge acquired by the larger model to the smaller model. Unlike the 2-step offline and mutual online manners, we propose to train a powerful teacher model, who transfer the knowledge to a lightweight and generalizable student model and iteratively improved by the feedback from the student model. We show that our method has achieved state-of-the-art results on CUB200-2011 and CARS196 while having advantages in computational efficiency.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116846886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Performance-Aware Selection Strategy for Cloud-based Video Services with Micro-Service Architecture","authors":"Zhengjun Xu, Haitao Zhang, Han Huang","doi":"10.1145/3338533.3366609","DOIUrl":"https://doi.org/10.1145/3338533.3366609","url":null,"abstract":"The cloud micro-service architecture provides loosely coupling services and efficient virtual resources, which becomes a promising solution for large-scale video services. It is difficult to efficiently select the optimal services under micro-service architecture, because the large number of micro-services leads to an exponential increase in the number of service selection candidate solutions. In addition, the time sensitivity of video services increases the complexity of service selection, and the video data can affects the service selection results. However, the current video service selection strategies are insufficient under micro-service architecture, because they do not take into account the resource fluctuation of the service instances and the features of the video service comprehensively. In this paper, we focus on the video service selection strategy under micro-service architecture. Firstly, we propose a QoS Prediction (QP) method using explicit factor analysis and linear regression. The QP can accurately predict the QoS values based on the features of video data and service instances. Secondly, we propose a Performance-Aware Video Service Selection (PVSS) method. We prune the candidate services to reduce computational complexity and then efficiently select the optimal solution based on Fruit Fly Optimization (FFO) algorithm. Finally, we conduct extensive experiments to evaluate our strategy, and the results demonstrate the effectiveness of our strategy.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126988022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective Attention Network for Image Dehazing and Deraining","authors":"Xiao Liang, Runde Li, Jinhui Tang","doi":"10.1145/3338533.3366688","DOIUrl":"https://doi.org/10.1145/3338533.3366688","url":null,"abstract":"Image dehazing and deraining are import low-level compute vision tasks. In this paper, we propose a novel method named Selective Attention Network (SAN) to solve these two problems. Due to the density of haze and directions of rain streaks are complex and non-uniform, SAN adopts the channel-wise attention and spatial-channel attention to remove rain streaks and haze both in globally and locally. To better capture various of rain and hazy details, we propose a Selective Attention Module(SAM) to re-scale the channel-wise attention and spatial-channel attention instead of simple element-wise summation. In addition, we conduct ablation studies to validate the effectiveness of the each module of SAN. Extensive experimental results on synthetic and real-world datasets show that SAN performs favorably against state-of-the-art methods.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126790113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Summarization based on Sparse Subspace Clustering with Automatically Estimated Number of Clusters","authors":"Pengyi Hao, Edwin Manhando, Taotao Ye, Cong Bai","doi":"10.1145/3338533.3366593","DOIUrl":"https://doi.org/10.1145/3338533.3366593","url":null,"abstract":"Advancements in technology resulted in a sharp growth in the number of digital cameras at people's disposal all across the world. Consequently, the huge storage space consumed by the videos from these devices on video repositories make the job of video processing and analysis to be time-consuming. Furthermore, this also slows down the video browsing and retrieval. Video summarization plays a very crucial role in solving these issues. Despite the number of video summarization approaches proposed up to the present time, the goal is to take a long video and generate a video summary in form of a short video skim without losing the meaning or the message transmitted by the original lengthy video. This is done by selecting the important frames called key-frames. The approach proposed by this work performs automatic summarization of digital videos based on detected objects' deep features. To this end, we apply sparse subspace clustering with an automatically estimated number of clusters to the objects' deep features. The summary generated from our scheme will store the meta-data for each short video inferred from the clustering results. In this paper, we also suggest a new video dataset for video summarization. We evaluate the performance of our work using the TVSum dataset and our video summarization dataset.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132442556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active Perception Network for Salient Object Detection","authors":"Junhang Wei, Shuhui Wang, Liang Li, Qingming Huang","doi":"10.1145/3338533.3366580","DOIUrl":"https://doi.org/10.1145/3338533.3366580","url":null,"abstract":"To get better saliency maps for salient object detection, recent methods fuse features from different levels of convolutional neural networks and have achieved remarkable progress. However, the differences between different feature levels bring difficulties to the fusion process, thus it may lead to unsatisfactory saliency predictions. To address this issue, we propose Active Perception Network (APN) to enhance inter-feature consistency for salient object detection. First, Mutual Projection Module (MPM) is developed to fuse different features, which uses high-level features as guided information to extract complementary components from low-level features, and can suppress background noises and improve semantic consistency. Self Projection Module (SPM) is designed to further refine the fused features, which can be considered as the extended version of residual connection. Features that pass through SPM can produce more accurate saliency maps. Finally, we propose Head Projection Module (HPM) to aggregate global information, which brings strong semantic consistency to the whole network. Comprehensive experiments on five benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art approaches on different evaluation metrics.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134536705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijian Lin, Haosheng Chen, Yanjie Liang, Y. Yan, Hanzi Wang
{"title":"Robust Visual Tracking via Statistical Positive Sample Generation and Gradient Aware Learning","authors":"Lijian Lin, Haosheng Chen, Yanjie Liang, Y. Yan, Hanzi Wang","doi":"10.1145/3338533.3366556","DOIUrl":"https://doi.org/10.1145/3338533.3366556","url":null,"abstract":"In recent years, Convolutional Neural Network (CNN) based trackers have achieved state-of-the-art performance on multiple benchmark datasets. Most of these trackers train a binary classifier to distinguish the target from its background. However, they suffer from two limitations. Firstly, these trackers cannot effectively handle significant appearance variations due to the limited number of positive samples. Secondly, there exists a significant imbalance of gradient contributions between easy and hard samples, where the easy samples usually dominate the computation of gradient. In this paper, we propose a robust tracking method via Statistical Positive sample generation and Gradient Aware learning (SPGA) to address the above two limitations. To enrich the diversity of positive samples, we present an effective and efficient statistical positive sample generation algorithm to generate positive samples in the feature space. Furthermore, to handle the issue of imbalance between easy and hard samples, we propose a gradient sensitive loss to harmonize the gradient contributions between easy and hard samples. Extensive experiments on three challenging benchmark datasets including OTB50, OTB100 and VOT2016 demonstrate that the proposed SPGA performs favorably against several state-of-the-art trackers.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130307879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning","authors":"Yiyan Chen, Li Tao, Xueting Wang, T. Yamasaki","doi":"10.1145/3338533.3366583","DOIUrl":"https://doi.org/10.1145/3338533.3366583","url":null,"abstract":"Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each shot is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video shots in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123933017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}