Proceedings of the 26th ACM international conference on Multimedia最新文献

筛选
英文 中文
Jaguar 捷豹
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240561
Wenxiao Zhang, Bo Han, P. Hui
{"title":"Jaguar","authors":"Wenxiao Zhang, Bo Han, P. Hui","doi":"10.1145/3240508.3240561","DOIUrl":"https://doi.org/10.1145/3240508.3240561","url":null,"abstract":"In this paper, we present the design, implementation and evaluation of Jaguar, a mobile Augmented Reality (AR) system that features accurate, low-latency, and large-scale object recognition and flexible, robust, and context-aware tracking. Jaguar pushes the limit of mobile AR's end-to-end latency by leveraging hardware acceleration with GPUs on edge cloud. Another distinctive aspect of Jaguar is that it seamlessly integrates marker-less object tracking offered by the recently released AR development tools (e.g., ARCore and ARKit) into its design. Indeed, some approaches used in Jaguar have been studied before in a standalone manner, e.g., it is known that cloud offloading can significantly decrease the computational latency of AR. However, the question of whether the combination of marker-less tracking, cloud offloading and GPU acceleration would satisfy the desired end-to-end latency of mobile AR (i.e., the interval of camera frames) has not been eloquently addressed yet. We demonstrate via a prototype implementation of our proposed holistic solution that Jaguar reduces the end-to-end latency to ~33 ms. It also achieves accurate six degrees of freedom tracking and 97% recognition accuracy for a dataset with 10,000 images.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134107758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Improving QoE of ABR Streaming Sessions through QUIC Retransmissions 通过QUIC重传提高ABR流会话的QoE
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240664
Divyashri Bhat, R. Deshmukh, M. Zink
{"title":"Improving QoE of ABR Streaming Sessions through QUIC Retransmissions","authors":"Divyashri Bhat, R. Deshmukh, M. Zink","doi":"10.1145/3240508.3240664","DOIUrl":"https://doi.org/10.1145/3240508.3240664","url":null,"abstract":"While adaptive bitrate (ABR) streaming has contributed significantly to the reduction of video playout stalling, ABR clients continue to suffer from the variation of bit rate qualities over the duration of a streaming session. Similar to stalling, these variations in bit rate quality have a negative impact on the users' Quality of Experience (QoE). In this paper, we use a trace from a large-scale CDN to show that such quality changes occur in a significant amount of streaming sessions and investigate an ABR video segment retransmission approach to reduce the number of such quality changes. As the new HTTP/2 standard is becoming increasingly popular, we also see an increase in the usage of QUIC as an alternative protocol for the transmission of web traffic including video streaming. Using various network conditions, we conduct a systematic comparison of existing transport layer approaches for HTTP/2 that is best suited for ABR segment retransmissions. Since it is well known that both protocols provide a series of improvements over HTTP/1.1, we perform experiments both in controlled environments and over transcontinental links in the Internet and find that these benefits also \"trickle up'' into the application layer when it comes to ABR video streaming where QUIC retransmissions can significantly improve the average quality bitrate while simultaneously minimizing bit rate variations over the duration of a streaming session.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134447507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Dissimilarity Representation Learning for Generalized Zero-Shot Recognition 广义零射击识别的不相似表示学习
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240686
Gang Yang, Jinlu Liu, Jieping Xu, Xirong Li
{"title":"Dissimilarity Representation Learning for Generalized Zero-Shot Recognition","authors":"Gang Yang, Jinlu Liu, Jieping Xu, Xirong Li","doi":"10.1145/3240508.3240686","DOIUrl":"https://doi.org/10.1145/3240508.3240686","url":null,"abstract":"Generalized zero-shot learning (GZSL) aims to recognize any test instance coming either from a known class or from a novel class that has no training instance. To synthesize training instances for novel classes and thus resolving GZSL as a common classification problem, we propose a Dissimilarity Representation Learning (DSS) method. Dissimilarity representation is to represent a specific instance in terms of its (dis)similarity to other instances in a visual or attribute based feature space. In the dissimilarity space, instances of the novel classes are synthesized by an end-to-end optimized neural network. The neural network realizes two-level feature mappings and domain adaptions in the dissimilarity space and the attribute based feature space. Experimental results on five benchmark datasets, i.e., AWA, AWA$_2$, SUN, CUB, and aPY, show that the proposed method improves the state-of-the-art with a large margin, approximately 10% gain in terms of the harmonic mean of the top-1 accuracy. Consequently, this paper establishes a new baseline for GZSL.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131508161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Dynamic Sound Field Synthesis for Speech and Music Optimization 动态声场合成语音和音乐优化
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240644
Zhenyu Tang, Nicolás Morales, Dinesh Manocha
{"title":"Dynamic Sound Field Synthesis for Speech and Music Optimization","authors":"Zhenyu Tang, Nicolás Morales, Dinesh Manocha","doi":"10.1145/3240508.3240644","DOIUrl":"https://doi.org/10.1145/3240508.3240644","url":null,"abstract":"We present a novel acoustic optimization algorithm to synthesize dynamic sound fields in a static scene. Our approach places new active loudspeakers or virtual sources in the scene so that the dynamic sound field in a region satisfies optimization criteria to improve speech and music perception. We use a frequency domain formulation of sound propagation and reduce the computation of dynamic sound field synthesis to solving a linear least squares problem, and do not impose any constraints on the environment or loudspeakers type, or loudspeaker placement. We highlight the performance on complex indoor scenes in terms of speech and music improvements. We evaluate the performance with a user study and highlight the perceptual benefits for virtual reality and multimedia applications.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127552815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Online Inter-Camera Trajectory Association Exploiting Person Re-Identification and Camera Topology 基于人再识别和摄像机拓扑的在线摄像机间轨迹关联
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240663
Na Jiang, Si-Yuan Bai, Yue Xu, Chang Xing, Zhong Zhou, Wei Wu
{"title":"Online Inter-Camera Trajectory Association Exploiting Person Re-Identification and Camera Topology","authors":"Na Jiang, Si-Yuan Bai, Yue Xu, Chang Xing, Zhong Zhou, Wei Wu","doi":"10.1145/3240508.3240663","DOIUrl":"https://doi.org/10.1145/3240508.3240663","url":null,"abstract":"Online inter-camera trajectory association is a promising topic in intelligent video surveillance, which concentrates on associating trajectories belong to the same individual across different cameras according to time. It remains challenging due to the inconsistent appearance of a person in different cameras and the lack of spatio-temporal constraints between cameras. Besides, the orientation variations and the partial occlusions significantly increase the difficulty of inter-camera trajectory association. Targeting to solve these problems, this work proposes an orientation-driven person re-identification (ODPR) and an effective camera topology estimation based on appearance features for online inter-camera trajectory association. ODPR explicitly leverages the orientation cues and stable torso features to learn discriminative feature representations for identifying trajectories across cameras, which alleviates the pedestrian orientation variations by the designed orientation-driven loss function and orientation aware weights. The effective camera topology estimation introduces appearance features to generate the correct spatio-temporal constraints for narrowing the retrieval range, which improves the time efficiency and provides the possibility for intelligent inter-camera trajectory association in large-scale surveillance environments. Extensive experimental results demonstrate that our proposed approach significantly outperforms most state-of-the-art methods on the popular person re-identification datasets and the public multi-target, multi-camera tracking benchmark.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131259728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Drawing in a Virtual 3D Space - Introducing VR Drawing in Elementary School Art Education 虚拟三维空间中的绘画——介绍VR绘画在小学美术教育中的应用
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240692
W. Bolier, Wolfgang Hürst, G. V. Bommel, Joost Bosman, H. Bosman
{"title":"Drawing in a Virtual 3D Space - Introducing VR Drawing in Elementary School Art Education","authors":"W. Bolier, Wolfgang Hürst, G. V. Bommel, Joost Bosman, H. Bosman","doi":"10.1145/3240508.3240692","DOIUrl":"https://doi.org/10.1145/3240508.3240692","url":null,"abstract":"Drawing is an important part of elementary school education, especially since it contributes to the development of spatial skills. Virtual reality enables us to draw not just on a flat 2D surface, but in 3D space. Our research aims at showing if and how this form of 3D drawing can be beneficial for art education. This paper presents first insights into potential benefits and obstacles when introducing 3D drawing at elementary schools. In an experiment with 18 children, we studied practical aspects, proficiency, and spatial ability development. Our results show improvement in the children's 3D drawing skills but not in their spatial abilities. Their drawing skills also do seem to be correlated with their mental rotation ability, although further research is needed to conclusively confirm this.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133577581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Fashion Sensitive Clothing Recommendation Using Hierarchical Collocation Model 基于分层搭配模型的时尚敏感性服装推荐
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240596
Zhengzhong Zhou, Xiu Di, Wei Zhou, Liqing Zhang
{"title":"Fashion Sensitive Clothing Recommendation Using Hierarchical Collocation Model","authors":"Zhengzhong Zhou, Xiu Di, Wei Zhou, Liqing Zhang","doi":"10.1145/3240508.3240596","DOIUrl":"https://doi.org/10.1145/3240508.3240596","url":null,"abstract":"Automatic clothing recommendation grows dramatically due to the booming of apparel e-commerce. In this paper, we propose a novel clothing recommendation approach which is sensitive to the fashion trend. The proposed approach incorporates the expert knowledge into multiple dimensional information including purchase behaviors, image contents and product descriptions so as to provide recommendation of clothing in line with the forefront of fashion. Meanwhile, to meet with human visual aesthetics and user's collocation experience, we propose the integration of the convolutional neural network and the hierarchical collocation model (HCM) into our framework. The former is to extract effective visual features and attribute descriptors from the clothing items, while the latter embeds them into the concept of style topics which interpret the collocation pattern from a higher level of semantic knowledge. Such a data driven recommendation approach is able to learn clothing collocation metric from multi-dimensional clothing information. Experimental results show that our HCM method achieves better performance than other state-of-the-art baselines. Besides, it also ensures the fashion sensitivity of the recommended outfits.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115797291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Multi-Human Parsing Machines 多人解析机
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240515
Jianshu Li, Jian Zhao, Yunpeng Chen, S. Roy, Shuicheng Yan, Jiashi Feng, T. Sim
{"title":"Multi-Human Parsing Machines","authors":"Jianshu Li, Jian Zhao, Yunpeng Chen, S. Roy, Shuicheng Yan, Jiashi Feng, T. Sim","doi":"10.1145/3240508.3240515","DOIUrl":"https://doi.org/10.1145/3240508.3240515","url":null,"abstract":"Human parsing is an important task in human-centric analysis. Despite the remarkable progress in single-human parsing, the more realistic case of multi-human parsing remains challenging in terms of the data and the model. Compared with the considerable number of available single-human parsing datasets, the datasets for multi-human parsing are very limited in number mainly due to the huge annotation effort required. Besides the data challenge to multi-human parsing, the persons in real-world scenarios are often entangled with each other due to close interaction and body occlusion, making it difficult to distinguish body parts from different person instances. In this paper we propose the Multi-Human Parsing Machines (MHPM) system, which contains an MHP Montage model and an MHP Solver, to address both challenges in multi-human parsing. Specifically, the MHP Montage model in MHPM generates realistic images with multiple persons together with the parsing labels. It intelligently composes single persons onto background scene images while maintaining the structural information between persons and the scene. The generated images can be used to train better multi-human parsing algorithms. On the other hand, the MHP Solver in MHPM solves the bottleneck of distinguishing multiple entangled persons with close interaction. It employs a Group-Individual Push and Pull (GIPP) loss function, which can effectively separate persons with close interaction. We experimentally show that the proposed MHPM can achieve state-of-the-art performance on the multi-human parsing benchmark and the person individualization benchmark, which distinguishes closely entangled person instances.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114622437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Structure Guided Photorealistic Style Transfer 结构引导的逼真风格转移
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240637
Yuheng Zhi, Huawei Wei, Bingbing Ni
{"title":"Structure Guided Photorealistic Style Transfer","authors":"Yuheng Zhi, Huawei Wei, Bingbing Ni","doi":"10.1145/3240508.3240637","DOIUrl":"https://doi.org/10.1145/3240508.3240637","url":null,"abstract":"Recent style transfer methods based on deep networks strive to generate more content matching stylized images by adding semantic guidance in the iterative process. However, these approaches can just guarantee the transfer of integral color and texture distribution between semantically equivalent regions, but local variation within these regions cannot be accurately captured. Therefore, the resulting image lacks local plausibility. To this end, we develop a non-parametric patch based style transfer framework to synthesize more content coherent images. By designing a novel patch matching algorithm which simultaneously takes high-level category information and geometric structure information (e.g., human pose and building structure) into account, our proposed method is capable of transferring more detailed distribution and producing more photorealistic stylized images. We show that our approach achieves remarkable style transfer results on contents with geometric structure, including human body, vehicles, buildings, etc.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114739805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
WildFish
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240616
Peiqin Zhuang, Yali Wang, Yu Qiao
{"title":"WildFish","authors":"Peiqin Zhuang, Yali Wang, Yu Qiao","doi":"10.1145/3240508.3240616","DOIUrl":"https://doi.org/10.1145/3240508.3240616","url":null,"abstract":"Fish recognition is an important task to understand the marine ecosystem and biodiversity. It is often challenging to identify fish species in the wild, due to the following difficulties. First, most fish benchmarks are small-scale, which may limit the representation power of machine learning models. Second, the number of fish species is huge, and there may still exist unknown categories in our planet. The traditional classifiers often fail to deal with this open-set scenario. Third, certain fish species are highly-confused. It is often hard to figure out the subtle differences, only by the unconstrained images. Motivated by these facts, we introduce a large-scale WildFish benchmark for fish recognition in the wild. Specifically, we make three contributions in this paper. First, WildFish is the largest image data set for wild fish recognition, to our best knowledge. It consists of 1000 fish categories with 54,459 unconstrained images, allowing to train high-capacity models for automatic fish classification. Second, we propose a novel open-set fish classification task for realistic scenarios, and investigate the open-set deep learning framework with a number of practical designs. Third, we propose a novel fine-grained recognition task, with the guidance of pairwise textual descriptions. Via leveraging the comparison knowledge in the sentence, we design a multi-modal fish net to effectively distinguish two confused categories in a pair. Finally, we release WildFish (https://github.com/PeiqinZhuang/WildFish), in order to bring benefit to more research studies in multimedia and beyond.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123462334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信