Ye Wu, Xin Luo, Xin-Shun Xu, Shanqing Guo, Yuliang Shi
{"title":"Dictionary Learning based Supervised Discrete Hashing for Cross-Media Retrieval","authors":"Ye Wu, Xin Luo, Xin-Shun Xu, Shanqing Guo, Yuliang Shi","doi":"10.1145/3206025.3206045","DOIUrl":"https://doi.org/10.1145/3206025.3206045","url":null,"abstract":"Hashing technique has attracted considerable attention for large-scale multimedia retrieval due to its low storage cost and fast query speed. Moreover, many hashing models have been proposed for cross-modal retrieval task. However, there are still some problems that need to be further considered. For example, a majority of them directly use linear projection matrix to project heterogeneous data into a common space, which may lead to large error as there are some heterogeneous data with semantic similarity hard to be close in latent space when linear projection is used. Besides, most existing cross-modal hashing methods use a simple pairwise similarity matrix for preserving the label information when learning. This kind of pairwise similarity cannot fully utilize the discriminative property of label information. Furthermore, most existing supervised ones try to solve a relaxed continuous optimization problem by dropping the discrete constraints, which may lead to large quantization error. To overcome these limitations, in this paper, we propose a novel cross-modal hashing method, called Dictionary Learning based Supervised Discrete Hashing (DLSDH). Specifically, it learns dictionaries and generates sparse representation for every instance, which is more suitable to be projected to a latent space. To make full use of label information, it uses cosine similarity to construct a new pairwise similarity matrix which can contain more information. Moreover, it directly learns the discrete hash codes instead of relaxing the discrete constraints. Extensive experiments are conducted on three benchmark datasets and the results demonstrate that it outperforms several state-of-the-art methods for cross-modal retrieval task.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134230522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Doctoral Symposium Session","authors":"Martha Larson Takahiro Ogaawa","doi":"10.1145/3252933","DOIUrl":"https://doi.org/10.1145/3252933","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115561297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Special Session 2: Social-Media Visual Summarization / Large-Scale 3D Multimedia Analysis and Applications","authors":"Joao Magalhaes Rongrong Ji","doi":"10.1145/3252932","DOIUrl":"https://doi.org/10.1145/3252932","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114877771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Pairwise Classification and Ranking for Predicting Media Interestingness","authors":"Jayneel Parekh, Harshvardhan Tibrewal, Sanjeel Parekh","doi":"10.1145/3206025.3206078","DOIUrl":"https://doi.org/10.1145/3206025.3206078","url":null,"abstract":"With the explosive increase in the consumption of multimedia content in recent years, the field of media interestingness analysis has gained a lot of attention. This paper tackles the problem of image interestingness in videos and proposes a novel algorithm based on pairwise-comparisons of frames to rank all frames in a video. Experiments performed on the Predicting Media Interestingness dataset, affirm its effectiveness over existing solutions. In terms of the official metric i.e. Mean Average Precision at 10, it outperforms the previous state-of-the-art (to the best of our knowledge) on this dataset. Additional results on video interestingness substantiate the flexibility and performance reliability of our approach.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131106554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Relational Information in Social Networks using Geometric Deep Learning on Hypergraphs","authors":"Devanshu Arya, M. Worring","doi":"10.1145/3206025.3206062","DOIUrl":"https://doi.org/10.1145/3206025.3206062","url":null,"abstract":"Online social networks are constituted by a diverse set of entities including users, images and posts which makes the task of predicting interdependencies between entities challenging. We need a model that transfers information from a given type of relations between entities to predict other types of relations, irrespective of the type of entity. In order to devise a generic framework, one needs to capture the relational information between entities without any entity dependent information. However, there are two challenges: (a) a social network has an intrinsic community structure. In these communities, some relations are much more complicated than pairwise relations, thus cannot be simply modeled by a graph; (b) there are different types of entities and relations in a social network, taking into account all of them makes it difficult to formulate a model. In this paper, we claim that representing social networks using hypergraphs improves the task of predicting missing information about an entity by capturing higher-order relations. We study the behavior of our method by performing experiments on CLEF dataset consisting of images from Flickr, an online photo sharing social network.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Multilevel Semantic Similarity for Large-Scale Multi-Label Image Retrieval","authors":"Ge Song, Xiaoyang Tan","doi":"10.1145/3206025.3206027","DOIUrl":"https://doi.org/10.1145/3206025.3206027","url":null,"abstract":"We present a novel Deep Supervised Hashing with code operation (DSOH) method for large-scale multi-label image retrieval. This approach is in contrast with existing methods in that we respect both the intention gap and the intrinsic multilevel similarity of multi-labels. Particularly, our method allows a user to simultaneously present multiple query images rather than a single one to better express her intention, and correspondingly a separate sub-network in our architecture is specifically designed to fuse the query intention represented by each single query. Furthermore, as in the training stage, each image is annotated with multiple labels to enrich its semantic representation, we propose a new margin-adaptive triplet loss to learn the fine-grained similarity structure of multi-labels, which is known to be hard to capture. The whole system is trained in an end-to-end manner, and our experimental results demonstrate that the proposed method is not only able to learn useful multilevel semantic similarity-preserving binary codes but also achieves state-of-the-art retrieval performance on three popular datasets.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122400881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Ongoing Evolution of Broadcast Technology","authors":"K. Mitani","doi":"10.1145/3206025.3210489","DOIUrl":"https://doi.org/10.1145/3206025.3210489","url":null,"abstract":"The media environment of program production, content delivery, and viewing has been changing because of progress in broadcasting and communication technologies and other technologies like IoT, cloud computing, and artificial intelligence (AI). In December 2018, 8K and 4K UHDTV satellite broadcasting will start in Japan, which means that viewers will soon be able to enjoy 8K and 4K programs featuring a wide color gamut and high dynamic range characteristics together with 22.2 multi-channel audio at home. Meanwhile, distribution services for sending content to PCs and smartphones through the Internet have rapidly been spreading and the introduction of the next generation of mobile networks (5G) will accelerate their spread. The coming of such advanced broadcast and broadband technologies and consequent changes in lifestyle will provide broadcasters with a great opportunity for a new stage of development. At NHK Science & Technology Research Laboratories (NHK STRL), we are pursuing a wide range of research with the aim of creating new broadcast services that can provide viewing experiences never before imagined and user experiences more attuned to daily life. To enhance the convenience of television and the value of TV programming, we are developing technology for connecting the TV experience with various activities in everyday life. Extensions to \"Hybridcast Connect\" will drive applications that link TVs, smartphones, and IoT. They will enable spontaneous consumption of content during everyday activities through various devices around the user. Establishing a new program production workflow with AI, which we call \"Smart Production\", is one of our most important research topics. We are developing speech and face recognition technologies for making closed captions and metadata efficiently, as well as technologies for automatically converting content into computer-generated sign language, audio descriptions, and simplified Japanese. This presentation introduces these research achievements targeting 2020 and beyond, as well as other broadcasting technology trends including 4K8K UHDTV broadcasting in Japan, 3D imaging, and VR/AR.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121381752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Reconstruction by Laplacian Eigenmaps for Efficient Instance Search","authors":"Bingqing Ke, Jie Shao, Zi Huang, Heng Tao Shen","doi":"10.1145/3206025.3206032","DOIUrl":"https://doi.org/10.1145/3206025.3206032","url":null,"abstract":"Instance search aims at retrieving images containing a particular query instance. Recently, image features derived from pre-trained convolutional neural networks (CNNs) have been shown to provide promising performance for image retrieval. However, the robustness of these features is still limited by hard positives and hard negatives. To address this issue, this work focuses on reconstructing a new representation based on conventional CNN features to capture the intrinsic image manifold in the original feature space. After the feature reconstruction, the Euclidean distance can be applied in the new space to measure the pairwise distance among feature points. The proposed method is highly efficient, which benefits from the linear search complexity and a further optimization for speedup. Experiments demonstrate that our method achieves promising efficiency with highly competitive accuracy. This work succeeds in capturing implicit embedding information in images as well as reducing the computational complexity significantly.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127774687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personal Basketball Coach: Tactic Training through Wireless Virtual Reality","authors":"Wan-Lun Tsai","doi":"10.1145/3206025.3206084","DOIUrl":"https://doi.org/10.1145/3206025.3206084","url":null,"abstract":"In this paper, we present a basketball tactic training framework with the aid of virtual reality (VR) technology to improve the effectiveness and experience of tactic learning. Our proposal is composed of 1) a wireless VR interaction system with motion capture devices which is applicable in the fast movement basketball running scenario; 2) a computing server that generates three-dimensional virtual players, defenders, and advantageous tactics guide. By the assistance of our VR training system, the user can vividly experience how the tactics are executed by viewing from the a specific player's viewing direction. Moreover, the basketball tactic movement guidance and virtual defenders are rendered in our VR system to make the users feel like playing in a real basketball game, which improves the efficiency and effectiveness of tactics training.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132438642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Poster Paper Session","authors":"Keiji Yanai","doi":"10.1145/3252930","DOIUrl":"https://doi.org/10.1145/3252930","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134533323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}