S. Sivaprasad, Tanmayee Joshi, Rishabh Agrawal, N. Pedanekar
{"title":"Multimodal Continuous Prediction of Emotions in Movies using Long Short-Term Memory Networks","authors":"S. Sivaprasad, Tanmayee Joshi, Rishabh Agrawal, N. Pedanekar","doi":"10.1145/3206025.3206076","DOIUrl":"https://doi.org/10.1145/3206025.3206076","url":null,"abstract":"Predicting emotions that movies are designed to evoke, can be useful in entertainment applications such as content personalization, video summarization and ad placement. Multimodal input, primarily audio and video, helps in building the emotional content of a movie. Since the emotion is built over time by audio and video, the temporal context of these modalities is an important aspect in modeling it. In this paper, we use Long Short-Term Memory networks (LSTMs) to model the temporal context in audio-video features of movies. We present continuous emotion prediction results using a multimodal fusion scheme on an annotated dataset of Academy Award winning movies. We report a significant improvement over the state-of-the-art results, wherein the correlation between predicted and annotated values is improved from 0.62 vs 0.84 for arousal, and from 0.29 to 0.50 for valence.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"295 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121826004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueping Wang, Weixin Li, Guodong Mu, Di Huang, Yunhong Wang
{"title":"Facial Expression Synthesis by U-Net Conditional Generative Adversarial Networks","authors":"Xueping Wang, Weixin Li, Guodong Mu, Di Huang, Yunhong Wang","doi":"10.1145/3206025.3206068","DOIUrl":"https://doi.org/10.1145/3206025.3206068","url":null,"abstract":"High-level manipulation of facial expressions in images such as expression synthesis is challenging because facial expression changes are highly non-linear, and vary depending on the facial appearance. Identity of the person should also be well preserved in the synthesized face. In this paper, we propose a novel U-Net Conditioned Generative Adversarial Network (UC-GAN) for facial expression generation. U-Net helps retain the property of the input face, including the identity information and facial details. We also propose an identity preserving loss, which further improves the performance of our model. Both qualitative and quantitative experiments are conducted on the Oulu-CASIA and KDEF datasets, and the results show that our method can generate faces with natural and realistic expressions while preserve the identity information. Comparison with the state-of-the-art approaches also demonstrates the competency of our method.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114515782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ranking News-Quality Multimedia","authors":"G. Marcelino, Ricardo Pinto, João Magalhães","doi":"10.1145/3206025.3206053","DOIUrl":"https://doi.org/10.1145/3206025.3206053","url":null,"abstract":"News editors need to find the photos that best illustrate a news piece and fulfill news-media quality standards, while being pressed to also find the most recent photos of live events. Recently, it became common to use social-media content in the context of news media for its unique value in terms of immediacy and quality. Consequently, the amount of images to be considered and filtered through is now too much to be handled by a person. To aid the news editor in this process, we propose a framework designed to deliver high-quality, news-press type photos to the user. The framework, composed of two parts, is based on a ranking algorithm tuned to rank professional media highly and a visual SPAM detection module designed to filter-out low-quality media. The core ranking algorithm is leveraged by aesthetic, social and deep-learning semantic features. Evaluation showed that the proposed framework is effective at finding high-quality photos (true-positive rate) achieving a retrieval MAP of 64.5% and a classification precision of 70%.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131723402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kengo Makino, W. Duan, Rui Ishiyama, Toru Takahashi, Yuta Kudo, P. Jonker
{"title":"Automated Scanning and Individual Identification System for Parts without Marking or Tagging","authors":"Kengo Makino, W. Duan, Rui Ishiyama, Toru Takahashi, Yuta Kudo, P. Jonker","doi":"10.1145/3206025.3206088","DOIUrl":"https://doi.org/10.1145/3206025.3206088","url":null,"abstract":"This paper presents a fully automated system for detecting, classifying, microscopic imaging, and individually identifying multiple parts without ID-marking or tagging. The system is beneficial for product assemblers, who handle multiple types of parts simultaneously. They can ensure traceability quite easily by only placing the parts freely on the system platform. The system captures microscopic images of parts as their \"fingerprints,\" which are matched with pre-registered images in a database to identify an individual part's information such as its serial number. We demonstrate a working prototype and interaction scenario.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131742474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Promoting Open Innovations in Real Estate Tech: Provision of the LIFULL HOME'S Data Set and Collaborative Studies","authors":"Yoji Kiyota","doi":"10.1145/3206025.3210494","DOIUrl":"https://doi.org/10.1145/3206025.3210494","url":null,"abstract":"The LIFULL HOME'S Data Set, which is provided for academic use since November 2015, is being used for research in a variety of fields such as economics, architecture, urban science and so on. In particular, since it contains 83 million object property images and 5.1 million floor plan images, utilization in the computer vision and multimedia field is thriving, and papers using data sets are also adopted at the top conference ICCV 2017 in the image processing field it is. This presentation summarizes the results that have been obtained through the provision of datasets, and shows plans to promote open innovation in the field of real estate technology.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132177692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhongyang Zhang, Lei Wang, Yang Wang, Luping Zhou, Jianjia Zhang, Fangxiao Chen
{"title":"Instance Image Retrieval by Aggregating Sample-based Discriminative Characteristics","authors":"Zhongyang Zhang, Lei Wang, Yang Wang, Luping Zhou, Jianjia Zhang, Fangxiao Chen","doi":"10.1145/3206025.3206069","DOIUrl":"https://doi.org/10.1145/3206025.3206069","url":null,"abstract":"Identifying the discriminative characteristic of a query is important for image retrieval. For retrieval without human interaction, such characteristic is usually obtained by average query expansion (AQE) or its discriminative variant (DQE) learned from pseudo-examples online, among others. In this paper, we propose a new query expansion method to further improve the above ones. The key idea is to learn a \"unique'' discriminative characteristic for each database image, in an offline manner. During retrieval, the characteristic of a query is obtained by aggregating the unique characteristics of the query-relevant images collected from an initial retrieval result. Compared with AQE which works in the original feature space, our method works in the space of the unique characteristics of database images, significantly enhancing the discriminative power of the characteristic identified for a query. Compared with DQE, our method needs neither pseudo-labeled negatives nor the online learning process, leading to more efficient retrieval and even better performance. The experimental study conducted on seven benchmark datasets verifies the considerable improvement achieved by the proposed method, and also demonstrates its application to the state-of-the-art diffusion-based image retrieval.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"86 26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126140667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Romain Cohendet, Karthik Yadati, Ngoc Q. K. Duong, C. Demarty
{"title":"Annotating, Understanding, and Predicting Long-term Video Memorability","authors":"Romain Cohendet, Karthik Yadati, Ngoc Q. K. Duong, C. Demarty","doi":"10.1145/3206025.3206056","DOIUrl":"https://doi.org/10.1145/3206025.3206056","url":null,"abstract":"Memorability can be regarded as a useful metric of video importance to help make a choice between competing videos. Research on computational understanding of video memorability is however in its early stages. There is no available dataset for modelling purposes, and the few previous attempts provided protocols to collect video memorability data that would be difficult to generalize. Furthermore, the computational features needed to build a robust memorability predictor remain largely undiscovered. In this article, we propose a new protocol to collect long-term video memorability annotations. We measure the memory performances of 104 participants from weeks to years after memorization to build a dataset of 660 videos for video memorability prediction. This dataset is made available for the research community. We then analyze the collected data in order to better understand video memorability, in particular the effects of response time, duration of memory retention and repetition of visualization on video memorability. We finally investigate the use of various types of audio and visual features and build a computational model for video memorability prediction. We conclude that high level visual semantics help better predict the memorability of videos.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122482958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Perceptual Embeddings with Two Related Tasks for Joint Predictions of Media Interestingness and Emotions","authors":"Yang Liu, Zhonglei Gu, Tobey H. Ko, K. Hua","doi":"10.1145/3206025.3206071","DOIUrl":"https://doi.org/10.1145/3206025.3206071","url":null,"abstract":"Integrating media elements of various medium, multimedia is capable of expressing complex information in a neat and compact way. Early studies have linked different sensory presentation in multimedia with the perception of human-like concepts. Yet, the richness of information in multimedia makes understanding and predicting user perceptions in multimedia content a challenging task both to the machine and the human mind. This paper presents a novel multi-task feature extraction method for accurate prediction of user perceptions in multimedia content. Differentiating from the conventional feature extraction algorithms which focus on perfecting a single task, the proposed model recognizes the commonality between different perceptions (e.g., interestingness and emotional impact), and attempts to jointly optimize the performance of all the tasks through uncovered commonality features. Using both a media interestingness dataset and a media emotion dataset for user perception prediction tasks, the proposed model attempts to simultaneously characterize the individualities of each task and capture the commonalities shared by both tasks, and achieves better accuracy in predictions than other competing algorithms on real-world datasets of two related tasks: MediaEval 2017 Predicting Media Interestingness Task and MediaEval 2017 Emotional Impact of Movies Task.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124165120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recommendation Technologies for Multimedia Content","authors":"Xiangnan He, Hanwang Zhang, Tat-Seng Chua","doi":"10.1145/3206025.3210497","DOIUrl":"https://doi.org/10.1145/3206025.3210497","url":null,"abstract":"Recommendation systems play a vital role in online information systems and have become a major monetization tool for user-oriented platforms. In recent years, there has been increasing research interest in recommendation technologies in the information retrieval and data mining community, and significant progress has been made owing to the fast development of deep learning. However, in the multimedia community, there has been relatively less attention paid to the development of multimedia recommendation technologies. In this tutorial, we summarize existing research efforts on multimedia recommendation. We first provide an overview on fundamental techniques and recent advances on personalized recommendation for general items. We then summarize existing developments on recommendation technologies for multimedia content. Lastly, we present insight into the challenges and future directions in this emerging and promising area.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124219466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session 4: Video Analysis","authors":"K. Shinoda","doi":"10.1145/3252929","DOIUrl":"https://doi.org/10.1145/3252929","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132047503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}