{"title":"Are Theme Songs Usable for Anime Retrieval?","authors":"Naoto Homma, Aiko Uemura, Tetsuro Kitahara","doi":"10.1109/MIPR51284.2021.00042","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00042","url":null,"abstract":"Japanese anime is well known as a multimedia-related popular culture but its retrieval techniques have not been fully developed. In this paper, we made an attempt of similarity-based retrieval of anime works using their theme songs. We hypothesized that similar anime works have similar theme songs because the atmosphere of anime works may be reflected in their theme songs. Under this hypothesis, we measured the audio-based and lyrics-based similarity among theme songs to search for anime works similar to a given query work. Experimental results show that the anime retrieval with audio-based and lyrics-based theme song similarity succeeded with average accuracy of 63% and 66%, respectively. This accuracy is higher than random selection, even though it is lower than an upper-bound accuracy based on manually prepared summary texts.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122102556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takahiro Komamizu, Shoi Ito, Yasuhiro Ogawa, K. Toyama
{"title":"FPX-G: First Person Exploration for Graph","authors":"Takahiro Komamizu, Shoi Ito, Yasuhiro Ogawa, K. Toyama","doi":"10.1109/MIPR51284.2021.00018","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00018","url":null,"abstract":"Data exploration is a fundamental user task in the information seeking process. In data exploration, users have ambiguous information needs, and they traverse across the data for gathering information. In this paper, a novel data exploration system, called FPX-G, is proposed that uses virtual reality (VR) technology. VR-based data exploration (or immersive analytics) is a recent trend in data analytics, and the existing work approaches involve aggregated information in an interactive and 3D manner. However, exploration for individual pieces of data scarcely has been approached. Traditional data exploration is done on 2D displays, therefore space is limited, and there is no depth. FPX-G fully utilizes 3D space to make individual piece of data visible in the user’s line of sight. In this paper, the data structure in FPX-G is designed as a graph, and the data exploration process is modeled as graph traversal. To utilize the capability of VR, FPX-G provides a first person view-based interface from which users can look at individual pieces of data and can walk through the data (like walking in a library). In addition to the walking mechanism, to deal with limited physical space in a room, FPX-G introduces eye-tracking technology for traversing data through a graph. A simulation-based evaluation reveals that FPX-G provides a significantly efficient interface for exploring data compared with the traditional 2D interface.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130351851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Itsuki Hashimoto, Yuanyuan Wang, Yukiko Kawai, K. Sumiya
{"title":"Topic Detection for Video Stream based on Geographical Relationships and its Interactive Viewing System","authors":"Itsuki Hashimoto, Yuanyuan Wang, Yukiko Kawai, K. Sumiya","doi":"10.1109/MIPR51284.2021.00012","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00012","url":null,"abstract":"During the recent years of Internet TV spread, researches on recommending relevant information for TV programs have been actively conducted. The NHK’s Hybridcast provides a service that recommends relevant information on the same screen during the broadcast of a TV program. However, there is currently no service for recommending supplementary information based on the users’ viewing behavior. Based on this research background, we first extract geographic words (location names) and topics of each scene using closed captions of TV programs. Next, we analyze the user’s viewing behavior to extract the scenes selected by the user in the sequence. After that, we can detect the topics of the user’s selected scenes. Therefore, the supplementary information is recommended by generating queries based on geographical relationships using geographical words and topics. In this paper, we discuss our proposed system for supporting interactive viewing of TV programs, which is based on the viewing behavior of users and geographic relationships.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128080820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdullah Alfarrarjeh, Xiao Yang, A. A. Jabal, S. H. Kim, C. Shahabi
{"title":"Exploring the Spatial-Visual Locality of Geo-tagged Urban Street Images","authors":"Abdullah Alfarrarjeh, Xiao Yang, A. A. Jabal, S. H. Kim, C. Shahabi","doi":"10.1109/MIPR51284.2021.00023","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00023","url":null,"abstract":"Urban street images have a unique property as they capture visual scenes that are distinctive to their geo-graphical regions. Such images are similar to their neighboring ones while dissimilar to faraway images. We refer to this characteristic of images as the spatial visual locality or the spatial locality of similar visual features. This study focuses on geo-tagged urban street images and hypothesizes that those images demonstrate a local similarity in a certain region but a dissimilarity across different regions, and provides different analysis methods to validate the hypothesis. The paper also evaluates the correctness of the hypothesis using three real geo-tagged street images collected from the Google Street View. Our experimental results demonstrate a high locality of similar visual features among urban street images.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115635499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dance to Music: Generative Choreography with Music using Mixture Density Networks","authors":"Rongfeng Li, Meng Zhao, Xianlin Zhang, Xueming Li","doi":"10.1109/MIPR51284.2021.00065","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00065","url":null,"abstract":"Choreography is usually done by professional choreographers, while the development of motion capture technology and artificial intelligence has made it possible for computers to choreograph with music. There are two main challenges in choreography: 1) how to get real and novel dance moves without relying on motion capture and manual production, and 2) how to use the appropriate music and motion features and matching algorithms to enhance the synchronization of music and dance. Focusing on these two targets above, we propose a framework based on Mixture Density Network (MDN) to synthesis dances that match the target music. The framework includes three steps: motion generation, motion screening and feature matching. In order to make the dance movements generated by the model applicable for choreography with music, we propose a parameter control algorithm and a coherence-based motion screening algorithm to improve the consistency of dance movements. Moreover, to achieve better unity of music and motions, we propose a multi-level music and motion feature matching algorithm, which combines global feature matching with local feature matching. Finally, our framework proved to be able to synthesis more coherent and creative choreography with music.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115766441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Augmented Tai-Chi Chuan Practice Tool with Pose Evaluation","authors":"Y. Jan, Kuan-Wei Tseng, Peng-Yuan Kao, Y. Hung","doi":"10.1109/MIPR51284.2021.00013","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00013","url":null,"abstract":"Tai Chi Chuan (TCC) is a well-known Chinese martial art that promotes health. In addition to learning TCC from a coach in a classroom setting, learners usually use books or videos to practice on their own. However, since turning is a frequent movement in TCC, learners cannot watch a tutorial and practice TCC at the same time. Furthermore, it is difficult for users to determine whether their postures are correct. We propose an augmented reality TCC practice tool with pose evaluation to help people practice TCC on their own. The tool consists of an optical see-through head-mounted display, external cameras, digital compasses, and a server. Users learn TCC movements from surrounding virtual coaches in augmented reality and determine whether their postures are correct via an evaluation module. Study results show that the proposed tool provides a helpful learning environment for TCC and that the pose estimation and evaluation are robust and reliable.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115003978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Machine Translation Enhancement by Fusing Multimodal-attention and Fine-grained Image Features","authors":"Lin Li, Turghun Tayir","doi":"10.1109/MIPR51284.2021.00050","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00050","url":null,"abstract":"With recent development of the multimodal machine translation (MMT) network architectures, recurrent models have effectively been replaced by attention mechanism and the translation results have been enhanced with the assistance of fine-grained image information. Although attention is a powerful and ubiquitous mechanism, different number of attention heads and granularity image features aligned by attention have an impact on the quality of multimodal machine translation. In order to address above problems, this paper proposes a multimodal machine translation enhancement by fusing multimodal-attention and fine-grained image features method which builds some submodels by introducing different granularity of image features to the multimodal-attention mechanism with different number of heads. Moreover, these sub-models are randomly fused and fusion models are obtained. The experimental results on the Multi30k dataset that the pruned attention heads lead to the improvement of translation results. Finally, our fusion model obtained the best results according to the automatic evaluation metrics BLEU compared with sub-models and some baselines.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126693184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhijie Qin, Wei Zhong, Fei Hu, Xinyan Yang, Long Ye, Qin Zhang
{"title":"Layout Structure Assisted Indoor Image Generation","authors":"Zhijie Qin, Wei Zhong, Fei Hu, Xinyan Yang, Long Ye, Qin Zhang","doi":"10.1109/MIPR51284.2021.00061","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00061","url":null,"abstract":"The existing methods can generate images in accord with scene graph, but the obtained images may appear blurs at the edges and disorders in the structure, due to the lacks of the structure information. In this paper, by considering the indoor images contain more layout structures than outdoor ones, we focus on the indoor image generation assisted with the layout structures. In the proposed method, through fusing the scene graph features together with the layout structure, the graph convolutional network is employed to convert the fused semantic information into the feature representation of scenes. Subsequently, a refined encoder-decoder network is also used for generating the final images. In the experiments, we compare the proposed method with the existing works on the indoor image dataset in terms of subjective and objective evaluations. The experimental results show that our method can achieve better IoU metric, and the visualized results also illustrate that the proposed approach can generate more clear indoor images with better layout structures.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Text Style Transfer With Decorative Elements","authors":"Yuting Ma, Fan Tang, Weiming Dong, Changsheng Xu","doi":"10.1109/MIPR51284.2021.00062","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00062","url":null,"abstract":"The text rendered by special effects can give a rich visual experience. Text stylization can help users migrate their favorite styles to specified texts, improving production efficiency and saving design cost. This paper proposes a novel text stylization framework, which can transfer mixed text styles, including font glyph and fine decorations, to user-specified texts. The transfer of decorative elements is difficult due to the text is obscured by decorative elements to a certain extent. Our method is divided into three stages: firstly, the position of decorative elements in the image is extracted and retained; secondly, the effects of font glyph and textures other than decorative elements are migrated; finally, a structure-aware strategy is used to reorganize the decorative elements to complete the entire stylization process. Experiments on open source text data sets demonstrated the advantages of our approach over other state- of-the-art style migration methods.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126036887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering Trajectories via Sparse Auto-encoders","authors":"Xiaofeng Wu, Rui Zhang, Lin Li","doi":"10.1109/MIPR51284.2021.00049","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00049","url":null,"abstract":"With the development of satellite navigation, communication and positioning technology, more and more trajectory data are collected and stored. Exploring such trajectory data can help us understand human mobility. A typical task of group-level mobility modeling is trajectory clustering. However, trajectories usually vary in length and shape, also contain noises. These exert a negative influence on trajectory representation and thus hinder trajectory clustering. Therefore, this paper proposes a U-type robust sparse autoencoder model(uRSAA), which is robust against noise and form variety. Specifically, a sparsity penalty is applied to constrain the output to decrease the effect of noise. By introducing skip connections, our model can strengthen the data exchange and preserve the information. Experiments are conducted on both synthetic datasets and real datasets, and the results show that our model outperforms the existing models.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128426284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}