{"title":"Multi-Style Transfer Generative Adversarial Network for Text Images","authors":"Honghui Yuan, Keiji Yanai","doi":"10.1109/MIPR51284.2021.00017","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00017","url":null,"abstract":"In recent years, neural style transfer have shown impressive results in deep learning. In particular, for text style transfer, recent researches have successfully completed the transition from the text font domain to the text style domain. However, for text style transfer, multiple style transfer often requires learning many models, and generating multiple styles images of texts in a single model remains an unsolved problem. In this paper, we propose a multiple style transformation network for text style transfer, which can generate multiple styles of text images in a single model and control the style of texts in a simple way. The main idea is to add conditions to the transfer network so that all the styles can be trained effectively in the network, and to control the generation of each text style through the conditions. We also optimize the network so that the conditional information can be transmitted effectively in the network. The advantage of the proposed network is that multiple styles of text can be generated with only one model and that it is possible to control the generation of text styles. We have tested the proposed network on a large number of texts, and have demonstrated that it works well when generating multiple styles of text at the same time.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116114927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma
{"title":"Transformer based Neural Network for Fine-Grained Classification of Vehicle Color","authors":"Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma","doi":"10.1109/MIPR51284.2021.00025","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00025","url":null,"abstract":"The development of vehicle color recognition technology is of great significance for vehicle identification and the development of the intelligent transportation system. However, the small variety of colors and the influence of the illumination in the environment make fine-grained vehicle color recognition a challenge task. Insufficient training data and small color categories in previous datasets causes the low recognition accuracy and the inflexibility of practical using. Meanwhile, the inefficient feature learning also leads to poor recognition performance of the previous methods. Therefore, we collect a rear shooting dataset from vehicle bayonet monitoring for fine-grained vehicle color recognition. Its images can be divided into 11 main-categories and 75 color subcategories according to the proposed labeling algorithm which can eliminate the influence of illumination and assign the color annotation for each image. We propose a novel recognition model which can effectively identify the vehicle colors. We skillfully interpolate the Transformer into recognition model to enhance the feature learning capacity of conventional neural networks, and specially design a hierarchical loss function through in-depth analysis of the proposed dataset. We evaluate the designed recognition model on the dataset and it can achieve accuracy of 97.77%, which is superior to the traditional approaches.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121564860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Chen, Jignasha Borad, Mizuki Miyashita, James Randall
{"title":"Integrated Cloud-based System for Endangered Language Documentation and Application","authors":"Min Chen, Jignasha Borad, Mizuki Miyashita, James Randall","doi":"10.1109/MIPR51284.2021.00044","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00044","url":null,"abstract":"Nearly half of the world languages are considered endangered and need to be documented, analyzed, and revitalized. However, existing linguistics tools lack the accessibility to effectively analyze languages such as Blackfoot in which relative pitch movement is significant, e.g., words with the same sound sequence but convey different meanings when changing in pitches. To address this issue, we present a novel form of audio analysis with perceptual scale, and develop a consolidated and interactive toolset called MeTILDA (Melodic Transcription in Language Documentation and Analysis) to effectively capture perceived changes in pitch movement and to host other existing desktop-based linguistic tools on the cloud to enable collaboration, data-sharing, and data reuse among multiple linguistic tools.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129088140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Human Behavior with Transformer Considering the Mutual Relationship between Categories and Regions","authors":"Ryoichi Osawa, Keiichi Suekane, Ryoko Nakamura, Aozora Inagaki, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00029","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00029","url":null,"abstract":"Recently, studies on human behavior have been frequently conducted. Predicting human mobility is one area of interest. However, it is difficult since human activities are the result of various factors such as periodicity, changes of preferences, and geographical effects. When predicting human mobility, it is essential to capture these factors.Humans may go to particular areas to visit a store of a desired category. Also, since stores of a particular category tend to open in specific areas, trajectories of visited geographical regions are helpful in understanding the purpose of visits. Therefore, the purposes of visiting stores of a desired category and of visiting a region affect each other. Capturing this mutual dependency enables to predict with higher accuracy than modeling only the superficial trajectory sequence. To capture it, a mechanism that can dynamically adjust the important categories depending on region was necessary, but the conventional methods, which can only perform static operations, have structural limitations.In the proposed model, we used the Transformer to address this problem. However, since a default Transformer can only capture unidirectional relationships, the proposed model uses mutually connected Transformers to capture the mutual relationships between categories and regions.Furthermore, most human activities have a weekly periodicity, and it is highly possible that only a part of a trajectory is important to predict human mobility. Therefore, we propose an encoder that captures the periodicity of human mobility and an attention mechanism to extract the important part of the trajectory.In our experiments, we predict whether a user will visit stores in specific categories and regions taking the trajectory sequence as input. By comparing our model with existing models, we show that the model outperforms state-of-the-art (SOTA) models in similar tasks in this experimental setup.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kyoto Sightseeing Map 2.0 for User-Experience Oriented Tourism","authors":"Jing Xu, Junjie Sun, Taishan Li, Qiang Ma","doi":"10.1109/MIPR51284.2021.00045","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00045","url":null,"abstract":"We present Kyoto sightseeing map 2.0, a web-based application, for user-experience oriented tourism through discovering and exploring sightseeing resources from User Generated Content (UGC). It focuses on adapting the massive content analysis of information from UGC, to give an additional source of information from user experience to travelers in their search information process. It decreases and bridges the information gap of sightseeing resources, especially Point of Interest (POIs), caused by the map provided by government or tourism firms from the perspective of publicity and marketing. On the one hand, Kyoto sightseeing map 2.0 offers the aesthetics quality results of photos taken in tourist spots over time in Kyoto based on the UGC by aesthetics quality assessment (AQA) with Multi-level Spatially-Pooled (MLSP) to tourists. On the other hand, the user can also use two sets of POI photos generated by the user data displayed on the map as a reference. Our application, for user-experience oriented tourism, help them make well-informed decisions of their trip based on UGC.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129277787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saed Rezayi, Saber Soleymani, H. Arabnia, Sheng Li
{"title":"Socially Aware Multimodal Deep Neural Networks for Fake News Classification","authors":"Saed Rezayi, Saber Soleymani, H. Arabnia, Sheng Li","doi":"10.1109/MIPR51284.2021.00048","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00048","url":null,"abstract":"The importance of fake news detection and classification on Online Social Networks (OSN) has recently increased and drawn attention. Training machine learning models for this task requires different types of attributes or modalities for the target OSN. Existing methods mainly rely on social media text, which carries rich semantic information and can roughly explain the discrepancy between normal and multiple fake news types. However, the structural characteristics of OSNs are overlooked. This paper aims to exploit such structural characteristics and further boost the fake news classification performance on OSN. Using deep neural networks, we build a novel multimodal classifier that incorporates relaying features, textual features, and network feature concatenated with each other in a late fusion manner. Experimental results on benchmark datasets demonstrate that our socially aware architecture outperforms existing models on fake news classification.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129773955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Local Geometry Capture in 3D Point Cloud Classification","authors":"Shivanand Venkanna Sheshappanavar, C. Kambhamettu","doi":"10.1109/MIPR51284.2021.00031","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00031","url":null,"abstract":"With the advent of PointNet, the popularity of deep neural networks has increased in point cloud analysis. PointNet’s successor, PointNet++, partitions the input point cloud and recursively applies PointNet to capture local geometry. PointNet++ model uses ball querying for local geometry capture in its set abstraction layers. Several models based on single scale grouping of PointNet++ continue to use ball querying with a fixed-radius ball. Due to its uniform scale in all directions, a ball lacks orientation and is ineffective in capturing complex local neighborhoods. Few recent models replace a fixed-sized ball with a fixed-sized ellipsoid or a fixed-sized cuboid to capture local neighborhoods. However, these methods are not still fully effective in capturing varying geometry proportions from different local neighborhoods on the object surface. We propose a novel technique of dynamically oriented and scaled ellipsoid based on unique local information to capture the local geometry better. We also propose ReducedPointNet++, a single set abstraction based single scale grouping model. Our model, along with dynamically oriented and scaled ellipsoid querying, achieves 92.1% classification accuracy on the ModelNet40 dataset. We achieve state-of-the-art 3D classification results on all six variants of the real-world ScanObjectNN dataset with an accuracy of 82.0% on the most challenging variant.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"383 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134147919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ling Fan, Yifang Bao, Shuyu Gong, Sida Yan, Harry J. Wang
{"title":"The Brain-Machine-Ratio Model for Designer and AI Collaboration","authors":"Ling Fan, Yifang Bao, Shuyu Gong, Sida Yan, Harry J. Wang","doi":"10.1109/MIPR51284.2021.00058","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00058","url":null,"abstract":"Recently, artificial intelligence is profoundly changing design practice. The relationship between designers and applied artificial intelligence urgently needs a framework and theory to describe and measure. Thus, this article establishes the Brain-Machine-Ratio (BMR) model, which examines the collaborative relationship between the designers and artificial intelligence with the ratio of human and machine labor in the process of design work. The core approach is modeling the proportion of human and AI in seven design tasks on the time dimension. Based on both qualitative and quantitative evaluation, we proposed the concept and statistics of the Brain-Machine-Ratio model and deduced the further collaborative relationship between designers and artificial intelligence.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129736961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Temmermans, Deepayan Bhowmik, Fernando Pereira, T. Ebrahimi
{"title":"An Introduction to the JPEG Fake Media Initiative","authors":"F. Temmermans, Deepayan Bhowmik, Fernando Pereira, T. Ebrahimi","doi":"10.1109/MIPR51284.2021.00075","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00075","url":null,"abstract":"Recent advances in media creation and modification allow to produce near realistic media assets that are almost indistinguishable from original assets to the human eye. These developments open opportunities for creative production of new media in the entertainment and art industry. However, the intentional or unintentional spread of manipulated media, i.e., modified media with the intention to induce misinterpretation, also imposes risks such as social unrest, spread of rumours for political gain or encouraging hate crimes. The clear and transparent annotation of media modifications is considered to be a crucial element in many usage scenarios bringing trust to the users. This has already triggered various organizations to develop mechanisms that can detect and annotate modified media assets when they are shared. However, these annotations should be attached to the media in a secure way to prevent them of being compromised. In addition, to achieve a wide adoption of such an annotation ecosystem, interoperability is essential and this clearly calls for a standard. This paper presents an initiative by the JPEG Committee called JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall support usage scenarios that are in good faith as well as those with malicious intent. This paper gives an overview of the current state of this initiative and introduces already identified use cases and requirements.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Topic-Enhanced Memory Networks: Time-series Behavior Prediction based on Changing Intrinsic Consciousnesses","authors":"Ryoko Nakamura, Hirofumi Sano, Aozora Inagaki, Ryoichi Osawa, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00035","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00035","url":null,"abstract":"In the field of behavior prediction, methods have been developed to predict the state of the user by using the previous state or time-series of recorded behavior histories. However, so far, there has been no effort to capture time series reflecting the intrinsic consciousnesses and changes thereof of users. Here, we propose a model that captures changes in intrinsic consciousnesses of the user, called Dynamic Topic-Enhanced Memory Networks (DTEMN), for location-based advertising. In comparative experiments, we used DTEMN to predict places where users will visit in the future. The results show capturing changes in intrinsic consciousnesses using DTEMN is effective in improving prediction performance. In addition, we show an improvement in interpretability when simultaneously learning topics expressed as multiple intrinsic consciousnesses.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"90 30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129849069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}