2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

筛选
英文 中文
Multi-Style Transfer Generative Adversarial Network for Text Images 文本图像的多风格迁移生成对抗网络
Honghui Yuan, Keiji Yanai
{"title":"Multi-Style Transfer Generative Adversarial Network for Text Images","authors":"Honghui Yuan, Keiji Yanai","doi":"10.1109/MIPR51284.2021.00017","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00017","url":null,"abstract":"In recent years, neural style transfer have shown impressive results in deep learning. In particular, for text style transfer, recent researches have successfully completed the transition from the text font domain to the text style domain. However, for text style transfer, multiple style transfer often requires learning many models, and generating multiple styles images of texts in a single model remains an unsolved problem. In this paper, we propose a multiple style transformation network for text style transfer, which can generate multiple styles of text images in a single model and control the style of texts in a simple way. The main idea is to add conditions to the transfer network so that all the styles can be trained effectively in the network, and to control the generation of each text style through the conditions. We also optimize the network so that the conditional information can be transmitted effectively in the network. The advantage of the proposed network is that multiple styles of text can be generated with only one model and that it is possible to control the generation of text styles. We have tested the proposed network on a large number of texts, and have demonstrated that it works well when generating multiple styles of text at the same time.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116114927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Transformer based Neural Network for Fine-Grained Classification of Vehicle Color 基于变压器的车辆颜色细粒度分类神经网络
Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma
{"title":"Transformer based Neural Network for Fine-Grained Classification of Vehicle Color","authors":"Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma","doi":"10.1109/MIPR51284.2021.00025","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00025","url":null,"abstract":"The development of vehicle color recognition technology is of great significance for vehicle identification and the development of the intelligent transportation system. However, the small variety of colors and the influence of the illumination in the environment make fine-grained vehicle color recognition a challenge task. Insufficient training data and small color categories in previous datasets causes the low recognition accuracy and the inflexibility of practical using. Meanwhile, the inefficient feature learning also leads to poor recognition performance of the previous methods. Therefore, we collect a rear shooting dataset from vehicle bayonet monitoring for fine-grained vehicle color recognition. Its images can be divided into 11 main-categories and 75 color subcategories according to the proposed labeling algorithm which can eliminate the influence of illumination and assign the color annotation for each image. We propose a novel recognition model which can effectively identify the vehicle colors. We skillfully interpolate the Transformer into recognition model to enhance the feature learning capacity of conventional neural networks, and specially design a hierarchical loss function through in-depth analysis of the proposed dataset. We evaluate the designed recognition model on the dataset and it can achieve accuracy of 97.77%, which is superior to the traditional approaches.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121564860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated Cloud-based System for Endangered Language Documentation and Application 濒危语言文献与应用集成云系统
Min Chen, Jignasha Borad, Mizuki Miyashita, James Randall
{"title":"Integrated Cloud-based System for Endangered Language Documentation and Application","authors":"Min Chen, Jignasha Borad, Mizuki Miyashita, James Randall","doi":"10.1109/MIPR51284.2021.00044","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00044","url":null,"abstract":"Nearly half of the world languages are considered endangered and need to be documented, analyzed, and revitalized. However, existing linguistics tools lack the accessibility to effectively analyze languages such as Blackfoot in which relative pitch movement is significant, e.g., words with the same sound sequence but convey different meanings when changing in pitches. To address this issue, we present a novel form of audio analysis with perceptual scale, and develop a consolidated and interactive toolset called MeTILDA (Melodic Transcription in Language Documentation and Analysis) to effectively capture perceived changes in pitch movement and to host other existing desktop-based linguistic tools on the cloud to enable collaboration, data-sharing, and data reuse among multiple linguistic tools.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129088140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Human Behavior with Transformer Considering the Mutual Relationship between Categories and Regions 考虑类别与区域相互关系的变压器预测人类行为
Ryoichi Osawa, Keiichi Suekane, Ryoko Nakamura, Aozora Inagaki, T. Takagi, Isshu Munemasa
{"title":"Predicting Human Behavior with Transformer Considering the Mutual Relationship between Categories and Regions","authors":"Ryoichi Osawa, Keiichi Suekane, Ryoko Nakamura, Aozora Inagaki, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00029","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00029","url":null,"abstract":"Recently, studies on human behavior have been frequently conducted. Predicting human mobility is one area of interest. However, it is difficult since human activities are the result of various factors such as periodicity, changes of preferences, and geographical effects. When predicting human mobility, it is essential to capture these factors.Humans may go to particular areas to visit a store of a desired category. Also, since stores of a particular category tend to open in specific areas, trajectories of visited geographical regions are helpful in understanding the purpose of visits. Therefore, the purposes of visiting stores of a desired category and of visiting a region affect each other. Capturing this mutual dependency enables to predict with higher accuracy than modeling only the superficial trajectory sequence. To capture it, a mechanism that can dynamically adjust the important categories depending on region was necessary, but the conventional methods, which can only perform static operations, have structural limitations.In the proposed model, we used the Transformer to address this problem. However, since a default Transformer can only capture unidirectional relationships, the proposed model uses mutually connected Transformers to capture the mutual relationships between categories and regions.Furthermore, most human activities have a weekly periodicity, and it is highly possible that only a part of a trajectory is important to predict human mobility. Therefore, we propose an encoder that captures the periodicity of human mobility and an attention mechanism to extract the important part of the trajectory.In our experiments, we predict whether a user will visit stores in specific categories and regions taking the trajectory sequence as input. By comparing our model with existing models, we show that the model outperforms state-of-the-art (SOTA) models in similar tasks in this experimental setup.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kyoto Sightseeing Map 2.0 for User-Experience Oriented Tourism 京都观光地图2.0用户体验导向旅游
Jing Xu, Junjie Sun, Taishan Li, Qiang Ma
{"title":"Kyoto Sightseeing Map 2.0 for User-Experience Oriented Tourism","authors":"Jing Xu, Junjie Sun, Taishan Li, Qiang Ma","doi":"10.1109/MIPR51284.2021.00045","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00045","url":null,"abstract":"We present Kyoto sightseeing map 2.0, a web-based application, for user-experience oriented tourism through discovering and exploring sightseeing resources from User Generated Content (UGC). It focuses on adapting the massive content analysis of information from UGC, to give an additional source of information from user experience to travelers in their search information process. It decreases and bridges the information gap of sightseeing resources, especially Point of Interest (POIs), caused by the map provided by government or tourism firms from the perspective of publicity and marketing. On the one hand, Kyoto sightseeing map 2.0 offers the aesthetics quality results of photos taken in tourist spots over time in Kyoto based on the UGC by aesthetics quality assessment (AQA) with Multi-level Spatially-Pooled (MLSP) to tourists. On the other hand, the user can also use two sets of POI photos generated by the user data displayed on the map as a reference. Our application, for user-experience oriented tourism, help them make well-informed decisions of their trip based on UGC.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129277787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Socially Aware Multimodal Deep Neural Networks for Fake News Classification 虚假新闻分类的社会感知多模态深度神经网络
Saed Rezayi, Saber Soleymani, H. Arabnia, Sheng Li
{"title":"Socially Aware Multimodal Deep Neural Networks for Fake News Classification","authors":"Saed Rezayi, Saber Soleymani, H. Arabnia, Sheng Li","doi":"10.1109/MIPR51284.2021.00048","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00048","url":null,"abstract":"The importance of fake news detection and classification on Online Social Networks (OSN) has recently increased and drawn attention. Training machine learning models for this task requires different types of attributes or modalities for the target OSN. Existing methods mainly rely on social media text, which carries rich semantic information and can roughly explain the discrepancy between normal and multiple fake news types. However, the structural characteristics of OSNs are overlooked. This paper aims to exploit such structural characteristics and further boost the fake news classification performance on OSN. Using deep neural networks, we build a novel multimodal classifier that incorporates relaying features, textual features, and network feature concatenated with each other in a late fusion manner. Experimental results on benchmark datasets demonstrate that our socially aware architecture outperforms existing models on fake news classification.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129773955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamic Local Geometry Capture in 3D Point Cloud Classification 三维点云分类中的动态局部几何捕获
Shivanand Venkanna Sheshappanavar, C. Kambhamettu
{"title":"Dynamic Local Geometry Capture in 3D Point Cloud Classification","authors":"Shivanand Venkanna Sheshappanavar, C. Kambhamettu","doi":"10.1109/MIPR51284.2021.00031","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00031","url":null,"abstract":"With the advent of PointNet, the popularity of deep neural networks has increased in point cloud analysis. PointNet’s successor, PointNet++, partitions the input point cloud and recursively applies PointNet to capture local geometry. PointNet++ model uses ball querying for local geometry capture in its set abstraction layers. Several models based on single scale grouping of PointNet++ continue to use ball querying with a fixed-radius ball. Due to its uniform scale in all directions, a ball lacks orientation and is ineffective in capturing complex local neighborhoods. Few recent models replace a fixed-sized ball with a fixed-sized ellipsoid or a fixed-sized cuboid to capture local neighborhoods. However, these methods are not still fully effective in capturing varying geometry proportions from different local neighborhoods on the object surface. We propose a novel technique of dynamically oriented and scaled ellipsoid based on unique local information to capture the local geometry better. We also propose ReducedPointNet++, a single set abstraction based single scale grouping model. Our model, along with dynamically oriented and scaled ellipsoid querying, achieves 92.1% classification accuracy on the ModelNet40 dataset. We achieve state-of-the-art 3D classification results on all six variants of the real-world ScanObjectNN dataset with an accuracy of 82.0% on the most challenging variant.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"383 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134147919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The Brain-Machine-Ratio Model for Designer and AI Collaboration 设计师与人工智能协作的脑机比例模型
Ling Fan, Yifang Bao, Shuyu Gong, Sida Yan, Harry J. Wang
{"title":"The Brain-Machine-Ratio Model for Designer and AI Collaboration","authors":"Ling Fan, Yifang Bao, Shuyu Gong, Sida Yan, Harry J. Wang","doi":"10.1109/MIPR51284.2021.00058","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00058","url":null,"abstract":"Recently, artificial intelligence is profoundly changing design practice. The relationship between designers and applied artificial intelligence urgently needs a framework and theory to describe and measure. Thus, this article establishes the Brain-Machine-Ratio (BMR) model, which examines the collaborative relationship between the designers and artificial intelligence with the ratio of human and machine labor in the process of design work. The core approach is modeling the proportion of human and AI in seven design tasks on the time dimension. Based on both qualitative and quantitative evaluation, we proposed the concept and statistics of the Brain-Machine-Ratio model and deduced the further collaborative relationship between designers and artificial intelligence.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129736961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Introduction to the JPEG Fake Media Initiative 介绍JPEG假媒体倡议
F. Temmermans, Deepayan Bhowmik, Fernando Pereira, T. Ebrahimi
{"title":"An Introduction to the JPEG Fake Media Initiative","authors":"F. Temmermans, Deepayan Bhowmik, Fernando Pereira, T. Ebrahimi","doi":"10.1109/MIPR51284.2021.00075","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00075","url":null,"abstract":"Recent advances in media creation and modification allow to produce near realistic media assets that are almost indistinguishable from original assets to the human eye. These developments open opportunities for creative production of new media in the entertainment and art industry. However, the intentional or unintentional spread of manipulated media, i.e., modified media with the intention to induce misinterpretation, also imposes risks such as social unrest, spread of rumours for political gain or encouraging hate crimes. The clear and transparent annotation of media modifications is considered to be a crucial element in many usage scenarios bringing trust to the users. This has already triggered various organizations to develop mechanisms that can detect and annotate modified media assets when they are shared. However, these annotations should be attached to the media in a secure way to prevent them of being compromised. In addition, to achieve a wide adoption of such an annotation ecosystem, interoperability is essential and this clearly calls for a standard. This paper presents an initiative by the JPEG Committee called JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall support usage scenarios that are in good faith as well as those with malicious intent. This paper gives an overview of the current state of this initiative and introduces already identified use cases and requirements.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Topic-Enhanced Memory Networks: Time-series Behavior Prediction based on Changing Intrinsic Consciousnesses 动态主题增强记忆网络:基于改变内在意识的时间序列行为预测
Ryoko Nakamura, Hirofumi Sano, Aozora Inagaki, Ryoichi Osawa, T. Takagi, Isshu Munemasa
{"title":"Dynamic Topic-Enhanced Memory Networks: Time-series Behavior Prediction based on Changing Intrinsic Consciousnesses","authors":"Ryoko Nakamura, Hirofumi Sano, Aozora Inagaki, Ryoichi Osawa, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00035","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00035","url":null,"abstract":"In the field of behavior prediction, methods have been developed to predict the state of the user by using the previous state or time-series of recorded behavior histories. However, so far, there has been no effort to capture time series reflecting the intrinsic consciousnesses and changes thereof of users. Here, we propose a model that captures changes in intrinsic consciousnesses of the user, called Dynamic Topic-Enhanced Memory Networks (DTEMN), for location-based advertising. In comparative experiments, we used DTEMN to predict places where users will visit in the future. The results show capturing changes in intrinsic consciousnesses using DTEMN is effective in improving prediction performance. In addition, we show an improvement in interpretability when simultaneously learning topics expressed as multiple intrinsic consciousnesses.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"90 30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129849069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信