2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献_第4页

Multi-Style Transfer Generative Adversarial Network for Text Images 文本图像的多风格迁移生成对抗网络

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00017

Honghui Yuan, Keiji Yanai

{"title":"Multi-Style Transfer Generative Adversarial Network for Text Images","authors":"Honghui Yuan, Keiji Yanai","doi":"10.1109/MIPR51284.2021.00017","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00017","url":null,"abstract":"In recent years, neural style transfer have shown impressive results in deep learning. In particular, for text style transfer, recent researches have successfully completed the transition from the text font domain to the text style domain. However, for text style transfer, multiple style transfer often requires learning many models, and generating multiple styles images of texts in a single model remains an unsolved problem. In this paper, we propose a multiple style transformation network for text style transfer, which can generate multiple styles of text images in a single model and control the style of texts in a simple way. The main idea is to add conditions to the transfer network so that all the styles can be trained effectively in the network, and to control the generation of each text style through the conditions. We also optimize the network so that the conditional information can be transmitted effectively in the network. The advantage of the proposed network is that multiple styles of text can be generated with only one model and that it is possible to control the generation of text styles. We have tested the proposed network on a large number of texts, and have demonstrated that it works well when generating multiple styles of text at the same time.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116114927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Transformer based Neural Network for Fine-Grained Classification of Vehicle Color 基于变压器的车辆颜色细粒度分类神经网络

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00025

Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma

{"title":"Transformer based Neural Network for Fine-Grained Classification of Vehicle Color","authors":"Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma","doi":"10.1109/MIPR51284.2021.00025","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00025","url":null,"abstract":"The development of vehicle color recognition technology is of great significance for vehicle identification and the development of the intelligent transportation system. However, the small variety of colors and the influence of the illumination in the environment make fine-grained vehicle color recognition a challenge task. Insufficient training data and small color categories in previous datasets causes the low recognition accuracy and the inflexibility of practical using. Meanwhile, the inefficient feature learning also leads to poor recognition performance of the previous methods. Therefore, we collect a rear shooting dataset from vehicle bayonet monitoring for fine-grained vehicle color recognition. Its images can be divided into 11 main-categories and 75 color subcategories according to the proposed labeling algorithm which can eliminate the influence of illumination and assign the color annotation for each image. We propose a novel recognition model which can effectively identify the vehicle colors. We skillfully interpolate the Transformer into recognition model to enhance the feature learning capacity of conventional neural networks, and specially design a hierarchical loss function through in-depth analysis of the proposed dataset. We evaluate the designed recognition model on the dataset and it can achieve accuracy of 97.77%, which is superior to the traditional approaches.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121564860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrated Cloud-based System for Endangered Language Documentation and Application 濒危语言文献与应用集成云系统

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00044

Min Chen, Jignasha Borad, Mizuki Miyashita, James Randall

引用次数: 0

Predicting Human Behavior with Transformer Considering the Mutual Relationship between Categories and Regions 考虑类别与区域相互关系的变压器预测人类行为

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00029

Ryoichi Osawa, Keiichi Suekane, Ryoko Nakamura, Aozora Inagaki, T. Takagi, Isshu Munemasa

{"title":"Predicting Human Behavior with Transformer Considering the Mutual Relationship between Categories and Regions","authors":"Ryoichi Osawa, Keiichi Suekane, Ryoko Nakamura, Aozora Inagaki, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00029","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00029","url":null,"abstract":"Recently, studies on human behavior have been frequently conducted. Predicting human mobility is one area of interest. However, it is difficult since human activities are the result of various factors such as periodicity, changes of preferences, and geographical effects. When predicting human mobility, it is essential to capture these factors.Humans may go to particular areas to visit a store of a desired category. Also, since stores of a particular category tend to open in specific areas, trajectories of visited geographical regions are helpful in understanding the purpose of visits. Therefore, the purposes of visiting stores of a desired category and of visiting a region affect each other. Capturing this mutual dependency enables to predict with higher accuracy than modeling only the superficial trajectory sequence. To capture it, a mechanism that can dynamically adjust the important categories depending on region was necessary, but the conventional methods, which can only perform static operations, have structural limitations.In the proposed model, we used the Transformer to address this problem. However, since a default Transformer can only capture unidirectional relationships, the proposed model uses mutually connected Transformers to capture the mutual relationships between categories and regions.Furthermore, most human activities have a weekly periodicity, and it is highly possible that only a part of a trajectory is important to predict human mobility. Therefore, we propose an encoder that captures the periodicity of human mobility and an attention mechanism to extract the important part of the trajectory.In our experiments, we predict whether a user will visit stores in specific categories and regions taking the trajectory sequence as input. By comparing our model with existing models, we show that the model outperforms state-of-the-art (SOTA) models in similar tasks in this experimental setup.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Kyoto Sightseeing Map 2.0 for User-Experience Oriented Tourism 京都观光地图2.0用户体验导向旅游

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00045

Jing Xu, Junjie Sun, Taishan Li, Qiang Ma

引用次数: 1

Socially Aware Multimodal Deep Neural Networks for Fake News Classification 虚假新闻分类的社会感知多模态深度神经网络

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00048

Saed Rezayi, Saber Soleymani, H. Arabnia, Sheng Li

引用次数: 3

Dynamic Local Geometry Capture in 3D Point Cloud Classification 三维点云分类中的动态局部几何捕获

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00031

Shivanand Venkanna Sheshappanavar, C. Kambhamettu

{"title":"Dynamic Local Geometry Capture in 3D Point Cloud Classification","authors":"Shivanand Venkanna Sheshappanavar, C. Kambhamettu","doi":"10.1109/MIPR51284.2021.00031","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00031","url":null,"abstract":"With the advent of PointNet, the popularity of deep neural networks has increased in point cloud analysis. PointNet’s successor, PointNet++, partitions the input point cloud and recursively applies PointNet to capture local geometry. PointNet++ model uses ball querying for local geometry capture in its set abstraction layers. Several models based on single scale grouping of PointNet++ continue to use ball querying with a fixed-radius ball. Due to its uniform scale in all directions, a ball lacks orientation and is ineffective in capturing complex local neighborhoods. Few recent models replace a fixed-sized ball with a fixed-sized ellipsoid or a fixed-sized cuboid to capture local neighborhoods. However, these methods are not still fully effective in capturing varying geometry proportions from different local neighborhoods on the object surface. We propose a novel technique of dynamically oriented and scaled ellipsoid based on unique local information to capture the local geometry better. We also propose ReducedPointNet++, a single set abstraction based single scale grouping model. Our model, along with dynamically oriented and scaled ellipsoid querying, achieves 92.1% classification accuracy on the ModelNet40 dataset. We achieve state-of-the-art 3D classification results on all six variants of the real-world ScanObjectNN dataset with an accuracy of 82.0% on the most challenging variant.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"383 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134147919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

The Brain-Machine-Ratio Model for Designer and AI Collaboration 设计师与人工智能协作的脑机比例模型

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00058

Ling Fan, Yifang Bao, Shuyu Gong, Sida Yan, Harry J. Wang

引用次数: 0

An Introduction to the JPEG Fake Media Initiative 介绍JPEG假媒体倡议

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00075

F. Temmermans, Deepayan Bhowmik, Fernando Pereira, T. Ebrahimi

{"title":"An Introduction to the JPEG Fake Media Initiative","authors":"F. Temmermans, Deepayan Bhowmik, Fernando Pereira, T. Ebrahimi","doi":"10.1109/MIPR51284.2021.00075","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00075","url":null,"abstract":"Recent advances in media creation and modification allow to produce near realistic media assets that are almost indistinguishable from original assets to the human eye. These developments open opportunities for creative production of new media in the entertainment and art industry. However, the intentional or unintentional spread of manipulated media, i.e., modified media with the intention to induce misinterpretation, also imposes risks such as social unrest, spread of rumours for political gain or encouraging hate crimes. The clear and transparent annotation of media modifications is considered to be a crucial element in many usage scenarios bringing trust to the users. This has already triggered various organizations to develop mechanisms that can detect and annotate modified media assets when they are shared. However, these annotations should be attached to the media in a secure way to prevent them of being compromised. In addition, to achieve a wide adoption of such an annotation ecosystem, interoperability is essential and this clearly calls for a standard. This paper presents an initiative by the JPEG Committee called JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall support usage scenarios that are in good faith as well as those with malicious intent. This paper gives an overview of the current state of this initiative and introduces already identified use cases and requirements.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Passenger Flow Estimation with Bipartite Matching on Bus Surveillance Cameras 基于二部匹配的公交监控摄像机客流估计

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00038

Shunta Komatsu, Ryosuke Furuta, Y. Taniguchi

{"title":"Passenger Flow Estimation with Bipartite Matching on Bus Surveillance Cameras","authors":"Shunta Komatsu, Ryosuke Furuta, Y. Taniguchi","doi":"10.1109/MIPR51284.2021.00038","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00038","url":null,"abstract":"To formulate the schedules and routes of buses, bus companies monitor and gather data on the number of passengers and the boarding sections for each passenger several days a year. The problem is, however, that this monitoring is currently performed manually and requires a great deal of human cost. To solve this problem, recent proposals analyze the images taken by the surveillance cameras installed in most modern Japanese buses. The previous methods make it possible to identify the boarding sections regardless of the payment method like IC cards by matching people in the images obtained from different surveillance cameras. In this paper, we propose an improved method for estimating boarding sections; it uses minimum weight perfect matching on a bipartite graph; the assumption is that there exists one-to-one correspondence between people appearing in two surveillance camera images. In addition, the proposed method takes the boarding direction estimates output by person detection and tracking into account. To further improve the estimation accuracy, we employ a time constraint to handle the restricted movement of passengers on a bus. To confirm the effectiveness of the proposed method, we conduct experiments on the images taken by actual bus surveillance cameras. The results show that the proposed method achieves significantly better results than the previous method.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117098935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1