{"title":"Supplementing Omitted Named Entities in Cooking Procedural Text with Attached Images","authors":"Yixin Zhang, Yoko Yamakata, Keishi Tajima","doi":"10.1109/MIPR51284.2021.00037","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00037","url":null,"abstract":"In this research, we aim at supplementing named entities, such as food, omitted in the procedural text of recipe data. It helps users understand the recipe and is also necessary for the machine to understand the recipe data automatically. The contribution of this research is as follows. (1) We construct a dataset of Chinese recipes consisting of 12,548 recipes. To detect sentences in which food entities are omitted, we label named entities such as food, tool, and cooking actions in the procedural text by using the automatic recipe named entity recognition method. (2) We propose a method of recognizing food from the attached images. A procedural text of recipe data is often associated with an image, and the attached image often contains the food even when it is omitted in the procedural text. Tool entities in images in recipe data can be identified with high accuracy by conventional general object recognition techniques. On the other hand, the general object recognition methods in the literature, which assume that the properties of an object are constant, perform not well for food in recipe image data because food states change during cooking procedures. To solve this problem, we propose a method of obtaining food entity candidates from other steps that are similar to the target step, both in sentence similarity and image feature similarity. Among all the 246,195 procedural steps in our dataset, there are 16,593 steps in which the food entity is omitted in the procedural text. Our method is applied to supplement the food entities in these steps and achieves the accuracy of 67.55%.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130162051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Scale Context Interaction Learning network for Medical Image Segmentation","authors":"Wenhao Fang, X. Han, Xu Qiao, Huiyan Jiang, Yenwei Chen","doi":"10.1109/MIPR51284.2021.00036","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00036","url":null,"abstract":"Semantic segmentation methods based on deep learning have provided the state-of-the-art performance in recent years. Based on deep learning, many Convolutional Neural Network (CNN) models have been proposed. Among them, U-Net with the simple encoder and decoder structure, can learn multi-scale features with various context information and has become one of the most popular neural network architectures for medical image segmentation. To reuse the features with the detail image structure in the encoder path, U-Net utilizes a skip-connection structure to simply copy the low-level features in the encoder to the decoder, and cannot explore the correlations between two paths and different scales. This study proposes a multi-scale context interaction learning network (MCIU-net) for medical image segmentation. First, to effectively fuse the features with detail structure in the encoder path and more semantic information in the decoder path, we conduct interaction learning on the corresponding scale via the bi-directional ConvLSTM (BConvLSTM) unit. Second, the interaction learning among all blocks of the decoder path is also employed for dynamically merging multi-scale contexts. We validate our proposed interaction learning network on three medical image datasets: retinal blood vessel segmentation, skin lesion segmentation, and lung segmentation, and demonstrate promising results compared with the state-of-the-art methods.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126720384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Yamazaki, Hui Lam Ong, Jianquan Liu, Wei Jian Peh, Hong Yen Ong, Qinyu Huang, Xinlai Jiang
{"title":"Practice-Oriented Real-time Person Occurrence Search System","authors":"S. Yamazaki, Hui Lam Ong, Jianquan Liu, Wei Jian Peh, Hong Yen Ong, Qinyu Huang, Xinlai Jiang","doi":"10.1109/MIPR51284.2021.00040","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00040","url":null,"abstract":"Face recognition is a potential technology to realize Person Occurrence Search (POS) application which retrieves all occurrences of a target person over multiple cameras. From the industry perspective, such a POS application requires a practice-oriented system that can respond to search requests in seconds, return search results nearly without false positives, and handle the variations of face angles and illumination in camera views. In this paper, we demonstrate a real-time person occurrence search system that adopts person re-identification for person occurrence tracking to achieve extremely low false positives. Our proposed system performs face detection and face clustering in an online manner to drastically reduce the response time of search requests from users. To retrieve person occurrence count and duration quickly, we design a process so-called Logical Occurrences that utilizes the maximum interval of detected time of faces to efficiently compute the occurrence count. Such a process can reduce the online computational complexity from O(M2) to O(M) by pre-computing elapsed time during the online face clustering. The proposed system is evaluated on a real data set which contains about 1 million of detected faces for search. In the experiments, our system responds to search requests within 2 seconds on average, and achieves 99.9% precision of search results over more than 200 actual search requests.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116847436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Violent Scene Detection of Film Videos Based on Multi-Task Learning of Temporal-Spatial Features","authors":"Z. Zheng, Wei Zhong, Long Ye, Li Fang, Qin Zhang","doi":"10.1109/MIPR51284.2021.00067","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00067","url":null,"abstract":"In this paper, we propose a new framework for the violent scene detection of film videos based on multi-task learning of temporal-spatial features. In the proposed framework, for the violent behavior representation of film clips, we employ a temporal excitation and aggregation network to extract the temporal-spatial deep features in the visual modality. And on the other hand, a recurrent neural network with local attention is utilized to extract the utterance-level representation of affective analysis in the audio modality. In the process of feature mapping, we concern the task of violent scene detection together with that of affective analysis and then propose a multi-task learning strategy to effectively predict the violent scene of film clips. To evaluate the effectiveness of the proposed method, the experiments are done on the task of violent scenes detection 2015. The experimental results show that our model outperforms most of the state of the art methods, validating the innovation of considering the task of violent scene detection jointly with the violence emotion analysis.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116775365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Florin, Andreas Specker, Arne Schumann, J. Beyerer
{"title":"Hardness Prediction for More Reliable Attribute-based Person Re-identification","authors":"Lucas Florin, Andreas Specker, Arne Schumann, J. Beyerer","doi":"10.1109/MIPR51284.2021.00077","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00077","url":null,"abstract":"Recognition of person attributes in surveillance camera imagery is often used as an auxiliary cue in person re-identification approaches. Additionally, increasingly more attention is being payed to the cross modal task of person re-identification based purely on attribute queries. In both of these settings, the reliability of attribute predictions is crucial for success. However, the task attribute recognition is affected by several non-trivial challenges. These include common aspects, such as degraded image quality through low resolution, motion blur, lighting conditions and similar factors. Another important factor in the context of attribute recognition is, however, the lack of visibility due to occlusion through scene objects, other persons or self-occlusion or simply due to mis-cropped person detections. All these factors make attribute prediction challenging and the resulting detections everything but reliable. In order to improve their applicability to person re-identification, we propose to apply hardness prediction models and provide an additional hardness score with each attribute that measures the likelihood of the actual prediction to be reliable. We investigate several key aspects of hardness prediction in the context of attribute recognition and compare our resulting hardness predictor to several alternatives. Finally, we include the hardness prediction into an attribute-based re-identification task and show improvements in the resulting accuracy. Our code is available at https://github.com/Lucas-Florin/hardness-predictor-for-par.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125256411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-domain Person Re-Identification with Identity-preserving Style Transfer","authors":"Shixing Chen, Caojin Zhang, Mingtao Dong, Chengcui Zhang","doi":"10.1109/MIPR51284.2021.00008","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00008","url":null,"abstract":"Although great successes have been achieved recently in person re-identification (re-ID), there are still two major obstacles restricting its real-world performance: large variety of camera styles and a limited number of samples for each identity. In this paper, we propose an efficient and scalable framework for cross-domain re-ID tasks. Single-model style transfer and pairwise comparison are seamlessly integrated in our framework through adversarial training. Moreover, we propose a novel identity-preserving loss to replace the content loss in style transfer and mathematically show that its minimization guarantees that the generated images have identical conditional distributions (conditioned on identity) as the real ones, which is critical for cross-domain person re-ID. Our model achieved state-of-the-art results in challenging cross-domain re-ID tasks.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126068778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feng Gao, Chengjia Lei, Xingguo Long, Jin Wang, Peiheng Song
{"title":"Design and Development of an Intelligent Pet-Type Quadruped Robot","authors":"Feng Gao, Chengjia Lei, Xingguo Long, Jin Wang, Peiheng Song","doi":"10.1109/MIPR51284.2021.00068","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00068","url":null,"abstract":"Inspired by the assistance that artificial intelligence offers to artistic creation, we apply AI technology to create the Open Monster C class number 01 (OM-C01), a quadruped robot dog as lifelike as an artwork. OM-C01 adopts a 2-DoF five-bar parallel mechanism to realize the thigh and shank bionic structure. We combine the visual learning system based on few-shot learning and incremental learning with GPT-2 pre-training language model to endow OM-C01 the same learning ability as a pet. OM-C01 can make decisions based on the facial expression as well as its emotional state, and shape a unique personality by updating the Q-table. Meanwhile, we implement a digital twin simulation environment for OM-C01 based on .NET WPF, which is convenient for designing various actions.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"334 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123183775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of Walkability on Rental Prices in Tokyo","authors":"A. Bramson, Megumi Hori","doi":"10.1109/MIPR51284.2021.00054","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00054","url":null,"abstract":"In order to measure the role of walkability in determining the perceived quality of an area, and also to determine which kinds of amenities contribute the most to enhancing walkability, we perform a hedonistic regression of rental prices on 23 categories of establishments within various walking ranges from each station in central Tokyo. Using an integrated walking network, we collect the reachable nodes within various isochrones (<5min, <10min, <15min, 5-10min, 10-15min) from each station, and then by buffering the traversed edges we identify reachable stores for each one. We also collect selected similar rental properties within 15 minutes of each station to estimate variations in value for each area. Our regression model aims to uncover how much of the price variations can be explained by walkability, and also which kinds of establishment contribute the most to walkability’s benefit. We find that the number of convenience stores is a reliable indicator of neighborhood quality, but relationships of other establishments to walkability depend on distance from the station and often have counter-intuitive effects.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130051359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smart Portable Musical Simulation System Based on Unified Temperament","authors":"Lin Gan, Li Lv, Cuicui Wang, Mu Zhang","doi":"10.1109/MIPR51284.2021.00069","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00069","url":null,"abstract":"This study builds a digital system of a portable musical instrument based on Unified Temperament. The system utilizes Equal-temperament, which integrates different modes of playing on the Musical Pad. By using the visualized and digitalized system, people without musical training will be able to give a musical performance. The Musical Pad simulates different musical instruments, including keyboard, woodwind, string, and other orchestral instruments. Therefore, music lovers can cooperate to play a variety of parts in polyphonic music. The system is suitable for general music education for non-artistic students in primary and middle schools. In the new form for music teaching and appreciation, students can participate more actively.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134455874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A probabilistic and random method for the generation of Bai nationality music fragments","authors":"Pengcheng Shang, Shan Ni, Li Zhou","doi":"10.1109/MIPR51284.2021.00057","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00057","url":null,"abstract":"Based on the theory of Chinese folk music, this paper analyzes the characteristics of Chinese Bai nationality music works, applies probabilistic and random methods to generate music fragments -with Bai nationality style, and conducts expert interviews on the generated melodies. The interview results show that, to some extent, the generation method of Bai nationality music fragments based on probability and randomness is effective for the melody creations with Bai nationality style, which is consistent with the characteristics of Bai nationality music. This method can also play a reference role in the intelligent protection and inheritance of Chinese folk music.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130827563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}