{"title":"Multi-Pose Learning based Head-Shoulder Re-identification","authors":"Jia Li, Yunpeng Zhai, Yaowei Wang, Yemin Shi, Yonghong Tian","doi":"10.1109/MIPR.2018.00057","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00057","url":null,"abstract":"The whole body of person is probably invisible in video surveillance because of occlusion and view angles (such as in crowded public places), on which occasion conventional person re-identification (i.e., whole-body based Re-ID) approaches may not work. To address this problem, we propose a novel deep pairwise model based on multi-pose learning (MPL) which aims at head-shoulder part instead of the whole body. The proposed method explicitly tackles pose variations by learning an ensemble verification conditional probability distribution about relationship among multiple poses. To facilitate the research on this problem, we contribute three head-shoulder datasets based on CUHK03, CUHK01 and VIPeR. Experiments on these datasets demonstrate that our proposed method achieves the state-of-the-art performance.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128187966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Bring Your Own Device Trend in an Oil and Gas Sector","authors":"I. Ekpo, Sheila D. Fournier-Bonilla","doi":"10.1109/MIPR.2018.00052","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00052","url":null,"abstract":"In trying to improve worker efficiencies and organizational productivity, organizations leverage on tools that promote the consumerization of Information Technology. Among these, is the growing trend of implementing Bring Your Own Device (BYOD) which allows employees to use their own devices for work related activities. Strategies and policies are put in place to guide the usage of such implementations and protect the organization from associated risks. User’s may however not find such policies user friendly and may be inclined to resist the BYOD implementation. The project studies and seeks to understand how the implementation of such BYOD policies by an Organization brings about concerning factors otherwise viewed as perceived threats that may bring about a resistance to user’s adoption of the technology. It further goes to determine to what extent such concerns influence employee’s decisions to either embrace a BYOD program or reject the BYOD program.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"13 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123682914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Thermal Images and Physiological Features to Model Human Behavior: A Survey","authors":"Christian Hessler, M. Abouelenien","doi":"10.1109/MIPR.2018.00064","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00064","url":null,"abstract":"Physiological signals provide a reliable method to identify the physical and mental state of a person at any given point in time. Multiple techniques are used to extract physiological signals from the human body. However, these techniques require contact and cooperation of the individual as well as human effort for connecting the devices and collecting the needed measurement. Thermal imaging provides a non-contact approach for acquiring these signals. New applications are exploring ways to utilize physiological features extracted from thermal images to detect subtle changes in human physiology. In this paper, we provide a review of applications, which propose a variety of innovative techniques to model human behavior by analyzing thermal videos.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128788693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Standalone Demo for Quiz Game \"Describe and Guess\"","authors":"Yikun Sheng, Xiaoshan Yang, Changsheng Xu","doi":"10.1109/MIPR.2018.00046","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00046","url":null,"abstract":"The popular quiz game \"Describe and Guess\" can only be played by at least two players which limits its utilization. In this paper, we propose a standalone demo with a multimodal interface which is much more convenient for a single player. To deduce the target object based on user-provided attributes and sketches, we propose attributeimproved neural networks (AINN) which are joint learned for both the single-label object classification and the multilabel attribute prediction. Finally, we build a real-world system which can be effectively played on mobile devices.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121621229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Logo Detection and Recognition with Minimal Labeling","authors":"D. M. Montserrat, Qian Lin, J. Allebach, E. Delp","doi":"10.1109/MIPR.2018.00034","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00034","url":null,"abstract":"In this paper we describe a new approach to detecting and locating brand logos in an image using machine learning methods and synthetic training data. Deep learning methods, particularly the use of Convolutional Neural Networks (CNN), have been very popular for extracting visual information, such as image shapes and objects, from images. A CNN has parameters and configuration information that are learned from training images. To obtain good accuracy usually a large amount of labeled (groundtruthed) images are required for training. Collecting the training images and labeling them can be expensive and time consuming. Methods that include data augmentation, image synthesis, and bootstrapping techniques provide useful alternatives to creating training images. In this paper, we present a logo detection method that requires minimum labeled images. First, we use synthetic images to train a CNN to detect logos. Then, this CNN is used to automatically detect and localize logos from images extracted from the web. Finally, these images are used to train a logo classifier. The combination of the logo detector and the classifier allows us to locate and classify multiple logos in a scene. While existing methods rely on manually labeled images, our method is fully trained with images obtained in an automated manner with minimal human supervision.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126906755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Coherence in Visual Explanations","authors":"Malihe Alikhani, Matthew Stone","doi":"10.1109/MIPR.2018.00063","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00063","url":null,"abstract":"A wide range of communicative artifacts—perhaps the majority—involve the coordinated presentation of visual and linguistic information. We envisage computer systems that support access to information by using rich representations of the interpretation of such multimodal presentations. This paper advocates organizing such representations in terms of coherence relations [2, 19], a fundamental construct from the theory of natural language discourse that is often invoked to explain the integrated interpretation of the diverse communicative actions in face-to-face conversation [9, 25, 35]. Coherence relations come in constrained classes, such as the Explanation, Narration and Parallel relations, each of which establishes specific kinds of structural, logical, and intentional relationships among communicative actions. Representing these relationships can therefore provide a scaffold for organizing, disambiguating and integrating the interpretation of communication across modalities. This paper uses a case study of instructions presented using text and pictures to motivate and describe an analysis of multimodal discourse interpretation in terms of coherence relations and to sketch a roadmap for operationalizing the approach in computer systems.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133429989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPEG CDVS Feature Trajectories for Action Recognition in Videos","authors":"R. Dasari, Chang Wen Chen","doi":"10.1109/MIPR.2018.00069","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00069","url":null,"abstract":"Visual Action Recognition on mobile phones is a challenging problem. Mobile and wearable devices deal with power, memory, computational and hardware constraints, which mandate robust and lightweight algorithmic implementations for sophisticated vision applications, like action recognition. Compact Descriptors for Visual Search (CDVS) is an MPEG7 standard for an accelerated visual search on mobiles. In our work, we propose a mobile action recognition framework which classifies actions by tracking CDVS feature trajectories of human subjects. The proposed method capitalizes on the sparse, salient and memory efficient properties of CDVS features. Although our recognition accuracies on standard action datasets KTH, UCF50, and HMDB is not superior to the CNN based methods, our work explores and proves the feasibility of using CDVS features for action recognition.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129864696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ligaj Pradhan, Chengcui Zhang, Steven Bethard, Xin Chen
{"title":"Embedding User Behavioral Aspect in TF-IDF Like Representation","authors":"Ligaj Pradhan, Chengcui Zhang, Steven Bethard, Xin Chen","doi":"10.1109/MIPR.2018.00061","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00061","url":null,"abstract":"Term Frequency – Inverse Document Frequency (TF-IDF) computes weight for each word in a document which increases proportionally to the number of times the word appears in a specific document but is counterbalanced by the number of times it occurs in the collection of documents. TF-IDF is the state-of-the-art for computing relevancy scores between documents. However, it is based on statistical learning alone and doesn’t directly capture the conceptual contents of the text or the behavioral aspects of the writer. Hence, in this work we show how relatively low dimensional user behavioral vectors extracted from the same text, from which TF-IDF vectors are extracted, can be used to enrich the performance of TF-IDF. We extract User-Concerns embedded in user reviews and append them to TF-IDF vectors to train a deep rating prediction model. Our experiments show that adding such conceptual knowledge to TF-IDF vectors can significantly enhance the performance of TF-IDF vectors by only adding very little complexity.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129935683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting Typical Domain Keywords from Annual Reports of Listed Company","authors":"Zhaohui Chao, Lin Li","doi":"10.1109/MIPR.2018.00047","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00047","url":null,"abstract":"Keywords usually serve as indicators of important information contained in documents. In the financial analysis work, annual report is an essential basis for auditors to obtain financial information about the company. In this paper, we propose a novel system that recognizes typical domain keywords automatically by analyzing annual reports. In addition, the system could customize search keywords, recommend domain words. Auditors would work effectively with the help of the system.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"28 Pt 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124700461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ontology-Driven Hierarchical Deep Learning for Fashion Recognition","authors":"Zhenzhong Kuang, Jun Yu, Zhou Yu, Jianping Fan","doi":"10.1109/MIPR.2018.00012","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00012","url":null,"abstract":"We present an automatic approach for large-scale fashion recognition, given an image without any kind of annotation. We formulate the problem as a hierarchical deep learning (HDL) algorithm which can: (i) integrate the deep CNNs to learn more discriminative high-level features for fashion image representations of both coarse-grained and fine-grained classes at different levels of the fashion ontology tree; (ii) leverage multi-task learning and inter-task relationship constraint to train more discriminative classifiers for the nodes on the fashion ontology; (iii) use back propagation to simultaneously refine both the relevant node classifiers and the deep CNNs according to a joint objective function; and (iv) accelerate the fashion retrieval process via path-based classification. The experimental results have verified the effectiveness and efficiency of our proposed algorithm on both classification and retrieval performance.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121134630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}