Philipp Harzig, Stephan Brehm, R. Lienhart, Carolin Kaiser, René Schallner
{"title":"Multimodal Image Captioning for Marketing Analysis","authors":"Philipp Harzig, Stephan Brehm, R. Lienhart, Carolin Kaiser, René Schallner","doi":"10.1109/MIPR.2018.00035","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00035","url":null,"abstract":"Automatically captioning images with natural language sentences is an important research topic. State of the art models are able to produce human-like sentences. These models typically describe the depicted scene as a whole and do not target specific objects of interest or emotional relationships between these objects in the image. However, marketing companies require to describe these important attributes of a given scene. In our case, objects of interest are consumer goods, which are usually identifiable by a product logo and are associated with certain brands. From a marketing point of view, it is desirable to also evaluate the emotional context of a trademarked product, i.e., whether it appears in a positive or a negative connotation. We address the problem of finding brands in images and deriving corresponding captions by introducing a modified image captioning network. We also add a third output modality, which simultaneously produces real-valued image ratings. Our network is trained using a classification-aware loss function in order to stimulate the generation of sentences with an emphasis on words identifying the brand of a product. We evaluate our model on a dataset of images depicting interactions between humans and branded products. The introduced network improves mean class accuracy by 24.5 percent. Thanks to adding the third output modality, it also considerably improves the quality of generated captions for images depicting branded products.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116745141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative Sensing: Transforming Unreliable Sensor Data for Reliable Recognition","authors":"Lina Karam, Tejas S. Borkar, Yu Cao, J. Chae","doi":"10.1109/MIPR.2018.00025","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00025","url":null,"abstract":"This paper introduces a deep learning enabled generative sensing framework which integrates low-end sensors with computational intelligence to attain a high recognition accuracy on par with that attained with high-end sensors. The proposed generative sensing framework aims at transforming low-end, low-quality sensor data into higher quality sensor data in terms of achieved classification accuracy. The low-end data can be transformed into higher quality data of the same modality or into data of another modality. Different from existing methods for image generation, the proposed framework is based on discriminative models and targets to maximize the recognition accuracy rather than a similarity measure. This is achieved through the introduction of selective feature regeneration in a deep neural network (DNN). The proposed generative sensing will essentially transform low-quality sensor data into high-quality information for robust perception. Results are presented to illustrate the performance of the proposed framework.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123365489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Malware Detection Accuracy by Extracting Icon Information","authors":"Pedro Silva, Sepehr Akhavan-Masouleh, Li Li","doi":"10.1109/MIPR.2018.00088","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00088","url":null,"abstract":"Detecting PE malware files is now commonly approached using statistical and machine learning models. While these models commonly use features extracted from the structure of PE files, we propose that icons from these files can also help better predict malware. We propose a new machine learning approach to extract information from icons. Our proposed approach consists of two steps: 1) extracting icon features using summary statics, a histogram of gradients (HOG), and a convolutional autoencoder, 2) clustering icons based on the extracted icon features. Using publicly available data and by using machine learning experiments, we show our proposed icon clusters significantly boost the efficacy of malware prediction models. In particular, our experiments show an average accuracy increase of 10 percent when icon clusters are used in the prediction model.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131034875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can You Tell a Face from a HEVC Bitstream?","authors":"Saeed Ranjbar Alvar, Hyomin Choi, I. Bajić","doi":"10.1109/MIPR.2018.00060","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00060","url":null,"abstract":"Image and video analytics are being increasingly used on a massive scale. Not only is the amount of data growing, but the complexity of the data processing pipelines is also increasing, thereby exacerbating the problem. It is becoming increasingly important to save computational resources wherever possible. We focus on one of the poster problems of visual analytics – face detection – and approach the issue of reducing the computation by asking: Is it possible to detect a face without full image reconstruction from the High Efficiency Video Coding (HEVC) bitstream? We demonstrate that this is indeed possible, with accuracy comparable to conventional face detection, by training a Convolutional Neural Network on the output of the HEVC entropy decoder.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"15 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124252029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Wang, Yang Feng, Haofu Liao, Jiebo Luo, Xiangyang Xu
{"title":"Do They All Look the Same? Deciphering Chinese, Japanese and Koreans by Fine-Grained Deep Learning","authors":"Yu Wang, Yang Feng, Haofu Liao, Jiebo Luo, Xiangyang Xu","doi":"10.1109/MIPR.2018.00015","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00015","url":null,"abstract":"We study to what extend Chinese, Japanese and Korean faces can be classified and which facial attributes offer the most important cues. First, we propose a novel way of ob- taining large numbers of facial images with nationality la- bels. Then we train state-of-the-art neural networks with these labeled images. We are able to achieve an accuracy of 75.03% in the classification task, with chances being 33.33% and human accuracy 49% . Further, we train mul- tiple facial attribute classifiers to identify the most distinc- tive features for each group. We find that Chinese, Japanese and Koreans do exhibit substantial differences in certain at- tributes, such as bangs, smiling, and bushy eyebrows. Along the way, we uncover several gender-related cross-country patterns as well. Our work, which complements existing APIs such as Microsoft Cognitive Services and Face++, could find potential applications in tourism, e-commerce, social media marketing, criminal justice and even counter- terrorism.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133494172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}