Divya Sharma, Nitin Gupta, C. Chattopadhyay, S. Mehta
{"title":"REXplore: A Sketch Based Interactive Explorer for Real Estates Using Building Floor Plan Images","authors":"Divya Sharma, Nitin Gupta, C. Chattopadhyay, S. Mehta","doi":"10.1109/ISM.2018.00018","DOIUrl":"https://doi.org/10.1109/ISM.2018.00018","url":null,"abstract":"The increasing trend of using online platforms for real estate rent/sale makes automatic retrieval of similar floor plans a key requirement to help architects and buyers alike. Although sketch based image retrieval has been explored in the multimedia community, the problem of hand-drawn floor plan retrieval has been less researched in the past. In this paper, we propose REXplore (Real Estate eXplore), a novel framework that uses sketch based query mode to retrieve corresponding similar floor plan images from a repository using Cyclic Generative Adversarial Networks (Cyclic GAN) for mapping between sketch and image domain. The key contributions of our proposed approach are : (1) a novel sketch based floor plan retrieval framework using an intuitive and convenient sketch query mode; (2) A conjunction of Cyclic GANs and Convolution Neural Networks (CNNs) for the task of hand-drawn floor plan image retrieval. Extensive experimentation and comparison with baseline results authenticates our claim.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117235157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryuka Nanzaka, T. Kitamura, T. Takiguchi, Yuji Adachi, Kiyoto Tai
{"title":"Spectrum Enhancement of Singing Voice Using Deep Learning","authors":"Ryuka Nanzaka, T. Kitamura, T. Takiguchi, Yuji Adachi, Kiyoto Tai","doi":"10.1109/ISM.2018.00-18","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-18","url":null,"abstract":"In this paper, we propose a novel singing-voice enhancement system that makes the singing voice of amateurs similar to that of professional opera singers, where the singing voice of amateurs is emphasized by using a singing voice of a professional opera singer on a frequency band that represents the remarkable characteristic of the professional singer. Moreover, our proposed singing-voice enhancement based on highway networks is able to convert any song (that a professional opera singer does not sing). As a result of our experiments, the singing voice of the amateur singer at the middle-high frequency range which contains a lot of frequency components that affect glossiness was emphasized while maintaining speaker characteristics.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126718959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Jiang, Mun-Cheon Kang, M. Fan, Sung-Ho Chae, S. Ko
{"title":"A Novel Relative Camera Motion Estimation Algorithm with Applications to Visual Odometry","authors":"Yue Jiang, Mun-Cheon Kang, M. Fan, Sung-Ho Chae, S. Ko","doi":"10.1109/ISM.2018.000-4","DOIUrl":"https://doi.org/10.1109/ISM.2018.000-4","url":null,"abstract":"In this paper, we propose a novel method to estimate the relative camera motions of three consecutive images. Given a set of point correspondences in three views, the proposed method determines the fundamental matrix representing the geometrical relationship between the first two views by using the eight-point algorithm. Then, by minimizing the proposed cost function with the fundamental matrix, the relative camera motions over three views are precisely estimated. The experimental results show that the proposed method outperforms the conventional two-view and three-view geometry-based method in terms of the accuracy.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115526576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Santiago De-Luxán-Hernández, H. Schwarz, D. Marpe, T. Wiegand
{"title":"Fast Line-Based Intra Prediction for Video Coding","authors":"Santiago De-Luxán-Hernández, H. Schwarz, D. Marpe, T. Wiegand","doi":"10.1109/ISM.2018.00032","DOIUrl":"https://doi.org/10.1109/ISM.2018.00032","url":null,"abstract":"Intra prediction plays a very important role in current video coding technologies like the H.265/High Efficiency Video Coding (HEVC) standard, the Joint Exploration Test Model (JEM) and the upcoming Versatile Video Coding (VVC) standard. In previous work we proposed a Line-Based Intra Prediction algorithm to improve the state-of-art coding performance of HEVC and JEM. This method divides (horizontally or vertically) a block into lines and then it codes each of them individually in a sequential manner. At the encoder side, however, it is necessary to select an optimal combination of intra mode and 1-D split type in a Rate-Distortion sense. Since testing for every block all possible combinations of these two parameters would imply a very significant increase in the encoder complexity, this paper proposes several fast algorithms to reduce the number of tests and improve the overall trade-off between complexity and gain. The experimental results show a reduction of the encoder run-time from 322% to 166% in exchange for a loss of 0.34% for the All Intra configuration and from 151% to 116% for a loss of 0.15% in the case of Random Access.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124719431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joose Sainio, A. Ylä-Outinen, Marko Viitanen, Jarno Vanne, T. Hämäläinen
{"title":"Eye-Controlled Region of Interest HEVC Encoding","authors":"Joose Sainio, A. Ylä-Outinen, Marko Viitanen, Jarno Vanne, T. Hämäläinen","doi":"10.1109/ISM.2018.00-12","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-12","url":null,"abstract":"This paper presents a demonstrator setup for real-time HEVC encoding with gaze-based region of interest (ROI) detection. This proof-of-concept system is built on Kvazaar open-source HEVC encoder and Pupil eye tracking glasses. The gaze data is used to extract the ROI from live video and the ROI is encoded with higher quality than non-ROI regions. This demonstration illustrates that performing HEVC encoding with non-uniform quality reduces bit rate by 40-90% and complexity by 10-35% over that of the conventional approaches with negligible to minor deterioration in subjective quality.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129910908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Burn-in Potential Region Detection Method for the OLED panel displays","authors":"M. Kim, S.-H. Chae, J.-S. Kim","doi":"10.1109/ISM.2018.00-14","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-14","url":null,"abstract":"Organic light emitting diode (OLED) displays consist of organic compounds that emit light in response to electric current. OLED displays have been widely adopted to various multimedia devices due to their excellent performance. However, when a high luminance is repeatedly output in a specific region, the pixels within the region are seriously degraded as compared with the surrounding area. Such cumulative non-uniform use of pixels can cause screen burn-in, which is a noticeable color drift on the OLED display over time. In this paper, we propose a novel method to detect a burn-in potential region (BPR) as a preprocessing to prevent the burnin problem. In the proposed method, the lifetime of each pixel of the OLED display is estimated by accumulating the amount of consumed charge. If the discoloration due to the difference in the remaining lifetime between some particular pixels being outputting the high luminance and their surrounding pixels being outputting the low luminance is close to the user’s perceptible level, those particular pixels are selected as the BPR. The experimental results demonstrate that the proposed method detects the BPR with superior effectiveness compared with the conventional method.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123489996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kari Siivonen, Joose Sainio, Marko Viitanen, Jarno Vanne, T. Hämäläinen
{"title":"Open framework for error-compensated gaze data collection with eye tracking glasses","authors":"Kari Siivonen, Joose Sainio, Marko Viitanen, Jarno Vanne, T. Hämäläinen","doi":"10.1109/ISM.2018.00067","DOIUrl":"https://doi.org/10.1109/ISM.2018.00067","url":null,"abstract":"Eye tracking is nowadays the primary method for collecting training data for neural networks in the Human Visual System modelling. Our recommendation is to collect eye tracking data from videos with eye tracking glasses that are more affordable and applicable to diverse test conditions than conventionally used screen based eye trackers. Eye tracking glasses are prone to moving during the gaze data collection but our experiments show that the observed displacement error accumulates fairly linearly and can be compensated automatically by the proposed framework. This paper describes how our framework can be used in practice with videos up to 4K resolution. The proposed framework and the data collected during our sample experiment are made publicly available.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114705800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akinori Sato, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, I. Ide, Daisuke Deguchi, H. Murase
{"title":"Gaze-Inspired Learning for Estimating the Attractiveness of a Food Photo","authors":"Akinori Sato, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, I. Ide, Daisuke Deguchi, H. Murase","doi":"10.1109/ISM.2018.00015","DOIUrl":"https://doi.org/10.1109/ISM.2018.00015","url":null,"abstract":"The number of food photos posted to the Web has been increasing. Most of the users prefer to post delicious-looking food photos. They, however, do not always look delicious. A previous work proposed a method for estimating the attractiveness of food photos, that is, the degree of how much a food photo looks delicious, as an assistive technology for taking a delicious-looking food photo. This method extracted image features from the entire food photo to evaluate the impression. In our work, we conduct a preference experiment where subjects are asked to compare a pair of food photos and measure their gaze. The proposed method extracts image features from local regions selected based on the gaze information and estimates the attractiveness of a food photo by learning regression parameters. Experimental results showed the effectiveness of extracting image features from outside the gaze regions rather than inside them.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121774789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaman Kumar Singla, Rohit Jain, Khwaja Mohd. Salik, R. Shah, Roger Zimmermann, Yifang Yin
{"title":"MyLipper: A Personalized System for Speech Reconstruction using Multi-view Visual Feeds","authors":"Yaman Kumar Singla, Rohit Jain, Khwaja Mohd. Salik, R. Shah, Roger Zimmermann, Yifang Yin","doi":"10.1109/ISM.2018.00-19","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-19","url":null,"abstract":"Lipreading is the task of looking at, perceiving, and interpreting spoken symbols. It has a wide range of applications such as in surveillance, Internet telephony, speech reconstruction for silent movies and as an aid to a person with speech as well as hearing impairments. However, most of the work in lipreading literature has been limited to the classification of speech videos into text classes formed of phrases, words and sentences. Even this has been based on a highly constrained lexicon of words which, then subsequently translates to restriction on total number of classes (i.e, phrases, words and sentences) that are considered for the classification task. Recently, research has ventured into generating speech (audio) from silent video sequences. In spite of non-frontal views showing the potential of enhancing performance of speech reading and reconstruction systems, there have been no developments in using multiple camera feeds for the same. To this end, this paper presents a multi-view speech reading and reconstruction system. The major contribution of this paper is to present a model, namely MyLipper, which is a vocabulary and language agnostic and a real-time model that deals with a variety of poses of a speaker. The model leverages silent video feeds from multiple cameras recording a subject to generate intelligent speech for that speaker, thus being a personalized speech reconstruction model. It uses deep learning based STCNN+BiGRU architecture to achieve this goal. The results obtained using MyLipper show an improvement of over 20% in reconstructed speech's intelligibility (as measured by PESQ) using multiple views as compared to a single view visual feed. This confirms the importance of exploiting multiple views in building an efficient speech reconstruction system. The paper further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Further, we demonstrate the reconstructed audios overlaid on the corresponding videos obtained from MyLipper using a variety of videos from the dataset","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115783519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}