2018 IEEE International Symposium on Multimedia (ISM)最新文献

筛选
英文 中文
[Title page iii] [标题页iii]
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ism.2018.00002
{"title":"[Title page iii]","authors":"","doi":"10.1109/ism.2018.00002","DOIUrl":"https://doi.org/10.1109/ism.2018.00002","url":null,"abstract":"","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115656037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
REXplore: A Sketch Based Interactive Explorer for Real Estates Using Building Floor Plan Images reexplore:一个基于草图的交互式资源管理器,用于使用建筑平面图图像的房地产
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00018
Divya Sharma, Nitin Gupta, C. Chattopadhyay, S. Mehta
{"title":"REXplore: A Sketch Based Interactive Explorer for Real Estates Using Building Floor Plan Images","authors":"Divya Sharma, Nitin Gupta, C. Chattopadhyay, S. Mehta","doi":"10.1109/ISM.2018.00018","DOIUrl":"https://doi.org/10.1109/ISM.2018.00018","url":null,"abstract":"The increasing trend of using online platforms for real estate rent/sale makes automatic retrieval of similar floor plans a key requirement to help architects and buyers alike. Although sketch based image retrieval has been explored in the multimedia community, the problem of hand-drawn floor plan retrieval has been less researched in the past. In this paper, we propose REXplore (Real Estate eXplore), a novel framework that uses sketch based query mode to retrieve corresponding similar floor plan images from a repository using Cyclic Generative Adversarial Networks (Cyclic GAN) for mapping between sketch and image domain. The key contributions of our proposed approach are : (1) a novel sketch based floor plan retrieval framework using an intuitive and convenient sketch query mode; (2) A conjunction of Cyclic GANs and Convolution Neural Networks (CNNs) for the task of hand-drawn floor plan image retrieval. Extensive experimentation and comparison with baseline results authenticates our claim.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117235157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Spectrum Enhancement of Singing Voice Using Deep Learning 利用深度学习增强歌唱声音的频谱
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00-18
Ryuka Nanzaka, T. Kitamura, T. Takiguchi, Yuji Adachi, Kiyoto Tai
{"title":"Spectrum Enhancement of Singing Voice Using Deep Learning","authors":"Ryuka Nanzaka, T. Kitamura, T. Takiguchi, Yuji Adachi, Kiyoto Tai","doi":"10.1109/ISM.2018.00-18","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-18","url":null,"abstract":"In this paper, we propose a novel singing-voice enhancement system that makes the singing voice of amateurs similar to that of professional opera singers, where the singing voice of amateurs is emphasized by using a singing voice of a professional opera singer on a frequency band that represents the remarkable characteristic of the professional singer. Moreover, our proposed singing-voice enhancement based on highway networks is able to convert any song (that a professional opera singer does not sing). As a result of our experiments, the singing voice of the amateur singer at the middle-high frequency range which contains a lot of frequency components that affect glossiness was emphasized while maintaining speaker characteristics.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126718959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Novel Relative Camera Motion Estimation Algorithm with Applications to Visual Odometry 一种新的摄像机相对运动估计算法及其在视觉里程计中的应用
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.000-4
Yue Jiang, Mun-Cheon Kang, M. Fan, Sung-Ho Chae, S. Ko
{"title":"A Novel Relative Camera Motion Estimation Algorithm with Applications to Visual Odometry","authors":"Yue Jiang, Mun-Cheon Kang, M. Fan, Sung-Ho Chae, S. Ko","doi":"10.1109/ISM.2018.000-4","DOIUrl":"https://doi.org/10.1109/ISM.2018.000-4","url":null,"abstract":"In this paper, we propose a novel method to estimate the relative camera motions of three consecutive images. Given a set of point correspondences in three views, the proposed method determines the fundamental matrix representing the geometrical relationship between the first two views by using the eight-point algorithm. Then, by minimizing the proposed cost function with the fundamental matrix, the relative camera motions over three views are precisely estimated. The experimental results show that the proposed method outperforms the conventional two-view and three-view geometry-based method in terms of the accuracy.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115526576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fast Line-Based Intra Prediction for Video Coding 基于快速行内预测的视频编码
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00032
Santiago De-Luxán-Hernández, H. Schwarz, D. Marpe, T. Wiegand
{"title":"Fast Line-Based Intra Prediction for Video Coding","authors":"Santiago De-Luxán-Hernández, H. Schwarz, D. Marpe, T. Wiegand","doi":"10.1109/ISM.2018.00032","DOIUrl":"https://doi.org/10.1109/ISM.2018.00032","url":null,"abstract":"Intra prediction plays a very important role in current video coding technologies like the H.265/High Efficiency Video Coding (HEVC) standard, the Joint Exploration Test Model (JEM) and the upcoming Versatile Video Coding (VVC) standard. In previous work we proposed a Line-Based Intra Prediction algorithm to improve the state-of-art coding performance of HEVC and JEM. This method divides (horizontally or vertically) a block into lines and then it codes each of them individually in a sequential manner. At the encoder side, however, it is necessary to select an optimal combination of intra mode and 1-D split type in a Rate-Distortion sense. Since testing for every block all possible combinations of these two parameters would imply a very significant increase in the encoder complexity, this paper proposes several fast algorithms to reduce the number of tests and improve the overall trade-off between complexity and gain. The experimental results show a reduction of the encoder run-time from 322% to 166% in exchange for a loss of 0.34% for the All Intra configuration and from 151% to 116% for a loss of 0.15% in the case of Random Access.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124719431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Eye-Controlled Region of Interest HEVC Encoding 眼控感兴趣区HEVC编码
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00-12
Joose Sainio, A. Ylä-Outinen, Marko Viitanen, Jarno Vanne, T. Hämäläinen
{"title":"Eye-Controlled Region of Interest HEVC Encoding","authors":"Joose Sainio, A. Ylä-Outinen, Marko Viitanen, Jarno Vanne, T. Hämäläinen","doi":"10.1109/ISM.2018.00-12","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-12","url":null,"abstract":"This paper presents a demonstrator setup for real-time HEVC encoding with gaze-based region of interest (ROI) detection. This proof-of-concept system is built on Kvazaar open-source HEVC encoder and Pupil eye tracking glasses. The gaze data is used to extract the ROI from live video and the ROI is encoded with higher quality than non-ROI regions. This demonstration illustrates that performing HEVC encoding with non-uniform quality reduces bit rate by 40-90% and complexity by 10-35% over that of the conventional approaches with negligible to minor deterioration in subjective quality.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129910908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Burn-in Potential Region Detection Method for the OLED panel displays 一种OLED面板显示器的老化电位区域检测方法
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00-14
M. Kim, S.-H. Chae, J.-S. Kim
{"title":"A Burn-in Potential Region Detection Method for the OLED panel displays","authors":"M. Kim, S.-H. Chae, J.-S. Kim","doi":"10.1109/ISM.2018.00-14","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-14","url":null,"abstract":"Organic light emitting diode (OLED) displays consist of organic compounds that emit light in response to electric current. OLED displays have been widely adopted to various multimedia devices due to their excellent performance. However, when a high luminance is repeatedly output in a specific region, the pixels within the region are seriously degraded as compared with the surrounding area. Such cumulative non-uniform use of pixels can cause screen burn-in, which is a noticeable color drift on the OLED display over time. In this paper, we propose a novel method to detect a burn-in potential region (BPR) as a preprocessing to prevent the burnin problem. In the proposed method, the lifetime of each pixel of the OLED display is estimated by accumulating the amount of consumed charge. If the discoloration due to the difference in the remaining lifetime between some particular pixels being outputting the high luminance and their surrounding pixels being outputting the low luminance is close to the user’s perceptible level, those particular pixels are selected as the BPR. The experimental results demonstrate that the proposed method detects the BPR with superior effectiveness compared with the conventional method.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123489996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Open framework for error-compensated gaze data collection with eye tracking glasses 基于眼动追踪眼镜的误差补偿凝视数据采集开放框架
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00067
Kari Siivonen, Joose Sainio, Marko Viitanen, Jarno Vanne, T. Hämäläinen
{"title":"Open framework for error-compensated gaze data collection with eye tracking glasses","authors":"Kari Siivonen, Joose Sainio, Marko Viitanen, Jarno Vanne, T. Hämäläinen","doi":"10.1109/ISM.2018.00067","DOIUrl":"https://doi.org/10.1109/ISM.2018.00067","url":null,"abstract":"Eye tracking is nowadays the primary method for collecting training data for neural networks in the Human Visual System modelling. Our recommendation is to collect eye tracking data from videos with eye tracking glasses that are more affordable and applicable to diverse test conditions than conventionally used screen based eye trackers. Eye tracking glasses are prone to moving during the gaze data collection but our experiments show that the observed displacement error accumulates fairly linearly and can be compensated automatically by the proposed framework. This paper describes how our framework can be used in practice with videos up to 4K resolution. The proposed framework and the data collected during our sample experiment are made publicly available.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114705800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Gaze-Inspired Learning for Estimating the Attractiveness of a Food Photo 用目光来评估食物照片的吸引力
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00015
Akinori Sato, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, I. Ide, Daisuke Deguchi, H. Murase
{"title":"Gaze-Inspired Learning for Estimating the Attractiveness of a Food Photo","authors":"Akinori Sato, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, I. Ide, Daisuke Deguchi, H. Murase","doi":"10.1109/ISM.2018.00015","DOIUrl":"https://doi.org/10.1109/ISM.2018.00015","url":null,"abstract":"The number of food photos posted to the Web has been increasing. Most of the users prefer to post delicious-looking food photos. They, however, do not always look delicious. A previous work proposed a method for estimating the attractiveness of food photos, that is, the degree of how much a food photo looks delicious, as an assistive technology for taking a delicious-looking food photo. This method extracted image features from the entire food photo to evaluate the impression. In our work, we conduct a preference experiment where subjects are asked to compare a pair of food photos and measure their gaze. The proposed method extracts image features from local regions selected based on the gaze information and estimates the attractiveness of a food photo by learning regression parameters. Experimental results showed the effectiveness of extracting image features from outside the gaze regions rather than inside them.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121774789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MyLipper: A Personalized System for Speech Reconstruction using Multi-view Visual Feeds MyLipper:一个使用多视图视觉馈送的个性化语音重建系统
2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00-19
Yaman Kumar Singla, Rohit Jain, Khwaja Mohd. Salik, R. Shah, Roger Zimmermann, Yifang Yin
{"title":"MyLipper: A Personalized System for Speech Reconstruction using Multi-view Visual Feeds","authors":"Yaman Kumar Singla, Rohit Jain, Khwaja Mohd. Salik, R. Shah, Roger Zimmermann, Yifang Yin","doi":"10.1109/ISM.2018.00-19","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-19","url":null,"abstract":"Lipreading is the task of looking at, perceiving, and interpreting spoken symbols. It has a wide range of applications such as in surveillance, Internet telephony, speech reconstruction for silent movies and as an aid to a person with speech as well as hearing impairments. However, most of the work in lipreading literature has been limited to the classification of speech videos into text classes formed of phrases, words and sentences. Even this has been based on a highly constrained lexicon of words which, then subsequently translates to restriction on total number of classes (i.e, phrases, words and sentences) that are considered for the classification task. Recently, research has ventured into generating speech (audio) from silent video sequences. In spite of non-frontal views showing the potential of enhancing performance of speech reading and reconstruction systems, there have been no developments in using multiple camera feeds for the same. To this end, this paper presents a multi-view speech reading and reconstruction system. The major contribution of this paper is to present a model, namely MyLipper, which is a vocabulary and language agnostic and a real-time model that deals with a variety of poses of a speaker. The model leverages silent video feeds from multiple cameras recording a subject to generate intelligent speech for that speaker, thus being a personalized speech reconstruction model. It uses deep learning based STCNN+BiGRU architecture to achieve this goal. The results obtained using MyLipper show an improvement of over 20% in reconstructed speech's intelligibility (as measured by PESQ) using multiple views as compared to a single view visual feed. This confirms the importance of exploiting multiple views in building an efficient speech reconstruction system. The paper further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Further, we demonstrate the reconstructed audios overlaid on the corresponding videos obtained from MyLipper using a variety of videos from the dataset","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115783519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信