2018 IEEE International Symposium on Multimedia (ISM)最新文献

[Title page iii] [标题页iii]

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ism.2018.00002

引用次数: 0

REXplore: A Sketch Based Interactive Explorer for Real Estates Using Building Floor Plan Images reexplore:一个基于草图的交互式资源管理器，用于使用建筑平面图图像的房地产

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00018

Divya Sharma, Nitin Gupta, C. Chattopadhyay, S. Mehta

引用次数: 5

Spectrum Enhancement of Singing Voice Using Deep Learning 利用深度学习增强歌唱声音的频谱

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00-18

Ryuka Nanzaka, T. Kitamura, T. Takiguchi, Yuji Adachi, Kiyoto Tai

引用次数: 3

A Novel Relative Camera Motion Estimation Algorithm with Applications to Visual Odometry 一种新的摄像机相对运动估计算法及其在视觉里程计中的应用

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.000-4

Yue Jiang, Mun-Cheon Kang, M. Fan, Sung-Ho Chae, S. Ko

引用次数: 3

Fast Line-Based Intra Prediction for Video Coding 基于快速行内预测的视频编码

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00032

Santiago De-Luxán-Hernández, H. Schwarz, D. Marpe, T. Wiegand

{"title":"Fast Line-Based Intra Prediction for Video Coding","authors":"Santiago De-Luxán-Hernández, H. Schwarz, D. Marpe, T. Wiegand","doi":"10.1109/ISM.2018.00032","DOIUrl":"https://doi.org/10.1109/ISM.2018.00032","url":null,"abstract":"Intra prediction plays a very important role in current video coding technologies like the H.265/High Efficiency Video Coding (HEVC) standard, the Joint Exploration Test Model (JEM) and the upcoming Versatile Video Coding (VVC) standard. In previous work we proposed a Line-Based Intra Prediction algorithm to improve the state-of-art coding performance of HEVC and JEM. This method divides (horizontally or vertically) a block into lines and then it codes each of them individually in a sequential manner. At the encoder side, however, it is necessary to select an optimal combination of intra mode and 1-D split type in a Rate-Distortion sense. Since testing for every block all possible combinations of these two parameters would imply a very significant increase in the encoder complexity, this paper proposes several fast algorithms to reduce the number of tests and improve the overall trade-off between complexity and gain. The experimental results show a reduction of the encoder run-time from 322% to 166% in exchange for a loss of 0.34% for the All Intra configuration and from 151% to 116% for a loss of 0.15% in the case of Random Access.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124719431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Eye-Controlled Region of Interest HEVC Encoding 眼控感兴趣区HEVC编码

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00-12

Joose Sainio, A. Ylä-Outinen, Marko Viitanen, Jarno Vanne, T. Hämäläinen

引用次数: 3

A Burn-in Potential Region Detection Method for the OLED panel displays 一种OLED面板显示器的老化电位区域检测方法

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00-14

M. Kim, S.-H. Chae, J.-S. Kim

{"title":"A Burn-in Potential Region Detection Method for the OLED panel displays","authors":"M. Kim, S.-H. Chae, J.-S. Kim","doi":"10.1109/ISM.2018.00-14","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-14","url":null,"abstract":"Organic light emitting diode (OLED) displays consist of organic compounds that emit light in response to electric current. OLED displays have been widely adopted to various multimedia devices due to their excellent performance. However, when a high luminance is repeatedly output in a specific region, the pixels within the region are seriously degraded as compared with the surrounding area. Such cumulative non-uniform use of pixels can cause screen burn-in, which is a noticeable color drift on the OLED display over time. In this paper, we propose a novel method to detect a burn-in potential region (BPR) as a preprocessing to prevent the burnin problem. In the proposed method, the lifetime of each pixel of the OLED display is estimated by accumulating the amount of consumed charge. If the discoloration due to the difference in the remaining lifetime between some particular pixels being outputting the high luminance and their surrounding pixels being outputting the low luminance is close to the user’s perceptible level, those particular pixels are selected as the BPR. The experimental results demonstrate that the proposed method detects the BPR with superior effectiveness compared with the conventional method.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123489996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Open framework for error-compensated gaze data collection with eye tracking glasses 基于眼动追踪眼镜的误差补偿凝视数据采集开放框架

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00067

Kari Siivonen, Joose Sainio, Marko Viitanen, Jarno Vanne, T. Hämäläinen

引用次数: 2

Gaze-Inspired Learning for Estimating the Attractiveness of a Food Photo 用目光来评估食物照片的吸引力

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00015

Akinori Sato, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, I. Ide, Daisuke Deguchi, H. Murase

引用次数: 0

MyLipper: A Personalized System for Speech Reconstruction using Multi-view Visual Feeds MyLipper:一个使用多视图视觉馈送的个性化语音重建系统

2018 IEEE International Symposium on Multimedia (ISM) Pub Date : 2018-12-01 DOI: 10.1109/ISM.2018.00-19

Yaman Kumar Singla, Rohit Jain, Khwaja Mohd. Salik, R. Shah, Roger Zimmermann, Yifang Yin

{"title":"MyLipper: A Personalized System for Speech Reconstruction using Multi-view Visual Feeds","authors":"Yaman Kumar Singla, Rohit Jain, Khwaja Mohd. Salik, R. Shah, Roger Zimmermann, Yifang Yin","doi":"10.1109/ISM.2018.00-19","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-19","url":null,"abstract":"Lipreading is the task of looking at, perceiving, and interpreting spoken symbols. It has a wide range of applications such as in surveillance, Internet telephony, speech reconstruction for silent movies and as an aid to a person with speech as well as hearing impairments. However, most of the work in lipreading literature has been limited to the classification of speech videos into text classes formed of phrases, words and sentences. Even this has been based on a highly constrained lexicon of words which, then subsequently translates to restriction on total number of classes (i.e, phrases, words and sentences) that are considered for the classification task. Recently, research has ventured into generating speech (audio) from silent video sequences. In spite of non-frontal views showing the potential of enhancing performance of speech reading and reconstruction systems, there have been no developments in using multiple camera feeds for the same. To this end, this paper presents a multi-view speech reading and reconstruction system. The major contribution of this paper is to present a model, namely MyLipper, which is a vocabulary and language agnostic and a real-time model that deals with a variety of poses of a speaker. The model leverages silent video feeds from multiple cameras recording a subject to generate intelligent speech for that speaker, thus being a personalized speech reconstruction model. It uses deep learning based STCNN+BiGRU architecture to achieve this goal. The results obtained using MyLipper show an improvement of over 20% in reconstructed speech's intelligibility (as measured by PESQ) using multiple views as compared to a single view visual feed. This confirms the importance of exploiting multiple views in building an efficient speech reconstruction system. The paper further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Further, we demonstrate the reconstructed audios overlaid on the corresponding videos obtained from MyLipper using a variety of videos from the dataset","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115783519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16