{"title":"MID: A Novel Contrast Metric for the MSER Detector","authors":"Martin Oelsch, Başak Güleçyüz, E. Steinbach","doi":"10.1109/ISM.2018.00014","DOIUrl":"https://doi.org/10.1109/ISM.2018.00014","url":null,"abstract":"This paper presents a novel contrast measure for MSER region selection, termed Mean Intensity Difference (MID). The proposed metric is computed between the pixels of an MSER region and its surrounding pixel set. In this work we consider the complementary pixels within the bounding box of a region and alternatively the pixels of the first contour layer as surroundings. To evaluate the proposed contrast metric, a location retrieval task is performed. To this end, SURF descriptors are computed and the Bag-of-Words representation is used as global signature for each image. For the evaluation we use the Devon Island dataset, which is said to have one of the most Mars-like environments on Earth and which comes with GPS ground-truth data. We further integrate the contrast-based methods with the approach of Grid Adaptation. The experimental results show that our contrast metric outperforms state-of-the-art metrics, such as Perceptual Divergence, and yields better performance compared to random region selection. In this work we also evaluate the computational complexity of the methods.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132145953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mattis Jeppsson, H. Espeland, T. Kupka, Ragnar Langseth, Andreas Petlund, Peng Qiaoqiao, Chuansong Xue, Konstantin Pogorelov, M. Riegler, Dag Johansen, C. Griwodz, P. Halvorsen
{"title":"Efficient Live and on-Demand Tiled HEVC 360 VR Video Streaming","authors":"Mattis Jeppsson, H. Espeland, T. Kupka, Ragnar Langseth, Andreas Petlund, Peng Qiaoqiao, Chuansong Xue, Konstantin Pogorelov, M. Riegler, Dag Johansen, C. Griwodz, P. Halvorsen","doi":"10.1109/ISM.2018.00022","DOIUrl":"https://doi.org/10.1109/ISM.2018.00022","url":null,"abstract":"With 360° panorama video technology becoming commonplace, the need for efficient streaming methods for such videos arises. We go beyond the existing on-demand solutions and present a live streaming system which strikes a trade-off between bandwidth usage and the video quality in the user’s field-of-view. We have created an architecture that combines RTP and DASH to deliver 360° VR content to a Huawei set-top-box and a Samsung Galaxy S7. Our system multiplexes a single HEVC hardware decoder to provide faster quality switching than at the traditional GOP boundaries. We demonstrate the performance and illustrate the trade-offs through real-world experiments where we can report comparable bandwidth savings to existing on-demand approaches, but with faster quality switches when the field-of-view changes.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134370280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Audio Feature Extraction Based on Sub-Band Signal Correlations for Music Genre Classification","authors":"Takuya Kobayashi, Akira Kubota, Yusuke Suzuki","doi":"10.1109/ISM.2018.00-15","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-15","url":null,"abstract":"We present novel low-level audio features that are based on correlations between sub-band audio signals decomposed by undecimated wavelet transform. Under the assumption that SVM is used for classifier learning, the experimental results on GTZAN dataset showed that the proposed method demonstrated the best accuracy of 81.5%, outperforming the conventional methods.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123847952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Malignancy Classification of Lung Nodule Based on Accumulated Multi Planar Views and Canonical Correlation Analysis","authors":"S. A. Abdelrahman, M. Abdelwahab, M. Sayed","doi":"10.1109/ISM.2018.00012","DOIUrl":"https://doi.org/10.1109/ISM.2018.00012","url":null,"abstract":"Appearance of a small round or oval shaped in a Computed Tomography (CT) scan of lung is an alarm to suspicion of lung cancer. In order to avoid the misdiagnose of lung cancer at early stage, Computer Aided Diagnosis (CAD) assists oncologists to classify pulmonary nodules as malignant (cancerous) or benign (noncancerous). This paper introduces a novel approach for pulmonary nodules classification employing three accumulated views (top, front, and side) of CT slices and Canonical Correlation Analysis (CCA). Nodule is extracted from 2D CT slice to obtain the Region of Interest (ROI) patch. All patches from sequential slices are accumulated from three different views. Vector representation of each view is correlated with two training sets, malignant and benign sets, employing CCA in spatial and Radon Transform (RT) domain. According to the correlation coefficients, each view is classified and the final classification decision is taken based on the priority decision. For training and testing, 1010 patients are downloaded from Lung Image Database Consortium (LIDC). The final results show that the proposed method achieved the best performance with an accuracy of 90.93% compared with existing methods.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128653858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florian Schimanke, R. Mertens, Bettina Sophie Huck
{"title":"Player Types in Mobile Learning Games – Playing Patterns and Motivation","authors":"Florian Schimanke, R. Mertens, Bettina Sophie Huck","doi":"10.1109/ISM.2018.00035","DOIUrl":"https://doi.org/10.1109/ISM.2018.00035","url":null,"abstract":"This paper presents results from an analysis of player behavior in the popular mobile learning game \"Where is that\". Playing data of nearly 24,000 unique users were gathered over a period of three months and subsequently analyzed in order to get a better insight in how games are played. The results will then further be used to compare learning results with a spaced repetition approach. Our analysis revealed four distinct clusters of learner types that can be categorized as Learners, Confirmers, Leisure Players and Sporadic Players. The data shows the player types' playing patterns and gives indications about what motivates them to play. It can thus give valuable hints for the design of player interaction in learning games as well as content selection.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121980022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Herglotz, D. Muller, Andreas Weinlich, F. Bauer, M. Ortner, M. Stamminger, André Kaup
{"title":"Improving HEVC Encoding of Rendered Video Data Using True Motion Information","authors":"Christian Herglotz, D. Muller, Andreas Weinlich, F. Bauer, M. Ortner, M. Stamminger, André Kaup","doi":"10.1109/ISM.2018.00063","DOIUrl":"https://doi.org/10.1109/ISM.2018.00063","url":null,"abstract":"This paper shows that motion vectors representing the true motion of an object in a scene can be exploited to improve the encoding process of computer generated video sequences. Therefore, a set of sequences is presented for which the true motion vectors of the corresponding objects were generated on a per-pixel basis during the rendering process. In addition to conventional motion estimation methods, it is proposed to exploit the computer generated motion vectors to enhance the rate-distortion performance. To this end, a motion vector mapping method including disocclusion handling is presented. It is shown that mean rate savings of 3.78% can be achieved.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123460424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mariem Ben Yahia, Yannick Le Louédec, G. Simon, L. Nuaymi
{"title":"HTTP/2-Based Streaming Solutions for Tiled Omnidirectional Videos","authors":"Mariem Ben Yahia, Yannick Le Louédec, G. Simon, L. Nuaymi","doi":"10.1109/ISM.2018.00023","DOIUrl":"https://doi.org/10.1109/ISM.2018.00023","url":null,"abstract":"360° video streaming is coming up against two major technical challenges: network resource consumption and Quality of Experience (QoE). Dynamically adapting the content delivery process to the user behavior is a promising approach to ensure both important network resource savings and satisfying experiences. In this paper, we propose to leverage HTTP Adaptive Streaming (HAS), tiled-based 360° video encoding and the HTTP/2 protocol to implement this dynamic content delivery process. The 360° video stream is spatially encoded into tiles and temporally divided into segments. The client executes two viewport predictions for each segment, one before and one during its delivery. Upon every prediction, it decides on a priority and a quality level for each tile of the video segment; tiles overlapping with the predicted viewport get higher priorities and quality levels. Then it exploits the priority and stream termination features of the HTTP/2 protocol to enforce its decisions. We compare our proposed solution with four alternative schemes on a set of 360° video streaming sessions corresponding to various types of videos, user behaviors and network conditions. Our solution provides better performances: a higher quality on the viewport pixels, a lower ratio of unreceived viewport pixels in bandwidth-constrained networks, and a reduction of the bandwidth consumption, up to 12% compared to the alternative schemes exploiting 2 viewport predictions per video segment.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129638705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nguyen-Khang Le, Sih-Huei Chen, Tzu-Chiang Tai, Jia-Ching Wang
{"title":"Single-Channel Speech Separation Based on Gaussian Process Regression","authors":"Nguyen-Khang Le, Sih-Huei Chen, Tzu-Chiang Tai, Jia-Ching Wang","doi":"10.1109/ISM.2018.00040","DOIUrl":"https://doi.org/10.1109/ISM.2018.00040","url":null,"abstract":"Gaussian process (GP) is a flexible kernel-based learning method which has found widespread applications in signal processing. In this paper, a supervised approach is proposed to handle single-channel speech separation (SCSS) problem. We focus on modeling a nonlinear mapping between mixed and clean speeches based on GP regression, in which reconstructed audio signal is estimated by the predictive mean of GP model. The nonlinear conjugate gradient method was utilized to perform the hyper-parameter optimization. The experiment on a subset of TIMIT speech dataset is carried out to confirm the validity of the proposed approach.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130964072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saman Zadtootaghaj, Nabajeet Barman, Steven Schmidt, M. Martini, S. Möller
{"title":"NR-GVQM: A No Reference Gaming Video Quality Metric","authors":"Saman Zadtootaghaj, Nabajeet Barman, Steven Schmidt, M. Martini, S. Möller","doi":"10.1109/ISM.2018.00031","DOIUrl":"https://doi.org/10.1109/ISM.2018.00031","url":null,"abstract":"Gaming as a popular system has recently expanded the associated services, by stepping into live streaming services. Live gaming video streaming is not only limited to cloud gaming services, such as Geforce Now, but also include passive streaming, where the players' gameplay is streamed both live and ondemand over services such as Twitch.tv and YouTubeGaming. So far, in terms of gaming video quality assessment, typical video quality assessment methods have been used. However, their performance remains quite unsatisfactory. In this paper, we present a new No Reference (NR) gaming video quality metric called NR-GVQM with performance comparable to state-of-the-art Full Reference (FR) metrics. NR-GVQM is designed by training a Support Vector Regression (SVR) with the Gaussian kernel using nine frame-level indexes such as naturalness and blockiness as input features and Video Multimethod Assessment Fusion (VMAF) scores as the ground truth. Our results based on a publicly available dataset of gaming videos are shown to have a correlation score of 0.98 with VMAF and 0.89 with MOS scores. We further present two approaches to reduce computational complexity.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115641016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geometry-Based Motion Vector Scaling for Omnidirectional Video Coding","authors":"R. G. Youvalari, A. Aminlou","doi":"10.1109/ISM.2018.00030","DOIUrl":"https://doi.org/10.1109/ISM.2018.00030","url":null,"abstract":"Virtual reality (VR) applications make use of 360° omnidirectional video content for creating immersive experience to the user. In order to utilize current 2D video compression standards, such content must be projected onto a 2D image plane. However, the projection from spherical to 2D domain introduces deformations in the projected content due to the different sampling characteristics of the 2D plane. Such deformations are not favorable for the motion models of the current video coding standards. Consequently, omnidirectional video is not efficiently compressible with current codecs. In this work, a geometry-based motion vector scaling method is proposed in order to compress the motion information of omnidirectional content efficiently. The proposed method applies a scaling technique, based on the location in the 360° video, to the motion information of the neighboring blocks in order to provide a uniform motion behavior in a certain part of the content. The uniform motion behavior provides optimal candidates for efficiently predicting the motion vectors of the current block. The conducted experiments illustrated that the proposed method provides up to 2.2% bitrate reduction and on average around 1% bitrate reduction for the content with high motion characteristics in the VTM test model of Versatile Video Coding (H.266/VVC) standard.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"303 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123664188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}