Rafael Zequeira Jiménez, Gabriel Mittag, S. Möller
{"title":"Effect of Number of Stimuli on Users Perception of Different Speech Degradations. A Crowdsourcing Case Study","authors":"Rafael Zequeira Jiménez, Gabriel Mittag, S. Möller","doi":"10.1109/ISM.2018.00-16","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-16","url":null,"abstract":"Crowdsourcing (CS) has established as a powerful tool to collect human input for data acquisition and labeling. However, it remains the question about the validity of the data collected in a CS platform. Sometimes, the users work carelessly or they try to tweak the system to maximize their profits. This paper reports on whether the number of speech stimuli presented to the listeners has an impact on the user perception of certain degradation conditions applied to the speech signal. To this end, a crowdsourcing study has been conducted with 209 listeners that were divided in three non-overlapping user groups, each of which was presented with tasks containing a different number of stimuli: 10, 20, or 40. Listeners were asked to rate speech stimuli with respect to their overall quality and the ratings were collected on a 5-point scale in accordance with ITU-T Rec. P.800. Workers assessed the speech stimuli of the database 501 from ITU-T Rec. P.863. Additionally, the influence of certain speech signal characteristics, such as interruptions and bandwidth, on the quality perception of the workers was investigated.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115912284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Reinforcement Learning with Parameterized Action Space for Object Detection","authors":"Zheng Wu, N. Khan, Lei Gao, L. Guan","doi":"10.1109/ISM.2018.00025","DOIUrl":"https://doi.org/10.1109/ISM.2018.00025","url":null,"abstract":"Object detection is a fundamental task in computer vision. With the remarkable progress made in big visual data analytics and deep learning, Reinforcement Learning (RL) is becoming a promising framework to model the object detection problem since the detection procedure can be cast as a Markov decision process (MDP). We propose a Reinforcement Learning system with parameterized action space for image object detection. The proposed system uses an active agent exploring in a scene to identify the location of a target object, and learns a policy to refine the geometry of the agent by taking simple actions in parameterized space, which integrates the discrete actions and its corresponding continuous parameters. We then optimize the representation of the generated region proposals with the discriminative multiple canonical correlation analysis (DMCCA) [11] in preparation for classification with Fast R-CNN. Experiments on PASCAL VOC 2007 and 2012 datasets show the effectiveness of the proposed method.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129933420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Convolutional Network Based Foreground Feature Fusion","authors":"Hanjian Song, Lihua Tian, Chen Li","doi":"10.1109/ISM.2018.00036","DOIUrl":"https://doi.org/10.1109/ISM.2018.00036","url":null,"abstract":"with explosion of videos, action recognition has become an important research subject. This paper makes a special effort to investigate and study 3D Convolutional Network. Focused on the problem of ConvNet dependence on multiple large scale dataset, we propose a 3D ConvNet structure which incorporate the original 3D-ConvNet features and foreground 3D-ConvNet features fused by static object and motion detection. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, experimental results demonstrate that with merely 50% pixels utilization, foreground ConvNet achieves satisfying performance as same as origin. With feature fusion, we achieve 83.7% accuracy on UCF-101 exceeding original ConvNet.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132524734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashok Shrestha, Truong X. Tran, R. S. Aygün, M. Pusey
{"title":"Mobile Scanner for Protein Crystallization Plates","authors":"Ashok Shrestha, Truong X. Tran, R. S. Aygün, M. Pusey","doi":"10.1109/ISM.2018.000-5","DOIUrl":"https://doi.org/10.1109/ISM.2018.000-5","url":null,"abstract":"Protein crystallization well plate is a rectangular platform that contains wells usually organized as a grid structure. The crystallization conditions are studied through a screening process by setting up the trial conditions in the well plate. In the past, the expert evaluates the trial wells for the growth of crystals by manually viewing the plate under a microscope or using a high-throughput plate imaging and analysis system. While the first method is tedious and cumbersome, the second method requires financial investment. Recently, a few approaches were developed by collecting images using smartphones thus enabling low-cost automatic scoring (classification) of well images. Nevertheless, these recent methods do not detect which well on the plate is captured. If the user has a smartphone, the user may capture or scan any well by just moving the smartphone to the corresponding well. In this paper, we propose a mobile scanner that identifies the well by using a coded template under the well plate. The mobile scanner provides two modes: image and video. Image mode is used for single well analysis whereas video mode is used to scan the complete plate. In the video mode, the mobile scanner app generates a tilemap of the plate.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131492579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun-Xiang Xu, Tzu-Ching Lin, Tsai-Ching Yu, Tzu-Chiang Tai, P. Chang
{"title":"Acoustic Scene Classification Using Reduced MobileNet Architecture","authors":"Jun-Xiang Xu, Tzu-Ching Lin, Tsai-Ching Yu, Tzu-Chiang Tai, P. Chang","doi":"10.1109/ISM.2018.00038","DOIUrl":"https://doi.org/10.1109/ISM.2018.00038","url":null,"abstract":"Sounds are ubiquitous in our daily lives, for instance, sounds of vehicles or sounds of conversations between people. Therefore, it is easy to collect all these soundtracks and categorize them into different groups. By doing so, we can use these assets to recognize the scene. Acoustic scene classification allows us to do so by training our machine which can further be installed on devices such as smartphones. This provides people with convenience which improves our lives. Our goal is to maximize our validation rate of our machine learning results and also optimize our usage of hardware. We utilize the dataset from IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) to train our machine. The data of DCASE 2017 contains 15 different kinds of outdoor audio recordings, including beach, bus, restaurant etc. In this work, we use two different types of signal processing techniques which are Log Mel and HPSS (Harmonic-Percussive Sound Separation). Next we modify and reduce the MobileNet structure to train our dataset. We also make use of fine-tuning and late fusion to make our results more accurate and to improve our performances. With the structure aforementioned, we succeed in reaching the validation rate of 75.99% which is approximately the seventh highest performing group of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2017, with less computational complexity comparing with others having higher accuracy. We deem it a worthy trade-off.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130030571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Copyright notice]","authors":"","doi":"10.1109/ism.2018.00003","DOIUrl":"https://doi.org/10.1109/ism.2018.00003","url":null,"abstract":"","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121629904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joni Rasanen, Marko Viitanen, Jarno Vanne, T. Hämäläinen
{"title":"Live Demonstration: Kvazzup 4K HEVC Video Call","authors":"Joni Rasanen, Marko Viitanen, Jarno Vanne, T. Hämäläinen","doi":"10.1109/ISM.2018.00-13","DOIUrl":"https://doi.org/10.1109/ISM.2018.00-13","url":null,"abstract":"This paper describes a demonstration setup for an end-to-end 4K video call with Kvazzup open-source HEVC video call application. The Kvazzup clients are installed on a desktop and a laptop computer powered by Intel 22-core Xeon and Intel 4-core i7 processors, respectively. The proposed two-way peer-to-peer video call setup is shown to support 2160p30 video stream from the desktop to the laptop and 720p30 stream in the reverse direction.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121930770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salah Rabba, M. Kyan, Lei Gao, A. Quddus, A. S. Zandi, L. Guan
{"title":"Discriminative Robust Gaze Estimation Using Kernel-DMCCA Fusion","authors":"Salah Rabba, M. Kyan, Lei Gao, A. Quddus, A. S. Zandi, L. Guan","doi":"10.1109/ISM.2018.00064","DOIUrl":"https://doi.org/10.1109/ISM.2018.00064","url":null,"abstract":"The proposed framework employs discriminative analysis for gaze estimation using kernel discriminative multiple canonical correlation analysis (K-DMCCA), which represents different feature vectors that account for variations of head pose, illumination and occlusion. The feature extraction component of the framework includes spatial indexing, statistical and geometrical elements. Gaze estimation is constructed by feature aggregation and transforming features into a higher dimensional space using the RBF kernel ���� and spread factor. The output of fused features through K-DMCCA is robust to illumination, occlusion and is calibration free. Our algorithm is validated on MPII, CAVE, ACS and EYEDIAP datasets. The two main contributions of the framework are the following: Enhancing the performance of DMCCA with the kernel and introducing quadtree as an iris region descriptor. Spatial indexing using quadtree is a robust method for detecting which quadrant the iris is situated, detecting the iris boundary and it is inclusive of statistical and geometrical indexing that are calibration free. Our method achieved an accurate gaze estimation of 4.8º using Cave, 4.6° using MPII, 5.1º using ACS and 5.9° using EYEDIAP datasets respectively. The proposed framework provides insight into the methodology of multi-feature fusion for gaze estimation.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121100701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Honoka Kakimoto, Yuanyuan Wang, Yukiko Kawai, K. Sumiya
{"title":"Extraction of Movie Trailer Biases Based on Editing Features for Trailer Generation","authors":"Honoka Kakimoto, Yuanyuan Wang, Yukiko Kawai, K. Sumiya","doi":"10.1109/ISM.2018.000-6","DOIUrl":"https://doi.org/10.1109/ISM.2018.000-6","url":null,"abstract":"Currently, movie trailers are edited using various methods. However, the length of each trailer is at most several minutes, and the scenes used for editing and the types of effects are limited because a trailer is created for a certain target audience. Therefore, it is difficult to edit a trailer that caters to the different preferences of various users. Moreover, potential audience may be lost if the trailer is not enticing enough. To solve this problem, we define seven editing biases that occur when movies are summarized and edited into trailers. We investigate whether these biases can be used to generate a movie trailer catering to various viewer preference.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127800854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ehab M. Ibrahim, Emad Badry, A. Abdelsalam, I. Abdalla, M. Sayed, Hossam Shalaby
{"title":"Neural Networks Based Fractional Pixel Motion Estimation for HEVC","authors":"Ehab M. Ibrahim, Emad Badry, A. Abdelsalam, I. Abdalla, M. Sayed, Hossam Shalaby","doi":"10.1109/ISM.2018.00027","DOIUrl":"https://doi.org/10.1109/ISM.2018.00027","url":null,"abstract":"High Efficiency Video Coding (HEVC) provides more compression than its predecessors. One of the modules that contributes to higher compression rates is the Motion Estimation module, which consists of Integer and Fractional pixel motion estimation. The Fractional Motion Estimation (FME) process performs interpolations to find sample values at fractional-pixel locations, which can be computationally demanding. In this paper, we propose an interpolation-free method for FME based on Artificial Neural Networks (ANNs). Our proposed method is implemented in HEVC reference software (HM-16.9). According to our results, ANNs can accomplish FME task with an average increase of 2.6% in BDRate and an average reduction of 0.09 dB in BD-PSNR.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131796622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}