Omkar N. Kulkarni, Vikram Patil, Shivam B. Parikh, Shashank Arora, P. Atrey
{"title":"Can You All Look Here? Towards Determining Gaze Uniformity In Group Images","authors":"Omkar N. Kulkarni, Vikram Patil, Shivam B. Parikh, Shashank Arora, P. Atrey","doi":"10.1109/ISM.2020.00024","DOIUrl":"https://doi.org/10.1109/ISM.2020.00024","url":null,"abstract":"Since the advent of the smartphone, the number of group images taken every day is rising exponentially. The photographers' struggle is to make sure everyone looks at the camera while taking the picture. More specifically, in a group image, if everybody is not looking in the same direction, then the image's aesthetic quality and utility are depreciated. The photographer usually discards the image, and then subsequently, several images are taken to mitigate this issue. Usually, users have to manually check if the image is uniformly gazed, which is tedious and time-consuming. This paper proposes a method for classifying a given group image as uniformly gazed or nonuniformly gazed by calculating the Gaze Uniformity Index. We evaluate the proposed method on a subset of the ‘Images of Groups' dataset. The proposed method achieved an accuracy of 67%.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"76 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116310016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Effective Rotational Invariant Key-point Detector for Image Matching","authors":"Thanh Hong-Phuoc, L. Guan","doi":"10.1109/ISM.2020.00043","DOIUrl":"https://doi.org/10.1109/ISM.2020.00043","url":null,"abstract":"Traditional detectors e.g. Harris, SIFT, SFOP... are known inflexible in different contexts as they solely target corners, blobs, junctions or other specific human-designed structures. To account for this inflexibility and additionally their unreliability under non-uniform lighting change, recently, a Sparse Coding based Key-point detector (SCK) relying on no human-designed structures and invariant to non-uniform illumination change was proposed. Yet, geometric transformations such as rotation are not considered in SCK. Thus, a novel Rotational Invariant SCK called RI-SCK is proposed in this paper. To make SCK rotational invariant, an effective use of multiple rotated versions of the original dictionary in the sparse coding step of SCK is proposed. A novel strength measure is also introduced for comparison of key-points across image pyramid levels if scale invariance is required. Experimental results on three public datasets have confirmed that significant gains in repeatability and matching score could be achieved by the proposed detector.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116841336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time Spatio-Temporal Action Localization in 360 Videos","authors":"Bo Chen, A. Ali-Eldin, P. Shenoy, K. Nahrstedt","doi":"10.1109/ISM.2020.00018","DOIUrl":"https://doi.org/10.1109/ISM.2020.00018","url":null,"abstract":"Spatio-temporal action localization of human actions in a video has been a popular topic over the past few years. It tries to localize the bounding boxes, the time span and the class of one action, which summarizes information in the video and helps humans understand it. Though many approaches have been proposed to solve this problem, these efforts have only focused on perspective videos. Unfortunately, perspective videos only cover a small field-of-view (FOV), which limits the capability of action localization. In this paper, we develop a comprehensive approach to real-time spatio-temporallocalization that can be used to detect actions in 360 videos. We create two datasets named UCF-101-24-360 and JHMDB-21-360 for our evaluation. Our experiments show that our method consistently outperforms other competing approaches and achieves a real-time processing speed of 15fps for 360 videos.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"601 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116177255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. G. Catharin, Rafael P. Ribeiro, C. Silla, Yandre M. G. Costa, V. D. Feltrim
{"title":"Multimodal Classification of Emotions in Latin Music","authors":"L. G. Catharin, Rafael P. Ribeiro, C. Silla, Yandre M. G. Costa, V. D. Feltrim","doi":"10.1109/ISM.2020.00038","DOIUrl":"https://doi.org/10.1109/ISM.2020.00038","url":null,"abstract":"In this study we classified the songs of the Latin Music Mood Database (LMMD) according to their emotion using two approaches: single-step classification, which consists of classifying the songs by emotion, valence, arousal and quadrant; and multistep classification, which consists of using the predictions of the best valence and arousal classifiers to classify quadrants and the best valence, arousal and quadrant predictions as features to classify emotions. Our hypothesis is that breaking the emotion classification in smaller problems would reduce complexity and improve results. Our best single-step emotion and valence classifiers used multimodal sets of features extracted from lyrics and audio. Our best arousal classifier used features extracted from lyrics and SMOTE to mitigate the dataset imbalance. The proposed multistep emotion classifier, which uses the predictions of a multistep quadrant classifier, improved the single-step classifier performance, reaching 0.605 of mean f-measure. These results show that using valence, arousal, and consequently, quadrant information can improve the prediction of specific emotions.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130585477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jounsup Park, Mingyuan Wu, Kuan-Ying Lee, Bo Chen, K. Nahrstedt, M. Zink, R. Sitaraman
{"title":"SEAWARE: Semantic Aware View Prediction System for 360-degree Video Streaming","authors":"Jounsup Park, Mingyuan Wu, Kuan-Ying Lee, Bo Chen, K. Nahrstedt, M. Zink, R. Sitaraman","doi":"10.1109/ISM.2020.00016","DOIUrl":"https://doi.org/10.1109/ISM.2020.00016","url":null,"abstract":"Future view prediction for a 360-degree video streaming system is important to save the network bandwidth and improve the Quality of Experience (QoE). Historical view data of a single viewer and multiple viewers have been used for future view prediction. Video semantic information is also useful to predict the viewer's future behavior. However, extracting video semantic information requires powerful computing hardware and large memory space to perform deep learning-based video analysis. It is not a desirable condition for most of client devices, such as small mobile devices or Head Mounted Display (HMD). Therefore, we develop an approach where video semantic analysis is executed on the media server, and the analysis results are shared with clients via the Semantic Flow Descriptor (SFD) and View-Object State Machine (VOSM). SFD and VOSM become new descriptive additions of the Media Presentation Description (MPD) and Spatial Relation Description (SRD) to support 360-degree video streaming. Using the semantic-based approach, we design the Semantic-Aware View Prediction System (SEAWARE) to improve the overall view prediction performance. The evaluation results of 360-degree videos and real HMD view traces show that the SEAWARE system improves the view prediction performance and streams high-quality video with limited network bandwidth.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131166589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deriving Strategies for the Evaluation of Spaced Repetition Learning in Mobile Learning Applications from Learning Analytics","authors":"Florian Schimanke, R. Mertens","doi":"10.1109/ISM.2020.00049","DOIUrl":"https://doi.org/10.1109/ISM.2020.00049","url":null,"abstract":"Evaluating the success of learning technologies with respect to improvement in the learners' abilities and knowledge is not an easy task. The problem is formed by the existence of many different definitions with different perspectives like grades on the one hand and workplace performance on the other. This paper reviews definitions from the literature with the aim to find a suitable definition for the evaluation of learning success in spaced repetition based mobile learning for knowledge improvement. It also borrows approaches from learning analytics to tackle the fact that learner groups are heterogeneous which leads to the need of analyzing learning success differently in different groups of learners.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127861740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two types of flows admission control method for maximizing all user satisfaction considering seek-bar operation","authors":"Keisuke Ode, S. Miyata","doi":"10.1109/ISM.2020.00048","DOIUrl":"https://doi.org/10.1109/ISM.2020.00048","url":null,"abstract":"In recent years, the available network bandwidth is decreased by increasing the mobile devices such as a smartphones or tablets. Quality of Service (QoS) control is required to guarantee the communication quality for users of the network. As one of QoS control techniques, which judges a new arrival streaming application (=flow) can be accommodate in the network has been proposed. Conventional admission control methods that focus on the user's cooperative behaviour have been proposed. In general, some users use the video navigation tools like a seek-bar to jump to different time position because they want to watch some specific scenes. However, this conventional method does not assume the user's behaviour. In this paper, we propose an admission control to maximize user satisfaction by considering user behaviour to reduce the video time. We evaluate our proposed method by numerical analysis using queuing theory and show the effectiveness of our proposed method.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115576733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structured Pruning of LSTMs via Eigenanalysis and Geometric Median for Mobile Multimedia and Deep Learning Applications","authors":"Nikolaos Gkalelis, V. Mezaris","doi":"10.1109/ISM.2020.00028","DOIUrl":"https://doi.org/10.1109/ISM.2020.00028","url":null,"abstract":"In this paper, a novel structured pruning approach for learning efficient long short-term memory (LSTM) network architectures is proposed. More specifically, the eigenvalues of the covariance matrix associated with the responses of each LSTM layer are computed and utilized to quantify the layers' redundancy and automatically obtain an individual pruning rate for each layer. Subsequently, a Geometric Median based (GM-based) criterion is used to identify and prune in a structured way the most redundant LSTM units, realizing the pruning rates derived in the previous step. The experimental evaluation on the Penn Treebank text corpus and the large-scale YouTube-8M audio-video dataset for the tasks of word-level prediction and visual concept detection, respectively, shows the efficacy of the proposed approach1.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114617084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SumBot: Summarize Videos Like a Human","authors":"Hongxiang Gu, Stefano Petrangeli, Viswanathan Swaminathan","doi":"10.1109/ISM.2020.00044","DOIUrl":"https://doi.org/10.1109/ISM.2020.00044","url":null,"abstract":"Video currently accounts for 70% of all internet traffic and this number is expected to continue to grow. Each minute, more than 500 hours worth of videos are uploaded on YouTube. Generating engaging short videos out of the raw captured content is often a time-consuming and cumbersome activity for content creators. Existing ML- based video summarization and highlight generation approaches often neglect the fact that many summarization tasks require specific domain knowledge of the video content, and that human editors often follow a semistructured template when creating the summary (e.g. to create the highlights for a sport event). We therefore address in this paper the challenge of creating domain-specific summaries, by actively leveraging this editorial template. Particularly, we present an Inverse Reinforcement Learning (IRL)-based framework that can automatically learn the hidden structure or template followed by a human expert when generating a video summary for a specific domain. Particularly, we propose to formulate the video summarization task as a Markov Decision Process, where each state is a combination of the features of the video shots added to the summary, and the possible actions are to include/remove a shot from the summary or leave it as is. Using a set of domain-specific human-generated video highlights as examples, we employ a Maximum Entropy IRL algorithm to learn the implicit reward function governing the summary generation process. The learned reward function is then used to train an RL-agent that can produce video summaries for a specific domain, closely resembling what a human expert would create. Learning from expert demonstrations allows our approach to be applicable to any domain or editorial styles. To demonstrate the superior performance of our approach, we employ it to the task of soccer games highlight generation and show that it outperforms other state-of-the-art methods, both quantitatively and qualitatively.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"330 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122831691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2020 IEEE International Symposium on Multimedia ISM 2020","authors":"","doi":"10.1109/ism.2020.00001","DOIUrl":"https://doi.org/10.1109/ism.2020.00001","url":null,"abstract":"","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126224254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}