2020 IEEE International Symposium on Multimedia (ISM)最新文献

Automatic Identification of Keywords in Lecture Video Segments 讲座视频片段关键词的自动识别

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00035

Raga Shalini Koka, Farah Naz Chowdhury, Mohammad Rajiur Rahman, T. Solorio, J. Subhlok

{"title":"Automatic Identification of Keywords in Lecture Video Segments","authors":"Raga Shalini Koka, Farah Naz Chowdhury, Mohammad Rajiur Rahman, T. Solorio, J. Subhlok","doi":"10.1109/ISM.2020.00035","DOIUrl":"https://doi.org/10.1109/ISM.2020.00035","url":null,"abstract":"Lecture video is an increasingly important learning resource. However, the challenge of quickly finding the content of interest in a long lecture video is a critical limitation of this format. This paper introduces automatic discovery of keywords (or tags) for lecture video segments to improve navigation. A lecture video is divided into topical segments based on the frame-to-frame similarity of content. A user navigates the lecture video assisted by visual summaries and keywords for the segments. Keywords provide an overview of the content discussed in the segment to improve navigation. The input to the keyword identification algorithm is the text from the video frames extracted by OCR. Automatically discovering keywords is challenging as the suitability of an N-gram to be a keyword depends on a variety of factors including frequency in a segment and relative frequency in reference to the full video, font size, time on screen, and the existence in domain and language dictionaries. This paper explores how these factors are quantified and combined to identify good keywords. The key scientific contribution of this paper is the design, implementation, and evaluation of a keyword selection algorithm for lecture video segments. Evaluation is performed by comparing the keywords generated by the algorithm with the tags chosen by experts on 121 segments of 11 videos from STEM courses.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120940809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Between the Frames - Evaluation of Various Motion Interpolation Algorithms to Improve 360° Video Quality 帧之间-各种运动插值算法的评估，以提高360°视频质量

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00017

S. Fremerey, Frank Hofmeyer, Steve Göring, Dominik Keller, A. Raake

{"title":"Between the Frames - Evaluation of Various Motion Interpolation Algorithms to Improve 360° Video Quality","authors":"S. Fremerey, Frank Hofmeyer, Steve Göring, Dominik Keller, A. Raake","doi":"10.1109/ISM.2020.00017","DOIUrl":"https://doi.org/10.1109/ISM.2020.00017","url":null,"abstract":"With the increasing availability of 360° video content, it becomes important to provide smoothly playing videos of high quality for end users. For this reason, we compare the influence of different Motion Interpolation (MI) algorithms on 360° video quality. After conducting a pre-test with 12 video experts in [3], we found that MI is a useful tool to increase the QoE (Quality of Experience) of omnidirectional videos. As a result of the pretest, we selected three suitable MI algorithms, namely ffmpeg Motion Compensated Interpolation (MCI), Butterflow and Super-SloMo. Subsequently, we interpolated 15 entertaining and realworld omnidirectional videos with a duration of 20 seconds from 30 fps (original framerate) to 90 fps, which is the native refresh rate of the HMD used, the HTC Vive Pro. To assess QoE, we conducted two subjective tests with 24 and 27 participants. In the first test we used a Modified Paired Comparison (M-PC) method, and in the second test the Absolute Category Rating (ACR) approach. In the M-PC test, 45 stimuli were used and in the ACR test 60. Results show that for most of the 360° videos, the interpolated versions obtained significantly higher quality scores than the lower-framerate source videos, validating our hypothesis that motion interpolation can improve the overall video quality for 360° video. As expected, it was observed that the relative comparisons in the M-PC test result in larger differences in terms of quality. Generally, the ACR method lead to similar results, while reflecting a more realistic viewing situation. In addition, we compared the different MI algorithms and can conclude that with sufficient available computing power Super-SloMo should be preferred for interpolation of omnidirectional videos, while MCI also shows a good performance.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125993669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

On Subpicture-based Viewport-dependent 360-degree Video Streaming using VVC 基于子图片的视口依赖的360度视频流使用VVC

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00021

Maryam Homayouni, A. Aminlou, M. Hannuksela

{"title":"On Subpicture-based Viewport-dependent 360-degree Video Streaming using VVC","authors":"Maryam Homayouni, A. Aminlou, M. Hannuksela","doi":"10.1109/ISM.2020.00021","DOIUrl":"https://doi.org/10.1109/ISM.2020.00021","url":null,"abstract":"Virtual reality applications create an immersive experience using 360° video with high resolution and frame rate. However, since the user only views a portion of 360° video according to his/her current viewport, streaming the whole content with high resolution causes bandwidth wastage. To address this issue, viewport-dependent approaches have been proposed such that only the part of the video which falls within user's current viewport is transmitted in high quality while the rest of the content is transmitted in lower quality. The selection of high- and low-quality parts is constantly adapted according to the user's head motion, which requires frequent intra coded frames at switching points, leading to an increment in the overall streaming bitrate. In this paper a viewport-adaptive streaming scheme is introduced, which avoids intra frames at switching points by introducing long intra period for non-changing parts of the content during head motion. This scheme has been realized taking advantage of mixed Video Coding Layer (VCL) Network Abstraction Layer (NAL) unit feature of Versatile Video Coding (VVC) standard. This method reduces bitrate significantly, especially for the sequences with either no or only slow camera motion, which is common for 360° video capturing.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132819889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Better Look Twice - Improving Visual Scene Perception Using a Two-Stage Approach 更好地看两次-使用两阶段方法改善视觉场景感知

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00013

Christopher B. Kuhn, M. Hofbauer, G. Petrovic, E. Steinbach

{"title":"Better Look Twice - Improving Visual Scene Perception Using a Two-Stage Approach","authors":"Christopher B. Kuhn, M. Hofbauer, G. Petrovic, E. Steinbach","doi":"10.1109/ISM.2020.00013","DOIUrl":"https://doi.org/10.1109/ISM.2020.00013","url":null,"abstract":"Accurate visual scene perception plays an important role in fields such as medical imaging or autonomous driving. Recent advances in computer vision allow for accurate image classification, object detection and even pixel-wise semantic segmentation. Human vision has repeatedly been used as an inspiration for developing new machine vision approaches. In this work, we propose to adapt the “zoom lens model” from psychology for semantic scene segmentation. According to this model, humans first distribute their attention evenly across the entire field of view at low processing power. Then, they follow visual cues to look at a few smaller areas with increased attention. By looking twice, it is possible to refine the initial scene understanding without requiring additional input. We propose to perform semantic segmentation the same way. To obtain visual cues for deciding where to look twice, we use a failure region prediction approach based on a state-of-the-art failure prediction method. Then, the second, focused look is performed by a dedicated classifier that reclassifies the most challenging patches. Finally, pixels predicted to be errors are updated in the original semantic prediction. While focusing only on areas with the highest predicted failure probability, we achieve a classification accuracy of over 63% for the predicted failure regions. After updating the initial semantic prediction of 4000 test images from a large-scale driving data set, we reduce the absolute pixel-wise error of 232 road participants by 10% or more.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115890073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Bonsai Style Classification: a new database and baseline results 盆景风格分类:一个新的数据库和基线结果

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00025

Guilherme H. S. Nakahata, A. A. Constantino, Yandre M. G. Costa

{"title":"Bonsai Style Classification: a new database and baseline results","authors":"Guilherme H. S. Nakahata, A. A. Constantino, Yandre M. G. Costa","doi":"10.1109/ISM.2020.00025","DOIUrl":"https://doi.org/10.1109/ISM.2020.00025","url":null,"abstract":"Bonsai consists of an ancient art which is aimed at mimicking a tree in miniature. Despite being original and popular on the Asian continent, Bonsai has been widespread in several parts of the world. There are many techniques for styling the plants, classifying them in different patterns widely known by people who appreciate this art. In this work, we introduce a new database specially created for the development of research on Bonsai styles classification. The database is composed of 700 samples, equally distributed among the seven following classes: Formal Upright, Informal Upright, Slanting, Cascade, Semi Cascade, Literati and Wind Swept. The classes selected to compose the database were chosen considering the five basic styles and two more styles that have distinct characteristics from the others. The database was created by the authors themselves, using images available on the Pinterest platform, and they were subjected to a pre-processing criteria to remove similar photos and resize them. The baseline results presented here were obtained using deep models (CNN architectures) successfully used to address image classification tasks in different application domains: VGG, Xception, DenseNet and InceptionV3. These models were trained on ImageNet and we used transfer learning aiming to adapt it to the current proposal. In order to avoid overfitting, data augmentation was performed during training, along with the dropout method. Experimental results showed that VGG19 model obtained the highest accuracy rate, reaching 89%. In addition, we used DeconvNet and Deep Taylor methods aiming to find a proper explanation for the obtained results. It was noted that the VGG19 model better captured the most important aspects for the classification task investigated here, with a better performance to discriminate and generalize patterns in the task of classifying Bonsai styles.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124178091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Measuring Driver Situation Awareness Using Region-of-Interest Prediction and Eye Tracking 利用兴趣区域预测和眼动追踪测量驾驶员的态势感知

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00022

M. Hofbauer, Christopher B. Kuhn, Lukas Püttner, G. Petrovic, E. Steinbach

{"title":"Measuring Driver Situation Awareness Using Region-of-Interest Prediction and Eye Tracking","authors":"M. Hofbauer, Christopher B. Kuhn, Lukas Püttner, G. Petrovic, E. Steinbach","doi":"10.1109/ISM.2020.00022","DOIUrl":"https://doi.org/10.1109/ISM.2020.00022","url":null,"abstract":"With increasing progress in autonomous driving, the human does not have to be in control of the vehicle for the entire drive. A human driver obtains the control of the vehicle in case of an autonomous system failure or when the vehicle encounters an unknown traffic situation it cannot handle on its own. A critical part of this transition to human control is to ensure a sufficient driver situation awareness. Currently, no direct method to explicitly estimate driver awareness exists. In this paper, we propose a novel system to explicitly measure the situation awareness of the driver. Our approach is inspired by methods used in aviation. However, in contrast to aviation, the situation awareness in driving is determined by the detection and understanding of dynamically changing and previously unknown situation elements. Our approach uses machine learning to define the best possible situation awareness. We also propose to measure the actual situation awareness of the driver using eye tracking. Comparing the actual awareness to the target awareness allows us to accurately assess the awareness the driver has of the current traffic situation. To test our approach, we conducted a user study. We measured the situation awareness score of our model for 8 unique traffic scenarios. The results experimentally validate the accuracy of the proposed driver awareness model.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131050339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Automatic Sparsity-Aware Recognition for Keypoint Detection 关键点检测的自动稀疏感知识别

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00029

Yurui Xie, L. Guan

引用次数: 1

A comparative study of RTC applications RTC应用的比较研究

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00007

A. Nisticò, Dena Markudova, Martino Trevisan, M. Meo, G. Carofiglio

{"title":"A comparative study of RTC applications","authors":"A. Nisticò, Dena Markudova, Martino Trevisan, M. Meo, G. Carofiglio","doi":"10.1109/ISM.2020.00007","DOIUrl":"https://doi.org/10.1109/ISM.2020.00007","url":null,"abstract":"Real-Time Communication (RTC) applications have become ubiquitous and are nowadays fundamental for people to communicate with friends and relatives, as well as for enterprises to allow remote working and save travel costs. Countless competing platforms differ in the ease of use, features they implement, supported user equipment and targeted audience (consumer of business). However, there is no standard protocol or interoperability mechanism. This picture complicates the traffic management, making it hard to isolate RTC traffic for prioritization or obstruction. Moreover, undocumented operation could result in the traffic being blocked at firewalls or middleboxes. In this paper, we analyze 13 popular RTC applications, from widespread consumer apps, like Skype and Whatsapp, to business platforms dedicated to enterprises - Microsoft Teams and Webex Teams. We collect packet traces under different conditions and illustrate similarities and differences in their use of the network. We find that most applications employ the well-known RTP protocol, but we observe a few cases of different (and even undocumented) approaches. The majority of applications allow peer-to-peer communication during calls with only two participants. Six of them send redundant data for Forward Error Correction or encode the user video at different bitrates. In addition, we notice that many of them are easy to identify by looking at the destination servers or the domain names resolved via DNS. The packet traces we collected, along with the metadata we extract, are made available to the community.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125319626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

[Copyright notice] (版权)

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ism.2020.00003

引用次数: 0

Can You All Look Here? Towards Determining Gaze Uniformity In Group Images 你们能看这里吗?群体图像凝视均匀性的确定

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00024

Omkar N. Kulkarni, Vikram Patil, Shivam B. Parikh, Shashank Arora, P. Atrey

引用次数: 4