Sebastian Pospiech, N. Birnbaum, L. Knipping, R. Mertens
{"title":"Personalized Indexing of Attention in Lectures -- Requirements and Concept","authors":"Sebastian Pospiech, N. Birnbaum, L. Knipping, R. Mertens","doi":"10.1109/ISM.2015.44","DOIUrl":"https://doi.org/10.1109/ISM.2015.44","url":null,"abstract":"Web lectures can be employed in a variety of didactic scenarios ranging from add-on for a live lecture to stand-alone learning content. In all of these scenarios, though less in the stand-alone one, indexing and navigation are crucial for real world usability. As a consequence, many approaches like slide based indexing, transcript based indexing, collaborative manual indexing as well as individual or social indexing based on viewing behavior have been devised. The approach proposed in this paper takes individual indexing based on viewing behavior two steps further in that (a) indexes the recording at production time in the lecture hall and (b) actively analyzes the students attention focus instead of passively recording viewing time as done in conventional footprinting. In order to track student attention during the lecture, recoding and analyzing the student's behaviour in parallel to the lecture as well as synchronizing both data streams is necessary. This paper discusses the architecture required for personalized attention based indexing, possible problems and strategies to tackle them.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123122036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seyed Vahid Hosseinioun, Hussein Al Osman, Abdulmotaleb El Saddik
{"title":"Employing Sensors and Services Fusion to Detect and Assess Driving Events","authors":"Seyed Vahid Hosseinioun, Hussein Al Osman, Abdulmotaleb El Saddik","doi":"10.1109/ISM.2015.121","DOIUrl":"https://doi.org/10.1109/ISM.2015.121","url":null,"abstract":"With the remarkable increase in use of sensors in our daily lives, various methods have been devised to detect events in a driving environment using smart-phones as they provide two main advantages: they eliminate the need to have dedicated hardware in vehicles and they are widely accessible. Since rewarding safe driving is an important issue for insurance companies, some companies are implementing Usage-Based Insurance (UBI) as opposed to traditional History-Based plans. The collection of driving events, such as acceleration and turning, is a prerequisite requirement for the adoption of such plans. Mobile phone sensors are capable of detecting whether a car is accelerating or braking, while through service fusion we can detect other events like speeding or instances of severe weather. We propose a new and robust hybrid classification algorithm that detects acceleration-based events with an F1-score of 0.9304 and turn events with an F1-score of 0.9038. We further propose a method for measuring the driving performance index using the detected events.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120960570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the Complementarity of Audio-Visual Structural Regularities for the Classification of Videos into TV-Program Collections","authors":"G. Sargent, P. Hanna, H. Nicolas, F. Bimbot","doi":"10.1109/ISM.2015.133","DOIUrl":"https://doi.org/10.1109/ISM.2015.133","url":null,"abstract":"This article proposes to analyze the structural regularities from the audio and video streams of TV-programs and explore their potential for the classification of videos into program collections. Our approach is based on the spectral analysis of distance matrices representing the short-and long-term dependancies within the audio and visual modalities of a video. We propose to compare two videos by their respective spectral features. We appreciate the benefits brought by the two modalities on the performances in the context of a K-nearest neighbor classification, and we test our approach in the context of an unsupervised clustering algorithm. These evaluations are performed on two datasets of French and Italian TV programs.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116119635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. VenkataPhaniKumar, K. C. R. C. Varma, S. Mahapatra
{"title":"A Novel Two Pass Rate Control Scheme for Variable Bit Rate Video Streaming","authors":"M. VenkataPhaniKumar, K. C. R. C. Varma, S. Mahapatra","doi":"10.1109/ISM.2015.32","DOIUrl":"https://doi.org/10.1109/ISM.2015.32","url":null,"abstract":"In this paper, a novel two-pass rate control scheme is proposed to achieve a consistent visual quality media for variable bit rate (VBR) video streaming. The rate-distortion (RD) characteristics of each frame is used to establish a frame complexity model, which is later used along with statistics collected in the first-pass to derive an optimal quantization parameter for encoding the frame in the second-pass. The experimental results demonstrate that the proposed rate control scheme significantly outperforms the existing rate control mechanism in the Joint Model (JM) reference software in terms of the Peak Signal to Noise Ratio (PSNR) and consistent perceptual visual quality while achieving the target bit rate. Further, the proposed scheme is validated through implementation on a miniature test-bed.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133164391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takahiro Hayashi, Masato Ishimori, N. Ishii, K. Abe
{"title":"Improvement of Image and Video Matting with Multiple Reliability Maps","authors":"Takahiro Hayashi, Masato Ishimori, N. Ishii, K. Abe","doi":"10.1109/ISM.2015.28","DOIUrl":"https://doi.org/10.1109/ISM.2015.28","url":null,"abstract":"In this paper, we propose a framework for extending existing matting methods to actualize more reliable alpha estimation. The key idea of the framework is integration of multiple alpha maps based on their reliabilities. In the proposed framework, the given input image is converted into multiple grayscale images having various luminance appearances. Then, alpha maps are generated corresponding to these grayscale images by utilizing an existing matting method. At the same time reliability maps (single channel images visualizing the reliabilities of the estimated alpha values) are generated. Finally, by combining alpha maps having the highest reliabilities in each local region, one reliable alpha map is generated. The experimental results have shown that reliable alpha estimation can be actualized by the proposed framework.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122761675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Video and Sparse 3D Transform-Domain Collaborative Filtering for Time-of-Flight Depth Maps","authors":"T. Hach, Tamara Seybold, H. Böttcher","doi":"10.1109/ISM.2015.112","DOIUrl":"https://doi.org/10.1109/ISM.2015.112","url":null,"abstract":"This paper proposes a novel strategy for depth video denoising in RGBD camera systems. Today's depth map sequences obtained by state-of-the-art Time-of-Flight sensors suffer from high temporal noise. All high-level RGB video renderings based on the accompanied depth map's 3D geometry like augmented reality applications will have severe temporal flickering artifacts. We approached this limitation by decoupling depth map upscaling from the temporal denoising step. Thereby, denoising is processed on raw pixels including uncorrelated pixel-wise noise distributions. Our denoising methodology utilizes joint sparse 3D transform-domain collaborative filtering. Therein, we extract RGB texture information to yield a more stable and accurate highly sparse 3D depth block representation for the consecutive shrinkage operation. We show the effectiveness of our method on real RGBD camera data and on a publicly available synthetic data set. The evaluation reveals that our method is superior to state-of-the-art methods. Our method delivers improved flicker-free depth video streams for future applications, which are especially sensitive to temporal noise and arbitrary depth artifacts.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122908200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chun-Fu Chen, G. Lee, Yinglong Xia, Wan-Yi Sabrina Lin, T. Suzumura, Ching-Yung Lin
{"title":"Efficient Multi-training Framework of Image Deep Learning on GPU Cluster","authors":"Chun-Fu Chen, G. Lee, Yinglong Xia, Wan-Yi Sabrina Lin, T. Suzumura, Ching-Yung Lin","doi":"10.1109/ISM.2015.119","DOIUrl":"https://doi.org/10.1109/ISM.2015.119","url":null,"abstract":"In this paper, we develop a pipelining schema for image deep learning on GPU cluster to leverage heavy workload of training procedure. In addition, it is usually necessary to train multiple models to obtain a good deep learning model due to the limited a priori knowledge on deep neural network structure. Therefore, adopting parallel and distributed computing appears is an obvious path forward, but the mileage varies depending on how amenable a deep network can be parallelized and the availability of rapid prototyping capabilities with low cost of entry. In this work, we propose a framework to organize the training procedures of multiple deep learning models into a pipeline on a GPU cluster, where each stage is handled by a particular GPU with a partition of the training dataset. Instead of frequently migrating data among the disks, CPUs, and GPUs, our framework only moves partially trained models to reduce bandwidth consumption and to leverage the full computation capability of the cluster. In this paper, we deploy the proposed framework on popular image recognition tasks using deep learning, and the experiments show that the proposed method reduces overall training time up to dozens of hours compared to the baseline method.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125674733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive Crowd Content Generation and Analysis Using Trajectory-Level Behavior Learning","authors":"Sujeong Kim, Aniket Bera, Dinesh Manocha","doi":"10.1109/ISM.2015.89","DOIUrl":"https://doi.org/10.1109/ISM.2015.89","url":null,"abstract":"We present an interactive approach for analyzing crowd videos and generating content for multimedia applications. Our formulation combines online tracking algorithms from computer vision, non-linear pedestrian motion models from computer graphics, and machine learning techniques to automatically compute the trajectory-level pedestrian behaviors for each agent in the video. These learned behaviors are used to detect anomalous behaviors, perform crowd replication, augment crowd videos with virtual agents, and segment the motion of pedestrians. We demonstrate the performance of these tasks using indoor and outdoor crowd video benchmarks consisting of tens of human agents, moreover, our algorithm takes less than a tenth of a second per frame on a multi-core PC. The overall approach can handle dense and heterogeneous crowd behaviors and is useful for realtime crowd scene analysis applications.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129745317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distortion Estimation Using Structural Similarity for Video Transmission over Wireless Networks","authors":"Arun Sankisa, A. Katsaggelos, P. Pahalawatta","doi":"10.1109/ISM.2015.88","DOIUrl":"https://doi.org/10.1109/ISM.2015.88","url":null,"abstract":"Efficient streaming of video over wireless networks requires real-time assessment of distortion due to packet loss, especially because predictive coding at the encoder can cause inter-frame propagation of errors and impact the overall quality of the transmitted video. This paper presents an algorithm to evaluate the expected receiver distortion on the source side by utilizing encoder information, transmission channel characteristics and error concealment. Specifically, distinct video transmission units, Group of Blocks (GOBs), are iteratively built at the source by taking into account macroblock coding modes and motion-compensated error concealment for three different combinations of packet loss. Distortion of these units is then calculated using the structural similarity (SSIM) metric and they are stochastically combined to derive the overall expected distortion. The proposed model provides a more accurate estimate of the distortion that closely models quality as perceived through the human visual system. When incorporated into a content-aware utility function, preliminary experimental results show improved packet ordering & scheduling efficiency and overall video signal at the receiver.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116018657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Viguier, Chung-Ching Lin, H. Aliakbarpour, F. Bunyak, Sharath Pankanti, G. Seetharaman, K. Palaniappan
{"title":"Automatic Video Content Summarization Using Geospatial Mosaics of Aerial Imagery","authors":"R. Viguier, Chung-Ching Lin, H. Aliakbarpour, F. Bunyak, Sharath Pankanti, G. Seetharaman, K. Palaniappan","doi":"10.1109/ISM.2015.124","DOIUrl":"https://doi.org/10.1109/ISM.2015.124","url":null,"abstract":"It is estimated that less than five percent of videos are currently analyzed to any degree. In addition to petabyte-sized multimedia archives, continuing innovations in optics, imaging sensors, camera arrays, (aerial) platforms, and storage technologies indicates that for the foreseeable future existing and new applications will continue to generate enormous volumes of video imagery. Contextual video summarizations and activity maps offers one innovative direction to tackling this Big Data problem in computer vision. The goal of this work is to develop semi-automatic exploitation algorithms and tools to increase utility, dissemination and usage potential by providing quick dynamic overview geospatial mosaics and motion maps. We present a framework to summarize (multiple) video streams from unmanned aerial vehicles (UAV) or drones which have very different characteristics compared to structured commercial and consumer videos that have been analyzed in the past. Using both metadata geospatial characteristics of the video combined with fast low-level image-based algorithms, the proposed method first generates mini-mosaics that can then be combined into geo-referenced meta-mosaics imagery. These geospatial maps enable rapid assessment of hours long videos with arbitrary spatial coverage from multiple sensors by generating quick look imagery, composed of multiple mini-mosaics, summarizing spatiotemporal dynamics such as coverage, dwell time, activity, etc. The overall summarization pipeline was tested on several DARPA Video and Image Retrieval and Analysis Tool (VIRAT) datasets. We evaluate the effectiveness of the proposed video summarization framework using metrics such as compression and hours of viewing time.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"44 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121012492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}