{"title":"An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content","authors":"Benjamin Elizalde, Howard Lei, G. Friedland","doi":"10.1109/ISM.2013.27","DOIUrl":"https://doi.org/10.1109/ISM.2013.27","url":null,"abstract":"Audio-based video event detection (VED) on user-generated content (UGC) aims to find videos that show an observable event such as a wedding ceremony or birthday party rather than a sound, such as music, clapping or singing. The difficulty of video content analysis on UGC lies in the acoustic variability and lack of structure of the data. The UGC task has been explored mainly by computer vision, but can be benefited by the used of audio. The i-vector system is state-of-the-art in Speaker Verification, and is outperforming a conventional Gaussian Mixture Model (GMM)-based approach. The system compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC. This paper employs the i-vector-based system for audio-based VED on UGC and expands the understanding of the system on the task. It also includes a performance comparison with the conventional GMM-based and state-of-the-art Random Forest (RF)-based systems. The i-vector system aids audio-based event detection by addressing UGC audio characteristics. It outperforms the GMM-based system, and is competitive with the RF-based system in terms of the Missed Detection (MD) rate at 4% and 2.8% False Alarm (FA) rates, and complements the RF-based system by demonstrating slightly improvement in combination over the standalone systems.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"61 1","pages":"114-117"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82634096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two-Phase Generation Model for Automatic Image Annotation","authors":"Liang Xie, Peng Pan, Yansheng Lu, Shixun Wang, Tong Zhu, Haijiao Xu, Deng Chen","doi":"10.1109/ISM.2013.33","DOIUrl":"https://doi.org/10.1109/ISM.2013.33","url":null,"abstract":"Automatic image annotation is an important task for multimedia retrieval. By allocating relevant words to un-annotated images, these images can be retrieved in response to textual queries. There are many researches on the problem of image annotation and most of them construct models based on joint probability or posterior probabilities of words. In this paper we estimate the probabilities that words generate the images, and propose a two-phase generation model for the generation procedure. Each word first generates its related words, then these words generate an un-annotated image, and the relation between the words and the un-annotated image is obtained by the probability of the two-phase generation. The textual words usually contain more semantic information than visual content of images, thus the probabilities that words generate images is more reliable than the probability that images generate words. As a result, our model estimates the more reliable probability than other probabilistic methods for image annotation. The other advantage of our model is the relation of words is taken into consideration. The experimental results on Corel 5K and MIR Flickr demonstrate that our model performs better than other previous methods. And two-phase generation which considering word's relation for annotation is better than one-phase generation which only consider the relation between words and images. Moreover, the methods which estimate the generative probability obtain better performance than SVM which estimates the posterior probability.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"15 1","pages":"155-162"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82759233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive HTTP Streaming Utilizing Temporal Sub-layers of High Efficiency Video Coding (HEVC)","authors":"S. Deshpande","doi":"10.1109/ISM.2013.73","DOIUrl":"https://doi.org/10.1109/ISM.2013.73","url":null,"abstract":"The newly approved High Efficiency Video Coding Standard (HEVC) includes temporal sub-layering feature which provides temporal scalability. Two types of pictures - Temporal Sub-layer Access Pictures and Step-wise Temporal Sub-layer Access Pictures are provided for this purpose. This paper utilizes the temporal scalability of HEVC to provide bandwidth adaptive HTTP streaming to clients. We describe our HTTP streaming algorithm, which is media timeline aware. Temporal sub-layers are switched on the server side dynamically. We performed subjective tests to determine user perception regarding acceptable frame rates when using temporal scalability of HEVC. These results are used to control the algorithm's temporal switching behavior to provide a good quality of experience to the user. We applied Internet and 3GPP error-delay patterns to validate the performance of our algorithm.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"13 1","pages":"384-390"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86585977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constraint Satisfaction Programming for Video Summarization","authors":"Sid-Ahmed Berrani, Haykel Boukadida, P. Gros","doi":"10.1109/ISM.2013.38","DOIUrl":"https://doi.org/10.1109/ISM.2013.38","url":null,"abstract":"This paper addresses the problem of automatic video summarization. The proposed solution relies on constraint satisfaction programming (CSP). Summary generation rules are expressed as constraints and the summary is created using the CSP solver given the input video, its audio-visual features and possibly user parameters (like the desired duration). The solution clearly separates production rules from the generation algorithm, which in practice allows users to easily express their constraints and preferences and also to modify them w.r.t. the target application. The solution is extensively evaluated in the context of tennis match summarization.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"29 1","pages":"195-202"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89376710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development and Preliminary Evaluation of an Interactive System to Support CAD Teaching","authors":"S. Akhtar, S. Warburton, W. Xu","doi":"10.1109/ISM.2013.92","DOIUrl":"https://doi.org/10.1109/ISM.2013.92","url":null,"abstract":"It has been a goal for many researchers to make education more enjoyable, attractive and effective through the use of multimedia technology [1]. Achieving this goal requires rich interactive communication between students and tutors and a clear understanding of the educational environment. Despite the availability of wide range of commercial systems that can support multimedia within the classroom there remains a gap for an innovative system that can provide a blended approach to providing support for live teaching sessions. This paper introduces Surrey Connect, a bespoke system, designed to enhance the teaching and learning experience in large classroom settings. It provides lecture recording with selective replay, implicit and explicit responses and multiple tutors support. In addition to its interactive user interface, Surrey Connect acts an early warning system by monitoring learners' in-class behaviour, presenting it in an interactive dashboard to the tutor and suggesting interventions in the light of rule-based programmable knowledge. The system has been tested in a real classroom environment of Computer Aided Design course with more than 150 students and received over 60% positive feedback from the students.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"36 1","pages":"480-485"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85687627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent and Selective Video Frames Discarding Policies for Improving Video Quality over Wired/Wireless Networks","authors":"Khalid A. Darabkh, Abeer M. Awad, A. Khalifeh","doi":"10.1109/ISM.2013.57","DOIUrl":"https://doi.org/10.1109/ISM.2013.57","url":null,"abstract":"Although IEEE 802.11 Wireless LAN (WLAN) is of a great interest nowadays, it lacks the efficient capability of supporting real-time streaming which is due to mainly the contention nature of wireless media. In this paper, we extend our earlier work concerning improving video traffic over wireless networks through effectively studying the dependencies between video frames and their implications on the overall network performance. In other words, we propose very efficient and novel algorithms that aim to minimize the cost of possible losses by intelligently and selectively discard frames based on their contribution to picture quality, namely, partial and intelligent-partial frame discarding policies considering the dependencies between video frames. The performance metrics that are employed to evaluate the performance of the proposed algorithms include the rate of non-decodable frames and peak signal-to-noise ratio (PSNR). Our results are promising and show significant improvements in the perceived video quality over what is relevant in the current literature.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"17 1","pages":"297-300"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80183755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Recombination of Evolving Guitar Sounds (DREGS): A Genetic Algorithm Approach to Guitar Synthesizer Control","authors":"Timothy M. Walker, Sean Whalen","doi":"10.1109/ISM.2013.47","DOIUrl":"https://doi.org/10.1109/ISM.2013.47","url":null,"abstract":"A system is described which integrates multiple hardware interfaces and software packages in order to control the parameters of a guitar synthesizer in real time. An interactive genetic algorithm is developed in order to create and explore parameter settings, and a mobile device wirelessly sets the fitness values. The synthesizer parameters are represented as genes within an individual, and individuals dynamically interact within a population as the user rates the resulting sounds by changing orientation.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"73 9 1","pages":"248-254"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76381438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Longitudinal Characterization of Breast Morphology during Reconstructive Surgery","authors":"Lijuan Zhao, Shishir K. Shah, F. Merchant","doi":"10.1109/ISM.2013.79","DOIUrl":"https://doi.org/10.1109/ISM.2013.79","url":null,"abstract":"Quantitative analysis of breast morphology facilitates pre-operative planning and post-operative outcome assessments in breast reconstruction. Our project is developing algorithms to quantify changes in local breast morphology occurring over time. The project encompasses three topics: (1) Three-dimensional (3D) images registration, (2) Breast contour detection, and (3) Quantitative analysis of local breast morphology changes. We developed a semi-automated 3D image registration algorithm. We have also developed an approach to directly compute breast contour on 3D images. In the future, we will improve existing and develop additional algorithms to fulfill our project goals.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"9 1","pages":"407-408"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85869288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VideoTopic: Content-Based Video Recommendation Using a Topic Model","authors":"Qiusha Zhu, M. Shyu, Haohong Wang","doi":"10.1109/ISM.2013.41","DOIUrl":"https://doi.org/10.1109/ISM.2013.41","url":null,"abstract":"Most video recommender systems limit the content to the metadata associated with the videos, which could lead to poor results since metadata is not always available or correct. Meanwhile, the visual information of videos is typically not fully explored, which is especially important for recommending new items with limited metadata information. In this paper, a novel content-based video recommendation framework, called Video Topic, that utilizes a topic model is proposed. It decomposes the recommendation process into video representation and recommendation generation. It aims to capture user interests in videos by using a topic model to represent the videos, and then generates recommendations by finding those videos that most fit to the topic distribution of the user interests. Experimental results on the Movie Lens dataset validate the effectiveness of Video Topic by evaluating each of its components and the whole framework.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"1 1","pages":"219-222"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86446020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contextualized Privacy Filters in Video Surveillance Using Crowd Density Maps","authors":"H. Fradi, A. Melle, J. Dugelay","doi":"10.1109/ISM.2013.23","DOIUrl":"https://doi.org/10.1109/ISM.2013.23","url":null,"abstract":"The widespread growth in the adoption of digital video surveillance systems emphasizes the need for privacy preservation video analytics techniques. While these privacy aspects have shown big interest in recent years, little importance has been given to the concept of context-aware privacy protection filters. In this paper, we specifically focus on the dependency between privacy preservation and crowd density. We show that additional information about the crowd density in the scene can be used in order to adjust the level of privacy protection according to the local needs. This additional information cue consists of modeling time-varying dynamics of the crowd density using local features as an observation of a probabilistic crowd function. It also involves a feature tracking step which enables excluding feature points on the background. This process is favourable for the later density function estimation since the influence of features irrelevant to the underlying crowd density is removed. Then, the protection level of personal privacy in videos is adapted according to the crowd density. Afterwards, a framework for objective evaluation of the contextualized protection filters is proposed. The effectiveness of the proposed context-aware privacy filters has been demonstrated by assessing the intelligibility vs. privacy trade-off using videos from different crowd datasets.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"1 1","pages":"92-99"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88620978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}