R. Petrocco, Michael Eberhard, J. Pouwelse, D. Epema
{"title":"Deftpack: A Robust Piece-Picking Algorithm for Scalable Video Coding in P2P Systems","authors":"R. Petrocco, Michael Eberhard, J. Pouwelse, D. Epema","doi":"10.1109/ISM.2011.52","DOIUrl":"https://doi.org/10.1109/ISM.2011.52","url":null,"abstract":"The volume of Internet video is growing, and is expected to exceed 57 percent of global consumer Internet traffic by 2014. Peer-to-Peer technology can help delivering this massive volume of traffic in a cost-efficient, scalable, and reliable manner. However, single bit rate streaming is not sufficient given today's device and network connection diversity. A possible solution to this problem is provided by layered coding techniques, such as Scalable Video Coding, which allow addressing this diversity by providing content in various qualities within a single bit stream. In this paper we propose a new self-adapting piece-picking algorithm for downloading layered video streams, called Deft pack. Our algorithm significantly reduces the number of stalls, minimises the frequency of quality changes during playback, and maximizes the effective usage of the available bandwidth. Deft pack is the first algorithm that is specifically crafted to take all these three quality dimensions into account simultaneously, thus increasing the overall quality of experience. Additionally, Deft pack can be integrated into Bit torrent-based P2P systems and so has the chance of large-scale deployment. Our results from realistic swarm simulations show that Deft pack significantly outperforms previously proposed algorithms for retrieving layered content when all three quality dimensions are taken into account.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130074725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Masking Effect of Out-of-sync News Content with Different Distractors","authors":"S. Buchinger","doi":"10.1109/ISM.2011.86","DOIUrl":"https://doi.org/10.1109/ISM.2011.86","url":null,"abstract":"The major aim of this paper consists of detecting possible masking effects across different media types in a realistic scenario. For this purpose, we presented out-of-sync news clips containing no, one or several additional content elements such as a speaker only or a narrator complemented by a picture, a video, a ticker, background music or noise, to several evaluators. Similar to several previous studies it has been noticed that out-of-sync errors are perceived much earlier when audio presentation precedes video. Furthermore, it has been observed that users prefer simple formats, i.e., a news speaker only, as long as the synchronization errors are low enough not to be noticed consciously. As soon as the time difference between the playback of visual and audible streams starts to be perceivable the viewers can be distracted by using a large number of different content elements. Using one distraction element only is not sufficient to produce a masking effect.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116515907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Audio Quality Assessment Improvement via Circular and Flexible Overlap","authors":"Mengyao Zhu, Jia Zheng, Xiaoqing Yu, W. Wan","doi":"10.1109/ISM.2011.17","DOIUrl":"https://doi.org/10.1109/ISM.2011.17","url":null,"abstract":"This paper proposed an improved audio quality metric via circular and flexible overlap. Based on the Power Spectrum Estimation via circular overlap, we use a novel circular overlap sub-frame to assess highly impaired audio. The relationship between the fraction of overlap and the metric of audio quality is also examined, and it has been proven that the accuracy of audio quality assessment increased with the fraction of overlap. By integrating circular and flexible overlap into ITU-R BS.1387, which is also known as Perceptual Evaluation of Audio Quality, our method can be applied to the quality assessment of highly impaired audio.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128317276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regions of Interest Extraction Based on Visual Saliency in Compressed Domain","authors":"L. Sui, Jing Zhang, L. Zhuo, Yuncong Yang","doi":"10.1109/ISM.2011.107","DOIUrl":"https://doi.org/10.1109/ISM.2011.107","url":null,"abstract":"Recently bag-of-words (BoW) model having been widely used in textual information processing has been extended into many tasks in visual domain such as image classification, scene analysis, image annotation and image retrieval, namely bag-of-visual-words (BoVW) model. Therefore, it is essential to create an effective visual vocabulary. Most of existing approaches create visual vocabularies from image in pixel domain, which requires extra processing time in decompressed images, since most images are stored in compressed format. In this paper we propose to create a visual vocabulary based on Scale Invariant Feature Transform(SIFT) descriptor in compressed domain with the following three steps, (1) constructing low-resolution images in compressed domain, (2) extracting SIFT descriptor from low-resolution images, and (3) creating a visual vocabulary based on extracted SIFT descriptors. In order to evaluate the performance of the visual words, experiments have been conducted on identifying pornographic images. Experimental results indicate that the proposed method can recognize pornographic images accurately with much reduced computational time.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126991307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyohei Ogawa, Toshiki Takeuchi, Kunihiro Nishimura, T. Tanikawa, M. Hirose
{"title":"Utterance Rate Feedback for Enhancing Mealtime Communication","authors":"Kyohei Ogawa, Toshiki Takeuchi, Kunihiro Nishimura, T. Tanikawa, M. Hirose","doi":"10.1109/ISM.2011.67","DOIUrl":"https://doi.org/10.1109/ISM.2011.67","url":null,"abstract":"The purpose of this research is to support users' mealtime communication by changing their utterance rate during meals. By logging and analyzing the utterance rate, or the ratio of utterance length to mealtime length, we determined the relationship between the utterance rate and the distribution of the community to which each meal companion belonged. Using this relationship, we developed a real-time feedback system that presented a pre-estimated utterance rate with rate at the present time during a meal. This utterance rate was estimated according to logged utterance rate data and input data about members at the table. In order to evaluate our system, we asked six users to use it. As a result, we found out that users almost always wanted to increase the utterance rate and users' utterance rates increased as they wanted after using our feedback system.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115600244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A QoE Evaluation Methodology for HD Video Streaming Using Social Networking","authors":"B. Gardlo, M. Ries, M. Rupp, R. Jarina","doi":"10.1109/ISM.2011.43","DOIUrl":"https://doi.org/10.1109/ISM.2011.43","url":null,"abstract":"A novel methodology for QoE evaluation in the social network environment is proposed. It provides high applicability for subjective testing of the multimedia services with respect to real usage scenarios. The environment of social networks provides also significant demographic data and ability to contact extremely many test subjects while allows to focus on or filter specific social groups. QoE results for HD internet video services are presented and followed by discussion on their statistical significance.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124112072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Pospiech, R. Mertens, Martin E. Muller, M. Ketterl
{"title":"TSF-Slider: Combining Time- and Structure-Based Media Navigation in One Navigation Component","authors":"Sebastian Pospiech, R. Mertens, Martin E. Muller, M. Ketterl","doi":"10.1109/ISM.2011.59","DOIUrl":"https://doi.org/10.1109/ISM.2011.59","url":null,"abstract":"Most state-of-the-art interfaces for multimedia browsing come with two inherently different navigation components: a time-based slider interface and a structure based overview. This demonstration introduces the Tempo-Structural-Fisheye-Slider (TSF-Slider). The TSF-Slider combines time- and structure-based navigation in one single navigation component. To merge time- and structure based navigation information, a fisheye based approach is used. The idea of fisheye visualization is extended to rescale time-based information so that it can be merged with structural information while maintaining the general advantages of fisheye visualizations: focusing on one area of interest while still maintaining a general overview. The main motivation for TSF-Slider is, however, that it brings together the advantages of time- and structure based navigation. The TSF-Slider is implemented in the context of the virt Presenter web lecture framework.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122990231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Similarity-Based Visualization for Image Browsing Revisited","authors":"Klaus Schöffmann, David Ahlström","doi":"10.1109/ISM.2011.76","DOIUrl":"https://doi.org/10.1109/ISM.2011.76","url":null,"abstract":"We investigate whether users' visual search performance in a commonly used grid-like arrangement of images (i.e., a storyboard) can be improved by using a similarity-based sorting of images. We propose a simple but efficient algorithm for sorting image based on their color similarity. The algorithm generates an intuitive arrangement of images and allows for general application with several different layouts (e.g., storyboard, simple row/column, 3D globe/cylinder). In difference to previous work, which rarely present results from user studies, we perform a fair user study and compare an interface with color sorted images to an interface with images positioned in a random order. Both interfaces use exactly the same screen estate and interaction means. Results show that users are 20% faster with the sorted interface.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115423652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Cricri, Kostadin Dabov, I. Curcio, Sujeet Mate, M. Gabbouj
{"title":"Multimodal Event Detection in User Generated Videos","authors":"Francesco Cricri, Kostadin Dabov, I. Curcio, Sujeet Mate, M. Gabbouj","doi":"10.1109/ISM.2011.49","DOIUrl":"https://doi.org/10.1109/ISM.2011.49","url":null,"abstract":"Nowadays most camera-enabled electronic devices contain various auxiliary sensors such as accelerometers, gyroscopes, compasses, GPS receivers, etc. These sensors are often used during the media acquisition to limit camera degradations such as shake and also to provide some basic tagging information such as the location used in geo-tagging. Surprisingly, exploiting the sensor-recordings modality for high-level event detection has been a subject of rather limited research, further constrained to highly specialized acquisition setups. In this work, we show how these sensor modalities, alone or in combination with content-based analysis, allow inferring information about the video content. In addition, we consider a multi-camera scenario, where multiple user generated recordings of a common scene (e.g., music concerts, public events) are available. In order to understand some higher-level semantics of the recorded media, we jointly analyze the individual video recordings and sensor measurements of the multiple users. The detected semantics include generic interesting events and some more specific events. The detection exploits correlations in the camera motion and in the audio content of multiple users. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real live music performances.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122878055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcelo Teider Lopes, Lucas L. Gioppo, Thiago T. Higushi, Celso A. A. Kaestner, C. Silla, Alessandro Lameiras Koerich
{"title":"Automatic Bird Species Identification for Large Number of Species","authors":"Marcelo Teider Lopes, Lucas L. Gioppo, Thiago T. Higushi, Celso A. A. Kaestner, C. Silla, Alessandro Lameiras Koerich","doi":"10.1109/ISM.2011.27","DOIUrl":"https://doi.org/10.1109/ISM.2011.27","url":null,"abstract":"In this paper we focus on the automatic identification of bird species from their audio recorded song. Bird monitoring is important to perform several tasks, such as to evaluate the quality of their living environment or to monitor dangerous situations to planes caused by birds near airports. We deal with the bird species identification problem using signal processing and machine learning techniques. First, features are extracted from the bird recorded songs using specific audio treatment, next the problem is performed according to a classical machine learning scenario, where a labeled database of previously known bird songs are employed to create a decision procedure that is used to predict the species of a new bird song. Experiments are conducted in a dataset of recorded songs of bird species which appear in a specific region. The experimental results compare the performance obtained in different situations, encompassing the complete audio signals, as recorded in the field, and short audio segments (pulses) obtained from the signals by a split procedure. The influence of the number of classes (bird species) in the identification accuracy is also evaluated.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114787808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}