M. Gerrits, B. Decker, Cosmin Ancuti, Tom Haber, C. Ancuti, T. Mertens, P. Bekaert
{"title":"Stroke-based creation of depth maps","authors":"M. Gerrits, B. Decker, Cosmin Ancuti, Tom Haber, C. Ancuti, T. Mertens, P. Bekaert","doi":"10.1109/ICME.2011.6012006","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012006","url":null,"abstract":"Depth information opens up a lot of possibilities for meaningful editing of photographs. So far, it has only been possible to acquire depth information by either using additional hardware, restrictive scene assumptions or extensive manual input. We developed a novel user-assisted technique for creating adequate depth maps with an intuitive stroke-based user interface. Starting from absolute depth constraints as well as surface normal constraints, we optimize for a feasible depth map over the image. We introduce a suitable smoothness constraint that respects image edges and accounts for slanted surfaces. We illustrate the usefulness of our technique by several applications such as depth of field reduction and advanced compositing.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123022198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LipActs: Efficient representations for visual speakers","authors":"E. Zavesky","doi":"10.1109/ICME.2011.6012102","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012102","url":null,"abstract":"Video-based lip activity analysis has been successfully used for assisting speech recognition for almost a decade. Surprisingly, this same capability has not been heavily used for near real-time visual speaker retrieval and verification, due to tracking complexity, inadequate or difficult feature determination, and the need for a large amount of pre-labeled data for model training. This paper explores the performance of several solutions using modern histogram of oriented gradients (HOG) features, several quantization techniques, and analyzes the benefits of temporal sampling and spatial partitioning to derive a representation called LipActs. Two datasets are used for evaluation: one with 81 participants derived from varying quality YouTube content and one with 3 participants derived from a forward-facing mobile video camera with 10 varied lighting and capture angle environments. Over these datasets, LipActs with a moderate number of pooled temporal frames and multi-resolution spatial quantization, offer an improvement of 37–73% over raw features when optimizing for lowest equal error rate (EER).","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"41 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121710096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Karime, Hussein Al Osman, W. Gueaieb, J. Jaam, Abdulmotaleb El Saddik
{"title":"Learn-pads: A mathematical exergaming system for children's physical and mental well-being","authors":"Ali Karime, Hussein Al Osman, W. Gueaieb, J. Jaam, Abdulmotaleb El Saddik","doi":"10.1109/ICME.2011.6011852","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011852","url":null,"abstract":"Child obesity is one of the major challenges facing modern societies, especially in developed countries. Exergaming tools are considered as effective means to reduce obesity among kids because they require the children to exert physical strength while playing the games. However, most of the existing exergaming tools focus more on the physical well-being of its users and almost neglect the mental aspect. In this paper, we present an exergaming system that combines both aspects by promoting not only entertainment, but also learning through physical activity. The system consists of a set of footpads that allow the user to interact with video games enriched with multimedia and aimed at enhancing the math knowledge of children. Our study shows that the system have created an atmosphere of fun among the children and engaged them in learning.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122634070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application-layer error resilience for wireless IP-based video broadcasting","authors":"Sheau-Ru Tong, Yuan-Tse Yu, C. Chen","doi":"10.1109/ICME.2011.6011938","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011938","url":null,"abstract":"Wireless IP-based video broadcast suffers from heavy packet losses caused by multipath fading and interference variations in the wireless channels. This paper addresses this issue by proposing an application-layer error-resilience scheme, called the replicate multiple descriptor coding scheme (RMD). In principle, RMD extends the conventional multiple descriptor transmission strategy with two new features, the selective-frame-based replication and a time-shifted descriptor transmission. We show that with these two features, we are able to exploit the time diversity in a time-sharing channel to mitigate the damage impact and provide more efficient protection for the video. The simulation results confirm that when the packet loss rate is heavy (e.g., 15%–35%), RMD outperforms other schemes in terms of PSNR improved, while only requiring moderate data overheads.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122838498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Commercial detection by mining maximal repeated sequence in audio stream","authors":"Jiansong Chen, Teng Li, Lei Zhu, Peng Ding, Bo Xu","doi":"10.1109/ICME.2011.6012115","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012115","url":null,"abstract":"Efficient detection of commercial is an important topic for many applications such as commercial monitoring, market investigation. This paper reports an unsupervised technique of discovering commercial by mining repeated sequence in audio stream. Compared with previous work, we focus on solving practical problems by introducing three principles of commercial: repetition principle, independence principle and equivalence principle. Based on these principles, we detect the commercials by first mining maximal repeated sequences (MRS) and then post-processing the MRS pairs based on independence principle and equivalence principle for final result. In addition, a coarse-to-fine scheme is adopted in the acoustic matching stage to save computational cost. Extensive experiments both on simulated data and real broadcast data demonstrate the effectiveness of our method.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129619049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient, robust video fingerprinting system","authors":"R. Cook","doi":"10.1109/ICME.2011.6012135","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012135","url":null,"abstract":"An efficient, robust system for machine identification of file and stream-based video content is presented. Efficiency is achieved through easily computed features, simple comparisons, and careful selection of robust indices that lead to fast searches. Robustness is achieved by selection of features that reflect the time structure of the content—a measure of how the visual content changes over time, perhaps the quintessential aspect of video. These features, primarily the overall luminance and interframe luminance differences, are unlikely to change as the underlying signal is distorted by typical video processing, both benevolent and otherwise. Feature extraction, indexing, and matching are discussed.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129555268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qian Chen, Yunfei Zheng, P. Yin, X. Lu, J. Solé, Qian Xu, E. François, D. Wu
{"title":"Classified quadtree-based adaptive loop filter","authors":"Qian Chen, Yunfei Zheng, P. Yin, X. Lu, J. Solé, Qian Xu, E. François, D. Wu","doi":"10.1109/ICME.2011.6012172","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012172","url":null,"abstract":"In this paper, we propose a classified quadtree-based adaptive loop filter (CQALF) in video coding. Pixels in a picture are classified into two categories by considering the impact of the deblocking filter, the pixels that are modified and the pixels that are not modified by the deblocking filter. A wiener filter is carefully designed for each category and the filter coefficients are transmitted to decoder. For the pixels that are modified by the deblocking filter, the filter is estimated at encoder by minimizing the mean square error between the original input frame and a combined frame which is a weighted average of the reconstructed frames before and after the deblocking filter. For pixels that the deblocking filter does not modify, the filter is estimated by minimizing the mean square error between the original frame and the reconstructed frame. The proposed algorithm is implemented on top of KTA software and compatible with the quadtree-based adaptive loop filter. Compared with kta2.6r1 anchor, the proposed CQALF achieves 10.05%, 7.55%, and 6.19% BD bitrate reduction in average for intra only, IPPP, and HB coding structures respectively.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128455081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Photo identity tag suggestion using only social network context on large-scale web services","authors":"Chi-Yao Tseng, Ming-Syan Chen","doi":"10.1109/ICME.2011.6012061","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012061","url":null,"abstract":"Recently, uploading photos and adding identity tags on social network services are prevalent. Although some researchers have considered leveraging context to facilitate the process of tagging, these approaches still rely mainly on face recognition techniques that use visual features of photos. However, since the computational and storage costs of these approaches are generally high, they cannot be directly applicable to large-scale web services. To resolve this problem, we explore using only social network context to generate the top-k list of photo identity tag suggestion. The proposed method is based on various co-occurrence contexts that are related to the question of who may appear in this photo. An efficient ranking algorithm is designed to satisfy the real-time needs of this application. We utilize public album data of 400 volunteers from Facebook to verify that our approach can efficiently provide accurate suggestions with less additional storage requirement.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130835451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Ghulam, M. Alsulaiman, A. Mahmood, Z. Ali
{"title":"Automatic voice disorder classification using vowel formants","authors":"Muhammad Ghulam, M. Alsulaiman, A. Mahmood, Z. Ali","doi":"10.1109/ICME.2011.6012187","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012187","url":null,"abstract":"In this paper, we propose an automatic voice disorder classification system using first two formants of vowels. Five types of voice disorder, namely, cyst, GERD, paralysis, polyp and sulcus, are used in the experiments. Spoken Arabic digits from the voice disordered people are recorded for input. First formant and second formant are extracted from the vowels [Fatha] and [Kasra], which are present in Arabic digits. These four features are then used to classify the voice disorder using two types of classification methods: vector quantization (VQ) and neural networks. In the experiments, neural network performs better than VQ. For female and male speakers, the classification rates are 67.86% and 52.5%, respectively, using neural networks. The best classification rate, which is 78.72%, is obtained for female sulcus disorder.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130960836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal-spatial face recognition using multi-atlas and Markov process model","authors":"Gaopeng Gou, Rui Shen, Yunhong Wang, A. Basu","doi":"10.1109/ICME.2011.6012063","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012063","url":null,"abstract":"Although video-based face recognition algorithms can provide more information than image-based algorithms, their performance is affected by subjects' head poses, expressions, illumination and so on. In this paper, we present an effective video-based face recognition algorithm. Multi-atlas is employed to efficiently represent faces of individual persons under various conditions, such as different poses and expressions. The Markov process model is used to propagate the temporal information between adjacent video frames. The combination of multi-atlas and Markov model provides robust face recognition by taking both spatial and temporal information into account. The performance of our algorithm was evaluated on three standard test databases: the Honda/UCSD video database, the CMU Motion of Body database, and the multi-modal VidTIMIT database. Experimental results demonstrate that our video-based face recognition algorithm outperforms other methods on all three test databases.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121915024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}