{"title":"Measurement of Human Sensitivity across the Vertical-Temporal Video Spectrum for Interlacing Filter Specification","authors":"K. Noland","doi":"10.1109/ICME.2012.35","DOIUrl":"https://doi.org/10.1109/ICME.2012.35","url":null,"abstract":"Good quality conversion from progressive to interlaced video is highly relevant to today's broadcast systems, in which interlaced content is still common. The interlacing process is a form of down-sampling, and hence requires an anti-alias filter. For best results the anti-alias filter should be matched to the reconstruction filter, which is comprised of the display and the human visual system. Additionally, it must meet the technical requirements of the down sampling process. In this paper we present a novel method of measuring the combined response to interlacing artefacts that is simple and powerful. We use the results to derive an optimal anti-alias filter template, using a new region-growing technique that is specifically designed to match the measured response whilst keeping to the technical constraints of an interlaced sampling structure. Our results provide support for an existing, heuristically-defined filter, and show that the same filter could be used for a range of viewing distances.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116531634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Large Scale Experiment for Mood-Based Classification of TV Programmes","authors":"J. Eggink, Denise Bland","doi":"10.1109/ICME.2012.68","DOIUrl":"https://doi.org/10.1109/ICME.2012.68","url":null,"abstract":"We present results from a large study with 200 participants who watched short excerpts from TV programmes and assigned mood labels. The agreement between labellers was evaluated, showing that an overall consensus exists. Multiple mood terms could be reduced to two principal dimensions, the first relating to the seriousness or light-heartedness of programmes, the second describing the perceived pace. Automatic classification of both mood dimensions was possible to a high degree of accuracy, reaching more than 95% for programmes with very clear moods. The influence of existing human generated genre labels was evaluated, showing that they were closely related to the first mood dimension and helped to distinguish serious form humorous programmes. The pace of programmes however could be more accurately classified when features based on audio and video signal processing were used.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128858734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Super-Resolution by Finer Sub-Pixel Motion Prediction and Bilateral Filtering","authors":"Damith J. Mudugamuwa, Xiangjian He, W. Jia","doi":"10.1109/ICME.2012.103","DOIUrl":"https://doi.org/10.1109/ICME.2012.103","url":null,"abstract":"Super-resolution reconstruction produces high-resolution images from a set of low-resolution images of the same scene. In the last two and a half decades, many super-resolution algorithms have been proposed. These algorithms are very sensitive to their assumed models of motion and noise, and computationally expensive for many practical applications. In this paper we adopt earlier reported fast prediction based sub-pixel motion estimation and a novel interpolation scheme based on the bilateral filter to produce a fast color super-resolution reconstruction that can accommodate arbitrary local motion patterns. The proposed algorithm exploits photometric proximity and available finer fractional motion information in the high resolution grid, to reconstruct enhanced super-resolved image frames. Experiments show a PSNR performance comparable to the state-of-the-art but at a fraction of their computational cost.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114297207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Joint Texture/Depth Edge-Directed Up-sampling Algorithm for Depth Map Coding","authors":"Huiping Deng, Li Yu, Jinbo Qiu, Juntao Zhang","doi":"10.1109/ICME.2012.2","DOIUrl":"https://doi.org/10.1109/ICME.2012.2","url":null,"abstract":"Depth edge preservation is important for improving synthesized view quality in depth map coding. A joint texture/depth edge-directed up-sampling algorithm for depth map coding is proposed in this paper. The depth up-sampling algorithm takes into account the edge similarity between depth map and corresponding texture image, and structural similarity between low resolution depth map and high resolution depth map. Based on the these characters, the optimal MMSE up sampling coefficients are estimated from the local covariance coefficients of down-sampled depth map and corresponding texture image. Experimental results show that the proposed up-sampling algorithm for depth map coding improves both depth map coding efficiency and synthesized view quality.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116037336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Weng, Geert Braeckman, A. Dooms, B. Preneel, P. Schelkens
{"title":"Robust Image Content Authentication with Tamper Location","authors":"Li Weng, Geert Braeckman, A. Dooms, B. Preneel, P. Schelkens","doi":"10.1109/ICME.2012.163","DOIUrl":"https://doi.org/10.1109/ICME.2012.163","url":null,"abstract":"We propose a novel image authentication system by combining perceptual hashing and robust watermarking. An image is divided into blocks. Each block is represented by a compact hash value. The hash value is embedded in the block. The authenticity of the image can be verified by re-computing hash values and comparing them with the ones extracted from the image. The system can tolerate a wide range of incidental distortion, and locate tampered areas as small as 1/64 of an image. In order to have minimal interference, we design both the hash and the watermark algorithms in the wavelet domain. The hash is formed by the sign bits of wavelet coefficients. The lattice-based QIM watermarking algorithm ensures a high payload while maintaining the image quality. Extensive experiments confirm the good performance of the proposal, and show that our proposal significantly outperforms a state-of-the-art algorithm.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"213 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132247149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing Blocking Artifacts in Compressed Images via Transform-Domain Non-local Coefficients Estimation","authors":"Xinfeng Zhang, Ruiqin Xiong, Siwei Ma, Wen Gao","doi":"10.1109/ICME.2012.159","DOIUrl":"https://doi.org/10.1109/ICME.2012.159","url":null,"abstract":"Block transform coding using discrete cosine transform is the most popular approach for image compression. However, many annoying blocking artifacts are generated due to coarse quantization on transform coefficients independently. This paper proposes an effective blocking artifacts reduction method by estimating the transform coefficients from their quantized version. In the proposed scheme, we estimate the transform coefficients based on an image statistic model and non-local similarity among blocks in transform domain. The parameters used in our proposed scheme are discussed and adaptively selected. Extensive experimental results show that our proposed method significantly reduces blocking artifacts and improves the subjective and the objective quality of block transform coded images.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132371407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shih-Wei Sun, Wen-Huang Cheng, Yao-Ling Hung, I. Fan, Chris Liu, Jacqueline Hung, Chia-Kai Lin, H. Liao
{"title":"Who's Who in a Sports Video? An Individual Level Sports Video Indexing System","authors":"Shih-Wei Sun, Wen-Huang Cheng, Yao-Ling Hung, I. Fan, Chris Liu, Jacqueline Hung, Chia-Kai Lin, H. Liao","doi":"10.1109/ICME.2012.59","DOIUrl":"https://doi.org/10.1109/ICME.2012.59","url":null,"abstract":"Sports video analysis has attracted great attention in recent years. In the past decade, numerous sports video indexing approaches have been proposed at different semantic levels. In this paper, an individual level sports video indexing (ILSVI) scheme is proposed. The individual level refers to the indexing of a sports video on a player basis, i.e. to recognize each player in a multi-player game. Since the jersey number is always \"worn'' by a player as the player's identity in a game, it is feasible to recognize jersey numbers for individual level indexing in sports videos. To solve the jersey number recognition problem, a principal-axis based contour descriptor is proposed. Compared to the state-of-the-art approaches, the proposed descriptor can achieve higher recognition rate and only consume much less computation power. In addition, we developed an interactive system to realize the individual level sports video indexing (ILSVI). This interactive system includes a player detection and a jersey number detection sub-systems. The interactive system can help complete the individual level sports video indexing task. We shall use basketball game videos as the basis to develop real-world systems.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134568912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Local Temporal Context-Based Approach for TV News Story Segmentation","authors":"Emilie Dumont, G. Quénot","doi":"10.1109/ICME.2012.3","DOIUrl":"https://doi.org/10.1109/ICME.2012.3","url":null,"abstract":"Users are often interested in retrieving only a particular passage on a topic of interest to them. It is therefore necessary to split videos into shorter segments corresponding to appropriate retrieval units. We propose here a method based on a local temporal context for the segmentation of TV news videos into stories. First, we extract multiple descriptors which are complementary and give good insights about story boundaries. Once extracted, these descriptors are expanded with a local temporal context and combined by an early fusion process. The story boundaries are then predicted using machine learning techniques. We investigate the system by experiments conducted using TRECVID 2003 data and protocol of the story boundary detection task and we show that the extension of multimodal descriptors by a local temporal context approach improves results and our method outperforms the state of the art.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133721616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expert Talk for Time Machine Session: Affective Multimedia Analysis: Introduction, Background and Perspectives","authors":"M. Soleymani","doi":"10.1109/ICME.2012.107","DOIUrl":"https://doi.org/10.1109/ICME.2012.107","url":null,"abstract":"The term \"affective computing\" was coined by Rosalind Picard in 1995. She presented her ideas about how to use affect for interaction with and analysis of multimedia. Her ideas were inspiring to the studies and applications on affective multimedia analysis in the last decade. In this talk, the initial ideas and their development to the current state as well as challenges and perspectives are presented.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133789727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Storyboards for Interactive Visual Search","authors":"Klaus Schöffmann, David Ahlström, L. Böszörményi","doi":"10.1109/ICME.2012.62","DOIUrl":"https://doi.org/10.1109/ICME.2012.62","url":null,"abstract":"Interactive image and video search tools typically use a grid-like arrangement of thumbnails for preview purpose. Such a display, which is commonly known as storyboard, provides limited flexibility at interactive search and it does not optimally exploit the available screen estate. In this paper we design and evaluate alternatives to the common two-dimensional storyboard. We take advantage of 3D graphics in order to present image thumbnails in cylindrical arrangements. Through a user study we evaluate the performance of these interfaces in terms of visual search time and subjective performance.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133555953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}