{"title":"High Performance Fractional Motion Estimation and Mode Decision for H.264/AVC","authors":"Chao-Yang Kao, Huang-Chih Kuo, Y. Lin","doi":"10.1109/ICME.2006.262762","DOIUrl":"https://doi.org/10.1109/ICME.2006.262762","url":null,"abstract":"We propose a high performance architecture for fractional motion estimation and Lagrange mode decision in H.264/AVC. Instead of time-consuming fractional-pixel interpolation and secondary search, our fractional motion estimator employees a mathematical model to estimate SADs at quarter-pixel position. Both computation time and memory access requirements are greatly reduced without significant quality degradation. We propose a novel cost function for mode decision that leads to much better performance than traditional low complexity method. Synthesized into a TSMC 0.13 mum CMOS technology, our design takes 56 k gates at 100 MHz and is sufficient to process QUXGA (3200times2400) video sequences at 30 frames per second (fps). Compared with a state-of-the-art design operating under the same frequency, ours is 30% smaller and has 18 times more throughput at the expense of only 0.05 db in PSNR difference","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132680272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensor-Based Multiple Object Trajectory Indexing and Retrieval","authors":"Xiang Ma, F. Bashir, A. Khokhar, D. Schonfeld","doi":"10.1109/ICME.2006.262468","DOIUrl":"https://doi.org/10.1109/ICME.2006.262468","url":null,"abstract":"This paper presents novel tensor-based object trajectory modelling techniques for simultaneous representation of multiple objects motion trajectories in a content based indexing and retrieval framework. Three different tensor decomposition techniques-PARAFAC, HOSVD and multiple-SVD-are explored to achieve this goal with the aim of using a minimum set of coefficients and data-dependant bases. These tensor decompositions have been applied to represent full as well as segmented trajectories. Our simulation results show that the PARAFAC-based representation provides higher compression ratio, superior precision-recall metrics, and smaller query processing time compared to the other tensor-based approaches","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123161332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Potential of Incorporating Knowledge of Human Visual Attention into Cbir Systems","authors":"Oge Marques, Liam M. Mayron, G. Borba, H. Gamba","doi":"10.1109/ICME.2006.262953","DOIUrl":"https://doi.org/10.1109/ICME.2006.262953","url":null,"abstract":"Content-based image retrieval (CBIR) systems have been actively investigated over the past decade. Several existing CBIR prototypes claim to be designed based on perceptual characteristics of the human visual system, but even those who do are far from recognizing that they could benefit further by incorporating ongoing research in vision science. This paper explores the inclusion of human visual perception knowledge into the design and implementation of CBIR systems. Particularly, it addresses the latest developments in computational modeling of human visual attention. This fresh way of revisiting concepts in CBIR based on the latest findings and open questions in vision science research has the potential to overcome some of the challenges faced by CBIR systems","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131422303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Automatic Classification System Applied in Medical Images","authors":"B. Qiu, Chang Xu, Q. Tian","doi":"10.1109/ICME.2006.262713","DOIUrl":"https://doi.org/10.1109/ICME.2006.262713","url":null,"abstract":"In this paper, a multi-class classification system is developed for medical images. We have mainly explored ways to use different image features, and compared two classifiers: principle component analysis (PCA) and supporting vector machines (SVM) with RBF (radial basis functions) kernels. Experimental results showed that SVM with a combination of the middle-level blob feature and low-level features (down-scaled images and their texture maps) achieved the highest recognition accuracy. Using the 9000 given training images from ImageCLEFOS, our proposed method has achieved a recognition rate of 88.9% in a simulation experiment. And according to the evaluation result from the ImageCLEFOS organizer, our method has achieved a recognition rate of 82% over its 1000 testing images","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127400596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Segmentation of Documentary Video using Music Breaks","authors":"Aijuan Dong, Honglin Li","doi":"10.1109/ICME.2006.262908","DOIUrl":"https://doi.org/10.1109/ICME.2006.262908","url":null,"abstract":"Many documentary videos use background music to help structure the content and communicate the semantic. In this paper, we investigate semantic segmentation of documentary video using music breaks. We first define video semantic units based on the speech text that a video/audio contains, and then propose a three-step procedure for semantic video segmentation using music breaks. Since the music breaks of a documentary video are of different semantic levels, we also study how different speech/music segment lengths correlate with the semantic level of a music break. Our experimental results show that music breaks can effectively segment a continuous documentary video stream into semantic units with an average F-score of 0.91 and the lengths of combined segments (speech segment plus the music segment that follows) strongly correlate with the semantic levels of music breaks","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124339961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Multimedia Retrieval using Lexical Query Expansion and Model-Based Reranking","authors":"A. Haubold, A. Natsev, M. Naphade","doi":"10.1109/ICME.2006.262892","DOIUrl":"https://doi.org/10.1109/ICME.2006.262892","url":null,"abstract":"We present methods for improving text search retrieval of visual multimedia content by applying a set of visual models of semantic concepts from a lexicon of concepts deemed relevant for the collection. Text search is performed via queries of words or fully qualified sentences, and results are returned in the form of ranked video clips. Our approach involves a query expansion stage, in which query terms are compared to the visual concepts for which we independently build classifier models. We leverage a synonym dictionary and WordNet similarities during expansion. Results over each query are aggregated across the expanded terms and ranked. We validate our approach on the TRECVID 2005 broadcast news data with 39 concepts specifically designed for this genre of video. We observe that concept models improve search results by nearly 50% after model-based re-ranking of text-only search. We also observe that purely model-based retrieval significantly outperforms text-based retrieval on non-named entity queries","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114833969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Resynchronization Method for Scalable Video Over Wireless Channel","authors":"Yu Wang, Lap-Pui Chau, Kim-Hui Yap","doi":"10.1109/ICME.2006.262869","DOIUrl":"https://doi.org/10.1109/ICME.2006.262869","url":null,"abstract":"A scalable video coder generates scalable compressed bit-stream, which can provide different types of scalability depend on different requirements. This paper proposes a novel resynchronization method for the scalable video with combined temporal and quality (SNR) scalability. The main purpose is to improve the robustness of the transmitted video. In the proposed scheme, the video is encoded into scalable compressed bit-stream with combined temporal and quality scalability. The significance of each enhancement layer unit is estimated properly. A novel resynchronization method is proposed where joint group of picture (GOP) level and picture level insertion of resynchronization marker approach is applied to insert different amount of resynchronization markers in different enhancement layer units for reliable transmission of the video over error-prone channels. It is demonstrated from the experimental results that the proposed method can perform graceful degradation under a variety of error conditions and the improvement can be up to 1 dB compared with the conventional method","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"564 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116285907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and Implementation of a Multimedia Personalized Service Over Large Scale Networks","authors":"Xiaorong Li, T. Hung, B. Veeravalli","doi":"10.1109/ICME.2006.262554","DOIUrl":"https://doi.org/10.1109/ICME.2006.262554","url":null,"abstract":"In this paper, we proposed to setup a distributed multimedia system which aggregates the capacity of multiple servers to provide customized multimedia services in a cost-effective way. Such a system enables clients to customize their services by specifying the service delay or the viewing times. We developed an experimental prototype in which media servers can cooperate in streams caching, replication and distribution. We applied a variety of stream distribution algorithms to the system and studied their performance under the real-life situations with limited network resources and varying request arrival pattern. The results show such a system can provide cost-effective services and be applied to practical environments","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116353231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking","authors":"K. Otsuka, Junji Yamato, Y. Takemae, H. Murase","doi":"10.1109/ICME.2006.262677","DOIUrl":"https://doi.org/10.1109/ICME.2006.262677","url":null,"abstract":"A novel method based on a probabilistic model for conversation scene analysis is proposed that can infer conversation structure from video sequences of face-to-face communication. Conversation structure represents the type of conversation such as monologue or dialogue, and can indicate who is talking/listening to whom. This study assumes that the gaze directions of participants provide cues for discerning the conversation structure, and can be identified from head directions. For measuring head directions, the proposed method newly employs a visual head tracker based on sparse-template condensation. The conversation model is built on a dynamic Bayesian network and is used to estimate the conversation structure and gaze directions from observed head directions and utterances. Visual tracking is conventionally thought to be less reliable than contact sensors, but experiments confirm that the proposed method achieves almost comparable performance in estimating gaze directions and conversation structure to a conventional sensor-based method","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123445965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Imagewatermarking Scheme Based on the Alpha-Beta Space","authors":"P. Martins, P. Carvalho","doi":"10.1109/ICME.2006.262847","DOIUrl":"https://doi.org/10.1109/ICME.2006.262847","url":null,"abstract":"A robust image watermarking scheme relying on an affine invariant embedding domain is presented. The invariant space is obtained by triangulating the image using affine invariant interest points as vertices and performing an invariant triangle representation with respect to affine transformations based on the barycentric coordinates system. The watermark is encoded via quantization index modulation with an adaptive quantization step","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122041362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}