{"title":"Video and Audio Editing for Mobile Applications","authors":"Ari Hourunranta, A. Islam, F. Chebil","doi":"10.1109/ICME.2006.262778","DOIUrl":"https://doi.org/10.1109/ICME.2006.262778","url":null,"abstract":"Video content creation and consumption have been increasingly available for the masses with the emergence of handheld devices capable of shooting, downloading, and playing videos. Video editing is a natural and necessary operation that is most commonly employed by users for finalizing and organizing their video content. With the constraints in processing power and memory, conventional spatial domain video editing is not a solution for mobile applications. In this paper, we present a complete video editing system for efficiently editing video content on mobile phones using compressed domain editing algorithms. A critical factor from usability point of view is the processing speed of the editing application. We show that with the proposed compressed domain editing system, typical video editing operations can be performed much faster than real-time on today's S60 phones","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117095129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximating Optimal Visual Sensor Placement","authors":"E. Hörster, R. Lienhart","doi":"10.1109/ICME.2006.262766","DOIUrl":"https://doi.org/10.1109/ICME.2006.262766","url":null,"abstract":"Many novel multimedia applications use visual sensor arrays. In this paper we address the problem of optimally placing multiple visual sensors in a given space. Our linear programming approach determines the minimum number of cameras needed to cover the space completely at a given sampling frequency. Simultaneously it determines the optimal positions and poses of the visual sensors. We also show how to account for visual sensors with different properties and costs if more than one kind is available, and report performance results.","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125920006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Protection Processor for MPEG-21 Players","authors":"P. Nesi, D. Rogai, A. Vallotti","doi":"10.1109/ICME.2006.262790","DOIUrl":"https://doi.org/10.1109/ICME.2006.262790","url":null,"abstract":"The design and implementation of MPEG-21 players and authoring tools presents several critical points to be solved. One of the most relevant is the security level and protection processing in the players. This paper presents a solution for the realization of components in charge of enforcing Digital Rights Management in AXMEDIS tools for MPEG-21 digital content. The proposed architecture provides functionalities to create both trusted environment on the client side and dynamic protection and unprotection of digital content including digital resources and their organization and metadata. The same solution can be used to achieve the desired security level in any other MPEG-21 player or authoring tool. The architecture presented hereinafter has been adopted to enforce protection on authoring and player tools developed for the AXMEDIS IST FP6 R&D European Commission project","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124705768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Media Synchronization Method for Video Hypermedia Application Based on Extended Event Model","authors":"Hironobu Abe, H. Shigeno, Ken-ichi Okada","doi":"10.1109/ICME.2006.262774","DOIUrl":"https://doi.org/10.1109/ICME.2006.262774","url":null,"abstract":"This paper describes a proposal of an extended event model using a media synchronization method for video hypermedia applications. In this extended event model, video and metadata are synchronized by periodically inserting event information in the video multiplex. We considered the following design policies: 1) a model that is independent of the video format and delivery method, 2) the synchronization accuracy can be tuned depending on the purpose and use of the metadata. We designed the extended event model based on the above design policies, and implemented this model as an encode/decode library for Windows Media. Based on this model we developed a video hypermedia system prototype and performed evaluation experiments. The evaluation results of real time synchronization performance of the system prototype showed that in the case of sports video content a synchronization accuracy of 100 msec between video and metadata makes our method effective for use in video hypermedia applications","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129492504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Dual AK-D Tree Search Algorithm for ICP Registration Applications","authors":"Jiann-Der Lee, Shih-Sen Hsieh, Chung-Hsien Huang, Li-Chang Liu, Cheien-Tsai Wu, Shin-Tseng Lee, Jyi-Feng Chen","doi":"10.1109/ICME.2006.262598","DOIUrl":"https://doi.org/10.1109/ICME.2006.262598","url":null,"abstract":"An algorithm for finding coupling points plays an important role in the iterative closest point algorithm (ICP) which is widely used in registration applications in medical and 3-D architecture areas. In recent researches of finding coupling points, Approximate K-D tree search algorithm (AK-D tree) is an efficient nearest neighbor search algorithm with comparable results. We proposed adaptive dual AK-D tree search algorithm (ADAK-D tree) for searching and synthesizing coupling points as significant control points to improve the registration accuracy in ICP registration applications. ADAK-D tree utilizes AK-D tree twice in different geometrical projection orders to reserve true nearest neighbor points used in later ICP stages. An adaptive threshold in ADAK-D tree is used to reserve sufficient coupling points for a smaller alignment error. Experimental results are shown that the registration accuracy of using ADAK-D tree is improved more than the result of using AK-D tree and the computation time is acceptable","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129865263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Audiovisual Anchorperson Detection for Topic-Oriented Navigation in Broadcast News","authors":"M. Haller, Hyoung‐Gook Kim, T. Sikora","doi":"10.1109/ICME.2006.262906","DOIUrl":"https://doi.org/10.1109/ICME.2006.262906","url":null,"abstract":"This paper presents a content-based audiovisual video analysis technique for anchorperson detection in broadcast news. For topic-oriented navigation in newscasts, a segmentation of the topic boundaries is needed. As the anchorperson gives a strong indication for such boundaries, the presented technique automatically determines that high-level information for video indexing from MPEG-2 videos and stores the results in an MPEG-7 conform format. The multimodal analysis process is carried out separately in the auditory and visual modality, and the decision fusion forms the final anchorperson segments","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128581438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Atrey, Vinay Kumar, Anurag Kumar, M. Kankanhalli
{"title":"Experiential Sampling based Foreground/Background Segmentation for Video Surveillance","authors":"P. Atrey, Vinay Kumar, Anurag Kumar, M. Kankanhalli","doi":"10.1109/ICME.2006.262904","DOIUrl":"https://doi.org/10.1109/ICME.2006.262904","url":null,"abstract":"Segmentation of foreground and background has been an important research problem arising out of many applications including video surveillance. A method commonly used for segmentation is \"background subtraction\" or thresholding the difference between the estimated background image and current image. Adaptive Gaussian mixture based background modelling has been proposed by many researchers for increasing the robustness against environmental changes. However, all these methods, being computationally intensive, need to be optimized for efficient and real-time performance especially at a higher image resolution. In this paper, we propose an improved foreground/background segmentation method which uses experiential sampling technique to restrict the computational efforts in the region of interest. We exploit the fact that the region of interest in general is present only in a small part of the image, therefore, the attention should only be focused in those regions. The proposed method shows a significant gain in processing speed at the expense of minor loss in accuracy. We provide experimental results and detailed analysis to show the utility of our method","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128642314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactions and Integrations of Multiple Sensory Channels in Human Brain","authors":"S. Nishida","doi":"10.1109/ICME.2006.262437","DOIUrl":"https://doi.org/10.1109/ICME.2006.262437","url":null,"abstract":"This paper describes a couple of new principles with regard to interactions and integrations of multiple sensory channels in the human brain. First, as opposed to the general belief that the perception of shape and that of color are relatively independent of motion processing, human visual system integrates shape and color signals along perceived motion trajectory in order to improve visibility of shape and color of moving objects. Second, when the human sensory system binds the outputs of different sensory channels, (including audio-visual signals) based on their temporal synchrony, it uses only sparse salient features rather than using the time courses of full sensory signals. We believe these principles are potentially useful for development of effective audiovisual processing and presentation devices","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129376748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reuse of Motion Processing for Camera Stabilization and Video Coding","authors":"Bao Lei, R. K. Gunnewiek, P. D. With","doi":"10.1109/ICME.2006.262479","DOIUrl":"https://doi.org/10.1109/ICME.2006.262479","url":null,"abstract":"The low bit rate of existing video encoders relies heavily on the accuracy of estimating actual motion in the input video sequence. In this paper, we propose a video stabilization and encoding (ViSE) system to achieve a higher coding efficiency through a preceding motion processing stage (to the compression), of which the stabilization part should compensate for vibrating camera motion. The improved motion prediction is obtained by differentiating between the temporal coherent motion and a more noisy motion component which is orthogonal to the coherent one. The system compensates the latter undesirable motion, so that it is eliminated prior to video encoding. To reduce the computational complexity of integrating a digital stabilization algorithm with video encoding, we propose a system that reuses the already evaluated motion vector from the stabilization stage in the compression. As compared to H.264, our system shows a 14% reduction in bit rate yet obtaining an increase of about 0.5 dB in SNR","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127197875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Memory Construction Scheme for an Arbitrary Side Growing Huffman Table","authors":"Sung-Wen Wang, Shang-Chih Chuang, Chih-Chieh Hsiao, Yi-Shin Tung, Ja-Ling Wu","doi":"10.1109/ICME.2006.262589","DOIUrl":"https://doi.org/10.1109/ICME.2006.262589","url":null,"abstract":"By grouping the common prefix of a Huffman tree, in stead of the commonly used single-side rowing Huffman tree (SGH-tree), we construct a memory efficient Huffman table on the basis of an arbitrary-side growing Huffman tree (AGH-tree) to speed up the Huffman decoding. Simulation results show that, in Huffman decoding, an AGH-tree based Huffman table is 2.35 times faster that of the Hashemian's method (an SGH-tree based one) and needs only one-fifth the corresponding memory size. In summary, a novel Huffman table construction scheme is proposed in this paper which provides better performance than existing construction schemes in both decoding speed and memory usage","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127303039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}