{"title":"A Novel Edge Detection Framework by Component Tree Construction","authors":"Zhijun Dai, Yihong Wu, Youji Feng","doi":"10.1109/ICMEW.2012.99","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.99","url":null,"abstract":"This paper proposes a new edge detection framework with component tree construction. This open framework is efficient for edge property computation and convenient for subsequent image processing. We detect edges according to the properties which are customized by framework rules. Experiments on using the framework for a new efficient implementation of Canny edge detector are reported. The results demonstrate that the tree construction is efficient and the framework is flexible.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129855348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Pose Estimation of Front Vehicle Towards a Better Driver Assistance System","authors":"Yu Peng, Jesse S. Jin, S. Luo, Min Xu, Yue Cui","doi":"10.1109/ICMEW.2012.97","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.97","url":null,"abstract":"Driver assistance system enhances traffic safety and efficiency. Accurate 3D pose of front vehicle can help driver to make right decisions on road. We propose a novel real-time system to estimate 3D pose of the front vehicle. This system consists of two parallel threads: vehicle rear tracking and mapping. Vehicle rear is firstly identified in the video captured by an on-board camera, after license plate localization and foreground extraction. 3D pose estimation technique is then employed with respect to extracted vehicle rear. Most 3D pose estimation techniques need prior models or a stereo initialization with user cooperation. It is extremely difficult to obtain prior models due to various appearances of vehicle rears. Moreover, it is unsafe to ask for driver's cooperation when vehicle is running. In our system, two initial key frames for stereo algorithm are automatically extracted by vehicle rear detection and tracking. Map points are defined as a collection of point features extracted from vehicle rear with their 3D information. These map points are inferences that relating 2D features detected in following vehicle rears with 3D world. Relative 3D Pose between current vehicle rear and on-board camera is then estimated through mapping that matches map points with current point features. We demonstrate the abilities of our system by augmented reality, which needs accurate and real-time 3D pose estimation.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129968414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lossless Compression of Stereo Disparity Maps for 3D","authors":"M. Zamarin, Søren Forchhammer","doi":"10.1109/ICMEW.2012.113","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.113","url":null,"abstract":"Efficient compression of disparity data is important for accurate view synthesis purposes in multi-view communication systems based on the \"texture plus depth\" format, including the stereo case. In this paper a novel technique for loss less compression of stereo disparity images is presented. The coding algorithm is based on bit-plane coding, disparity prediction via disparity warping and context-based arithmetic coding exploiting predicted disparity data. Experimental results show that the proposed compression scheme achieves average compression factors of about 48:1 for high resolution disparity maps for stereo pairs and outperforms different standard solutions for loss less still image compression. Moreover, it provides a progressive representation of disparity data as well as a parallelizable structure.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121935318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Summarization with Global and Local Features","authors":"Genliang Guan, Zhiyong Wang, Kaimin Yu, Shaohui Mei, Mingyi He, D. Feng","doi":"10.1109/ICMEW.2012.105","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.105","url":null,"abstract":"Video summarization has been crucial for effective and efficient access of video content due to the ever increasing amount of video data. Most of the existing key frame based summarization approaches represent individual frames with global features, which neglects the local details of visual content. Considering that a video generally depicts a story with a number of scenes in different temporal order and shooting angles, we formulate scene summarization as identifying a set of frames which best covers the key point pool constructed from the scene. Therefore, our approach is a two-step process, identifying scenes and selecting representative content for each scene. Global features are utilized to identify scenes through clustering due to the visual similarity among video frames of the same scene, and local features to summarize each scene. We develop a key point based key frame selection method to identify representative content of a scene, which allows users to flexibly tune summarization length. Our preliminary results indicate that the proposed approach is very promising and potentially robust to clustering based scene identification.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116383070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junyao Shao, Gang Liu, Zhiyuan Guo, Baoxiang Li, Yueming Lu
{"title":"An Improved Pruning Method Based on the Number of States Possessed by Hypotheses","authors":"Junyao Shao, Gang Liu, Zhiyuan Guo, Baoxiang Li, Yueming Lu","doi":"10.1109/ICMEW.2012.106","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.106","url":null,"abstract":"This paper presents an improved pruning method taking into account of the number of states possessed by hypotheses in some certain frames. With conventional pruning strategy, the hypotheses with a low score or a bad ranking will be discarded. However, it neglects a fact that the hypotheses several states ahead of or behind the right hypothesis in the prefix tree, which should be discarded, have similar scores and rankings with the right hypothesis. If a state is part of a partial path hypothesis, we say it is possessed by the hypothesis. So in a speech frame, we can deduce that the hypotheses which possess the most states and the hypotheses which possess the least states have little chance to be the right hypothesis. The proposed method analysis the range of the number of the states possessed by the hypotheses, and discards the hypotheses that possess too many or too few states. According to the experiments, This method could effectively improve the performance of the ASR.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116469018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruxandra-Marina Florea, Leon Denis, J. Lievens, P. Schelkens, A. Munteanu
{"title":"L-infinite Coding of 3D Representations of Human Affect","authors":"Ruxandra-Marina Florea, Leon Denis, J. Lievens, P. Schelkens, A. Munteanu","doi":"10.1109/ICMEW.2012.14","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.14","url":null,"abstract":"Off-line scanning, coding, transmission and remote animation of the human affect represents a possible processing pipeline for providing 3D immersion in virtual worlds. In this paper we target applications that make use of compact and scalable 3D representations of human affect and require close control over the local error introduced by lossy coding of the mesh geometry. To satisfy this requirement, we propose a novel L-infinite wavelet-based semi-regular mesh coding system. The system lies in contrast with classical mesh coding approaches which make use of the L-2 distortion metric. Specifically, in contrast to an L-2 driven implementation, the proposed system provides a bound on the local error on each vertex resulting from scalar embedded quantization of the wavelet coefficients. The experiments show that the proposed system provides scalability in L-infinite sense and that it outperforms the state-of-the art in L-infinite mesh coding.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127188528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hybrid Coded Block Patterns Based Fast Mode Decision in H.264/AVC","authors":"Zhiru Shi, W. Fernando, A. Kondoz","doi":"10.1109/ICMEW.2012.10","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.10","url":null,"abstract":"The video coding standard H.264/AVC although possesses the advantage of higher coding efficiency compared to previous ones, it would result in a high computational complexity due to various block sizes motion estimation for multi-modes decision. In this paper, a hybrid inter-mode decision algorithm is presented, combining coded block pattern (CBP) with motion activity and Rate-Distortion (RD) cost. In this algorithm, CBP and CBP4×4, which indicate none-zero coefficient block, is used to determine candidate modes at macroblock level and sub-macroblock level. Further, the normalized motion activity is used to identify the homogeneity of the block and target the candidate modes set more accurate. An early termination is also made between macroblock level and sub-macroblock level by RD cost comparing. The experimental results show that the proposed algorithm achieves approximately 60% of computational complexity saving in terms of encoding time, while with negligible quality degradation, compared to the conventional method in H.264/AVC.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127423267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Area of Interest Management for Large-Scale Immersive Video Conferencing","authors":"Pedram Pourashraf, F. Safaei, D. Franklin","doi":"10.1109/ICMEW.2012.31","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.31","url":null,"abstract":"Although video conferencing and its related applications have grown into a significant research area, the limited scalability of conference size is still a major problem. In this paper, a range of strategies for real-time area of interest (AOI) management in a 3D immersive video conference (IVC) are evaluated with the objective of minimising the required video transmission capacity and hence maximising the number of concurrent users. The paper shows that with judicious application of these techniques, the download capacity requirements of clients can be reduced by as much as 90% in a crowded virtual space.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114889971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vehicle Type Classification Using PCA with Self-Clustering","authors":"Yu Peng, Jesse S. Jin, S. Luo, Min Xu, Yue Cui","doi":"10.1109/ICMEW.2012.73","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.73","url":null,"abstract":"Different conditions, such as occlusions, changes of lighting, shadows and rotations, make vehicle type classification still a challenging task, especially for real-time applications. Most existing methods rely on presumptions on certain conditions, such as lighting conditions and special camera settings. However, these presumptions usually do not work for applications in real world. In this paper, we propose a robust vehicle type classification method based on adaptive multi-class Principal Components Analysis (PCA). We treat car images captured at daytime and night-time separately. Vehicle front is extracted by examining vehicle front width and the location of license plate. Then, after generating eigenvectors to represent extracted vehicle fronts, we propose a PCA method with self-clustering to classify vehicle type. The comparison experiments with the state of art methods and real-time evaluations demonstrate the promising performance of our proposed method. Moreover, as we do not find any public database including sufficient desired images, we built up online our own database including 4924 high-resolution images of vehicle front view for further research on this topic.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133536192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multi-User Interaction System Based on Kinect and Wii Remote","authors":"Yihua Lou, Wenjun Wu, Hui Zhang, Haikuo Zhang, Yongquan Chen","doi":"10.1109/ICMEW.2012.123","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.123","url":null,"abstract":"We will demonstrate a multi-user interaction system that uses Kinect and Wii Remote for manipulating windows in both desktop and wall-sized environment. This system combines the gesture information collected by Kinect and other sensor information such as acceleration from Wii Remote, therefore providing a more accurate control and a more nature experience for users.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128499643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}