{"title":"Calibration between depth and color sensors for commodity depth cameras","authors":"Cha Zhang, Zhengyou Zhang","doi":"10.1109/ICME.2011.6012191","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012191","url":null,"abstract":"Commodity depth cameras have created many interesting new applications in the research community recently. These applications often require the calibration information between the color and the depth cameras. Traditional checkerboard based calibration schemes fail to work well for the depth camera, since its corner features cannot be reliably detected in the depth image. In this paper, we present a maximum likelihood solution for the joint depth and color calibration based on two principles. First, in the depth image, points on the checker-board shall be co-planar, and the plane is known from color camera calibration. Second, additional point correspondences between the depth and color images may be manually specified or automatically established to help improve calibration accuracy. Uncertainty in depth values has been taken into account systematically. The proposed algorithm is reliable and accurate, as demonstrated by extensive experimental results on simulated and real-world examples.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130337394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A middleware platform for real-time processing of multiple video streams based on the data-flow paradigm","authors":"P. Foggia, M. Vento","doi":"10.1109/ICME.2011.6012142","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012142","url":null,"abstract":"In this paper we introduce a new software platform for the realization of intelligent video-surveillance applications and, more generally, of real-time video stream processing systems. The platform is implemented as a middleware, providing general purpose services, and a collection of dynamically loaded modules carrying out domain-specific tasks. The architecture of the platform follows a data-flow paradigm, where the application is organized as a processing network whose nodes are activated by the middleware as soon as their inputs are available and a processor is ready. This architecture is beneficial both with respect to the development process, simplifying the module implementation and favoring the reuse of software components, and with respect to the performance, since the middleware can automatically parallelize the processing using the available processors or cores. The platform has been validated by converting an existing video surveillance application, demonstrating both the improvement in the development process and the performance increment.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115540929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. A. Belloch, Alberto González, F. Martínez-Zaldívar, A. Vidal
{"title":"A real-time crosstalk canceller on a notebook GPU","authors":"J. A. Belloch, Alberto González, F. Martínez-Zaldívar, A. Vidal","doi":"10.1109/ICME.2011.6012072","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012072","url":null,"abstract":"Crosstalk cancellation is one of the main applications in multichannel acoustic signal processing. This field has experienced a major development in recent years because of the increase in the number of sound sources used in playback applications available to users. Developing these applications requires high computing capabilities because of its high number of operations. Graphics Processor Unit (GPU), a high parallel commodity programmable co-processors, offer the possibility of parallelizing these operations. This allows to obtain the results in a much shorter time and also to free up CPU resources which can be used for other tasks. One important aspect lies in the possibility to overlap the data transfer from CPU to GPU and vice versa with the computation, in order to carry out real-time applications. Thus, this work focuses on two main points: to describe an efficient implementation of a crosstalk cancellation on GPU and to incorporate it into a real-time application.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115619520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajwa Alharthi, Rania Albalawi, Mahmud Abdo, Abdulmotaleb El Saddik
{"title":"A context-aware e-health framework for students with moderate intellectual and learning disabilities","authors":"Rajwa Alharthi, Rania Albalawi, Mahmud Abdo, Abdulmotaleb El Saddik","doi":"10.1109/ICME.2011.6012218","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012218","url":null,"abstract":"In this paper we address the challenges of adding context-awareness to e-Health systems for students with moderate learning and intellectual disability. Our proposed framework provides a personalized e-Health environment containing context-aware learning media, services and user interface to deal with individuals with disability. The framework adapts accessibility as well as user interfaces dynamically based on user disability. We present the detailed design and implementation of the framework.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114256974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visual object recognition using DAISY descriptor","authors":"Chao Zhu, Charles-Edmond Bichot, Liming Chen","doi":"10.1109/ICME.2011.6011957","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011957","url":null,"abstract":"Visual content description is a key issue for the task of machine-based visual object categorization (VOC). A good visual descriptor should be both discriminative enough and computationally efficient while possessing some properties of robustness to viewpoint changes and lighting condition variations. The recent literature has featured local image descriptors, e.g. SIFT, as the main trend in VOC. However, it is well known that SIFT is computationally expensive, especially when the number of objects/concepts and learning data increase significantly. In this paper, we investigate the DAISY, which is a new fast local descriptor introduced for wide baseline matching problem, in the context of VOC. We carefully evaluate and compare the DAISY descriptor with SIFT both in terms of recognition accuracy and computation complexity on two standard image benchmarks - Caltech 101 and PASCAL VOC 2007. The experimental results show that DAISY outperforms the state-of-the-art SIFT while using shorter descriptor length and operating 3 times faster. When displaying a similar recognition accuracy to SIFT, DAISY can operate 12 times faster.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"17 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114088478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Mehrotra, Weig-Ge Chen, Zhengyou Zhang, P. Chou
{"title":"Realistic audio in immersive video conferencing","authors":"S. Mehrotra, Weig-Ge Chen, Zhengyou Zhang, P. Chou","doi":"10.1109/ICME.2011.6012065","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012065","url":null,"abstract":"With increasing computation power, network bandwidth, and improvements in display and capture technologies, fully immersive conferencing and tele-immersion is becoming ever closer to reality. Outside of video, one of the key components needed is high quality spatialized audio. This paper presents an implementation of a relatively low complexity, simple solution which allows realistic audio spatialization of arbitrary positions in a 3D video conference. When combined with pose tracking, it also allows the audio to change relative to which position on the screen the viewer is looking at.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114216873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient motion vector outlier removal for global motion estimation","authors":"T. Dinh, Gueesang Lee","doi":"10.1109/ICME.2011.6011881","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011881","url":null,"abstract":"Because motion vector based global motion estimation methods have much lower complexity than pixel based ones, they are widely used in the compressed domain to estimate the camera motion in video sequences. However, the accuracy of these motion vector based methods largely depends on the quality of the input motion vector field. In real applications, many outlier motion vectors are present because of noise or foreground objects. In this paper, a novel tensor voting based motion vector outlier removal method is proposed to improve the quality of the input motion vector field. First, motion vectors are encoded by second order tensors. A 2-D voting process is then used to smooth the motion vector field. Finally, the smoothed motion vector field is compared to the input one to detect outliers. The experimental results on synthetic and real data show the effectiveness of the proposed method.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117088499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David M. Chen, Sam S. Tsai, Cheng-Hsin Hsu, J. Singh, B. Girod
{"title":"Mobile augmented reality for books on a shelf","authors":"David M. Chen, Sam S. Tsai, Cheng-Hsin Hsu, J. Singh, B. Girod","doi":"10.1109/ICME.2011.6012171","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012171","url":null,"abstract":"Retrieving information about books on a bookshelf by snapping a photo of book spines with a mobile device is very useful for book-stores, libraries, offices, and homes. In this paper, we develop a new mobile augmented reality system for book spine recognition. Our system achieves very low recognition delays, around 1 second, to support real-time augmentation on a mobile device's viewfinder. We infer user interest by analyzing the motion of objects seen in the viewfinder. Our system initiates a query during each low-motion interval. This selection mechanism eliminates the need to press a button and avoids using degraded motion-blurred query frames during high-motion intervals. The viewfinder is augmented with a book's identity, prices from different vendors, average user rating, location within the enclosing bookshelf, and a digital compass marker. We present a new tiled search strategy for finding the location in the bookshelf with improved accuracy in half the time as in a previous state-of-the-art system. Our AR system has been implemented on an Android smartphone.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115385337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rotation invariant texture feature extraction based on Sorted Neighborhood Differences","authors":"K. Saipullah, Deok‐Hwan Kim, Seok-Lyong Lee","doi":"10.1109/ICME.2011.6011907","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011907","url":null,"abstract":"Rotation invariant texture descriptor plays an important role in texture-based object classification. However the classification accuracy may decrease due to the inconsistent performance of texture descriptor with respect to various rotated angles. In this paper we propose a consistent rotation invariant texture descriptor named Sorted Neighborhood Differences (SND). SND is derived from the integration of sorted neigh- borhood and binary patterns. Experimental results show that overall texture classification accuracy of SND with respect to different rotations using OUTEX TC 0010 texture database is 91.81% whereas those of LBPriu and LBP-HF are 86.42% and 88.28%, respectively. The texture and coin classification accuracies of SND are also consistent in various rotation angles and illumination levels.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123207862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High performance computing of line of sight viewshed","authors":"Ligang Lu, B. Paulovicks, M. Perrone, V. Sheinin","doi":"10.1109/ICME.2011.6011875","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011875","url":null,"abstract":"In this paper we present our recent research and development work for multicore computing of Line of Sight (LoS) on the Cell Broadband Engine (CBE) processors. LoS can be found in many applications where real-time high performance computing is required. We will describe an efficient LoS multi-core parallel computing algorithm, including the data partition and computation load allocation strategies to fully utilize the CBE's computational resources for efficient LoS viewshed parallel computing. In addition, we will also illustrate a successive fast transpose algorithm to prepare the input data for efficient Single-Instruction-Multiple-Data (SIMD) operations. Furthermore, we describe the data input and output (I/O) management scheme to reduce the (I/O) latency in Direct-Memory-Access (DMA) data fetching and storing operations. The performance evaluation of our LoS viewshed computing scheme over an area of interest (AOI) with more than 4.19 million points has shown that our parallel computing algorithm on CBE takes less than 25.5 ms, which is several orders of magnitude faster than the available commercial systems.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123324359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}