{"title":"Sequential Monte Carlo fusion of sound and vision for speaker tracking","authors":"J. Vermaak, Michel Gangnet, A. Blake, P. Pérez","doi":"10.1109/ICCV.2001.937600","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937600","url":null,"abstract":"Video telephony could be considerably enhanced by provision of a tracking system that allows freedom of movement to the speaker while maintaining a well-framed image, for transmission over limited bandwidth. Already commercial multi-microphone systems exist which track speaker direction in order to reject background noise. Stereo sound and vision are complementary modalities in that sound is good for initialisation (where vision is expensive) whereas vision is good for localisation (where sound is less precise). Using generative probabilistic models and particle filtering, we show that stereo sound and vision can indeed be fused effectively, to make a system more capable than with either modality on its own.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129861118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards real-time multi-modality 3-D medical image registration","authors":"T. Netsch, P. Rösch, A. V. Muiswinkel, J. Weese","doi":"10.1109/ICCV.2001.937595","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937595","url":null,"abstract":"Intensity value-based registration is a widely used technique for the spatial alignment of medical images. Generally, the registration transformation is determined by iteratively optimizing a similarity measure calculated from the grey values of both images. However, such algorithms may have high computational costs, especially in the case of multi-modality registration, which makes their integration into systems difficult. At present, registration based on mutual information (MI) still requires computation times of the order of several minutes. In this contribution we focus on a new similarity measure based on local correlation (LC) which is well-suited for numerical optimization. We show that LC can be formulated as a least-squares criterion which allows the use of dedicated methods. Thus, it is possible to register MR neuro perfusion time-series (128/sup 2//spl times/30 voxel, 40 images) on a moderate workstation in real-time: the registration of an image takes about 500 ms and is therefore several times faster than image acquisition time. For the registration of CT-MR images (512/sup 2//spl times/87 CT 256/sup 2//spl times/128 MR) a multiresolution framework is used. On top of the decomposition, which requires 47 s of computation time, the optimization with an algorithm based on Ml previously described in the literature takes 97 s. In contrast, the proposed approach only takes 13 s, corresponding to a speedup about a factor of 7. Furthermore, we demonstrate that the superior computational performance of LC is not gained at the expense of accuracy. In particular experiments with dual contrast MR images providing ground truth for the registration show a comparable sub-voxel accuracy of LC and MI similarity.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128734750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naoya Takao, Jianbo Shi, S. Baker, I. Matthews, Bart C. Nabbe
{"title":"Tele-graffiti: a pen and paper-based remote sketching system","authors":"Naoya Takao, Jianbo Shi, S. Baker, I. Matthews, Bart C. Nabbe","doi":"10.1109/ICCV.2001.937712","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937712","url":null,"abstract":"Tele-Graffiti is a system allowing two.or more users to communicate remotely via hand-drawn sketches. What one person writes at one site is captured using a video camera, transmitted to the other site(s), and displayed there using an LCD projector. The advantage of our system over other intelligent desktops and white-boards is that the users are free to move the pieces of paper on which they are writing. In Tele-Graffiti, paper detection and tracking is based on real-time paper boundary detection.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127865043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic learning and modelling of object dynamics for tracking","authors":"T. Tay, K. Sung","doi":"10.1109/ICCV.2001.937580","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937580","url":null,"abstract":"The problem of tracking can be decomposed and independently addressed in two steps, namely the prediction step and the verification step. In this paper we present a new approach of addressing the prediction step that is based on modelling joint probability densities of successive states of tracked objects. This approach has the advantage that it is conceptually general such that given sufficient training data, it is capable of modelling a wide range of complex dynamics. Furthermore, we show that this conceptual prediction framework can be implemented in a tractable manner using a Gaussian mixture representation which allows predictions to be generated efficiently. We then descibe experiments that demonstrate these benefits.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124588758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On cosine-fourth and vignetting effects in real lenses","authors":"Manoj Aggarwal, H. Hua, N. Ahuja","doi":"10.1109/ICCV.2001.937554","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937554","url":null,"abstract":"This paper has been prompted by observations of disparities between the observed fall-off in irradiance for off-axis points and that accounted for by the cosine-fourth and vignetting effects. A closer examination of the image formation process for real lenses revealed that even in the absence of vignetting a point light source does not uniformly illuminate the aperture, an effect known as pupil aberration. For example, we found the variation for a 16 mm lens to be as large as 31% for a field angle of 10/spl deg/. In this paper, we critically evaluate the roles of cosine-fourth and vignetting effects and demonstrate the significance of the pupil aberration on the fall-off in irradiance away from image center. The pupil aberration effect strongly depends on the aperture size and shape and this dependence has been demonstrated through two sets of experiments with three real lenses. The effect of pupil aberration is thus a third important cause of fall in irradiance away from the image center in addition to the familiar cosine-fourth and vignetting effects, that must be taken into account in applications that rely heavily on photometric variation such as shape from shading and mosaicing.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126250088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong Chen, Ying-Qing Xu, H. Shum, Song-Chun Zhu, Nanning Zheng
{"title":"Example-based facial sketch generation with non-parametric sampling","authors":"Hong Chen, Ying-Qing Xu, H. Shum, Song-Chun Zhu, Nanning Zheng","doi":"10.1109/ICCV.2001.937657","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937657","url":null,"abstract":"In this paper, we present an example-based facial sketch system. Our system automatically generates a sketch from an input image, by learning from example sketches drawn with a particular style by an artist. There are two key elements in our system: a non-parametric sampling method and a flexible sketch model. Given an input image pixel and its neighborhood, the conditional distribution of a sketch point is computed by querying the examples and finding all similar neighborhoods. An \"expected sketch image\" is then drawn from the distribution to reflect the drawing style. Finally, facial sketches are obtained by incorporating the sketch model. Experimental results demonstrate the effectiveness of our techniques.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127452374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human tracking with mixtures of trees","authors":"Sergey Ioffe, D. Forsyth","doi":"10.1109/ICCV.2001.937589","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937589","url":null,"abstract":"Tree-structured probabilistic models admit simple, fast inference. However they are not well suited to phenonena such as occlusion, where multiple components of an object may disappear simultaneously. We address this problem with mixtures of trees, and demonstrate an efficient and compact representation of this mixture, which admits simple learning and inference algorithms. We use this method to build an automated tracker for Muybridge sequences of a variety of human activities. Tracking is difficult, because the temporal dependencies rule out simple inference methods. We show how to use our model for efficient inference, using a method that employs alternate spatial and temporal inference. The result is a cracker that (a) uses a very loose motion model, and so can track many different activities at a variable frame rate and (b) is entirely, automatic.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126921759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stereo matching by compact windows via minimum ratio cycle","authors":"O. Veksler","doi":"10.1109/ICCV.2001.937563","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937563","url":null,"abstract":"Window size and shape selection is a difficult problem in area based stereo. We propose an algorithm which chooses an appropriate window shape by optimizing over a large class of \"compact\" windows. We call them compact because their ratio of perimeter to area tends to be small. We believe that this is the first window matching algorithm which can explicitly construct non-rectangular windows. Efficient optimization over the compact window class is achieved via the minimum ratio cycle algorithm. In practice it takes time linear in the size of the largest window in our class. Still the straightforward approach to find the optimal window for each pixel-disparity pair is too slow. We develop pruning heuristics which gave practically the same results while reducing running time from minutes to seconds. Our experiments show that unlike fixed window algorithms, our method avoids blurring disparity boundaries as well as constructs large windows in low textured areas. The algorithm has few parameters which are easy to choose, and the same parameters work well for different image pairs.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125459969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structure and motion from silhouettes","authors":"Kwan-Yee Kenneth Wong, R. Cipolla","doi":"10.1109/ICCV.2001.937627","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937627","url":null,"abstract":"This paper addresses the problem of recovering structure and motion from silhouettes. Silhouettes are projections of contour generators which are viewpoint dependent, and hence do not readily provide point correspondences for exploitation in motion estimation. Previous works have exploited correspondences induced by epipolar tangencies, and a successful solution has been developed in the special case of circular motion (turnable sequences). However, the main drawbacks are (1) new views cannot be added easily at a later time, and (2) part of the structure will always remain invisible under circular motion. In this paper we overcome the above problems by incorporating arbitrary general views and estimating the camera poses using silhouettes alone. We present a complete and practical system which produces high quality 3D models from 2D uncalibrated silhouettes. The 3D models thus obtained can be refined incrementally by adding new arbitrary views and estimating their poses. Experimental results on various objects are presented, demonstrating the quality of the reconstructions.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131154559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Depth from defocus in presence of partial self occlusion","authors":"S. Bhasin, S. Chaudhuri","doi":"10.1109/ICCV.2001.937556","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937556","url":null,"abstract":"Contrary to the normal belief we show that self occlusion is present in any real aperture image and we present a method on how we can take care of the occlusion while recovering the depth using the defocus as the cue. The space-variant blur is modeled as an MRF and the MAP estimates are obtained for both the depth map and the everywhere focused intensity image. The blur kernel is adjusted in the regions where occlusion is present, particularly at the regions of discontinuities in the scene. The performance of the proposed algorithm is tested over synthetic data and the estimates are found to be better than the earlier schemes where such subtle effects were ignored.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132368844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}