Estimating mixture models of images and inferring spatial transformations using the EM algorithm

Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149) Pub Date : 1999-06-23 DOI:10.1109/CVPR.1999.786972

B. Frey, N. Jojic

{"title":"Estimating mixture models of images and inferring spatial transformations using the EM algorithm","authors":"B. Frey, N. Jojic","doi":"10.1109/CVPR.1999.786972","DOIUrl":null,"url":null,"abstract":"Mixture modeling and clustering algorithms are effective, simple ways to represent images using a set of data centers. However, in situations where the images include background clutter and transformations such as translation, rotation, shearing and warping, these methods extract data centers that include clutter and represent different transformations of essentially the same data. Taking face images as an example, it would be more useful for the different clusters to represent different poses and expressions, instead of cluttered versions of different translations, scales and rotations. By including clutter and transformation as unobserved, latent variables in a mixture model, we obtain a new \"transformed mixture of Gaussians\", which is invariant to a specified set of transformations. We show how a linear-time EM algorithm can be used to fit this model by jointly estimating a mixture model for the data and inferring the transformation for each image. We show that this algorithm can jointly align images of a human head and learn different poses. We also find that the algorithm performs better than k-nearest neighbors and mixtures of Gaussians on handwritten digit recognition.","PeriodicalId":20644,"journal":{"name":"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)","volume":"57 1","pages":"416-422 Vol. 1"},"PeriodicalIF":0.0000,"publicationDate":"1999-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"94","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.1999.786972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 94

Abstract

Mixture modeling and clustering algorithms are effective, simple ways to represent images using a set of data centers. However, in situations where the images include background clutter and transformations such as translation, rotation, shearing and warping, these methods extract data centers that include clutter and represent different transformations of essentially the same data. Taking face images as an example, it would be more useful for the different clusters to represent different poses and expressions, instead of cluttered versions of different translations, scales and rotations. By including clutter and transformation as unobserved, latent variables in a mixture model, we obtain a new "transformed mixture of Gaussians", which is invariant to a specified set of transformations. We show how a linear-time EM algorithm can be used to fit this model by jointly estimating a mixture model for the data and inferring the transformation for each image. We show that this algorithm can jointly align images of a human head and learn different poses. We also find that the algorithm performs better than k-nearest neighbors and mixtures of Gaussians on handwritten digit recognition.

查看原文本刊更多论文

利用EM算法估计图像混合模型和推断空间变换

混合建模和聚类算法是使用一组数据中心表示图像的有效、简单的方法。但是，在图像包含背景杂波和转换(如平移、旋转、剪切和翘曲)的情况下，这些方法提取包含杂波的数据中心，并表示本质上相同数据的不同转换。以人脸图像为例，不同的簇表示不同的姿势和表情，而不是不同的平移、缩放和旋转的杂乱版本，会更有用。通过将杂波和变换作为未观察到的潜在变量加入混合模型中，我们得到了一种新的“变换后的高斯混合”，它对一组特定的变换是不变的。我们展示了线性时间EM算法如何通过联合估计数据的混合模型和推断每个图像的转换来拟合该模型。我们证明了该算法可以联合对齐人类头部图像并学习不同的姿势。我们还发现该算法在手写数字识别上的性能优于k近邻和混合高斯。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)

自引率

0.00%

发文量