视觉语音识别中协方差路径的率不变比较

2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG) Pub Date : 2013-12-01 DOI:10.1109/NCVPRIPG.2013.6776200

Jingyong Su, Anuj Srivastava, F. Souza, Sudeep Sarkar

{"title":"视觉语音识别中协方差路径的率不变比较","authors":"Jingyong Su, Anuj Srivastava, F. Souza, Sudeep Sarkar","doi":"10.1109/NCVPRIPG.2013.6776200","DOIUrl":null,"url":null,"abstract":"An important problem in speech, and generally activity, recognition is to develop analyses that are invariant to the execution rates. We introduce a theoretical framework that provides a parametrization-invariant metric for comparing parametrized paths on Riemannian manifolds. Treating instances of activities as parametrized paths on a Riemannian manifold of covariance matrices, we apply this framework to the problem of visual speech recognition from image sequences. We represent each sequence as a path on the space of covariance matrices, each covariance matrix capturing spatial variability of visual features in a frame, and perform simultaneous pairwise temporal alignment and comparison of paths. This removes the temporal variability and helps provide a robust metric for visual speech classification. We evaluated this idea on the OuluVS database and the rank-1 nearest neighbor classification rate improves from 32% to 57% due to temporal alignment.","PeriodicalId":436402,"journal":{"name":"2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rate-invariant comparisons of covariance paths for visual speech recognition\",\"authors\":\"Jingyong Su, Anuj Srivastava, F. Souza, Sudeep Sarkar\",\"doi\":\"10.1109/NCVPRIPG.2013.6776200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An important problem in speech, and generally activity, recognition is to develop analyses that are invariant to the execution rates. We introduce a theoretical framework that provides a parametrization-invariant metric for comparing parametrized paths on Riemannian manifolds. Treating instances of activities as parametrized paths on a Riemannian manifold of covariance matrices, we apply this framework to the problem of visual speech recognition from image sequences. We represent each sequence as a path on the space of covariance matrices, each covariance matrix capturing spatial variability of visual features in a frame, and perform simultaneous pairwise temporal alignment and comparison of paths. This removes the temporal variability and helps provide a robust metric for visual speech classification. We evaluated this idea on the OuluVS database and the rank-1 nearest neighbor classification rate improves from 32% to 57% due to temporal alignment.\",\"PeriodicalId\":436402,\"journal\":{\"name\":\"2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)\",\"volume\":\"93 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCVPRIPG.2013.6776200\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCVPRIPG.2013.6776200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语音识别和一般活动识别的一个重要问题是开发对执行率不变的分析。我们引入了一个理论框架，提供了一个参数化不变度量来比较黎曼流形上的参数化路径。将活动实例视为协方差矩阵黎曼流形上的参数化路径，我们将该框架应用于图像序列的视觉语音识别问题。我们将每个序列表示为协方差矩阵空间上的路径，每个协方差矩阵捕获一帧中视觉特征的空间变异性，并同时进行成对的时间对齐和路径比较。这消除了时间的可变性，并有助于为视觉语音分类提供一个健壮的度量。我们在OuluVS数据库上评估了这个想法，由于时间对齐，排名1的最近邻分类率从32%提高到57%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rate-invariant comparisons of covariance paths for visual speech recognition

An important problem in speech, and generally activity, recognition is to develop analyses that are invariant to the execution rates. We introduce a theoretical framework that provides a parametrization-invariant metric for comparing parametrized paths on Riemannian manifolds. Treating instances of activities as parametrized paths on a Riemannian manifold of covariance matrices, we apply this framework to the problem of visual speech recognition from image sequences. We represent each sequence as a path on the space of covariance matrices, each covariance matrix capturing spatial variability of visual features in a frame, and perform simultaneous pairwise temporal alignment and comparison of paths. This removes the temporal variability and helps provide a robust metric for visual speech classification. We evaluated this idea on the OuluVS database and the rank-1 nearest neighbor classification rate improves from 32% to 57% due to temporal alignment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)

自引率

0.00%

发文量