CO-LDA:一种半监督的视听人物识别方法

Xuran Zhao, N. Evans, J. Dugelay
{"title":"CO-LDA:一种半监督的视听人物识别方法","authors":"Xuran Zhao, N. Evans, J. Dugelay","doi":"10.1109/ICME.2012.14","DOIUrl":null,"url":null,"abstract":"Client models used in Automatic Speaker Recognition (ASR) and Automatic Face Recognition (AFR) are usually trained with labelled data acquired in a small number of menthol sessions. The amount of training data is rarely sufficient to reliably represent the variation which occurs later during testing. Larger quantities of client-specific training data can always be obtained, but manual collection and labelling is often cost-prohibitive. Co-training, a paradigm of semi-supervised machine learning, which can exploit unlabelled data to enhance weakly learned client models. In this paper, we propose a co-LDA algorithm which uses both labelled and unlabelled data to capture greater intersession variation and to learn discriminative subspaces in which test examples can be more accurately classified. The proposed algorithm is naturally suited to audio-visual person recognition because vocal and visual biometric features intrinsically satisfy the assumptions of feature sufficiency and independency which guarantee the effectiveness of co-training. When tested on the MOBIO database, the proposed co-training system raises a baseline identification rate from 71% to 99% while in a verification task the Equal Error Rate (EER) is reduced from 18% to about 1%. To our knowledge, this is the first successful application of co-training in audio-visual biometric systems.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"CO-LDA: A Semi-supervised Approach to Audio-Visual Person Recognition\",\"authors\":\"Xuran Zhao, N. Evans, J. Dugelay\",\"doi\":\"10.1109/ICME.2012.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Client models used in Automatic Speaker Recognition (ASR) and Automatic Face Recognition (AFR) are usually trained with labelled data acquired in a small number of menthol sessions. The amount of training data is rarely sufficient to reliably represent the variation which occurs later during testing. Larger quantities of client-specific training data can always be obtained, but manual collection and labelling is often cost-prohibitive. Co-training, a paradigm of semi-supervised machine learning, which can exploit unlabelled data to enhance weakly learned client models. In this paper, we propose a co-LDA algorithm which uses both labelled and unlabelled data to capture greater intersession variation and to learn discriminative subspaces in which test examples can be more accurately classified. The proposed algorithm is naturally suited to audio-visual person recognition because vocal and visual biometric features intrinsically satisfy the assumptions of feature sufficiency and independency which guarantee the effectiveness of co-training. When tested on the MOBIO database, the proposed co-training system raises a baseline identification rate from 71% to 99% while in a verification task the Equal Error Rate (EER) is reduced from 18% to about 1%. To our knowledge, this is the first successful application of co-training in audio-visual biometric systems.\",\"PeriodicalId\":273567,\"journal\":{\"name\":\"2012 IEEE International Conference on Multimedia and Expo\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Conference on Multimedia and Expo\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2012.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Multimedia and Expo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2012.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

用于自动说话人识别(ASR)和自动面部识别(AFR)的客户端模型通常使用在少量薄荷醇会话中获得的标记数据进行训练。训练数据的数量很少足以可靠地表示后来在测试期间发生的变化。总是可以获得大量特定于客户的培训数据,但手动收集和标记往往成本过高。协同训练,半监督机器学习的一种范例,它可以利用未标记的数据来增强弱学习的客户模型。在本文中,我们提出了一种协同lda算法,该算法使用标记和未标记的数据来捕获更大的会话间变化,并学习判别子空间,从而可以更准确地对测试示例进行分类。由于声视觉生物特征本质上满足特征充分性和独立性的假设,从而保证了协同训练的有效性,因此该算法自然适合于视听人物识别。在MOBIO数据库上进行测试时,所提出的协同训练系统将基线识别率从71%提高到99%,而在验证任务中,相等错误率(EER)从18%降低到1%左右。据我们所知,这是联合训练在视听生物识别系统中的首次成功应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CO-LDA: A Semi-supervised Approach to Audio-Visual Person Recognition
Client models used in Automatic Speaker Recognition (ASR) and Automatic Face Recognition (AFR) are usually trained with labelled data acquired in a small number of menthol sessions. The amount of training data is rarely sufficient to reliably represent the variation which occurs later during testing. Larger quantities of client-specific training data can always be obtained, but manual collection and labelling is often cost-prohibitive. Co-training, a paradigm of semi-supervised machine learning, which can exploit unlabelled data to enhance weakly learned client models. In this paper, we propose a co-LDA algorithm which uses both labelled and unlabelled data to capture greater intersession variation and to learn discriminative subspaces in which test examples can be more accurately classified. The proposed algorithm is naturally suited to audio-visual person recognition because vocal and visual biometric features intrinsically satisfy the assumptions of feature sufficiency and independency which guarantee the effectiveness of co-training. When tested on the MOBIO database, the proposed co-training system raises a baseline identification rate from 71% to 99% while in a verification task the Equal Error Rate (EER) is reduced from 18% to about 1%. To our knowledge, this is the first successful application of co-training in audio-visual biometric systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信