Ning Gao, Gregory Sell, Douglas W. Oard, Mark Dredze
{"title":"Leveraging side information for speaker identification with the Enron conversational telephone speech collection","authors":"Ning Gao, Gregory Sell, Douglas W. Oard, Mark Dredze","doi":"10.1109/ASRU.2017.8268988","DOIUrl":null,"url":null,"abstract":"Speaker identification experiments typically focus on acoustic signals, but conversational speech often occurs in settings where additional useful side information may be available. This paper introduces a new distributable speaker identification test collection based on recorded telephone calls of Enron energy traders. Experiments with these recordings demonstrate that social network features and recording channel metadata can be used to reduce error rates in speaker identification below that achieved using acoustic evidence alone. Social network features from the parallel Enron email collection (37 of the 41 speakers in the telephone recordings sent or received emails in the collection) improve speaker identification, as do social network features computed using lightly supervised techniques to estimate a social network from more than one thousand unlabeled recordings.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Speaker identification experiments typically focus on acoustic signals, but conversational speech often occurs in settings where additional useful side information may be available. This paper introduces a new distributable speaker identification test collection based on recorded telephone calls of Enron energy traders. Experiments with these recordings demonstrate that social network features and recording channel metadata can be used to reduce error rates in speaker identification below that achieved using acoustic evidence alone. Social network features from the parallel Enron email collection (37 of the 41 speakers in the telephone recordings sent or received emails in the collection) improve speaker identification, as do social network features computed using lightly supervised techniques to estimate a social network from more than one thousand unlabeled recordings.