Scott Novotney, D. Karakos, J. Silovský, R. Schwartz
{"title":"BBN technologies' OpenSAD system","authors":"Scott Novotney, D. Karakos, J. Silovský, R. Schwartz","doi":"10.1109/SLT.2016.7846238","DOIUrl":null,"url":null,"abstract":"We describe our submission to the NIST OpenSAD evaluation of speech activity detection of noisy audio generated by the DARPA RATS program. With frequent transmission degradation, channel interference and other noises added, simple energy thresholds do a poor job at SAD for this audio. The evaluation measured performance on both in-training and novel channels. Our approach used a system combination of feed-forward neural networks and bidirectional LSTM recurrent neural networks. System combination and unsupervised adaptation provided further gains on novel channels that lack training data. These improvements lead to a 26% relative improvement for novel channels over simple decoding. Our system resulted in the lowest error rate on the in-training channels and second on the out-of-training channels.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"21 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We describe our submission to the NIST OpenSAD evaluation of speech activity detection of noisy audio generated by the DARPA RATS program. With frequent transmission degradation, channel interference and other noises added, simple energy thresholds do a poor job at SAD for this audio. The evaluation measured performance on both in-training and novel channels. Our approach used a system combination of feed-forward neural networks and bidirectional LSTM recurrent neural networks. System combination and unsupervised adaptation provided further gains on novel channels that lack training data. These improvements lead to a 26% relative improvement for novel channels over simple decoding. Our system resulted in the lowest error rate on the in-training channels and second on the out-of-training channels.