提高语音/非语音检测性能的随机森林算法

Sincy V. Thambi, K. T. Sreekumar, C. S. Kumar, P. Raj
{"title":"提高语音/非语音检测性能的随机森林算法","authors":"Sincy V. Thambi, K. T. Sreekumar, C. S. Kumar, P. Raj","doi":"10.1109/COMPSC.2014.7032615","DOIUrl":null,"url":null,"abstract":"Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of time domain, frequency domain and cepstral domain features for short time frames of 20 ms. size along with their mean and standard deviation for segments of size 200 ms. We then analysed if selecting a subset of the features can help improve the performance of the SND system. Towards this, we experimented with different feature selection algorithms, and observed that correlation based feature selection gave the best results. Further, we experimented with different decision tree classification algorithms, and note that random forest algorithm outperformed other decision tree algorithms. We further improved the SND system performance by smoothing the decisions over 5 segments of 200 ms. each. Our baseline system has 272 features, a classification accuracy of 94.45 % and the final system with 8 features has a classification accuracy of 97.80 %.","PeriodicalId":388270,"journal":{"name":"2014 First International Conference on Computational Systems and Communications (ICCSC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Random forest algorithm for improving the performance of speech/non-speech detection\",\"authors\":\"Sincy V. Thambi, K. T. Sreekumar, C. S. Kumar, P. Raj\",\"doi\":\"10.1109/COMPSC.2014.7032615\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of time domain, frequency domain and cepstral domain features for short time frames of 20 ms. size along with their mean and standard deviation for segments of size 200 ms. We then analysed if selecting a subset of the features can help improve the performance of the SND system. Towards this, we experimented with different feature selection algorithms, and observed that correlation based feature selection gave the best results. Further, we experimented with different decision tree classification algorithms, and note that random forest algorithm outperformed other decision tree algorithms. We further improved the SND system performance by smoothing the decisions over 5 segments of 200 ms. each. Our baseline system has 272 features, a classification accuracy of 94.45 % and the final system with 8 features has a classification accuracy of 97.80 %.\",\"PeriodicalId\":388270,\"journal\":{\"name\":\"2014 First International Conference on Computational Systems and Communications (ICCSC)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 First International Conference on Computational Systems and Communications (ICCSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSC.2014.7032615\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 First International Conference on Computational Systems and Communications (ICCSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSC.2014.7032615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

语音/非语音检测(SND)在录制的音频和视频文档中区分语音和非语音段。当只需要音频文档中的语音片段时,SND系统可以帮助减少所需的存储空间,例如内容分析、口语识别等。在这项工作中,我们对20毫秒大小的短时间帧进行了时域、频域和倒谱域特征的实验,并对200毫秒大小的片段进行了平均值和标准差的实验。然后,我们分析了选择特征子集是否有助于提高SND系统的性能。为此,我们对不同的特征选择算法进行了实验,发现基于相关性的特征选择效果最好。此外,我们对不同的决策树分类算法进行了实验,并注意到随机森林算法优于其他决策树算法。我们进一步提高了SND系统的性能,平滑了5个区段,每个区段200毫秒。我们的基线系统有272个特征,分类准确率为94.45%,最终系统有8个特征,分类准确率为97.80%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Random forest algorithm for improving the performance of speech/non-speech detection
Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of time domain, frequency domain and cepstral domain features for short time frames of 20 ms. size along with their mean and standard deviation for segments of size 200 ms. We then analysed if selecting a subset of the features can help improve the performance of the SND system. Towards this, we experimented with different feature selection algorithms, and observed that correlation based feature selection gave the best results. Further, we experimented with different decision tree classification algorithms, and note that random forest algorithm outperformed other decision tree algorithms. We further improved the SND system performance by smoothing the decisions over 5 segments of 200 ms. each. Our baseline system has 272 features, a classification accuracy of 94.45 % and the final system with 8 features has a classification accuracy of 97.80 %.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信