基于长期时间变化的紧急语音邮件检测

Hosana Kamiyama, Atsushi Ando, Ryo Masumura, Satoshi Kobashikawa, Y. Aono
{"title":"基于长期时间变化的紧急语音邮件检测","authors":"Hosana Kamiyama, Atsushi Ando, Ryo Masumura, Satoshi Kobashikawa, Y. Aono","doi":"10.1109/APSIPAASC47483.2019.9023034","DOIUrl":null,"url":null,"abstract":"This paper proposes a effective urgent speech detection for voicemails focused on speech rhythm. Previous techniques use short-term features with millisecond scale (such as fundamental frequency, loudness and spectral features), and conventional techniques for urgent speech detection use also features obtained from entire speech (such as average speech rate). However, the features obtained from entire speech are too over-smoothed to explain the difference between urgent and nonurgent speech. We found that there was a difference between urgent and non-urgent speech in temporal variability related to speech rhythm. To handle the temporal variability of speech rhythm, the proposal extracts long-term temporal features. The long-term temporal features are envelope modulation spectrum and temporal statistics of Mel-frequency cepstrum coefficient with 1 sec scale. To use both features with different time scales, the proposed method integrates the long-term temporal features and the short-term features on neural networks. Our proposal yields better accuracy than the conventional methods (which uses e features obtained from entire speech); it achieves a 50.0% reduction in the error rate.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Urgent Voicemail Detection Focused on Long-term Temporal Variation\",\"authors\":\"Hosana Kamiyama, Atsushi Ando, Ryo Masumura, Satoshi Kobashikawa, Y. Aono\",\"doi\":\"10.1109/APSIPAASC47483.2019.9023034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a effective urgent speech detection for voicemails focused on speech rhythm. Previous techniques use short-term features with millisecond scale (such as fundamental frequency, loudness and spectral features), and conventional techniques for urgent speech detection use also features obtained from entire speech (such as average speech rate). However, the features obtained from entire speech are too over-smoothed to explain the difference between urgent and nonurgent speech. We found that there was a difference between urgent and non-urgent speech in temporal variability related to speech rhythm. To handle the temporal variability of speech rhythm, the proposal extracts long-term temporal features. The long-term temporal features are envelope modulation spectrum and temporal statistics of Mel-frequency cepstrum coefficient with 1 sec scale. To use both features with different time scales, the proposed method integrates the long-term temporal features and the short-term features on neural networks. Our proposal yields better accuracy than the conventional methods (which uses e features obtained from entire speech); it achieves a 50.0% reduction in the error rate.\",\"PeriodicalId\":145222,\"journal\":{\"name\":\"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPAASC47483.2019.9023034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPAASC47483.2019.9023034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种基于语音节奏的语音紧急检测方法。以前的技术使用毫秒尺度的短期特征(如基频、响度和频谱特征),而传统的紧急语音检测技术也使用从整个语音中获得的特征(如平均语音率)。然而,从整个语音中获得的特征过于平滑,无法解释紧急和非紧急语音之间的区别。我们发现紧急和非紧急言语在与言语节奏相关的时间变异性上存在差异。为了处理语音节奏的时间变异性,该方法提取了语音节奏的长期时间特征。长期的时间特征是包络调制频谱和mel频率倒谱系数的时间统计量。为了同时使用不同时间尺度的特征,该方法将神经网络上的长期时间特征和短期特征相结合。我们的建议比传统方法(使用从整个语音中获得的e个特征)产生更好的准确性;它使错误率降低了50.0%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Urgent Voicemail Detection Focused on Long-term Temporal Variation
This paper proposes a effective urgent speech detection for voicemails focused on speech rhythm. Previous techniques use short-term features with millisecond scale (such as fundamental frequency, loudness and spectral features), and conventional techniques for urgent speech detection use also features obtained from entire speech (such as average speech rate). However, the features obtained from entire speech are too over-smoothed to explain the difference between urgent and nonurgent speech. We found that there was a difference between urgent and non-urgent speech in temporal variability related to speech rhythm. To handle the temporal variability of speech rhythm, the proposal extracts long-term temporal features. The long-term temporal features are envelope modulation spectrum and temporal statistics of Mel-frequency cepstrum coefficient with 1 sec scale. To use both features with different time scales, the proposed method integrates the long-term temporal features and the short-term features on neural networks. Our proposal yields better accuracy than the conventional methods (which uses e features obtained from entire speech); it achieves a 50.0% reduction in the error rate.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信