Evaluating target utterance identification method using practical free conversation

Naoto Kosaka, Yumi Wakita
{"title":"Evaluating target utterance identification method using practical free conversation","authors":"Naoto Kosaka, Yumi Wakita","doi":"10.1109/IICAIET49801.2020.9257852","DOIUrl":null,"url":null,"abstract":"We develop a conversation support system for the public community. Our concept is that supporting elderly person's active life by assisting human-to-human conversation is more effective than providing a speech dialogue system. To use a conversation support system in an actual restaurant or lounge environment, it is necessary to separate the conversation of the target near the microphone from the ambient noise. We have already proposed the identification method of the utterances spoken between near a microphone and far from it using the standard deviation values of the fundamental frequency (SD-F0) and those of the speech power level (SD-SP) for each utterance. In the paper, we evaluate the effectiveness of our identification method for an actual free conversation using Support Vector Machine(SVM) method. As a result, the precision rate of the utterances near the microphone is 87.8%. This means that the identification method using the standard deviations of the fundamental frequency and speech power would be effective even if they are used in real environments. However, the performance depends on the utterances lengths, the F0 value's stability of the utterance part of over the threshold and the position of the microphones. In future, it evaluation should be done using more number of speakers and variable situations to define a suitable system specification.","PeriodicalId":300885,"journal":{"name":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET49801.2020.9257852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We develop a conversation support system for the public community. Our concept is that supporting elderly person's active life by assisting human-to-human conversation is more effective than providing a speech dialogue system. To use a conversation support system in an actual restaurant or lounge environment, it is necessary to separate the conversation of the target near the microphone from the ambient noise. We have already proposed the identification method of the utterances spoken between near a microphone and far from it using the standard deviation values of the fundamental frequency (SD-F0) and those of the speech power level (SD-SP) for each utterance. In the paper, we evaluate the effectiveness of our identification method for an actual free conversation using Support Vector Machine(SVM) method. As a result, the precision rate of the utterances near the microphone is 87.8%. This means that the identification method using the standard deviations of the fundamental frequency and speech power would be effective even if they are used in real environments. However, the performance depends on the utterances lengths, the F0 value's stability of the utterance part of over the threshold and the position of the microphones. In future, it evaluation should be done using more number of speakers and variable situations to define a suitable system specification.
用实际自由会话评价目标话语识别方法
我们为公共社区开发了一个对话支持系统。我们的理念是,通过协助人与人之间的对话来支持老年人的积极生活,比提供语音对话系统更有效。要在实际的餐厅或休息室环境中使用对话支持系统,有必要将麦克风附近目标的对话与环境噪声分开。我们已经提出了利用每个话语的基频(SD-F0)和语音功率电平(SD-SP)的标准差值对近麦克风和远麦克风之间的话语进行识别的方法。在本文中,我们使用支持向量机(SVM)方法评估了我们的识别方法对实际自由对话的有效性。结果表明,在麦克风附近的话语的准确率为87.8%。这意味着,即使在真实环境中使用,利用基频和语音功率的标准差进行识别的方法也是有效的。然而,性能取决于话语长度、超过阈值的话语部分F0值的稳定性以及麦克风的位置。将来,它的评估应该使用更多的扬声器和可变的情况来定义一个合适的系统规范。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信