Difference in speech analysis results by compression

Y. Omiya, Naoki Hagiwara, Takeshi Takano, Shuji Shinohara, M. Nakamura, Masakazu Higuchi, S. Mitsuyoshi, S. Tokuno
{"title":"Difference in speech analysis results by compression","authors":"Y. Omiya, Naoki Hagiwara, Takeshi Takano, Shuji Shinohara, M. Nakamura, Masakazu Higuchi, S. Mitsuyoshi, S. Tokuno","doi":"10.1109/ICIIBMS.2017.8279713","DOIUrl":null,"url":null,"abstract":"Mental health disorder has become a problem in many developed countries and in order to cope with it, a screening technology that will help to check depression and stress is being sought. The authors conducted research into estimating health status from the voice in a previous study, and have developed the MIMOSYS (Mind Monitoring System). The recorded voice might compress for efficiently transmitting or store the voice, so it is possible for sound quality deterioration caused by the coding of the voice to impact the results of the MIMOSYS analysis. The degradation of sound quality due to audio compression is performed by general signal quality evaluation, e.g. Peak Signal-to-Noise Ratio (PSNR) and mean opinion score (MOS). However, it is necessary to individually evaluate the impact on the health indicator based on the voice features. The purpose of this study is to verify the impact of voice sound quality degradation by compression on health state evaluation using voice. In the experiment, we used recorded voice of the 979 subjects of reading 17 fixed phrases, and AAC/MP3/WMA coding was applied assuming compression when recording and archiving. Here, the average PSNR square wave between original wave format file and compressed files with an AAC, MP3, and WMA coding were 29.58dB, 55.96dB, and 29.58dB. The audio before and after compressing was analyzed to compare the degree of health by correlation evaluation. The results show that there is a strong correlation between before and after compression, suggesting the possibility of using compressed audio for health state evaluation using voice.","PeriodicalId":122969,"journal":{"name":"2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIIBMS.2017.8279713","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Mental health disorder has become a problem in many developed countries and in order to cope with it, a screening technology that will help to check depression and stress is being sought. The authors conducted research into estimating health status from the voice in a previous study, and have developed the MIMOSYS (Mind Monitoring System). The recorded voice might compress for efficiently transmitting or store the voice, so it is possible for sound quality deterioration caused by the coding of the voice to impact the results of the MIMOSYS analysis. The degradation of sound quality due to audio compression is performed by general signal quality evaluation, e.g. Peak Signal-to-Noise Ratio (PSNR) and mean opinion score (MOS). However, it is necessary to individually evaluate the impact on the health indicator based on the voice features. The purpose of this study is to verify the impact of voice sound quality degradation by compression on health state evaluation using voice. In the experiment, we used recorded voice of the 979 subjects of reading 17 fixed phrases, and AAC/MP3/WMA coding was applied assuming compression when recording and archiving. Here, the average PSNR square wave between original wave format file and compressed files with an AAC, MP3, and WMA coding were 29.58dB, 55.96dB, and 29.58dB. The audio before and after compressing was analyzed to compare the degree of health by correlation evaluation. The results show that there is a strong correlation between before and after compression, suggesting the possibility of using compressed audio for health state evaluation using voice.
压缩语音分析结果的差异
精神健康障碍已成为许多发达国家的一个问题,为了应对这一问题,人们正在寻求一种有助于检查抑郁和压力的筛查技术。作者在之前的一项研究中进行了从声音中估计健康状况的研究,并开发了MIMOSYS(精神监测系统)。为了有效地传输或存储语音,录制的语音可能会被压缩,因此语音编码导致的音质下降可能会影响MIMOSYS分析的结果。由于音频压缩导致的音质退化是通过一般信号质量评估来实现的,例如峰值信噪比(PSNR)和平均意见评分(MOS)。但是,有必要根据语音特征单独评估对健康指标的影响。本研究的目的是验证压缩后的语音音质退化对语音健康状态评价的影响。在实验中,我们使用979名被试朗读17个固定短语的录音,录音和存档时采用AAC/MP3/WMA编码,假设压缩。其中,原始波形格式文件与AAC、MP3和WMA编码压缩文件的平均PSNR方波分别为29.58dB、55.96dB和29.58dB。对压缩前后的音频进行相关性评价,比较压缩前后的音频健康程度。结果表明,压缩前后存在较强的相关性,表明使用压缩音频进行语音健康状态评价的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信