An Efficient Method to Recognize and Separate Patient’s Audio from Recorded Data

Arjita Choubey, M. Pandey, Ashwani Kumar Dubey
{"title":"An Efficient Method to Recognize and Separate Patient’s Audio from Recorded Data","authors":"Arjita Choubey, M. Pandey, Ashwani Kumar Dubey","doi":"10.1109/AIST55798.2022.10065116","DOIUrl":null,"url":null,"abstract":"Separation of two voices along with silences and noise is one of the important parts of audio data pre-processing. This pre-processing increases the accuracy of any function. Removal of silence and unwanted voice is especially important in case of health care where doctors’ voice is not required. The proposed Patient’s Audio Recognition and Segmentation Model (PARSM) elaborates an end-to-end methodology for removing silence as well as voice of the virtual interviewer from DIAC-WOZ dataset. This model not only ensures creation of new audio file but also checks for eligibility of audio for being segmentable on the basis of close proximity of voices. In the dataset the volume levels of voice of interviewer and a patient is distinguishable. This fact is utilized in the model as it uses Short Time Energy as a feature. The binary classification is done using Support Vector Machine (SVM). After the calculation of STE, the signal is classified as either low energy or high energy signals. High energy signals, which depict voice of the patient, are then concatenated together to get desired output audio signal. Also, the weight factor can also be varied for each audio manually depending upon the requirement of strictness of segmentation for each audio.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"657 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIST55798.2022.10065116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Separation of two voices along with silences and noise is one of the important parts of audio data pre-processing. This pre-processing increases the accuracy of any function. Removal of silence and unwanted voice is especially important in case of health care where doctors’ voice is not required. The proposed Patient’s Audio Recognition and Segmentation Model (PARSM) elaborates an end-to-end methodology for removing silence as well as voice of the virtual interviewer from DIAC-WOZ dataset. This model not only ensures creation of new audio file but also checks for eligibility of audio for being segmentable on the basis of close proximity of voices. In the dataset the volume levels of voice of interviewer and a patient is distinguishable. This fact is utilized in the model as it uses Short Time Energy as a feature. The binary classification is done using Support Vector Machine (SVM). After the calculation of STE, the signal is classified as either low energy or high energy signals. High energy signals, which depict voice of the patient, are then concatenated together to get desired output audio signal. Also, the weight factor can also be varied for each audio manually depending upon the requirement of strictness of segmentation for each audio.
一种有效的患者语音识别与分离方法
分离两种声音以及噪声和消声是音频数据预处理的重要组成部分之一。这种预处理增加了任何函数的准确性。在不需要医生声音的卫生保健领域,消除沉默和不必要的声音尤为重要。提出的患者音频识别和分割模型(PARSM)阐述了一种端到端的方法,用于从DIAC-WOZ数据集中去除虚拟采访者的沉默和声音。该模型不仅确保了新音频文件的创建,而且还根据声音的接近程度检查了音频的可分割性。在数据集中,采访者和患者的音量水平是可区分的。这一事实在模型中被利用,因为它使用短时间能量作为特征。使用支持向量机(SVM)进行二值分类。经过STE计算后,将信号分为低能信号和高能信号。然后将描述患者声音的高能信号串联在一起,得到所需的输出音频信号。此外,权重因子也可以根据每个音频的分割严格程度的要求手动更改。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信