An Efficient Method to Recognize and Separate Patient’s Audio from Recorded Data

2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI:10.1109/AIST55798.2022.10065116

Arjita Choubey, M. Pandey, Ashwani Kumar Dubey

{"title":"An Efficient Method to Recognize and Separate Patient’s Audio from Recorded Data","authors":"Arjita Choubey, M. Pandey, Ashwani Kumar Dubey","doi":"10.1109/AIST55798.2022.10065116","DOIUrl":null,"url":null,"abstract":"Separation of two voices along with silences and noise is one of the important parts of audio data pre-processing. This pre-processing increases the accuracy of any function. Removal of silence and unwanted voice is especially important in case of health care where doctors’ voice is not required. The proposed Patient’s Audio Recognition and Segmentation Model (PARSM) elaborates an end-to-end methodology for removing silence as well as voice of the virtual interviewer from DIAC-WOZ dataset. This model not only ensures creation of new audio file but also checks for eligibility of audio for being segmentable on the basis of close proximity of voices. In the dataset the volume levels of voice of interviewer and a patient is distinguishable. This fact is utilized in the model as it uses Short Time Energy as a feature. The binary classification is done using Support Vector Machine (SVM). After the calculation of STE, the signal is classified as either low energy or high energy signals. High energy signals, which depict voice of the patient, are then concatenated together to get desired output audio signal. Also, the weight factor can also be varied for each audio manually depending upon the requirement of strictness of segmentation for each audio.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"657 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIST55798.2022.10065116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Separation of two voices along with silences and noise is one of the important parts of audio data pre-processing. This pre-processing increases the accuracy of any function. Removal of silence and unwanted voice is especially important in case of health care where doctors’ voice is not required. The proposed Patient’s Audio Recognition and Segmentation Model (PARSM) elaborates an end-to-end methodology for removing silence as well as voice of the virtual interviewer from DIAC-WOZ dataset. This model not only ensures creation of new audio file but also checks for eligibility of audio for being segmentable on the basis of close proximity of voices. In the dataset the volume levels of voice of interviewer and a patient is distinguishable. This fact is utilized in the model as it uses Short Time Energy as a feature. The binary classification is done using Support Vector Machine (SVM). After the calculation of STE, the signal is classified as either low energy or high energy signals. High energy signals, which depict voice of the patient, are then concatenated together to get desired output audio signal. Also, the weight factor can also be varied for each audio manually depending upon the requirement of strictness of segmentation for each audio.

查看原文本刊更多论文

一种有效的患者语音识别与分离方法

分离两种声音以及噪声和消声是音频数据预处理的重要组成部分之一。这种预处理增加了任何函数的准确性。在不需要医生声音的卫生保健领域，消除沉默和不必要的声音尤为重要。提出的患者音频识别和分割模型(PARSM)阐述了一种端到端的方法，用于从DIAC-WOZ数据集中去除虚拟采访者的沉默和声音。该模型不仅确保了新音频文件的创建，而且还根据声音的接近程度检查了音频的可分割性。在数据集中，采访者和患者的音量水平是可区分的。这一事实在模型中被利用，因为它使用短时间能量作为特征。使用支持向量机(SVM)进行二值分类。经过STE计算后，将信号分为低能信号和高能信号。然后将描述患者声音的高能信号串联在一起，得到所需的输出音频信号。此外，权重因子也可以根据每个音频的分割严格程度的要求手动更改。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)

自引率

0.00%

发文量