{"title":"An Efficient Method to Recognize and Separate Patient’s Audio from Recorded Data","authors":"Arjita Choubey, M. Pandey, Ashwani Kumar Dubey","doi":"10.1109/AIST55798.2022.10065116","DOIUrl":null,"url":null,"abstract":"Separation of two voices along with silences and noise is one of the important parts of audio data pre-processing. This pre-processing increases the accuracy of any function. Removal of silence and unwanted voice is especially important in case of health care where doctors’ voice is not required. The proposed Patient’s Audio Recognition and Segmentation Model (PARSM) elaborates an end-to-end methodology for removing silence as well as voice of the virtual interviewer from DIAC-WOZ dataset. This model not only ensures creation of new audio file but also checks for eligibility of audio for being segmentable on the basis of close proximity of voices. In the dataset the volume levels of voice of interviewer and a patient is distinguishable. This fact is utilized in the model as it uses Short Time Energy as a feature. The binary classification is done using Support Vector Machine (SVM). After the calculation of STE, the signal is classified as either low energy or high energy signals. High energy signals, which depict voice of the patient, are then concatenated together to get desired output audio signal. Also, the weight factor can also be varied for each audio manually depending upon the requirement of strictness of segmentation for each audio.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"657 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIST55798.2022.10065116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Separation of two voices along with silences and noise is one of the important parts of audio data pre-processing. This pre-processing increases the accuracy of any function. Removal of silence and unwanted voice is especially important in case of health care where doctors’ voice is not required. The proposed Patient’s Audio Recognition and Segmentation Model (PARSM) elaborates an end-to-end methodology for removing silence as well as voice of the virtual interviewer from DIAC-WOZ dataset. This model not only ensures creation of new audio file but also checks for eligibility of audio for being segmentable on the basis of close proximity of voices. In the dataset the volume levels of voice of interviewer and a patient is distinguishable. This fact is utilized in the model as it uses Short Time Energy as a feature. The binary classification is done using Support Vector Machine (SVM). After the calculation of STE, the signal is classified as either low energy or high energy signals. High energy signals, which depict voice of the patient, are then concatenated together to get desired output audio signal. Also, the weight factor can also be varied for each audio manually depending upon the requirement of strictness of segmentation for each audio.