{"title":"基于鲁棒管道的深度学习语音归因检测方法","authors":"Shreya Chakravarty, R. Khandelwal","doi":"10.1109/I2CT57861.2023.10126219","DOIUrl":null,"url":null,"abstract":"The \"thinking machines\" today, breathe hand-in-hand with the blessing of expunging human effort, as well as the disadvantage of being misused easily. There are enormous applications of automation, one of the most popular being speech recognition. Automated systems can now be controlled by voice commands, and also can provide human-like responses, whether it is appearance or communication media like speech. There won’t always be times when the source of audio would be in ideal surroundings. This aggravates the possibility of human-system interaction involving audio aberrations and hence, raises a great apprehension regarding forensic issues like authenticity and the source of the given audio, which calls for a challenge to resolve. This paper seeks to illustrate thorough augmentation of audio data for a robust solution that eradicates the anomalies in audio using a pipeline approach. We propose analysing the spectrogram representation of an audio signal to determine a mask that segregates noise from pure signal, and results in a signal that can be processed for speech recognition, further extending to fabrication of a deep neural network having an accuracy of 95.87%.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Robust Pipeline based Deep Learning Approach to Detect Speech Attribution\",\"authors\":\"Shreya Chakravarty, R. Khandelwal\",\"doi\":\"10.1109/I2CT57861.2023.10126219\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The \\\"thinking machines\\\" today, breathe hand-in-hand with the blessing of expunging human effort, as well as the disadvantage of being misused easily. There are enormous applications of automation, one of the most popular being speech recognition. Automated systems can now be controlled by voice commands, and also can provide human-like responses, whether it is appearance or communication media like speech. There won’t always be times when the source of audio would be in ideal surroundings. This aggravates the possibility of human-system interaction involving audio aberrations and hence, raises a great apprehension regarding forensic issues like authenticity and the source of the given audio, which calls for a challenge to resolve. This paper seeks to illustrate thorough augmentation of audio data for a robust solution that eradicates the anomalies in audio using a pipeline approach. We propose analysing the spectrogram representation of an audio signal to determine a mask that segregates noise from pure signal, and results in a signal that can be processed for speech recognition, further extending to fabrication of a deep neural network having an accuracy of 95.87%.\",\"PeriodicalId\":150346,\"journal\":{\"name\":\"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/I2CT57861.2023.10126219\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Robust Pipeline based Deep Learning Approach to Detect Speech Attribution
The "thinking machines" today, breathe hand-in-hand with the blessing of expunging human effort, as well as the disadvantage of being misused easily. There are enormous applications of automation, one of the most popular being speech recognition. Automated systems can now be controlled by voice commands, and also can provide human-like responses, whether it is appearance or communication media like speech. There won’t always be times when the source of audio would be in ideal surroundings. This aggravates the possibility of human-system interaction involving audio aberrations and hence, raises a great apprehension regarding forensic issues like authenticity and the source of the given audio, which calls for a challenge to resolve. This paper seeks to illustrate thorough augmentation of audio data for a robust solution that eradicates the anomalies in audio using a pipeline approach. We propose analysing the spectrogram representation of an audio signal to determine a mask that segregates noise from pure signal, and results in a signal that can be processed for speech recognition, further extending to fabrication of a deep neural network having an accuracy of 95.87%.