{"title":"综合运用双向LSTM和基于计算机的认知注意对言语口吃进行分类","authors":"Krishna Basak, Vineet Sharma, Sarangh Ramesh Kv, Nilamadhab Mishra","doi":"10.1109/ICAIS56108.2023.10073818","DOIUrl":null,"url":null,"abstract":"The capacity to talk smoothly is typically affected by stuttering, a neuro-developmental speech disorder where the flow of speech is disrupted by involuntary pauses and repetition of sounds. Stuttering can be cured by identifying the type of stutter and providing proper speech guidance. Many approaches have been taken to classify stuttered speech via a computer aided process including Deep Learning models. But most of the works rely heavily on a large number of audio features to be extracted manually. Also, many past works use the UCLASS dataset that is much older and lacks in quality. This paper proposes a Deep Learning model using Bidirectional LSTM and Attention to classify five types of stuttering events – Block, Prolongation, Word Repetition, Sound Repetition and Interjection, by utilizing only Mel-spectrogram audio feature. The model is trained and tested on the SEP-28k and latest annotations of the FluencyBank dataset to evaluate the performance and achieves an overall 75% accuracy.","PeriodicalId":164345,"journal":{"name":"2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Integrated Usage of Bidirectional LSTM and Computer-based Cognitive Attention to Categorize Speech Stutters\",\"authors\":\"Krishna Basak, Vineet Sharma, Sarangh Ramesh Kv, Nilamadhab Mishra\",\"doi\":\"10.1109/ICAIS56108.2023.10073818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The capacity to talk smoothly is typically affected by stuttering, a neuro-developmental speech disorder where the flow of speech is disrupted by involuntary pauses and repetition of sounds. Stuttering can be cured by identifying the type of stutter and providing proper speech guidance. Many approaches have been taken to classify stuttered speech via a computer aided process including Deep Learning models. But most of the works rely heavily on a large number of audio features to be extracted manually. Also, many past works use the UCLASS dataset that is much older and lacks in quality. This paper proposes a Deep Learning model using Bidirectional LSTM and Attention to classify five types of stuttering events – Block, Prolongation, Word Repetition, Sound Repetition and Interjection, by utilizing only Mel-spectrogram audio feature. The model is trained and tested on the SEP-28k and latest annotations of the FluencyBank dataset to evaluate the performance and achieves an overall 75% accuracy.\",\"PeriodicalId\":164345,\"journal\":{\"name\":\"2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIS56108.2023.10073818\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIS56108.2023.10073818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Integrated Usage of Bidirectional LSTM and Computer-based Cognitive Attention to Categorize Speech Stutters
The capacity to talk smoothly is typically affected by stuttering, a neuro-developmental speech disorder where the flow of speech is disrupted by involuntary pauses and repetition of sounds. Stuttering can be cured by identifying the type of stutter and providing proper speech guidance. Many approaches have been taken to classify stuttered speech via a computer aided process including Deep Learning models. But most of the works rely heavily on a large number of audio features to be extracted manually. Also, many past works use the UCLASS dataset that is much older and lacks in quality. This paper proposes a Deep Learning model using Bidirectional LSTM and Attention to classify five types of stuttering events – Block, Prolongation, Word Repetition, Sound Repetition and Interjection, by utilizing only Mel-spectrogram audio feature. The model is trained and tested on the SEP-28k and latest annotations of the FluencyBank dataset to evaluate the performance and achieves an overall 75% accuracy.