{"title":"弥合语言鸿沟:NLP和语音识别在英语口语教学中的作用","authors":"Parul Dubey , Pushkar Dubey , Rohit Raja , Sapna Singh Kshatri","doi":"10.1016/j.mex.2025.103359","DOIUrl":null,"url":null,"abstract":"<div><div>The Natural Language Processing (NLP) and speech recognition have transformed language learning by providing interactive and real-time feedback, enhancing oral English proficiency. These technologies facilitate personalized and adaptive learning, making pronunciation and fluency improvement more efficient. Traditional methods lack real-time speech assessment and individualized feedback, limiting learners' progress. Existing speech recognition models struggle with diverse accents, variations in speaking styles, and computational efficiency, reducing their effectiveness in real-world applications. This study utilizes three datasets—including a custom dataset of 882 English teachers, the CMU ARCTIC corpus, and LibriSpeech Clean—to ensure generalizability and robustness. The methodology integrates Hidden Markov Models for speech recognition, NLP-based text analysis, and computer vision-based lip movement detection to create an adaptive multimodal learning system. The novelty of this study lies in its real-time Bayesian feedback mechanism and multimodal integration of audio, visual, and textual data, enabling dynamic and personalized oral instruction. Performance is evaluated using recognition accuracy, processing speed, and statistical significance testing. The continuous HMM model achieves up to 97.5 % accuracy and significantly outperforms existing models such as MLP-LSTM and GPT-3.5-turbo (<em>p</em> < 0.05) across all datasets. Developed a multimodal system that combines speech, text, and visual data to enhance real-time oral English learning.<ul><li><span>•</span><span><div>Collected and annotated a diverse dataset of English speech recordings from teachers across various accents and speaking styles.</div></span></li><li><span>•</span><span><div>Designed an adaptive feedback framework to provide learners with immediate, personalized insights into their pronunciation and fluency.</div></span></li></ul></div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"14 ","pages":"Article 103359"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging language gaps: The role of NLP and speech recognition in oral english instruction\",\"authors\":\"Parul Dubey , Pushkar Dubey , Rohit Raja , Sapna Singh Kshatri\",\"doi\":\"10.1016/j.mex.2025.103359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The Natural Language Processing (NLP) and speech recognition have transformed language learning by providing interactive and real-time feedback, enhancing oral English proficiency. These technologies facilitate personalized and adaptive learning, making pronunciation and fluency improvement more efficient. Traditional methods lack real-time speech assessment and individualized feedback, limiting learners' progress. Existing speech recognition models struggle with diverse accents, variations in speaking styles, and computational efficiency, reducing their effectiveness in real-world applications. This study utilizes three datasets—including a custom dataset of 882 English teachers, the CMU ARCTIC corpus, and LibriSpeech Clean—to ensure generalizability and robustness. The methodology integrates Hidden Markov Models for speech recognition, NLP-based text analysis, and computer vision-based lip movement detection to create an adaptive multimodal learning system. The novelty of this study lies in its real-time Bayesian feedback mechanism and multimodal integration of audio, visual, and textual data, enabling dynamic and personalized oral instruction. Performance is evaluated using recognition accuracy, processing speed, and statistical significance testing. The continuous HMM model achieves up to 97.5 % accuracy and significantly outperforms existing models such as MLP-LSTM and GPT-3.5-turbo (<em>p</em> < 0.05) across all datasets. Developed a multimodal system that combines speech, text, and visual data to enhance real-time oral English learning.<ul><li><span>•</span><span><div>Collected and annotated a diverse dataset of English speech recordings from teachers across various accents and speaking styles.</div></span></li><li><span>•</span><span><div>Designed an adaptive feedback framework to provide learners with immediate, personalized insights into their pronunciation and fluency.</div></span></li></ul></div></div>\",\"PeriodicalId\":18446,\"journal\":{\"name\":\"MethodsX\",\"volume\":\"14 \",\"pages\":\"Article 103359\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MethodsX\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215016125002055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125002055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Bridging language gaps: The role of NLP and speech recognition in oral english instruction
The Natural Language Processing (NLP) and speech recognition have transformed language learning by providing interactive and real-time feedback, enhancing oral English proficiency. These technologies facilitate personalized and adaptive learning, making pronunciation and fluency improvement more efficient. Traditional methods lack real-time speech assessment and individualized feedback, limiting learners' progress. Existing speech recognition models struggle with diverse accents, variations in speaking styles, and computational efficiency, reducing their effectiveness in real-world applications. This study utilizes three datasets—including a custom dataset of 882 English teachers, the CMU ARCTIC corpus, and LibriSpeech Clean—to ensure generalizability and robustness. The methodology integrates Hidden Markov Models for speech recognition, NLP-based text analysis, and computer vision-based lip movement detection to create an adaptive multimodal learning system. The novelty of this study lies in its real-time Bayesian feedback mechanism and multimodal integration of audio, visual, and textual data, enabling dynamic and personalized oral instruction. Performance is evaluated using recognition accuracy, processing speed, and statistical significance testing. The continuous HMM model achieves up to 97.5 % accuracy and significantly outperforms existing models such as MLP-LSTM and GPT-3.5-turbo (p < 0.05) across all datasets. Developed a multimodal system that combines speech, text, and visual data to enhance real-time oral English learning.
•
Collected and annotated a diverse dataset of English speech recordings from teachers across various accents and speaking styles.
•
Designed an adaptive feedback framework to provide learners with immediate, personalized insights into their pronunciation and fluency.