弥合语言鸿沟：NLP和语音识别在英语口语教学中的作用

IF 1.6 Q2 MULTIDISCIPLINARY SCIENCES

MethodsX Pub Date : 2025-05-07 DOI:10.1016/j.mex.2025.103359

Parul Dubey , Pushkar Dubey , Rohit Raja , Sapna Singh Kshatri

{"title":"弥合语言鸿沟：NLP和语音识别在英语口语教学中的作用","authors":"Parul Dubey , Pushkar Dubey , Rohit Raja , Sapna Singh Kshatri","doi":"10.1016/j.mex.2025.103359","DOIUrl":null,"url":null,"abstract":"<div><div>The Natural Language Processing (NLP) and speech recognition have transformed language learning by providing interactive and real-time feedback, enhancing oral English proficiency. These technologies facilitate personalized and adaptive learning, making pronunciation and fluency improvement more efficient. Traditional methods lack real-time speech assessment and individualized feedback, limiting learners' progress. Existing speech recognition models struggle with diverse accents, variations in speaking styles, and computational efficiency, reducing their effectiveness in real-world applications. This study utilizes three datasets—including a custom dataset of 882 English teachers, the CMU ARCTIC corpus, and LibriSpeech Clean—to ensure generalizability and robustness. The methodology integrates Hidden Markov Models for speech recognition, NLP-based text analysis, and computer vision-based lip movement detection to create an adaptive multimodal learning system. The novelty of this study lies in its real-time Bayesian feedback mechanism and multimodal integration of audio, visual, and textual data, enabling dynamic and personalized oral instruction. Performance is evaluated using recognition accuracy, processing speed, and statistical significance testing. The continuous HMM model achieves up to 97.5 % accuracy and significantly outperforms existing models such as MLP-LSTM and GPT-3.5-turbo (<em>p</em> < 0.05) across all datasets. Developed a multimodal system that combines speech, text, and visual data to enhance real-time oral English learning.<ul><li><span>•</span><span><div>Collected and annotated a diverse dataset of English speech recordings from teachers across various accents and speaking styles.</div></span></li><li><span>•</span><span><div>Designed an adaptive feedback framework to provide learners with immediate, personalized insights into their pronunciation and fluency.</div></span></li></ul></div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"14 ","pages":"Article 103359"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging language gaps: The role of NLP and speech recognition in oral english instruction\",\"authors\":\"Parul Dubey , Pushkar Dubey , Rohit Raja , Sapna Singh Kshatri\",\"doi\":\"10.1016/j.mex.2025.103359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The Natural Language Processing (NLP) and speech recognition have transformed language learning by providing interactive and real-time feedback, enhancing oral English proficiency. These technologies facilitate personalized and adaptive learning, making pronunciation and fluency improvement more efficient. Traditional methods lack real-time speech assessment and individualized feedback, limiting learners' progress. Existing speech recognition models struggle with diverse accents, variations in speaking styles, and computational efficiency, reducing their effectiveness in real-world applications. This study utilizes three datasets—including a custom dataset of 882 English teachers, the CMU ARCTIC corpus, and LibriSpeech Clean—to ensure generalizability and robustness. The methodology integrates Hidden Markov Models for speech recognition, NLP-based text analysis, and computer vision-based lip movement detection to create an adaptive multimodal learning system. The novelty of this study lies in its real-time Bayesian feedback mechanism and multimodal integration of audio, visual, and textual data, enabling dynamic and personalized oral instruction. Performance is evaluated using recognition accuracy, processing speed, and statistical significance testing. The continuous HMM model achieves up to 97.5 % accuracy and significantly outperforms existing models such as MLP-LSTM and GPT-3.5-turbo (<em>p</em> < 0.05) across all datasets. Developed a multimodal system that combines speech, text, and visual data to enhance real-time oral English learning.<ul><li><span>•</span><span><div>Collected and annotated a diverse dataset of English speech recordings from teachers across various accents and speaking styles.</div></span></li><li><span>•</span><span><div>Designed an adaptive feedback framework to provide learners with immediate, personalized insights into their pronunciation and fluency.</div></span></li></ul></div></div>\",\"PeriodicalId\":18446,\"journal\":{\"name\":\"MethodsX\",\"volume\":\"14 \",\"pages\":\"Article 103359\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MethodsX\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215016125002055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125002055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

自然语言处理（NLP）和语音识别通过提供互动和实时反馈，提高英语口语水平，改变了语言学习。这些技术促进了个性化和适应性学习，使发音和流利程度的提高更加有效。传统方法缺乏实时语音评估和个性化反馈，限制了学习者的进步。现有的语音识别模型与不同的口音、不同的说话风格和计算效率作斗争，降低了它们在实际应用中的有效性。本研究使用了三个数据集，包括882名英语教师的自定义数据集、CMU北极语料库和librisspeech clean，以确保可泛化性和鲁棒性。该方法集成了用于语音识别的隐马尔可夫模型、基于nlp的文本分析和基于计算机视觉的唇动检测，以创建自适应多模态学习系统。本研究的新颖之处在于其实时贝叶斯反馈机制和音频、视觉和文本数据的多模态集成，从而实现动态和个性化的口语教学。使用识别准确性、处理速度和统计显著性测试来评估性能。连续HMM模型的准确率高达97.5%，显著优于现有的MLP-LSTM和GPT-3.5-turbo (p <；0.05)。开发了一个多模式系统，结合语音，文本和视觉数据，以提高实时英语口语学习。•收集和注释了来自不同口音和说话风格的教师的英语演讲录音的不同数据集。•设计一个自适应反馈框架，为学习者提供即时、个性化的发音和流利度洞察。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Bridging language gaps: The role of NLP and speech recognition in oral english instruction

查看原文本刊更多论文

Bridging language gaps: The role of NLP and speech recognition in oral english instruction

The Natural Language Processing (NLP) and speech recognition have transformed language learning by providing interactive and real-time feedback, enhancing oral English proficiency. These technologies facilitate personalized and adaptive learning, making pronunciation and fluency improvement more efficient. Traditional methods lack real-time speech assessment and individualized feedback, limiting learners' progress. Existing speech recognition models struggle with diverse accents, variations in speaking styles, and computational efficiency, reducing their effectiveness in real-world applications. This study utilizes three datasets—including a custom dataset of 882 English teachers, the CMU ARCTIC corpus, and LibriSpeech Clean—to ensure generalizability and robustness. The methodology integrates Hidden Markov Models for speech recognition, NLP-based text analysis, and computer vision-based lip movement detection to create an adaptive multimodal learning system. The novelty of this study lies in its real-time Bayesian feedback mechanism and multimodal integration of audio, visual, and textual data, enabling dynamic and personalized oral instruction. Performance is evaluated using recognition accuracy, processing speed, and statistical significance testing. The continuous HMM model achieves up to 97.5 % accuracy and significantly outperforms existing models such as MLP-LSTM and GPT-3.5-turbo (p < 0.05) across all datasets. Developed a multimodal system that combines speech, text, and visual data to enhance real-time oral English learning.

•
Collected and annotated a diverse dataset of English speech recordings from teachers across various accents and speaking styles.
•
Designed an adaptive feedback framework to provide learners with immediate, personalized insights into their pronunciation and fluency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊