Xinhao Wang, Keelan Evanini, Yao Qian, Matthew David Mulholland
{"title":"利用变形金刚对少儿英语学习者自发言语进行自动评分","authors":"Xinhao Wang, Keelan Evanini, Yao Qian, Matthew David Mulholland","doi":"10.1109/SLT48900.2021.9383553","DOIUrl":null,"url":null,"abstract":"This study explores the use of Transformer-based models for the automated assessment of children’s non-native spontaneous speech. Traditional approaches for this task have relied heavily on delivery features (e.g., fluency), whereas the goal of the current study is to build automated scoring models based solely on transcriptions in order to see how well they capture additional aspects of speaking proficiency (e.g., content appropriateness, vocabulary, and grammar) despite the high word error rate (WER) of automatic speech recognition (ASR) on children’s non-native spontaneous speech. Transformer-based models are built using both manual transcriptions and ASR hypotheses, and versions of the models that incorporated the prompt text were investigated in order to more directly measure content appropriateness. Two baseline systems were used for comparison, including an attention-based Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) and a Support Vector Regressor (SVR) with manually engineered content-related features. Experimental results demonstrate the effectiveness of the Transformer-based models: the automated prompt-aware model using ASR hypotheses achieves a Pearson correlation coefficient (r) with holistic proficiency scores provided by human experts of 0.835, outperforming both the attention-based RNN-LSTM baseline (r = 0.791) and the SVR baseline (r = 0.767).","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Automated Scoring of Spontaneous Speech from Young Learners of English Using Transformers\",\"authors\":\"Xinhao Wang, Keelan Evanini, Yao Qian, Matthew David Mulholland\",\"doi\":\"10.1109/SLT48900.2021.9383553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study explores the use of Transformer-based models for the automated assessment of children’s non-native spontaneous speech. Traditional approaches for this task have relied heavily on delivery features (e.g., fluency), whereas the goal of the current study is to build automated scoring models based solely on transcriptions in order to see how well they capture additional aspects of speaking proficiency (e.g., content appropriateness, vocabulary, and grammar) despite the high word error rate (WER) of automatic speech recognition (ASR) on children’s non-native spontaneous speech. Transformer-based models are built using both manual transcriptions and ASR hypotheses, and versions of the models that incorporated the prompt text were investigated in order to more directly measure content appropriateness. Two baseline systems were used for comparison, including an attention-based Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) and a Support Vector Regressor (SVR) with manually engineered content-related features. Experimental results demonstrate the effectiveness of the Transformer-based models: the automated prompt-aware model using ASR hypotheses achieves a Pearson correlation coefficient (r) with holistic proficiency scores provided by human experts of 0.835, outperforming both the attention-based RNN-LSTM baseline (r = 0.791) and the SVR baseline (r = 0.767).\",\"PeriodicalId\":243211,\"journal\":{\"name\":\"2021 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT48900.2021.9383553\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT48900.2021.9383553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automated Scoring of Spontaneous Speech from Young Learners of English Using Transformers
This study explores the use of Transformer-based models for the automated assessment of children’s non-native spontaneous speech. Traditional approaches for this task have relied heavily on delivery features (e.g., fluency), whereas the goal of the current study is to build automated scoring models based solely on transcriptions in order to see how well they capture additional aspects of speaking proficiency (e.g., content appropriateness, vocabulary, and grammar) despite the high word error rate (WER) of automatic speech recognition (ASR) on children’s non-native spontaneous speech. Transformer-based models are built using both manual transcriptions and ASR hypotheses, and versions of the models that incorporated the prompt text were investigated in order to more directly measure content appropriateness. Two baseline systems were used for comparison, including an attention-based Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) and a Support Vector Regressor (SVR) with manually engineered content-related features. Experimental results demonstrate the effectiveness of the Transformer-based models: the automated prompt-aware model using ASR hypotheses achieves a Pearson correlation coefficient (r) with holistic proficiency scores provided by human experts of 0.835, outperforming both the attention-based RNN-LSTM baseline (r = 0.791) and the SVR baseline (r = 0.767).