Evaluating an AI speaking assessment tool: Score accuracy, perceived validity, and oral peer feedback as feedback enhancement

IF 3.1 1区文学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Journal of English for Academic Purposes Pub Date : 2025-04-07 DOI:10.1016/j.jeap.2025.101505

Xu Jared Liu , Jingwen Wang , Bin Zou

{"title":"Evaluating an AI speaking assessment tool: Score accuracy, perceived validity, and oral peer feedback as feedback enhancement","authors":"Xu Jared Liu , Jingwen Wang , Bin Zou","doi":"10.1016/j.jeap.2025.101505","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial Intelligence (AI) has significantly transformed language learning approaches and outcomes. However, research on AI-assisted English for Academic Purposes (EAP) speaking classrooms remains sparse. This study evaluates \"EAP Talk\", an AI-assisted speaking assessment tool, examining its effectiveness in two contexts: controlled tasks (Reading Aloud) that elicit non-spontaneous speech, and uncontrolled tasks (Presentation) that generate spontaneous speech. The research assessed accuracy and validity of EAP Talk scores through analysing 20 Reading Aloud and 20 Presentation recordings randomly selected from a pool of 64 undergraduate students. These recordings were graded by five experienced EAP teachers using Adaptive Comparative Judgment (ACJ) – a comparative scoring method – and the traditional rubric rating approach. Acknowledging the limitation of EAP Talk in providing scores without detailed feedback, the study further investigated its perceived validity and examined oral peer feedback as a complementary enhancement strategy. Semi-structured interviews with four students were conducted to investigate their perceptions of the AI-assisted assessment process, focusing on the benefits of EAP Talk in enhancing learning, its limitations, and the effectiveness of oral peer feedback. Scoring concordance analysis shows that EAP Talk performs well in the controlled task but less so in the uncontrolled one. Content analysis on the interview data reveals that EAP Talk facilitates student confidence and positively shapes learning styles, while oral peer feedback markedly improves speaking skills through effective human-computer collaboration. The study calls for more precise AI assessments in uncontrolled tasks and proposes pedagogical strategies to better integrate AI into EAP speaking contexts.</div></div>","PeriodicalId":47717,"journal":{"name":"Journal of English for Academic Purposes","volume":"75 ","pages":"Article 101505"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of English for Academic Purposes","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1475158525000360","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial Intelligence (AI) has significantly transformed language learning approaches and outcomes. However, research on AI-assisted English for Academic Purposes (EAP) speaking classrooms remains sparse. This study evaluates "EAP Talk", an AI-assisted speaking assessment tool, examining its effectiveness in two contexts: controlled tasks (Reading Aloud) that elicit non-spontaneous speech, and uncontrolled tasks (Presentation) that generate spontaneous speech. The research assessed accuracy and validity of EAP Talk scores through analysing 20 Reading Aloud and 20 Presentation recordings randomly selected from a pool of 64 undergraduate students. These recordings were graded by five experienced EAP teachers using Adaptive Comparative Judgment (ACJ) – a comparative scoring method – and the traditional rubric rating approach. Acknowledging the limitation of EAP Talk in providing scores without detailed feedback, the study further investigated its perceived validity and examined oral peer feedback as a complementary enhancement strategy. Semi-structured interviews with four students were conducted to investigate their perceptions of the AI-assisted assessment process, focusing on the benefits of EAP Talk in enhancing learning, its limitations, and the effectiveness of oral peer feedback. Scoring concordance analysis shows that EAP Talk performs well in the controlled task but less so in the uncontrolled one. Content analysis on the interview data reveals that EAP Talk facilitates student confidence and positively shapes learning styles, while oral peer feedback markedly improves speaking skills through effective human-computer collaboration. The study calls for more precise AI assessments in uncontrolled tasks and proposes pedagogical strategies to better integrate AI into EAP speaking contexts.

查看原文本刊更多论文

评估人工智能口语评估工具：评分准确性、感知有效性和作为反馈强化的口头同伴反馈

人工智能（AI）极大地改变了语言学习的方法和结果。然而，关于人工智能辅助学术英语（EAP）口语课堂的研究仍然很少。本研究评估了“EAP Talk”，这是一种人工智能辅助的口语评估工具，研究了它在两种情况下的有效性：一种是引发非自发语言的受控任务（朗读），另一种是产生自发语言的非受控任务（演讲）。该研究通过分析从64名本科生中随机选择的20段朗读录音和20段演讲录音来评估EAP Talk分数的准确性和有效性。这些录音由五位经验丰富的EAP教师使用适应性比较判断(ACJ) -一种比较评分方法-和传统的标题评分方法进行评分。承认EAP Talk在提供没有详细反馈的分数方面的局限性，该研究进一步调查了其感知效度，并检验了口头同伴反馈作为一种补充增强策略。对四名学生进行了半结构化访谈，以调查他们对人工智能辅助评估过程的看法，重点关注EAP Talk在促进学习方面的好处，其局限性以及口头同伴反馈的有效性。得分一致性分析表明，EAP会话在控制任务中表现良好，而在非控制任务中表现较差。访谈数据的内容分析表明，EAP Talk促进了学生的自信，积极塑造了学习风格，而口头同伴反馈通过有效的人机协作显着提高了口语技能。该研究呼吁在不受控制的任务中进行更精确的人工智能评估，并提出了更好地将人工智能融入EAP口语环境的教学策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of English for Academic Purposes Multiple-

CiteScore

6.60

自引率

13.30%

发文量

审稿时长

57 days

期刊介绍： The Journal of English for Academic Purposes provides a forum for the dissemination of information and views which enables practitioners of and researchers in EAP to keep current with developments in their field and to contribute to its continued updating. JEAP publishes articles, book reviews, conference reports, and academic exchanges in the linguistic, sociolinguistic and psycholinguistic description of English as it occurs in the contexts of academic study and scholarly exchange itself.