Integrating move analysis and sentence reconstruction in automated writing evaluation for L2 academic writers

IF 5.5 1区文学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Assessing Writing Pub Date : 2025-10-01 DOI:10.1016/j.asw.2025.100984

Bo-Ren Mau , Hui-Hsien Feng

{"title":"Integrating move analysis and sentence reconstruction in automated writing evaluation for L2 academic writers","authors":"Bo-Ren Mau , Hui-Hsien Feng","doi":"10.1016/j.asw.2025.100984","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial intelligence has been widely utilized to assist L2 writers through automated writing evaluation (AWE) systems, which offer grammatical feedback. However, for English academic writing, such feedback is apparently insufficient to address the complexities of academic discourse. While genre-based AWE systems employ move analysis, they offer move detections as corrective feedback (CF) without addressing language use issues and are developed using limited datasets. Additionally, general-purpose large language models (LLMs; e.g., ChatGPT) may lack specialized mechanisms for accurately identifying rhetorical moves and providing genre-specific feedback in academic writing contexts. To address these limitations, this study proposes GURUS, a genre-based AWE system grounded in second language acquisition theories. It provides indirect CF by classifying moves with probabilistic scores, and direct CF through sentence reconstruction. GURUS is implemented as a web-based application using ensemble learning model and transformer-based LLMs. By offering indirect and direct CF, GURUS promotes learner-machine interaction, prompting learners to notice discrepancies between their writing and the reconstructed sentences. GURUS was trained on over one million sentences with OMRC moves. Its classification performance was assessed using <em>F1</em>-score and Brier score; furthermore, semantic and rhetorical production were evaluated using BERTscore and human assessment. The results show that GURUS sufficiently classifies sentence moves and reconstructs sentences while retaining semantic integrity. Given GURUS holds promise in academic writing instruction, this study also discusses its implementation to bolster learners’ genre awareness and proficiency in move-based abstract writing.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100984"},"PeriodicalIF":5.5000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessing Writing","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075293525000716","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial intelligence has been widely utilized to assist L2 writers through automated writing evaluation (AWE) systems, which offer grammatical feedback. However, for English academic writing, such feedback is apparently insufficient to address the complexities of academic discourse. While genre-based AWE systems employ move analysis, they offer move detections as corrective feedback (CF) without addressing language use issues and are developed using limited datasets. Additionally, general-purpose large language models (LLMs; e.g., ChatGPT) may lack specialized mechanisms for accurately identifying rhetorical moves and providing genre-specific feedback in academic writing contexts. To address these limitations, this study proposes GURUS, a genre-based AWE system grounded in second language acquisition theories. It provides indirect CF by classifying moves with probabilistic scores, and direct CF through sentence reconstruction. GURUS is implemented as a web-based application using ensemble learning model and transformer-based LLMs. By offering indirect and direct CF, GURUS promotes learner-machine interaction, prompting learners to notice discrepancies between their writing and the reconstructed sentences. GURUS was trained on over one million sentences with OMRC moves. Its classification performance was assessed using F1-score and Brier score; furthermore, semantic and rhetorical production were evaluated using BERTscore and human assessment. The results show that GURUS sufficiently classifies sentence moves and reconstructs sentences while retaining semantic integrity. Given GURUS holds promise in academic writing instruction, this study also discusses its implementation to bolster learners’ genre awareness and proficiency in move-based abstract writing.

查看原文本刊更多论文

在二语学术写作自动评价中整合移动分析和句子重构

人工智能通过自动写作评估（AWE）系统被广泛用于帮助二语作者，该系统提供语法反馈。然而，对于英语学术写作来说，这种反馈显然不足以解决学术话语的复杂性。虽然基于类型的AWE系统使用移动分析，但它们提供的移动检测作为纠正反馈（CF），而不解决语言使用问题，并且使用有限的数据集开发。此外，通用大型语言模型（llm，如ChatGPT）可能缺乏专门的机制来准确识别修辞动作，并在学术写作环境中提供特定类型的反馈。为了解决这些限制，本研究提出了GURUS，这是一个基于第二语言习得理论的基于体裁的AWE系统。它通过概率得分对动作进行分类来提供间接推理，通过句子重构来提供直接推理。GURUS是使用集成学习模型和基于转换器的llm实现的基于web的应用程序。通过提供间接和直接的CF， GURUS促进了学习者与机器的互动，促使学习者注意到他们的写作和重构句子之间的差异。大师们用OMRC动作训练了超过一百万个句子。采用f1评分和Brier评分评价其分类性能；此外，使用BERTscore和人工评估来评估语义和修辞生成。结果表明，GURUS在保持语义完整性的前提下，对句子移动进行了充分的分类和重构。鉴于GURUS在学术写作教学中具有前景，本研究还讨论了它的实施，以提高学习者的体裁意识和熟练程度基于动作的抽象写作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Assessing Writing Multiple-

CiteScore

6.00

自引率

17.90%

发文量

期刊介绍： Assessing Writing is a refereed international journal providing a forum for ideas, research and practice on the assessment of written language. Assessing Writing publishes articles, book reviews, conference reports, and academic exchanges concerning writing assessments of all kinds, including traditional (direct and standardised forms of) testing of writing, alternative performance assessments (such as portfolios), workplace sampling and classroom assessment. The journal focuses on all stages of the writing assessment process, including needs evaluation, assessment creation, implementation, and validation, and test development.