Assessing WritingPub Date : 2025-07-12DOI: 10.1016/j.asw.2025.100973
Zhiyun Huang , Guangyao Chen , Zhanhao Jiang
{"title":"Assessing L2 writing formality using syntactic complexity indices: A fuzzy evaluation approach","authors":"Zhiyun Huang , Guangyao Chen , Zhanhao Jiang","doi":"10.1016/j.asw.2025.100973","DOIUrl":"10.1016/j.asw.2025.100973","url":null,"abstract":"<div><div>Addressing the ambiguity in formality standards, this study introduces a cutting-edge Multi-dimensional Connection Cloud Model (MCCM) that leverages syntactic complexity indices to develop a fuzzy assessment model for formality in L2 writing. Employing Elastic Net Regression (ENR), the results revealed that four large-grained indices (mean length of sentence, mean length of T-unit, complex nominals per T-unit and complex nominals per clause), and one fine-grained index (average number of dependents per direct object) were significant in predicting the level of formality in L2 writing. To evaluate the model’s predictive power, 45 essays were used as a validation set. The MCCM model achieved a prediction accuracy of 91.1 % (41 out of 45 cases) in matching human ratings, with connection degrees effectively capturing classification uncertainty and boundary transitions. This pioneering framework effectively navigates the complexities and variable distributions of indicators, offering a more objective solution compared to conventional expert evaluations and introducing a novel methodological approach to assessing formality in academic writing.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100973"},"PeriodicalIF":4.2,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144604347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing WritingPub Date : 2025-06-26DOI: 10.1016/j.asw.2025.100957
Ruth Trüb , Jens Möller , Julian Lohmann , Thorben Jansen , Stefan D. Keller
{"title":"Judgment accuracy in primary school EFL writing assessment: Do text characteristics matter?","authors":"Ruth Trüb , Jens Möller , Julian Lohmann , Thorben Jansen , Stefan D. Keller","doi":"10.1016/j.asw.2025.100957","DOIUrl":"10.1016/j.asw.2025.100957","url":null,"abstract":"<div><div>Assessing the writing competence of pupils learning English as a foreign language (EFL) at primary school is challenging. This study aimed at examining a largely unexplored topic, namely the role of text characteristics in writing assessment, and analysed judgment accuracy differentiated by nine aspects of text quality (communicative effect, level of detail, coherence, cohesion, complexity of syntax and grammar, correctness of syntax and grammar, vocabulary, orthography and punctuation). Two hundred pre-service teachers assessed four randomly assigned texts from learners in grade six. Their assessment was compared to the existing ratings of two experts from a previous study. We found a relative judgment accuracy between <em>r</em> = .34 and .60 for the nine assessment criteria, with vocabulary being assessed significantly more accurately than almost all other criteria. Orthography, complexity and correctness of syntax and grammar and punctuation were rated with significantly more accuracy than cohesion, level of detail, communicative effect and coherence. The pre-service teachers assessed most criteria more strictly and with higher variability than the experts. The results suggest that teacher education should offer pre-service teachers concrete opportunities to practise writing assessment, implement activities to strengthen the assessment of content- and structure-related criteria, and help them adjust their assessment rigour.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100957"},"PeriodicalIF":4.2,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144490215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing WritingPub Date : 2025-06-25DOI: 10.1016/j.asw.2025.100962
Miseong Kim, Phil Hiver
{"title":"The effect of metacognitive instruction with indirect written corrective feedback on secondary students’ engagement and functional adequacy in L2 writing","authors":"Miseong Kim, Phil Hiver","doi":"10.1016/j.asw.2025.100962","DOIUrl":"10.1016/j.asw.2025.100962","url":null,"abstract":"<div><div>This study explored how metacognitive instruction (MI) combined with indirect written corrective feedback (WCF) influences students’ engagement with WCF and their functional adequacy (FA) in L2 writing. Fifty-four intermediate-level Korean secondary school students participated, divided into a treatment group (WCF + MI) and a comparison group (WCF only). Over 13 weeks, students completed five argumentative writing tasks, receiving WCF after each task. They also completed a self-report survey on their engagement with WCF. Results from the pretest, immediate posttest, and delayed posttest revealed that students in the treatment group showed increased behavioral engagement over time, although this pattern was inconsistent across all engagement dimensions. Overall, FA scores improved significantly across time points, but no significant differences were observed between groups. Furthermore, engagement with WCF did not significantly predict FA performance in either group at either posttest. These findings suggest that pairing MI with WCF may encourage behavioral engagement, but its impact on writing quality remains inconclusive. While preliminary, the results highlight the potential of MI as a tool in the feedback process and suggest the need for further research using broader engagement measures and longer instructional periods to better understand how MI and WCF can jointly support L2 writing development.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100962"},"PeriodicalIF":4.2,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144471913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing WritingPub Date : 2025-06-24DOI: 10.1016/j.asw.2025.100961
Yingying Liu , Xiaofei Lu , Huilei Qi
{"title":"Comparing GPT-based approaches in automated writing evaluation","authors":"Yingying Liu , Xiaofei Lu , Huilei Qi","doi":"10.1016/j.asw.2025.100961","DOIUrl":"10.1016/j.asw.2025.100961","url":null,"abstract":"<div><div>Large language models (LLMs) like OpenAI’s GPT models show significant promise in automated writing evaluation (AWE). However, recent research has mainly focused on non-fine-tuned GPT models, with limited attention to fine-tuned models as well as potential factors influencing performance, such as model type, prompting strategy, and dataset characteristics. This study compares six GPT-based approaches for evaluating TOFEL argumentative writing, namely, GPT-3.5 zero-shot, GPT-3.5 few-shot, GPT-4 zero-shot, GPT-4 few-shot, and two fine-tuning methods. We assess the impact of model type (GPT-3.5 vs. GPT-4), prompting strategy (zero-shot vs. few-shot), fine-tuning, class imbalance and dataset shift on performance. Our findings reveal that fine-tuned GPT models consistently outperform non-fine-tuned GPT-4 models, which in turn outperform GPT-3.5 models. Few-shot prompting does not show clear advantages over zero-shot prompting in this study. Additionally, class imbalance and dataset shift negatively affect model accuracy and reliability. These results offer valuable insights into the effectiveness of different GPT-based approaches and the factors that influence their performance in AWE.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100961"},"PeriodicalIF":4.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144471912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing WritingPub Date : 2025-06-19DOI: 10.1016/j.asw.2025.100960
Siyu Zhu , Qingyang Li , Yuan Yao , Jialin Li , Xinhua Zhu
{"title":"Improving writing feedback quality and self-efficacy of pre-service teachers in Gen-AI contexts: An experimental mixed-method design","authors":"Siyu Zhu , Qingyang Li , Yuan Yao , Jialin Li , Xinhua Zhu","doi":"10.1016/j.asw.2025.100960","DOIUrl":"10.1016/j.asw.2025.100960","url":null,"abstract":"<div><div>The rapid advancement of Generative AI (Gen-AI), such as ChatGPT, presents both opportunities and challenges for teacher education. For pre-service teachers (PSTs), Gen-AI offers new tools to enhance the efficiency and quality of writing feedback. However, it also raises concerns, as many PSTs lack classroom experience, confidence in giving feedback, and knowledge of how to effectively integrate AI-generated content into instructional practice. To address these issues, this study adopted a pre-post experimental design to examine the effects of targeted training on PSTs’ provision of writing feedback, with a focus on feedback quality, self-efficacy, and their relationship in ChatGPT-supported contexts. Over a two-week training program with 30 PSTs, Wilcoxon signed-rank test results from the content analysis showed significant improvements in feedback quality and self-efficacy. Semi-structured interviews with eight participants identified cognitive changes and enhanced ChatGPT operational skills as key drivers of these improvements. We reaffirmed that mastery and vicarious experiences are crucial for enhancing teacher self-efficacy. Furthermore, a reciprocal relationship was observed between the quality and self-efficacy in providing ChatGPT-assisted feedback. This study contributes to the broader discourse on ChatGPT in education and offers specific strategies for effectively incorporating new technology into teacher training.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100960"},"PeriodicalIF":4.2,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144321635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing WritingPub Date : 2025-06-19DOI: 10.1016/j.asw.2025.100959
Ge Lan , Yi Li , Jie Yang , Xuanzi He
{"title":"Investigating a customized generative AI chatbot for automated essay scoring in a disciplinary writing task","authors":"Ge Lan , Yi Li , Jie Yang , Xuanzi He","doi":"10.1016/j.asw.2025.100959","DOIUrl":"10.1016/j.asw.2025.100959","url":null,"abstract":"<div><div>The release of ChatGPT in 2022 has greatly influenced research on language assessment. Recently, there has been a burgeoning trend of investigating whether Generative (Gen) AI tools can be used for automated essay scoring (AES); however, most of this research has focused on common academic genres or exam writing tasks. To respond to the call for further investigations in recent studies, our study investigated the relationship between writing scores from a GenAI tool and scores from English teachers at a university in Hong Kong. We built a Chatbot to imitate the exact procedures English teachers need to follow before marking student writing. The Chatbot was applied to score 254 technical progress reports produced by engineering students in a disciplinary English course. Then we conducted correlation tests to examine the relationships between the Chatbot and English teachers on their total scores and four analytical scores (i.e., task fulfillment, language, organization, and formatting). The findings show a positive and moderate correlation on the total score (<em>r</em> = 0.424). For the analytical scores, the correlations are different across the four analytical domains, with stronger correlations on language (<em>r</em> = 0.364) and organization (<em>r</em> = 0.316) and weaker correlations on task fulfillment (<em>r</em> = 0.275) and formatting (<em>r</em> = 0.186). The findings indicate that GenAI has limited capacity for automated assessment as a whole but also that a customized Chatbot has greater potential for assessing language and organization domains than task fulfillment and formatting domains. Implications are also provided for similar future research.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100959"},"PeriodicalIF":4.2,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144321634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing WritingPub Date : 2025-06-16DOI: 10.1016/j.asw.2025.100958
John Elwood Romig , Amanda A. Olsen , Elizabeth Medina , Anna Tulloh
{"title":"Criterion validity evidence and alternate form reliability of curriculum-based measures of written expression for eighth grade students","authors":"John Elwood Romig , Amanda A. Olsen , Elizabeth Medina , Anna Tulloh","doi":"10.1016/j.asw.2025.100958","DOIUrl":"10.1016/j.asw.2025.100958","url":null,"abstract":"<div><div>Significant majorities of students in secondary grade levels struggle to meet grade level expectations for writing. Progress monitoring with curriculum-based measurement is one possible strategy for shaping instruction towards improved student outcomes. However, relatively little research has examined curriculum-based measures for writing with students in secondary grade levels. This study included 89 8th grade participants who completed one curriculum-based measurement writing task weekly for 11 weeks and completed the <em>Test of Written Language – 4</em> in the 12th week. Spearman’s rank correlations were calculated to determine the alternate form reliability and criterion validity evidence of the curriculum-based measurement tasks. We found alternate form reliability and criterion validity evidence to be weaker than established thresholds in the field but approaching what was found with other writing assessments. Educators should use caution when interpreting results of CBM in writing and consider alternative writing assessments for screening purposes.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100958"},"PeriodicalIF":4.2,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144290887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing WritingPub Date : 2025-06-03DOI: 10.1016/j.asw.2025.100954
Scott A. Crossley , Perpetual Baffour , L. Burleigh , Jules King
{"title":"A large-scale corpus for assessing source-based writing quality: ASAP 2.0","authors":"Scott A. Crossley , Perpetual Baffour , L. Burleigh , Jules King","doi":"10.1016/j.asw.2025.100954","DOIUrl":"10.1016/j.asw.2025.100954","url":null,"abstract":"<div><div>This paper introduces ASAP 2.0, a dataset of ∼25,000 source-based argumentative essays from U.S. secondary students. The corpus addresses the shortcomings of the original ASAP corpus by including demographic data, consistent scoring rubrics, and source texts. ASAP 2.0 aims to support the development of unbiased, sophisticated Automatic Essay Scoring (AES) systems that can foster improved educational practices by providing summative to students. The corpus is designed for broad accessibility with the hope of facilitating research into writing quality and AES system biases.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"65 ","pages":"Article 100954"},"PeriodicalIF":4.2,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144194685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}