Aligning linguistic complexity with the difficulty of English texts for L2 learners based on CEFR levels

IF 4.9 1区 文学 Q1 LINGUISTICS
Xiaopeng Zhang, Xiaofei Lu
{"title":"Aligning linguistic complexity with the difficulty of English texts for L2 learners based on CEFR levels","authors":"Xiaopeng Zhang, Xiaofei Lu","doi":"10.1017/s0272263125101125","DOIUrl":null,"url":null,"abstract":"<p>Selecting appropriate texts for second language (L2) learners is essential for effective education. However, current text difficulty models often inadequately classify materials for L2 learners by proficiency levels. This study addresses this deficiency by employing the Common European Framework of Reference for Languages (CEFR) as its foundational framework. A cohort of expert English-L2 educators classified 1,181 texts from the CommonLit Ease of Readability corpus into CEFR levels. A random forest model was then trained using 24 linguistic complexity features to predict the CEFR levels of English texts for L2 learners. The model achieved 62.6% exact-level accuracy across the six granular CEFR levels and 82.6% across the three overarching levels, outperforming a baseline model based on three existing readability formulas. Additionally, it identified shared and unique linguistic features across different CEFR levels, highlighting the necessity to adjust text classification models to accommodate the distinct linguistic profiles of low- and high-proficiency readers.</p>","PeriodicalId":22008,"journal":{"name":"Studies in Second Language Acquisition","volume":"52 1","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Second Language Acquisition","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1017/s0272263125101125","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Selecting appropriate texts for second language (L2) learners is essential for effective education. However, current text difficulty models often inadequately classify materials for L2 learners by proficiency levels. This study addresses this deficiency by employing the Common European Framework of Reference for Languages (CEFR) as its foundational framework. A cohort of expert English-L2 educators classified 1,181 texts from the CommonLit Ease of Readability corpus into CEFR levels. A random forest model was then trained using 24 linguistic complexity features to predict the CEFR levels of English texts for L2 learners. The model achieved 62.6% exact-level accuracy across the six granular CEFR levels and 82.6% across the three overarching levels, outperforming a baseline model based on three existing readability formulas. Additionally, it identified shared and unique linguistic features across different CEFR levels, highlighting the necessity to adjust text classification models to accommodate the distinct linguistic profiles of low- and high-proficiency readers.

根据CEFR水平调整语言复杂性和英语文本难度
为第二语言学习者选择合适的文本是有效教育的关键。然而,目前的文本难度模型往往不能充分地根据二语学习者的熟练程度对材料进行分类。本研究通过采用欧洲共同语言参考框架(CEFR)作为其基础框架来解决这一缺陷。一群英语- l2教育专家将CommonLit Ease of易读性语料库中的1181篇文章分类为CEFR级别。然后使用24种语言复杂性特征训练随机森林模型来预测第二语言学习者的英语文本CEFR水平。该模型在6个颗粒级CEFR级别上实现了62.6%的精确级精度,在3个总体级别上实现了82.6%的精度,优于基于3个现有可读性公式的基线模型。此外,它确定了不同CEFR水平的共同和独特的语言特征,强调了调整文本分类模型以适应低水平和高水平读者不同语言特征的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.00
自引率
9.80%
发文量
52
期刊介绍: Studies in Second Language Acquisition is a refereed journal of international scope devoted to the scientific discussion of acquisition or use of non-native and heritage languages. Each volume (five issues) contains research articles of either a quantitative, qualitative, or mixed-methods nature in addition to essays on current theoretical matters. Other rubrics include shorter articles such as Replication Studies, Critical Commentaries, and Research Reports.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信