What is the best predictor of word difficulty? A case of data mining using random forest

IF 2.2 1区 文学 0 LANGUAGE & LINGUISTICS
Hung Tan Ha, Duyen Thi Bich Nguyen, Tim Stoeckel
{"title":"What is the best predictor of word difficulty? A case of data mining using random forest","authors":"Hung Tan Ha, Duyen Thi Bich Nguyen, Tim Stoeckel","doi":"10.1177/02655322241263628","DOIUrl":null,"url":null,"abstract":"Word frequency has a long history of being considered the most important predictor of word difficulty and has served as a guideline for several aspects of second language vocabulary teaching, learning, and assessment. However, recent empirical research has challenged the supremacy of frequency as a predictor of word difficulty. Accordingly, applied linguists have questioned the use of frequency as the principal criterion in the development of wordlists and vocabulary tests. Despite being informative, previous studies on the topic have been limited in the way the researchers measured word difficulty and the statistical techniques they employed for exploratory data analysis. In the current study, meaning recall was used as a measure of word difficulty, and random forest was employed to examine the importance of various lexical sophistication metrics in predicting word difficulty. The results showed that frequency was not the most important predictor of word difficulty. Due to the limited scope, research findings are only generalizable to Vietnamese learners of English.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"78 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Testing","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1177/02655322241263628","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Word frequency has a long history of being considered the most important predictor of word difficulty and has served as a guideline for several aspects of second language vocabulary teaching, learning, and assessment. However, recent empirical research has challenged the supremacy of frequency as a predictor of word difficulty. Accordingly, applied linguists have questioned the use of frequency as the principal criterion in the development of wordlists and vocabulary tests. Despite being informative, previous studies on the topic have been limited in the way the researchers measured word difficulty and the statistical techniques they employed for exploratory data analysis. In the current study, meaning recall was used as a measure of word difficulty, and random forest was employed to examine the importance of various lexical sophistication metrics in predicting word difficulty. The results showed that frequency was not the most important predictor of word difficulty. Due to the limited scope, research findings are only generalizable to Vietnamese learners of English.
什么是单词难度的最佳预测指标?使用随机森林进行数据挖掘的案例
长期以来,词频一直被认为是预测词汇难度的最重要指标,并在第二语言词汇教学、学习和评估的多个方面发挥着指导作用。然而,最近的实证研究对词频作为单词难度预测指标的优越性提出了质疑。因此,应用语言学家对使用词频作为制定词汇表和词汇测试的主要标准提出了质疑。尽管这些研究信息丰富,但研究人员在测量单词难度和探索性数据分析时所采用的统计技术方面都有局限性。在本研究中,词义回忆被用来衡量单词难度,随机森林被用来考察各种词汇复杂度指标在预测单词难度中的重要性。结果表明,词频并不是预测单词难度的最重要指标。由于研究范围有限,研究结果仅适用于越南英语学习者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Language Testing
Language Testing Multiple-
CiteScore
6.70
自引率
9.80%
发文量
35
期刊介绍: Language Testing is a fully peer reviewed international journal that publishes original research and review articles on language testing and assessment. It provides a forum for the exchange of ideas and information between people working in the fields of first and second language testing and assessment. This includes researchers and practitioners in EFL and ESL testing, and assessment in child language acquisition and language pathology. In addition, special attention is focused on issues of testing theory, experimental investigations, and the following up of practical implications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信