Challenging lexical coverage conventions: Evaluating the vocabulary demands of family-genre film and television

Brett Milliner , Geoffrey Pinchbeck
{"title":"Challenging lexical coverage conventions: Evaluating the vocabulary demands of family-genre film and television","authors":"Brett Milliner ,&nbsp;Geoffrey Pinchbeck","doi":"10.1016/j.rmal.2025.100230","DOIUrl":null,"url":null,"abstract":"<div><div>The contribution of studies investigating lexical coverage to the field of applied linguistics cannot be understated. Lexical coverage research has helped establish the vocabulary knowledge most essential for second language (L2) comprehension and elevate the importance of high-frequency vocabulary knowledge acquisition. Approaches to lexical coverage research have, however, begun to come under closer scrutiny in recent studies, with some experts questioning the accuracy of coverage estimates. Understanding these limitations, the current study applies an alternative approach to evaluating the lexical knowledge required to comprehend the OPUS-family-genre corpus, a collection of closed captions from 1597 family-genre films and television programs (10,744,767 tokens). In contrast to previous conventions that used band-based (1000-word) predictions of lexical coverage, in this study, coverage is evaluated at the individual word-unit level. It compares the coverage provided by four word lists: (1) a lemma list derived from tagging the OPUS-family-genre corpus, (2) a flemma list, and two word-family lists, (3) the BNC, and (4) the BNC/COCA. The study also models how a part-of-speech lexical tagger (TagAnt) can be used to evaluate lemma-based lexical coverage. The analysis revealed that English language learners will know 90, 95, and 98% of the running words appearing in family-genre films and television if they know the first 855, 2005, and 4393 flemmas, from the attached word lists. More simply, knowing the first 900 words from our supplementary word frequency lists would enable English language learners to start viewing family-genre films and television.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100230"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766125000515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The contribution of studies investigating lexical coverage to the field of applied linguistics cannot be understated. Lexical coverage research has helped establish the vocabulary knowledge most essential for second language (L2) comprehension and elevate the importance of high-frequency vocabulary knowledge acquisition. Approaches to lexical coverage research have, however, begun to come under closer scrutiny in recent studies, with some experts questioning the accuracy of coverage estimates. Understanding these limitations, the current study applies an alternative approach to evaluating the lexical knowledge required to comprehend the OPUS-family-genre corpus, a collection of closed captions from 1597 family-genre films and television programs (10,744,767 tokens). In contrast to previous conventions that used band-based (1000-word) predictions of lexical coverage, in this study, coverage is evaluated at the individual word-unit level. It compares the coverage provided by four word lists: (1) a lemma list derived from tagging the OPUS-family-genre corpus, (2) a flemma list, and two word-family lists, (3) the BNC, and (4) the BNC/COCA. The study also models how a part-of-speech lexical tagger (TagAnt) can be used to evaluate lemma-based lexical coverage. The analysis revealed that English language learners will know 90, 95, and 98% of the running words appearing in family-genre films and television if they know the first 855, 2005, and 4393 flemmas, from the attached word lists. More simply, knowing the first 900 words from our supplementary word frequency lists would enable English language learners to start viewing family-genre films and television.
挑战词汇覆盖惯例:评估家庭类型电影和电视的词汇需求
词汇覆盖研究对应用语言学领域的贡献是不可低估的。词汇覆盖研究有助于确立对第二语言理解至关重要的词汇知识,提高高频词汇知识习得的重要性。然而,在最近的研究中,词汇覆盖研究的方法开始受到更严格的审查,一些专家质疑覆盖估计的准确性。了解了这些局限性,本研究采用了另一种方法来评估理解opus家庭类型语料库所需的词汇知识,该语料库收集了来自1597部家庭类型电影和电视节目(10,744,767个符号)的封闭字幕。与以前使用基于频带(1000字)的词汇覆盖率预测的惯例相反,在本研究中,覆盖率是在单个单词单位级别进行评估的。它比较了四个词表提供的覆盖范围:(1)从标注OPUS-family-genre语料库派生的引理表,(2)flemma表和两个词族表,(3)BNC和(4)BNC/COCA。该研究还对词性词汇标注器(TagAnt)如何用于评估基于引理的词汇覆盖进行了建模。分析显示,如果英语学习者知道了家庭类型电影和电视中出现的前855、2005和4393部电影的单词表,他们就能掌握其中的90%、95%和98%。更简单地说,从我们的补充词频表中了解前900个单词将使英语学习者能够开始观看家庭类型的电影和电视。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.10
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信