{"title":"挑战词汇覆盖惯例:评估家庭类型电影和电视的词汇需求","authors":"Brett Milliner , Geoffrey Pinchbeck","doi":"10.1016/j.rmal.2025.100230","DOIUrl":null,"url":null,"abstract":"<div><div>The contribution of studies investigating lexical coverage to the field of applied linguistics cannot be understated. Lexical coverage research has helped establish the vocabulary knowledge most essential for second language (L2) comprehension and elevate the importance of high-frequency vocabulary knowledge acquisition. Approaches to lexical coverage research have, however, begun to come under closer scrutiny in recent studies, with some experts questioning the accuracy of coverage estimates. Understanding these limitations, the current study applies an alternative approach to evaluating the lexical knowledge required to comprehend the OPUS-family-genre corpus, a collection of closed captions from 1597 family-genre films and television programs (10,744,767 tokens). In contrast to previous conventions that used band-based (1000-word) predictions of lexical coverage, in this study, coverage is evaluated at the individual word-unit level. It compares the coverage provided by four word lists: (1) a lemma list derived from tagging the OPUS-family-genre corpus, (2) a flemma list, and two word-family lists, (3) the BNC, and (4) the BNC/COCA. The study also models how a part-of-speech lexical tagger (TagAnt) can be used to evaluate lemma-based lexical coverage. The analysis revealed that English language learners will know 90, 95, and 98% of the running words appearing in family-genre films and television if they know the first 855, 2005, and 4393 flemmas, from the attached word lists. More simply, knowing the first 900 words from our supplementary word frequency lists would enable English language learners to start viewing family-genre films and television.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100230"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Challenging lexical coverage conventions: Evaluating the vocabulary demands of family-genre film and television\",\"authors\":\"Brett Milliner , Geoffrey Pinchbeck\",\"doi\":\"10.1016/j.rmal.2025.100230\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The contribution of studies investigating lexical coverage to the field of applied linguistics cannot be understated. Lexical coverage research has helped establish the vocabulary knowledge most essential for second language (L2) comprehension and elevate the importance of high-frequency vocabulary knowledge acquisition. Approaches to lexical coverage research have, however, begun to come under closer scrutiny in recent studies, with some experts questioning the accuracy of coverage estimates. Understanding these limitations, the current study applies an alternative approach to evaluating the lexical knowledge required to comprehend the OPUS-family-genre corpus, a collection of closed captions from 1597 family-genre films and television programs (10,744,767 tokens). In contrast to previous conventions that used band-based (1000-word) predictions of lexical coverage, in this study, coverage is evaluated at the individual word-unit level. It compares the coverage provided by four word lists: (1) a lemma list derived from tagging the OPUS-family-genre corpus, (2) a flemma list, and two word-family lists, (3) the BNC, and (4) the BNC/COCA. The study also models how a part-of-speech lexical tagger (TagAnt) can be used to evaluate lemma-based lexical coverage. The analysis revealed that English language learners will know 90, 95, and 98% of the running words appearing in family-genre films and television if they know the first 855, 2005, and 4393 flemmas, from the attached word lists. More simply, knowing the first 900 words from our supplementary word frequency lists would enable English language learners to start viewing family-genre films and television.</div></div>\",\"PeriodicalId\":101075,\"journal\":{\"name\":\"Research Methods in Applied Linguistics\",\"volume\":\"4 3\",\"pages\":\"Article 100230\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research Methods in Applied Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772766125000515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766125000515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Challenging lexical coverage conventions: Evaluating the vocabulary demands of family-genre film and television
The contribution of studies investigating lexical coverage to the field of applied linguistics cannot be understated. Lexical coverage research has helped establish the vocabulary knowledge most essential for second language (L2) comprehension and elevate the importance of high-frequency vocabulary knowledge acquisition. Approaches to lexical coverage research have, however, begun to come under closer scrutiny in recent studies, with some experts questioning the accuracy of coverage estimates. Understanding these limitations, the current study applies an alternative approach to evaluating the lexical knowledge required to comprehend the OPUS-family-genre corpus, a collection of closed captions from 1597 family-genre films and television programs (10,744,767 tokens). In contrast to previous conventions that used band-based (1000-word) predictions of lexical coverage, in this study, coverage is evaluated at the individual word-unit level. It compares the coverage provided by four word lists: (1) a lemma list derived from tagging the OPUS-family-genre corpus, (2) a flemma list, and two word-family lists, (3) the BNC, and (4) the BNC/COCA. The study also models how a part-of-speech lexical tagger (TagAnt) can be used to evaluate lemma-based lexical coverage. The analysis revealed that English language learners will know 90, 95, and 98% of the running words appearing in family-genre films and television if they know the first 855, 2005, and 4393 flemmas, from the attached word lists. More simply, knowing the first 900 words from our supplementary word frequency lists would enable English language learners to start viewing family-genre films and television.