{"title":"Introducing and comparing two techniques for key lexical bundles analysis","authors":"Tove Larsson, Taehyeong Kim, Jesse Egbert","doi":"10.1016/j.rmal.2025.100245","DOIUrl":null,"url":null,"abstract":"<div><div>Multiword units, specifically lexical bundles, have been found to be important building blocks in language production and processing. We also know that using the text rather than the full corpus as the unit of analysis increases the linguistic validity of the results, given that written language is produced through texts (e.g., Egbert & Biber, 2019). However, researchers wishing to look at which bundles are characteristic of, or <em>key</em> to, a population (e.g., students from a specific first-language background) are currently out of luck if they are interested in using the text as the unit of analysis. The present paper introduces two methods designed for looking at key lexical bundles using texts as the unit of analysis: <em>text dispersion keyness</em> and <em>mean text frequency keyness</em>. We subsequently compare the results from these methods to existing <em>whole-corpus frequency keyness</em>. The results show that the techniques produce similar lists, but that mean text frequency keyness produced the largest number of content generalizable bundles (i.e., bundles that can be generalized across texts in the corpus). By contrast, text dispersion keyness helped us obtain the largest number of content distinctive bundles (i.e., bundles that clearly distinguish the target corpus from the reference corpus). Text dispersion keyness also produced the highest number of bundles that were both content generalizable and distinctive. Researchers may therefore wish to make a choice among these methods based on the objectives of their analysis.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100245"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766125000667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multiword units, specifically lexical bundles, have been found to be important building blocks in language production and processing. We also know that using the text rather than the full corpus as the unit of analysis increases the linguistic validity of the results, given that written language is produced through texts (e.g., Egbert & Biber, 2019). However, researchers wishing to look at which bundles are characteristic of, or key to, a population (e.g., students from a specific first-language background) are currently out of luck if they are interested in using the text as the unit of analysis. The present paper introduces two methods designed for looking at key lexical bundles using texts as the unit of analysis: text dispersion keyness and mean text frequency keyness. We subsequently compare the results from these methods to existing whole-corpus frequency keyness. The results show that the techniques produce similar lists, but that mean text frequency keyness produced the largest number of content generalizable bundles (i.e., bundles that can be generalized across texts in the corpus). By contrast, text dispersion keyness helped us obtain the largest number of content distinctive bundles (i.e., bundles that clearly distinguish the target corpus from the reference corpus). Text dispersion keyness also produced the highest number of bundles that were both content generalizable and distinctive. Researchers may therefore wish to make a choice among these methods based on the objectives of their analysis.