Structural Factor Analysis of Lexical Complexity Constructs and Measures: A Quantitative Measure-Testing Process on Specialised Academic Texts

IF 1.7 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics Pub Date : 2023-11-02 DOI:10.1080/09296174.2023.2258782

Maryam Nasseri, Philip McCarthy

{"title":"Structural Factor Analysis of Lexical Complexity Constructs and Measures: A Quantitative Measure-Testing Process on Specialised Academic Texts","authors":"Maryam Nasseri, Philip McCarthy","doi":"10.1080/09296174.2023.2258782","DOIUrl":null,"url":null,"abstract":"ABSTRACTThis study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection. AcknowledgmentsWe would like to thank the two anonymous reviewers for their valuable suggestions and comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Credit authorship contribution statementMaryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.Notes1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).Additional informationFundingThis study is part of the “Lexical Proficiency Grading for Academic Writing (FRG23-C-S66)” comprehensive research granted by the American University of Sharjah (AUS).Notes on contributorsMaryam NasseriMaryam Nasseri received her doctoral degree from the University of Birmingham (UK), where she worked on the application of statistical modelling, NLP, and corpus linguistics methods on lexical and syntactic complexity. She has received multiple awards and grants, including the ISLE 2020 grant for syntactic complexification in academic texts and the AUS 2023-26 research grant for statistical modelling and designing software for lexical proficiency grading of academic writing. She has published in journals such as System, Journal of English for Academic Purposes (JEAP), and Assessing Writing and reviewed multiple articles and books for Taylor & Francis, Assessing Writing, and Journal of Language and Education (JLE).Philip McCarthyPhilip McCarthy is an Associate Professor and discourse scientist specializing in software design and corpus analysis. His major interest is analyzing the English writings of students. His articles have been published in journals such as Discourse Processes, The Modern Language Journal, Written Communication, and Applied Psycholinguistics. McCarthy has been a teacher for 30 years, working in locations such as Turkiye, Japan, Britain, the US and the UAE. He is currently the principal investigator of a project on lexical proficiency grading of academic writing funded by the American University of Sharjah (AUS).","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"24 6","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quantitative Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09296174.2023.2258782","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

Abstract

ABSTRACTThis study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection. AcknowledgmentsWe would like to thank the two anonymous reviewers for their valuable suggestions and comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Credit authorship contribution statementMaryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.Notes1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).Additional informationFundingThis study is part of the “Lexical Proficiency Grading for Academic Writing (FRG23-C-S66)” comprehensive research granted by the American University of Sharjah (AUS).Notes on contributorsMaryam NasseriMaryam Nasseri received her doctoral degree from the University of Birmingham (UK), where she worked on the application of statistical modelling, NLP, and corpus linguistics methods on lexical and syntactic complexity. She has received multiple awards and grants, including the ISLE 2020 grant for syntactic complexification in academic texts and the AUS 2023-26 research grant for statistical modelling and designing software for lexical proficiency grading of academic writing. She has published in journals such as System, Journal of English for Academic Purposes (JEAP), and Assessing Writing and reviewed multiple articles and books for Taylor & Francis, Assessing Writing, and Journal of Language and Education (JLE).Philip McCarthyPhilip McCarthy is an Associate Professor and discourse scientist specializing in software design and corpus analysis. His major interest is analyzing the English writings of students. His articles have been published in journals such as Discourse Processes, The Modern Language Journal, Written Communication, and Applied Psycholinguistics. McCarthy has been a teacher for 30 years, working in locations such as Turkiye, Japan, Britain, the US and the UAE. He is currently the principal investigator of a project on lexical proficiency grading of academic writing funded by the American University of Sharjah (AUS).

查看原文本刊更多论文

词汇复杂性的结构因素分析及其测度:专业学术文本的定量测量测试过程

摘要本研究评估了22个词汇复杂性指标，分别代表密度、多样性和复杂性三种结构。这些测量方法的选择源于对二语习得语言学文献的广泛回顾。所有的测量都经过定性筛选，以确定词汇熟练度/发展和标准效度的指标/预测因素。本研究的测量-测试过程首先根据其量化方法和相关测试的结果，将选定的测量分为计算相似和计算不同的两组。使用研究生学术文本的专门语料库，然后进行结构因素分析(SFA)，包括验证性因素分析(CFA)和探索性因素分析(EFA)。SFA的目的是1)验证和检查文献中提出的词汇分类，2)评估各种词汇结构与其代表性度量之间的关系，3)确定最能代表每种结构的指标，4)发现可能的新结构/维度。在语料库分析的基础上，探讨了词汇复杂性构式的构式差异性，以及构式中各概念/数学组的强指标。最后，为未来需要测量选择的研究提出了一套独特且较小的测量方法，代表每个结构。感谢两位匿名审稿人提出的宝贵建议和意见。披露声明作者未报告潜在的利益冲突。maryam Nasseri:概念化，数据管理，方法论，数据分析和结果评估，项目管理，可视化，写作:原稿，写作:批判性审查和编辑，资金获取。Philip McCarthy:测量-选择，写作:批判性审查和编辑，资金获取。LCA-AW中的词汇复杂性度量是通过BAWE(英国学术书面英语)语料库及其在语言学和语言研究中使用的最常用的学术写作词，以及基于BNC(英国国家语料库)或ANC(美国国家语料库)的一般英语频率词表进行过滤的。LCA-AW和TAALED基于引理形式计算索引，而Coh-Metrix基于词形计算vocd-D索引。在后一种情况下，规范化文件可以用作oh- metrix .3的输入。本研究中使用的R软件包包括psych(版本1.8.12,Revelle, Citation2018)、lavaan(版本0.5-18,Rosseel, Citation2012)和corrplot(版本0.84,Wei & Simko, Citation2017)。本研究是由沙迦美国大学(AUS)资助的“学术写作词汇能力评分(FRG23-C-S66)”综合研究的一部分。maryam Nasseri在英国伯明翰大学获得博士学位，在那里她致力于统计建模、自然语言处理和语料库语言学方法在词汇和句法复杂性方面的应用。她获得了多个奖项和资助，包括ISLE 2020学术文本句法复杂性的资助和AUS 2023-26学术写作词汇熟练度评分统计建模和设计软件的研究资助。她曾在System, Journal of English for Academic Purposes (JEAP)和assessment Writing等期刊上发表文章，并为Taylor & Francis, assessment Writing和Journal of Language and Education (JLE)评论了多篇文章和书籍。Philip McCarthy，副教授和话语科学家，专攻软件设计和语料库分析。他的主要兴趣是分析学生的英语写作。他的文章发表在《话语过程》、《现代语言杂志》、《书面交际》和《应用心理语言学》等期刊上。麦卡锡已经当了30年的老师，在土耳其、日本、英国、美国和阿联酋等地工作。他目前是由沙迦美国大学(AUS)资助的学术写作词汇能力分级项目的首席研究员。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Quantitative Linguistics Multiple-

CiteScore

2.90

自引率

7.10%

发文量

期刊介绍： The Journal of Quantitative Linguistics is an international forum for the publication and discussion of research on the quantitative characteristics of language and text in an exact mathematical form. This approach, which is of growing interest, opens up important and exciting theoretical perspectives, as well as solutions for a wide range of practical problems such as machine learning or statistical parsing, by introducing into linguistics the methods and models of advanced scientific disciplines such as the natural sciences, economics, and psychology.