{"title":"Text length effects on the reliability of syntactic complexity indices","authors":"Hyun-Bin Hwang, Charlene Polio","doi":"10.1016/j.rmal.2023.100085","DOIUrl":null,"url":null,"abstract":"<div><p>Automated tools are widely used to assess syntactic complexity in second language (L2) writing studies; however, the effects of text length on syntactic complexity indices remain unclear. This can pose a challenge when studying underrepresented populations (e.g., young learners, adults with limited literacy skills), as their lower proficiency may result in less text production. To address this issue, we investigated the minimum text length threshold at which automated measures of syntactic complexity become the most reliable while considering L2 proficiency and prompt topic. Essays from the ICNALE corpus, a dataset of 5,200 essays with four proficiency levels, were used to create a dataset of texts of varying lengths (50, 100, 150, and 200 words). Mixed-effects regression models showed that seven out of 14 indices were not affected by text length regardless of learner proficiency and prompt topic. The other seven differed only between the 50- and 200-word texts within intermediate levels. We suggest a minimum of 100 words as a conservative threshold for the reliability of syntactic complexity indices. Finally, we emphasize the importance of transparent reporting practice regarding text length information.</p></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"2 3","pages":"Article 100085"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766123000459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Automated tools are widely used to assess syntactic complexity in second language (L2) writing studies; however, the effects of text length on syntactic complexity indices remain unclear. This can pose a challenge when studying underrepresented populations (e.g., young learners, adults with limited literacy skills), as their lower proficiency may result in less text production. To address this issue, we investigated the minimum text length threshold at which automated measures of syntactic complexity become the most reliable while considering L2 proficiency and prompt topic. Essays from the ICNALE corpus, a dataset of 5,200 essays with four proficiency levels, were used to create a dataset of texts of varying lengths (50, 100, 150, and 200 words). Mixed-effects regression models showed that seven out of 14 indices were not affected by text length regardless of learner proficiency and prompt topic. The other seven differed only between the 50- and 200-word texts within intermediate levels. We suggest a minimum of 100 words as a conservative threshold for the reliability of syntactic complexity indices. Finally, we emphasize the importance of transparent reporting practice regarding text length information.