{"title":"Word Length in Chinese: The Menzerath-Altmann Law is Valid After All","authors":"Tereza Motalová, Ján Mačutek, Radek Čech","doi":"10.1080/09296174.2023.2259937","DOIUrl":"https://doi.org/10.1080/09296174.2023.2259937","url":null,"abstract":"ABSTRACTAccording to the Menzerath-Altmann law, longer language constructs consist, on average, of shorter constituents. It is most often studied at the level of words and syllables (the mean syllable length gets shorter with the increasing word length). Its validity at this level was corroborated in several languages. However, it was claimed that Chinese is an exception with respect to the validity of the Menzerath-Altmann law. We show that the law is valid if word types are considered, while the behaviour of word tokens is different. This difference can be explained by the fact that the Zipf law of abbreviation is valid not only for words but also for syllables (shorter syllables are used more frequently).KEYWORDS: word lengthMenzerath-Altmann lawChinesesyllableChinese characters AcknowledgmentsThe work was supported from European Regional Development Fund Project “Sinophone Borderlands – Interaction at the Edges”, CZ.02.1.01/0.0/0.0/16_019/0000791 (T. Motalová), VEGA 2/0096/21 (J. Mačutek), APVV-21-0216 (J. Mačutek), and Operational Programme Integrated Infrastructure (OPII) for the project 313011BWH2: “InoCHF – Research and development in the field of innovative technologies in the management of patients with CHF”, co-financed by the European Regional Development Fund (J. Mačutek).Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1. A more general formula with an additional parameter c, yx=axbecx, is sometimes used, see e.g. Mačutek et al. (Citation2019).2. The MAL has found its place also in research areas outside of human language, such as e.g. music (Boroda & Altmann, Citation1991), animal communication (Gustison et al., Citation2016), and genome structure (Ferrer-I-Cancho et al., Citation2014). The ‘common denominator’ of these branches of science is that they study information flow (in a very general sense).3. Syllable length was measured in moras, not in phonemes.4. In some of the papers cited in this paragraph, the mean syllable length is expressed in the number of graphemes rather than phonemes. The mean syllable length is quite similar for both choices in languages with shallow orthographies (Coulmas, Citation2002).5. Erization is an addition of the r-suffix (儿) to a syllable, e.g. 花 huā becomes 花儿 huār (‘flower’). Moreover, there are a few singular exceptions of polysyllabic characters in Chinese. Qiu (Citation2000, p. 26, 406) mentions 瓩 qiānwǎ ‘kilowatt’, 浬 hǎilǐ ‘nautical mile’, and 哩 yīnglǐ ‘English mile’ (none of these words occurs in our language material).6. Xin Han-Da cidian – Das neue Chinesisch-Deutsche Wörterbuch, 1985. Commercial Press, Beijing.7. In fact, one can speak about phonological words here, see e.g. Hall (Citation1999) or Zsiga (Citation2013, pp. 342–346). Thus, this approach can be considered a study of the MAL on the level of words, albeit from a slightly different perspective.8. Lengths of stress units ranged between 1 and 18 syllables while in the case of rhythmic segm","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135584918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural Factor Analysis of Lexical Complexity Constructs and Measures: A Quantitative Measure-Testing Process on Specialised Academic Texts","authors":"Maryam Nasseri, Philip McCarthy","doi":"10.1080/09296174.2023.2258782","DOIUrl":"https://doi.org/10.1080/09296174.2023.2258782","url":null,"abstract":"ABSTRACTThis study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection. AcknowledgmentsWe would like to thank the two anonymous reviewers for their valuable suggestions and comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Credit authorship contribution statementMaryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.Notes1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).Additional informationFundingThis study is part of the “Lexical Proficiency Grading for Academic Writing (FRG23-C-S66)” comprehensive research granted by the American University of Sharjah (AUS).Notes on cont","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135935652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Words and Numbers. In Memory of Peter Grzybek (1957-2019) <b>Words and Numbers. In Memory of Peter Grzybek (1957-2019)</b> , edited by Emmerich Kelih and Reinhard Köhler, Lüdenscheid, RAM-Verlag, 2020, 248 pp., ISBN 978-3-942303-89-7, 55,00 EUR for the paperback version","authors":"Mengge Wang","doi":"10.1080/09296174.2023.2262696","DOIUrl":"https://doi.org/10.1080/09296174.2023.2262696","url":null,"abstract":"","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136102652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lexical Features and Psychological States: A Quantitative Linguistic Approach","authors":"Xiaowei Du","doi":"10.1080/09296174.2023.2256211","DOIUrl":"https://doi.org/10.1080/09296174.2023.2256211","url":null,"abstract":"ABSTRACTIn recent decades, there has been an increasing interest in the relation between lexical features and texts of psychological states. Previous studies demonstrated that some lexical features varied significantly among the texts of psychological states. However, the lexical features at the textual level have received little attention. This paper extends this work by examining the performance of quantitative linguistic indices in classifying texts of psychological issues. A large dataset of forum posts including texts of anxiety, depression, suicide ideation, and normal states were experimented with Machine Learning algorithms. The results revealed that the quantitative linguistic indices with Machine Learning algorithms achieved a high level of success in identifying psychological states. Meanwhile, some quantitative linguistic indices, namely, ALT and Writer’s view, may extract adequate lexical features for classifying texts of different psychological states. The study is probably the first attempt that uses quantitative linguistic indices as lexical features to detect texts of psychological states, and the findings may contribute to our understanding of how accuracy may be enhanced in the identification of various psychological states. Finally, the implications of these findings are discussed. Publisher’s NoteAll claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.AcknowledgmentsWe thank the JQL referees and the editors for their insightful comments. Their suggestions have significantly enhanced the quality of the initial manuscripts.Disclosure StatementThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.Data Availability StatementPublicly available datasets were analysed in this study. This data can be found here: We used AlMosaiwi and Johnstone’s (2018) dataset which can be accessed at https://doi.org/10.6084/m9.figshare.474 3547.v1.Supplemental dataSupplemental data for this article can be accessed online at https://doi.org/10.1080/09296174.2023.2256211.Notes1. The dataset can be accessed at https://doi.org/10.6084/m9.figshare.4743547.Additional informationFundingThis study was Supported by “the Fundamental Research Funds for the Central Universities” (Grant No. 3132023331).","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135729302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effects of Word Limit on Sentence Length and Clause Length in Academic Journal Article Abstracts: A Synergetic Linguistic Perspective","authors":"Yue Li, Yuan Gao, Xiaofei Lu","doi":"10.1080/09296174.2023.2263249","DOIUrl":"https://doi.org/10.1080/09296174.2023.2263249","url":null,"abstract":"ABSTRACTSeveral studies have sought to characterize the syntactic features of research articles (RAs) and their part-genres. However, no study has examined the interrelation between different syntactic components (e.g. sentences and clauses) in the RA genre as a function of interacting internal and external factors (e.g. word limit) from a synergetic linguistic perspective. This study contributes to this line of research by investigating the effects of word limit (i.e. the restriction on the number of words used) on the length of sentences and clauses in RA abstracts. Our results show that RA abstracts contain significantly more longer sentences and clauses than the main body of RAs, but longer sentences in RA abstracts tend to have shorter constituting clauses, indicating that the Menzerath-Altmann Law is at play. Such an interrelation between sentence and clause length helps ensure a cognitively balanced system. Our findings have implications for the need to explore the interrelation between syntactic components emergent from the synergetic interactions of internal and external factors.KEYWORDS: Academic journal article abstractMenzerath-Altmann Lawsentence-clause interrelationsynergetic linguisticsword limit AcknowledgmentsWe appreciate the editors and anonymous reviewers for their constructive comments and suggestions.Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1. We balanced AJAA and AJAB in terms of word tokens in this study. One reviewer recommended calculating the ratio of mean sentence (and clause) length for each abstract-body pair for the 26 RAs represented in the AJAB corpus and subsequently computing a mean ratio along with its 95% confidence interval. The results of this analysis are summarized in Appendix C. These results reveal similar patterns of differences as those reported in Table 2, with RA abstracts containing slightly longer sentences and slightly shorter clauses than RA bodies along with less variation, although the results appear inconclusive, possibly partially due to the relatively small number of pairs analysed and the smaller number of sentences in each abstract than in each body.2. We balanced AJAA and AJAB in terms of word tokens in this study. One reviewer recommended running the MAL fitting analysis on the 26 abstracts and bodies of the RAs represented in AJAB for comparison purposes. Appendix D presents the mean clause length (measured in words) for sentences with different lengths in the 26 abstracts and bodies of the RAs represented in AJAB, and Appendix E presents the MAL fitting results on these abstracts and bodies. Similar to the results presented in Table 5, the coefficients of determination were larger than 0.9 for both corpora, with the RA abstracts showing a larger coefficient (0.9637 vs. 0.9380). Different from the results in Table 5, the F value for the RA abstracts did not reach statistical significance, and the b value for the RA abstracts was larger tha","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135829946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Structural Complexity of Chinese Words and Its Relationship with Word Frequency","authors":"Xinpei Hong, Wei Huang, Haitao Liu","doi":"10.1080/09296174.2023.2231743","DOIUrl":"https://doi.org/10.1080/09296174.2023.2231743","url":null,"abstract":"","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47269921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zipf’s Law for Speech Acts in Spoken English","authors":"Dang Qi, Hua Wang","doi":"10.1080/09296174.2023.2202470","DOIUrl":"https://doi.org/10.1080/09296174.2023.2202470","url":null,"abstract":"","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45872910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unifying Models for Word Length Distributions Based on Types and Tokens","authors":"Peter Zörnig, T. Berg","doi":"10.1080/09296174.2023.2202061","DOIUrl":"https://doi.org/10.1080/09296174.2023.2202061","url":null,"abstract":"ABSTRACT Word length studies have been one of the central issues in Quantitative Linguistics for a long time. Most models were constructed for very specific purposes, i.e. the individual models apply only to a specific language, only to token counts or only to type counts. The present paper takes up the challenge of developing unifying models which account for both type and token frequencies of a moderately large sample of languages (eight Indo-European and two non-Indo-European languages). We introduce three models which can be well fitted to all our data: the exponentiated Hyper-Poisson distribution, the generalized gamma and the Sichel distribution. We also discuss the possibility of interpreting the model parameters linguistically.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47161954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synergetic Properties of Lexical Structures in Chinese and English","authors":"Jieqiang Zhu, Jingyang Jiang","doi":"10.1080/09296174.2023.2213107","DOIUrl":"https://doi.org/10.1080/09296174.2023.2213107","url":null,"abstract":"ABSTRACT The synergetic lexical model provides a unique framework for exploration of the interrelationships between the lexical properties of languages. Previous studies concerning several properties of this lexical model have yielded many successful fittings results, but very few studies have investigated synonymy, a major property of this model. The present study uses 825 Chinese and 848 English tokens retrieved from Chinese and English corpora, dictionaries, and thesaurus to conduct a contrastive study on the interrelations between four major properties of this lexical model: word length, word frequency, polysemy, and synonymy. The successful fittings of both languages demonstrate the cross-linguistic validity of the synergetic lexical model, though English belongs to the Germanic language family, while Chinese, a highly analytical language, is of the Sino-Tibetan language family. Moreover, our analysis of the parameters of the fitting results shows that, compared to English, Chinese possesses a greater resistance to shortening word length and a quicker response to semantic change.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48436195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Corpus-Based Study of the Distributions of Adnominals Across Registers and Disciplines","authors":"Yiyang Hu, Qingshun He","doi":"10.1080/09296174.2023.2209487","DOIUrl":"https://doi.org/10.1080/09296174.2023.2209487","url":null,"abstract":"ABSTRACT Adnominals are an important resource of noun modification in written registers, especially in academic writing. This study compares the frequencies of adjectival adnominals and nominal adnominals across two registers (Fiction and Academic writing) by calculating T-values and conducting Welch’s t-tests on the adnominal subtypes. It is found that the preference for nominal adnominals exists in both the two registers and the mean frequencies of adjectival adnominals, premodifying nouns and postmodifying nouns increase as the register moves from Fiction to Academic writing. We further investigate the frequencies of adnominals in the research article abstracts across three disciplinary groups by conducting Welch’s ANOVA test. No significant difference is revealed in T-values in the research article abstracts across disciplines. The difference of adjectival adnominals, nouns as postmodifiers and appositive nouns lacks practical applications, while the effects of disciplines on the frequency of premodifying nouns cannot be rejected. It is the mean frequencies of premodifying nouns that show the significant difference in the research article abstracts across disciplines. Premodifying nouns are more prevalent in hard science texts than in soft science texts.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45089987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}