{"title":"Authorship Attribution via Occupancy-problem-type Indices","authors":"Lukun Zheng, Huiqiang Zheng, Chandra Kundu","doi":"10.1080/09296174.2022.2037276","DOIUrl":"https://doi.org/10.1080/09296174.2022.2037276","url":null,"abstract":"ABSTRACT In this paper, we propose a new methodology for authorship attribution based on a profile of indices related to the occupancy problem, called occupancy-problem indices. The occupancy problem has a long history and is an important example in standard textbooks like Feller (1971). We base our methodology on function words. We establish a testing procedure by constructing a confidence band of the occupancy-problem indices using the sampling distribution of the number of distinct function words. We validate our proposed methodology using controlled and constructed writing samples whose authorship is known. We then apply this methodology to explore the question of who wrote the 15th Oz book, which has a disputing authorship between Lyman Frank Baum (1856–1919) and his successor Ruth Plumly Thompson (1891–1976) on the Oz series.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"30 1","pages":"27 - 41"},"PeriodicalIF":1.4,"publicationDate":"2022-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43211595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"To Move or Not to Move: An Entropy-based Approach to the Informativeness of Research Article Abstracts across Disciplines","authors":"Wei Xiao, Li Li, Jin Liu","doi":"10.1080/09296174.2022.2037275","DOIUrl":"https://doi.org/10.1080/09296174.2022.2037275","url":null,"abstract":"ABSTRACT Research article (RA) abstracts succinctly and skilfully epitomize the core information of the full text and have thus attracted the attention of a number of scholars. While previous studies mainly focused on the rhetorical structures, meta-discursive features and lexico-grammatical features, few have made explorations from the perspective of information theory. To bridge this gap, the present study conducted an entropy-based analysis to explore the distribution pattern of information content across moves and the variations across disciplines. 318 RA abstracts across the natural sciences, social sciences and humanities (106 abstracts per discipline) were selected and three indices, i.e. the 1-/ 2-/ 3-gram entropies, were used to examine whether different indices yielded different features. The results show that in an RA abstract, the information content is unevenly distributed across moves; different entropy indices may reflect different linguistic properties; and both similarities and variations exist in information content across disciplines. These phenomena can be attributed to the functions of moves, the linguistic meanings of indices and disciplinary features. This study has implications for RA abstract writing instruction and practice, as well as for broadening the applications of quantitative linguistic methods into less touched fields.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"30 1","pages":"1 - 26"},"PeriodicalIF":1.4,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48228211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Menzerath-Altmann Law in Consecutive and Simultaneous Interpreting: Insights into Varied Cognitive Processes and Load","authors":"Xinlei Jiang, Yue Jiang","doi":"10.1080/09296174.2022.2027657","DOIUrl":"https://doi.org/10.1080/09296174.2022.2027657","url":null,"abstract":"ABSTRACT Notwithstanding theoretical simulations of distinctive cognitive processes and load of consecutive (CI) and simultaneous interpreting (SI), quantitative linguistic inquiry into their outputs is needed for solid empirical evidence. As a fundamental law of quantitative linguistics, Menzerath–Altmann Law (MAL) mirrors the economic processing of linguistic information and complex dynamic language system. Given its extensive validation at various linguistic levels and predictive power of its parameters in register, language and authorship differentiation, MAL is worthy of being applied to interpreting studies. We endeavour to investigate whether interpreted languages follow the MAL and reveal varied cognitive load of CI versus SI, as manifested by different MAL fitting models. Results show that (1) both CI and SI outputs follow the MAL; (2) SI processing involves more diversified structural information and shows a greater tendency of shortening the clauses of a sentence with increased sentence length, than CI processing, expressed by significantly higher a and lower b in SI models than that in CI models. Our findings suggest the disparate language representations are shaped by cognitive capacity limitations and interpreting modalities, and reveal how language system dynamically re-regulates and reorganizes the linguistic information to accommodate environmental settings from the perspective of synergetic linguistics.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"541 - 559"},"PeriodicalIF":1.4,"publicationDate":"2022-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45393907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Markov Models for Multi-state Language Change","authors":"F. Velde, Isabeau De Smet","doi":"10.1080/09296174.2021.1877004","DOIUrl":"https://doi.org/10.1080/09296174.2021.1877004","url":null,"abstract":"","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"314-338"},"PeriodicalIF":1.4,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2021.1877004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59838234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Syntactic Complexity of Different Text Types: From the Perspective of Dependency Distance Both Linearly and Hierarchically","authors":"Ruina Chen, Sirui Deng, Haitao Liu","doi":"10.1080/09296174.2021.2005960","DOIUrl":"https://doi.org/10.1080/09296174.2021.2005960","url":null,"abstract":"ABSTRACT Dependency distance (DD) is a well-established measure of syntactic complexity. Previous studies largely focused on the linear dimension, mostly by mean of dependency distance (MDD). In the present study, a new quantitative indicator –mean hierarchical dependency distance (MHDD), is proposed to discuss DD-related issues. Combining MHDD and MDD, the study investigates syntactic complexity of different texts, using strictly length-controlled sentences of 12 text types from the Freiburg-Brown corpus of American English. Correlations of MHDD and MDD have been identified, and possible reasons are discussed from the mathematical and theoretical perspectives. Mathematically, one is that the numerator of MHDD overlaps with the denominator of MDD, both being (n-1) where n is the number of words in the sentence. The other is that the denominator of MHDD (maximum hierarchical layer: MAXHL) and the numerator of MDD (sum of DD: SOD), are positively correlated. We believe that it is the positive correlation of SOD and MAXHL that ensures the change of MDD and MHDD in the same direction. It is also worth noting that both MAXHL and SOD seem to be minimized at their respective data spectrum, which foreshadows the dependency distance minimization (DDM) tendency on the hierarchical dimension.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"510 - 540"},"PeriodicalIF":1.4,"publicationDate":"2021-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42565734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dependency Distance and Its Probability Distribution: Are They the Universals for Measuring Second Language Learners’ Language Proficiency?","authors":"Yuxin Hao, Xuelin Wang, Yanni Lin","doi":"10.1080/09296174.2021.1991684","DOIUrl":"https://doi.org/10.1080/09296174.2021.1991684","url":null,"abstract":"ABSTRACT Previous studies have shown that dependency distance and its probability distribution can be applied as syntactic indicators of English as interlanguage. However, the universal application of these indicators has not been verified from the perspective of language typology. The issues are addressed in the present study based on a treebank of Chinese interlanguage of English and Japanese native speakers. The findings are as follows: (1) with the improvement of L2 proficiency, the MDDs of learners with different native language backgrounds gradually approach that of the target language in different patterns, and dependency distance is of universal significance as a metric to measure the development of interlanguage’s syntactic complexity; (2) Chinese interlanguage also follows the principle of least effort, and its probability distribution of dependency distance, like those of natural languages, presents a power–law distribution, which can successfully fit the Zipf-Alekseev distribution; (3) the right truncated modified Zipf-Alekseev distribution can be used to measure Chinese interlanguage proficiency, and the fitting parameters of the probability distribution of dependency distance as a metric of interlanguage proficiency are also of universal value.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"485 - 509"},"PeriodicalIF":1.4,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47417978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Zipfian Approach to Words in Contexts: The Cases of Modern English and Chinese","authors":"Jinzhou Cong","doi":"10.1080/09296174.2021.1926110","DOIUrl":"https://doi.org/10.1080/09296174.2021.1926110","url":null,"abstract":"ABSTRACT The system-level complexity of language has been thoroughly investigated in terms of Zipf’s law, whose quantitative features have proved to reflect text/language typology. This study extends the scope of Zipf’s law from the macroscopic scale of language to specific words in contexts, with the aim of examining its potential as an indicator of word typology. The focus is confined to the high-frequency words in English and Chinese as found in the FLOB and LCMC corpora. It has been found that the log–log rank-frequency distributions of contextual words of the words in question generally abide by the linear function y = ax+b. Moreover, it has been shown that an adjusted version of parameter a can help to distinguish the words in question’s classes. The contextual information as reflected by this Zipf-based index might be more important to the emergence of word classes of Chinese, which has no real inflection as a word-class indicator. From a Zipfian approach, the findings have preliminarily approved Saussure’s systems thinking regarding linguistic signs. Meanwhile, they may also contribute to such fields as usage-based linguistics.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"465 - 484"},"PeriodicalIF":1.4,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2021.1926110","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47942307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Indicative/subjunctive Mood Alternation with Adverbs of Doubt in Spanish","authors":"Harunobu Hirota","doi":"10.1080/09296174.2021.1919376","DOIUrl":"https://doi.org/10.1080/09296174.2021.1919376","url":null,"abstract":"ABSTRACT This study aims to analyse the indicative/subjunctive mood alternation in Spanish sentences with adverbs of doubt (acaso, posiblemente, probablemente, quizá, quizás, tal vez, seguramente, a lo mejor, igual). To this end, this study statistically analysed the linguistic and social factors conditioning the mood alternation in sentences with adverbs of doubt. A total of 1278 tokens were analysed. Each datum was annotated with verb type, verb aspect, verb person, distance between the adverb and the verb, sex, age, region, and education level. To exclude confounding factors, multivariable logistic regression was conducted, and the analysis yielded significant odds ratios (ORs) for 10 items, including sex, region, education level, adverbs (posiblemente, probablemente, quizá, quizás, tal vez), aspect, and distance between the verb and the adverb. These results show that these adverbs can be divided into two groups, where posiblemente, probablemente, quizá, quizás, and tal vez are more likely to co-occur with the subjunctive than the adverbs acaso, seguramente, a lo mejor, and igual. Furthermore, this study has shown that each adverb differs in the likelihood of co-occurring with the subjunctive, and that social factors of speakers affect the mood selection. Thus, an analysis of mood alternations should include social and linguistic factors.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"450 - 464"},"PeriodicalIF":1.4,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2021.1919376","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46192868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Entropy of Morphological Systems in Natural Languages Is Modulated by Functional and Semantic Properties","authors":"Francesca Franzon, Chiara Zanini","doi":"10.1080/09296174.2022.2063501","DOIUrl":"https://doi.org/10.1080/09296174.2022.2063501","url":null,"abstract":"ABSTRACT In most natural languages, grammatical gender and number features encode semantic attributes concerning animacy, sex, and numerosity. Despite the likely advantage of promptly communicating about such salient attributes, inflectional systems rarely display consistently bijective correspondences between the semantic attributes and the grammatical feature values. In a study on Italian, we explored how this apparently noisy encoding depends on a trade-off between the semantic and the functional aspects of grammatical features. Using entropy metrics, we assessed the primarily functional purpose of gender and number features in the lexicon, observing a distribution of nouns that can optimally serve agreement-based parsing and prediction of words in sentences. A novel context entropy measure, introduced in this study to assess meaning specificity, revealed a semantic underspecification in masculine and singular nouns denoting animate referents. We argue that underspecification is the hallmark of the particular type of information compression occurring in inflectional systems. In binary inflectional systems, one value specifically encodes a semantic attribute, while the other value does not encode any semantic information, and surfaces as a default for functional purposes. By providing an information-theoretical account of the role of grammatical features, we set the basis for a scientifically informed pursue of language inclusiveness.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"30 1","pages":"42 - 66"},"PeriodicalIF":1.4,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43283129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modelling the Dynamics of Language Change: Logistic Regression, Piotrowski’s Law, and a Handful of Examples in Polish","authors":"Rafal L. Górski, Maciej Eder","doi":"10.1080/09296174.2022.2151208","DOIUrl":"https://doi.org/10.1080/09296174.2022.2151208","url":null,"abstract":"ABSTRACT The study discusses modelling diachronic processes by logistic regression. The phenomenon of nonlinear changes in language was first observed by Raimund Piotrowski (hence labelled as Piotrowski’s law), even if actual linguistic evidence often speaks against using the notion of a ‘law’ in this context. In our study, we apply logistic regression models to changes which occurred between 15th and 18th century in the Polish language. The attested course of the majority of these changes closely follow the expected values, which proves that the language change might indeed resemble a nonlinear phase change scenario. We also extend the original Piotrowski’s approach by proposing polynomial logistic regression for these cases which can hardly be described by its standard version. Also, we propose to consider individual language change cases jointly, in order to inspect their possible collinearity or, more likely, their different dynamics in the function of time. Last but not least, we evaluate our results by testing the influence of the subcorpus size on the model’s goodness-of-fit.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"30 1","pages":"125 - 151"},"PeriodicalIF":1.4,"publicationDate":"2021-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43674217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}