{"title":"Word Length Distribution in German Texts during the 17th-19th Century","authors":"Fei Lian, Y. Li","doi":"10.1080/09296174.2019.1662536","DOIUrl":"https://doi.org/10.1080/09296174.2019.1662536","url":null,"abstract":"ABSTRACT Word length in German texts has been a frequently discussed issue in the field of quantitative linguistics. Taking an overall view of the existing research data, however, most of the research focuses on literary texts and private letters and the size of data corpus for each research is relatively small. This paper provides a time- and genre-based analysis of word length distribution in German using 360 texts originated between the 17th and 19th centuries, aiming to find a probability distribution that can capture well the German word length distribution from a diachronic perspective and to reveal the relationship between the word length distribution and boundary conditions such as the genre and the creation time of text. Results indicate that the word length distribution in German texts written in different eras abides by the 1-displaced hyper-Poisson distribution, whose parameters (a, b) are interconnected with boundary conditions. This study corroborates that the word length distribution of a certain language is consistent, due to the constraint of the cognitive mechanism. Besides, the parameters of probability distribution can be good indicators of the writing style as well as the creation time of text.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"117 - 137"},"PeriodicalIF":1.4,"publicationDate":"2019-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1662536","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42201015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistics in Corpus Linguistics: A Practical Guide","authors":"Cunxin Han","doi":"10.1080/09296174.2019.1646069","DOIUrl":"https://doi.org/10.1080/09296174.2019.1646069","url":null,"abstract":"Corpus linguistics is a powerful quantitative methodology, which heavily relies on frequency data and statistical procedures. It is difficult to talk about corpus linguistics without mentioning sta...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"379 - 383"},"PeriodicalIF":1.4,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1646069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46755901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Coding and the Origins of Zipfian Laws","authors":"R. Ferrer-i-Cancho, C. Bentz","doi":"10.1080/09296174.2020.1778387","DOIUrl":"https://doi.org/10.1080/09296174.2020.1778387","url":null,"abstract":"ABSTRACT The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"165 - 194"},"PeriodicalIF":1.4,"publicationDate":"2019-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1778387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47778723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correlations and Potential Cross-Linguistic Indicators of Writing Style","authors":"P. Juola, George K. Mikros, Sean Vinsick","doi":"10.1080/09296174.2018.1458395","DOIUrl":"https://doi.org/10.1080/09296174.2018.1458395","url":null,"abstract":"Abstract In this paper, we present preliminary results on how an individual’s writing style persists even across languages. In other words, what aspects of an individual’s writing will persist irrespective of the language in which he or she writes? We argue that cognitive and social traits are likely to persist and demonstrate this by two separate analyses of bilingual corpora using the same individuals. We show that for various measures of linguistic complexity (which we consider to be a cognitive variable) and participation in specific social conventions (a social one), the correlation between scores on the two languages studied is significantly higher than would be expected by chance. We argue that this type of correlation may permit cross-linguistic authorship attribution.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"146 - 171"},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1458395","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46316258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frequency Effect and Neutralization of Tones in Mandarin Chinese","authors":"Huifang Kong, Shengyi Wu","doi":"10.1080/09296174.2018.1452140","DOIUrl":"https://doi.org/10.1080/09296174.2018.1452140","url":null,"abstract":"Abstract Tonal neutralization in Mandarin has long been thought to be connected with lexical frequency. But this has never been investigated quantitatively because of the methodological challenge. In this study, a production experiment was run with speakers reading disyllabic words in neutral tones with frequency estimates derived from a Frequency Dictionary. The dependent measures were the three acoustic correlates of: duration, F0 contour and intensity. Independent measures included the lexical frequency at three levels (low, middle and high). Regression analysis showed that neutralization of tones are directly correlated with lexical frequency independent of other factors. A regularity, the more frequent, the shorter in duration; the more frequent, the lower in pitch; the more frequent, the weaker in intensity governs the neutralization of tones in reduced syllables. However, the exact shape of such an effect displays a different scenario in a different frequency range. Only high frequency words display a significant difference from low frequency words. Last but not the least, an exemplar representation is proposed to express a neutral tone’s observed frequency effect naturally.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"115 - 95"},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1452140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45619077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Calculation of Semantic Distances Between Words: From Synonymy to Antonymy","authors":"M. Vakulenko","doi":"10.1080/09296174.2018.1452524","DOIUrl":"https://doi.org/10.1080/09296174.2018.1452524","url":null,"abstract":"Abstract A new approach to numerically measure the semantic distances between lexical units (words and collocations) based on the geometric analogies and analytical calculation, is put forward. Having considered the cases of equal and different weights of semes, we obtained exact algebraic formulas describing different levels of the meanings proximity, ranging from absolute synonymy to full antonymy. It was emphasized that absolute synonymy arises when the compared units contain equal numbers of semes that fully coincide and have equal weights in the corresponding pairs. Calculation of the semes weights helps to locate the unit more precisely on the semantic sphere. It was shown that the level of synonymy and antonymy decreases if different semes are accentuated, while the semantic distance between the units without identical semes cannot be influenced by seme boosting. It was observed that depending on the context, a word may wander over this sphere, thus modifying its lexical semantic relations with other units. As the proposed approach contributes to formalization of the units comparison procedure, it is advisable for incorporation into relevant automatic tools, particularly into WordNet and FrameNet. The obtained results may be useful for various linguistic and associated studies including automatic text analysis and processing, computer lexicography, information search and retrieval, machine translation and other NLP applications that are related to the artificial intelligence problem.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"116 - 128"},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1452524","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47278957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models","authors":"A. Dobó, J. Csirik","doi":"10.1080/09296174.2019.1570897","DOIUrl":"https://doi.org/10.1080/09296174.2019.1570897","url":null,"abstract":"ABSTRACT Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"244 - 271"},"PeriodicalIF":1.4,"publicationDate":"2019-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1570897","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48204474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Systemic Dynamics Model of Text Production","authors":"Giacomo P. Figueredo, G. Figueredo","doi":"10.1080/09296174.2019.1567301","DOIUrl":"https://doi.org/10.1080/09296174.2019.1567301","url":null,"abstract":"ABSTRACT This paper introduces a quantitative model of text as it unfolds in time. The model conceptualizes text as a functional unit of language. This organization can be difficult to identify because it forms complex patterns of linguistic laws, probability and dynamics. These patterns are covert configurations and need complex methods to be investigated. One such method is to draw from qualitative frameworks derived from the quantitative properties of language. Previous studies have shown that covert configurations can be obtained through qualitative frameworks. When dynamics is considered, however, a model of text production including the variable time is needed. This paper therefore aims at addressing this research gap by proposing a dynamics model of text unfolding. It draws from systemic theory and models its categories quantitatively. Time is introduced as variation of choice. The model is applied to a sample of text. Results show how individual choices contribute to text unfolding – describing the amount of meanings at any given moment in text time. In addition, the dynamic accumulation indicates core characteristics of a text, which can be further explored in text behaviour and typology.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"291 - 320"},"PeriodicalIF":1.4,"publicationDate":"2019-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1567301","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59838178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative Approaches to the Russian Language","authors":"E. Kelih","doi":"10.1080/09296174.2018.1558834","DOIUrl":"https://doi.org/10.1080/09296174.2018.1558834","url":null,"abstract":"The omnibus volume under review comprises 10 individual chapters by 22 authors, thus most of the chapters are co-authored. This seems to reflect the overall interdisciplinary approach focus of the ...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"80 - 83"},"PeriodicalIF":1.4,"publicationDate":"2019-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1558834","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43913979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the ‘Stickiness’ of Words. A Comparative Language Study Screening the Internet for English, German, French and Latin Phrases","authors":"M. Berger","doi":"10.1080/09296174.2018.1451206","DOIUrl":"https://doi.org/10.1080/09296174.2018.1451206","url":null,"abstract":"Abstract Language, one of the defining attributes of Homo sapiens, not only deploys as a chain of words. Rather, words group together in a non-random way to form phrases. Here, the world-wide web was searched for idiomatic expressions in three living and one extinct language: 1102 English, 1183 German, 1138 French and 1128 Latin phrases distributed into three categories, with high, middle and low frequencies. High-frequency phrases such as in addition to and as a matter of fact constituted 49.5% of all English phrases, but only 9.0% of the French and 2.5% of the German ones. The middle-frequency category with classical idioms such as a bitter pill or carved in stone comprised 34.9% of the English, 33.0% of the French, and 24.9% of the German phrases. Most French and German phrases were of low frequency. Latin phrases were found as often as French and more often than German ones in the world-wide web, and exhibited a frequency distribution similar to those of French and German. Frequency distributions yielded three main categories around similar maxima for all four languages, with differing relative proportions. The internet may prove useful for the quantitative comparison of languages.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"81 - 94"},"PeriodicalIF":1.4,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1451206","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44814266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}