{"title":"Word Length Distribution in German Texts during the 17th-19th Century","authors":"Fei Lian, Y. Li","doi":"10.1080/09296174.2019.1662536","DOIUrl":null,"url":null,"abstract":"ABSTRACT Word length in German texts has been a frequently discussed issue in the field of quantitative linguistics. Taking an overall view of the existing research data, however, most of the research focuses on literary texts and private letters and the size of data corpus for each research is relatively small. This paper provides a time- and genre-based analysis of word length distribution in German using 360 texts originated between the 17th and 19th centuries, aiming to find a probability distribution that can capture well the German word length distribution from a diachronic perspective and to reveal the relationship between the word length distribution and boundary conditions such as the genre and the creation time of text. Results indicate that the word length distribution in German texts written in different eras abides by the 1-displaced hyper-Poisson distribution, whose parameters (a, b) are interconnected with boundary conditions. This study corroborates that the word length distribution of a certain language is consistent, due to the constraint of the cognitive mechanism. Besides, the parameters of probability distribution can be good indicators of the writing style as well as the creation time of text.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2019-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1662536","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quantitative Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/09296174.2019.1662536","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 5
Abstract
ABSTRACT Word length in German texts has been a frequently discussed issue in the field of quantitative linguistics. Taking an overall view of the existing research data, however, most of the research focuses on literary texts and private letters and the size of data corpus for each research is relatively small. This paper provides a time- and genre-based analysis of word length distribution in German using 360 texts originated between the 17th and 19th centuries, aiming to find a probability distribution that can capture well the German word length distribution from a diachronic perspective and to reveal the relationship between the word length distribution and boundary conditions such as the genre and the creation time of text. Results indicate that the word length distribution in German texts written in different eras abides by the 1-displaced hyper-Poisson distribution, whose parameters (a, b) are interconnected with boundary conditions. This study corroborates that the word length distribution of a certain language is consistent, due to the constraint of the cognitive mechanism. Besides, the parameters of probability distribution can be good indicators of the writing style as well as the creation time of text.
期刊介绍:
The Journal of Quantitative Linguistics is an international forum for the publication and discussion of research on the quantitative characteristics of language and text in an exact mathematical form. This approach, which is of growing interest, opens up important and exciting theoretical perspectives, as well as solutions for a wide range of practical problems such as machine learning or statistical parsing, by introducing into linguistics the methods and models of advanced scientific disciplines such as the natural sciences, economics, and psychology.