{"title":"Using Rank-Frequency and Type-Token Statistics to Compare Morphological Typology in the Celtic Languages","authors":"Andrew Wilson, Rosie Harvey","doi":"10.1080/09296174.2018.1560122","DOIUrl":"https://doi.org/10.1080/09296174.2018.1560122","url":null,"abstract":"ABSTRACT Previous work has used Greenberg’s synthetism index to compare three of the Celtic languages – Irish, Welsh, and Breton – but not the other three languages, namely Scottish Gaelic, Manx, and Cornish. This paper extends this earlier work by comparing all six Celtic languages, including two periods of Irish (Early Modern and Present Day). The analysis is based on a random sample of 210 parallel psalm texts (30 for each language). However, Greenberg’s synthetism index is problematic because there are no operational standards for counting morphemes within words. We therefore apply a newer typological indicator (B7), which is based solely on lexical rank-frequency statistics. We also explore whether type-token counts alone can provide similar information. The B7 indicator shows that both varieties of Irish, together with Welsh and Cornish, tend more towards synthetism, whereas Manx tends more towards analytism. Breton and Scottish Gaelic do not show a clear tendency in either direction. Rankings using type-token statistics vary considerably and do not tell the same story.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1560122","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41466832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multifactorial Analysis of Concessive Clause Positioning","authors":"Huimyung Kang, Jiajin Xu","doi":"10.1080/09296174.2020.1737488","DOIUrl":"https://doi.org/10.1080/09296174.2020.1737488","url":null,"abstract":"ABSTRACT Previous works have identified multiple factors and their interplay that condition the positioning of the concessive adverbial clauses. This study continues this line of research by 1) focusing exclusively on the positioning of although-led concessive adverbial clauses (although-clauses hereafter) among different concessive clause relations; 2) supplementing the factor set with more linguistic features, such as sentence-initial adverbials and hedging terms; and, 3) extending and generalizing the scope of competition among semantic, discoursal and processing motivators to a higher-level competition between ‘clarity’ and ‘processability’. Data were retrieved from 1,738 concessive sentences of student argumentative essays from the BAWE and NESSIE corpora. Models were generated based on binary logistic regression and random forests. The results show that the motivator of the relationship between the although-clauses and their main clauses was the most significant variable in all models, denoting its priority in conditioning concessive clause positioning, under the Competition Model framework. Subordinate clause complexity and deranking (i.e. clauses that do not have a full verb) were the least significant among all motivating factors. Overall, clarity-related motivators outweigh processability-related ones, prioritizing clear meaning-conveying in competition with processing motivators.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2020-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1737488","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44089175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gotzon Aurrekoetxea, Aitor Iglesias, E. Clua, I. Usobiaga, M. Salicrú
{"title":"Analysis of Transitional Areas in Dialectology: Approach with Fuzzy Logic","authors":"Gotzon Aurrekoetxea, Aitor Iglesias, E. Clua, I. Usobiaga, M. Salicrú","doi":"10.1080/09296174.2020.1732765","DOIUrl":"https://doi.org/10.1080/09296174.2020.1732765","url":null,"abstract":"ABSTRACT Comparing the dialectal classifications into disjointed zones with the representation of populations in a geolectal continuum has emphasized the importance of transition regions. Identifying these regions has been the subject of study in the scientific literature, although research has not been conducted in a reliable manner. Based on the Basque ‘Bourciez’ Corpus, we have highlighted the limitations of dialectal classifications using deterministic methods along with the possibilities provided by fuzzy logic. By contributing objectivity to the analysis, the C-means classification has allowed us to retain information from the deterministic classification, identify transition regions, emphasize the geolectal continuum and minimize the artificial isolation of certain populations in the classification. Classifying the French-Basque territory into two groups has separated the populations into two nearly-disjointed dialectal zones. Classifications into three and four groups have underscored the broad overlap between adjacent linguistic zones. This paper’s contribution has provided a new explanatory dimension and consequently improves the linguistic interpretation. In this sense, the results are in accordance with the previous contributions described in the literature and have justified the integration of different viewpoints.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1732765","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47674982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José Ramom Pichel, Pablo Gamallo, I. Alegria, Marco Neves
{"title":"A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity","authors":"José Ramom Pichel, Pablo Gamallo, I. Alegria, Marco Neves","doi":"10.1080/09296174.2020.1732177","DOIUrl":"https://doi.org/10.1080/09296174.2020.1732177","url":null,"abstract":"ABSTRACT The aim of this paper is to apply a corpus-based methodology, based on the measure of perplexity, to automatically calculate the cross-lingual language distance between historical periods of three languages. The three historical corpora have been constructed and collected with the closest spelling to the original on a balanced basis of fiction and non-fiction. This methodology has been applied to measure the historical distance of Galician with respect to Portuguese and Spanish, from the Middle Ages to the end of the 20th century, both in original spelling and automatically transcribed spelling. The quantitative results are contrasted with hypotheses extracted from experts in historical linguistics. Results show that Galician and Portuguese are varieties of the same language in the Middle Ages and that Galician converges and diverges with Portuguese and Spanish since the last period of the 19th century. In this process, orthography plays a relevant role. It should be pointed out that the method is unsupervised and can be applied to other languages.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1732177","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47311819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prose, Verse and Authorship in Dream of the Red Chamber: A Stylometric Analysis","authors":"Haoran Zhu, L. Lei, Hugh Craig","doi":"10.1080/09296174.2020.1724677","DOIUrl":"https://doi.org/10.1080/09296174.2020.1724677","url":null,"abstract":"ABSTRACT In this study, we provide a quantitative analysis of prose and verse in the classical Chinese novel, Dream of the Red Chamber (DRC), and discuss the implications for the disputed authorship of the novel. Firstly, we examine the amount of verse in across the chapters of DRC, and compare the style of the verse and prose portions of DRC. Secondly, a Principal Component Analysis (PCA) of DRC is performed based on the prose portions of the novel. Lastly, we discuss the implications of our experimental results for authorship attribution as well as descriptive stylistic analysis of DRC. Our authorial analysis largely confirms the findings of some previous studies that the novel has two authors. Meanwhile, stylistic analyses of the prose portions of the novel yield new and interesting results, which demonstrates that stylometric tools can be used to facilitate descriptive studies of classical Chinese literature.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2020-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1724677","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46696603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Syntactic Impairments of Chinese Alzheimer’s Disease Patients from a Language Dependency Network Perspective","authors":"Jianpeng Liu, Junhai Zhao, Xiaohui Bai","doi":"10.1080/09296174.2019.1703485","DOIUrl":"https://doi.org/10.1080/09296174.2019.1703485","url":null,"abstract":"ABSTRACT This study examined the syntactic impairments of Chinese Alzheimer’s disease patients with a dependency network approach. The dependency treebanks and dependency networks are constructed from the discourses of both the patient group and its healthy peers. By analysing the contrasts in the dependency networks of the two groups, we found that 1) the mean dependency distance (MDD) of the AD group is shorter than that of the HP group; furthermore, the MDDs of both AD and HP groups are far below the standard Chinese MDD; 2) the content words like remember, forget, know, etc. and the negative forms of the verbs like don’t know, can’t remember, can’t say, etc. show highly repetitive uncertain and negative expressions that are typical of the predicates of the clauses of AD patients; 3) the function word vertices in the AD dependency network have distinctive network parameters such as higher ‘betweenness’ centrality, closeness centrality, and clustering coefficients, etc., indicating that the syntax of AD is impaired and features more simplified stereotypes. These results indicate that the syntax of the AD group has been impaired from parts of speech to the whole syntactic structure.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2020-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1703485","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43008185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Normalized Dependency Distance: Proposing a New Measure","authors":"L. Lei, Matthew L. Jockers","doi":"10.1080/09296174.2018.1504615","DOIUrl":"https://doi.org/10.1080/09296174.2018.1504615","url":null,"abstract":"ABSTRACT Previous studies of dependency distance as a measure of, or a proxy for, syntactic complexity do not consider factors such as sentence length and root distance. In the present study, we propose a new algorithm, i.e. Normalized Dependency Distance (NDD), that takes sentence length and root distance into consideration. Our analysis showed that exponential distribution fit well the distribution model of NDD as it did with Mean Dependency Distance (MDD), the algorithm used in previous studies. Findings indicated that NDD is significantly less dependent on sentence length than MDD is, which suggests that the new algorithm may have, to some extent, addressed the issue of MDD’s dependency on sentence length. It is argued that NDD may serve as a measure of syntactic complexity, which is a kind of universality limited by the capacity of human working memory.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1504615","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42357852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functional Role of Frequency in Word-Formation Processes: A System Theoretical Approach","authors":"Inna Uglanova","doi":"10.1080/09296174.2018.1496990","DOIUrl":"https://doi.org/10.1080/09296174.2018.1496990","url":null,"abstract":"ABSTRACT Three experimental models are reported in which verb-formation processes are used to investigate the effect of frequency on language structure. The first model examined the impact of frequency on length of a language unit. Only the smoothed data for the prototypical verb formation confirmed the hypothesis. The second model tested the dependency of frequency on the depth of a word-formation structure. Good-fitting results were found for all main verb-formation structures. The third model aimed to study the influence of frequency on productivity (number of derivatives). The results of smoothing data showed that the more frequently a unit is used, the more derivatives it has. The outcomes allow clarifying some aspects of functioning of frequency in the synergetic mechanisms of language. In particular, it was shown that the observed frequency oscillation could be considered as a dialogue between the system and its environment.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1496990","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45884280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Taibu, E. Cheung, Weier Ye, S. Dehipawala, V. Shekoyan, G. Tremberger, T. Cheung
{"title":"Numerical Assessment of Orthographic Neighbourhood Size Fluctuation in Writing Using Fractal Dimension Analysis","authors":"R. Taibu, E. Cheung, Weier Ye, S. Dehipawala, V. Shekoyan, G. Tremberger, T. Cheung","doi":"10.1080/09296174.2019.1694360","DOIUrl":"https://doi.org/10.1080/09296174.2019.1694360","url":null,"abstract":"ABSTRACT The orthographic size of a targeted word, the number of new words that can be generated from a targeted word by exchanging a single letter, offers a research window where words can be transformed into numerical values. The CLEARPOND technology from Northwestern University was used for the transformation. A writing can then be modelled as a time series where the fluctuation can be further described using fractal dimension analysis. This project used the Higuchi fractal method for the computation of the fractal dimensions of time series. The proof of concept was conducted using writing examples which include Astronomy writing and English writing, the responses of Trump and Clinton in a Presidential election debate, and song lyrics. The results suggested that a high fractal dimension has an association with a high-demand cognitive task. The use of fractal dimension analysis as a writing assessment tool is discussed with relationship to the current lexical diversity computation technology.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1694360","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49612311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word Length Distribution in Zhuang Language","authors":"Aiyun Wei, Qian Lu, Haitao Liu","doi":"10.1080/09296174.2019.1678225","DOIUrl":"https://doi.org/10.1080/09296174.2019.1678225","url":null,"abstract":"ABSTRACT The present study focuses on the word length distribution (WLD) of Zhuang language. The results show that the WLDs of all texts investigated can be described by the Positive Cohen-Poisson model when the word length is measured by the syllable numbers. However, when the word length is measured by the letter numbers, they do not follow any model from the Poisson or Binomial distribution families widely observed in other languages. However, the WLDs of all the Zhuang texts investigated follow the Zipf-Alekseev function either in terms of syllable or letter numbers. Moreover, the research on the WLDs of different Zhuang genres indicates that WLD may not be a sensitive index in distinguishing different Zhuang genres but an effective one in distinguishing different Zhuang styles (spoken or written). Then, the study of the relationship between the parameters a and b in the Zipf-Alekseev function shows that the self-organizing regularity observed in other languages also exists in Zhuang. Finally, the study of the word length-frequency relationship of Zhuang indicates that Zhuang word length is influenced by its frequency, which can be explained by Zipf’s ‘Principle of Least Effort’ and thus follow the law of lexical synergetic subsystem in synergetic linguistics.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1678225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43491576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}