{"title":"Calculation of Phonetic Distances between Speech Sounds","authors":"M. Vakulenko","doi":"10.1080/09296174.2019.1678709","DOIUrl":"https://doi.org/10.1080/09296174.2019.1678709","url":null,"abstract":"ABSTRACT A new formalism to numerically measure phonetic differences between speech sounds treating feature values of the compared phones as independent parameters that give rise to corresponding Euclidean distances is put forward. The articulatory and acoustic methods within this formalism were compared, where the corresponding results display good agreement. The more reliable and more universal character of the acoustic approach is provided by robust and precise acoustic parameters used therein. The theoretical model and the findings of this article comply also with the experimental phonetic results. The proposed approach contributes to formalization of the procedure of phone comparison and mapping needed for automatic text and speech processing.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1678709","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45277965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing χ2 Tables for Separability of Distribution and Effect: Meta-Tests for Comparing Homogeneity and Goodness of Fit Contingency Test Outcomes","authors":"S. Wallis","doi":"10.1080/09296174.2018.1496537","DOIUrl":"https://doi.org/10.1080/09296174.2018.1496537","url":null,"abstract":"ABSTRACT This paper describes a series of statistical meta-tests for comparing independent contingency tables for different types of significant difference. Recognizing when an experiment obtains a significantly different result and when it does not is frequently overlooked in research publication. Papers are frequently published citing ‘p values’ or test scores suggesting a ‘stronger effect’ substituting for sound statistical reasoning. This paper sets out a series of tests that together illustrate the correct approach to this question. These meta-tests permit us to evaluate whether experiments have failed to replicate on new data; whether a particular data source or subcorpus obtains a significantly different result than another; or whether changing experimental parameters obtains a stronger effect. The meta-tests are derived mathematically from the χ2 test and the Wilson score interval, and consist of pairwise ‘point’ tests, ‘homogeneity’ tests and ‘goodness of fit’ tests. Meta-tests for comparing tests with one degree of freedom (e.g. ‘2 × 1ʹ and ‘2 × 2ʹ tests) are generalized to those of arbitrary size. Finally, we compare our approach with a competing approach offered by Zar, which, while straightforward to calculate, turns out to be both less powerful and less robust. (Note: A spreadsheet including all the tests in this paper is publicly available at www.ucl.ac.uk/english-usage/statspapers/2x2-x2-separability.xls.)","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1496537","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44599559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Discriminativeness of Internal Syntactic Representations in Automatic Genre Classification","authors":"Mingyu Wan, A. Fang, Chu-Ren Huang","doi":"10.1080/09296174.2019.1663655","DOIUrl":"https://doi.org/10.1080/09296174.2019.1663655","url":null,"abstract":"ABSTRACT Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work hypothesizes a close interaction between syntactic variation and genre differentiation by introspecting stylistic cues in functional and structural aspects beyond word level. It has engineered 14 syntactic feature sets of internal representations for genre classification through Machine Learning devices. Experiment results show significant superiority of fusing structural and lexical features for genre classification (F∆max. = 9.2%, sig. = 0.001), suggesting the effectiveness of incorporating syntactic cues for genre discrimination. In addition, the PCA analysis reports the noun phrases (NP) as the most principle component (66%) for genre variation and prepositional phrases (PP) the second. Particularly, noun phrases with dominant structures of prepositional complements and pronouns functioning as a subject are most effective for identifying printed texts of high formality, while prepositional phrases are useful for identifying speeches of low formality. Error analysis suggests that the phrasal features are particularly useful for classifying four groups of genre classes, i.e. unscripted speech, fiction, news reports, and academic writing, all distributed with distinct structural characteristics, and they demonstrate an incremental degree of formality in the continuum of language complexity.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1663655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43498085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word Length Distribution in German Texts during the 17th-19th Century","authors":"Fei Lian, Y. Li","doi":"10.1080/09296174.2019.1662536","DOIUrl":"https://doi.org/10.1080/09296174.2019.1662536","url":null,"abstract":"ABSTRACT Word length in German texts has been a frequently discussed issue in the field of quantitative linguistics. Taking an overall view of the existing research data, however, most of the research focuses on literary texts and private letters and the size of data corpus for each research is relatively small. This paper provides a time- and genre-based analysis of word length distribution in German using 360 texts originated between the 17th and 19th centuries, aiming to find a probability distribution that can capture well the German word length distribution from a diachronic perspective and to reveal the relationship between the word length distribution and boundary conditions such as the genre and the creation time of text. Results indicate that the word length distribution in German texts written in different eras abides by the 1-displaced hyper-Poisson distribution, whose parameters (a, b) are interconnected with boundary conditions. This study corroborates that the word length distribution of a certain language is consistent, due to the constraint of the cognitive mechanism. Besides, the parameters of probability distribution can be good indicators of the writing style as well as the creation time of text.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1662536","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42201015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistics in Corpus Linguistics: A Practical Guide","authors":"Cunxin Han","doi":"10.1080/09296174.2019.1646069","DOIUrl":"https://doi.org/10.1080/09296174.2019.1646069","url":null,"abstract":"Corpus linguistics is a powerful quantitative methodology, which heavily relies on frequency data and statistical procedures. It is difficult to talk about corpus linguistics without mentioning sta...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1646069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46755901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Coding and the Origins of Zipfian Laws","authors":"R. Ferrer-i-Cancho, C. Bentz","doi":"10.1080/09296174.2020.1778387","DOIUrl":"https://doi.org/10.1080/09296174.2020.1778387","url":null,"abstract":"ABSTRACT The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1778387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47778723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correlations and Potential Cross-Linguistic Indicators of Writing Style","authors":"P. Juola, George K. Mikros, Sean Vinsick","doi":"10.1080/09296174.2018.1458395","DOIUrl":"https://doi.org/10.1080/09296174.2018.1458395","url":null,"abstract":"Abstract In this paper, we present preliminary results on how an individual’s writing style persists even across languages. In other words, what aspects of an individual’s writing will persist irrespective of the language in which he or she writes? We argue that cognitive and social traits are likely to persist and demonstrate this by two separate analyses of bilingual corpora using the same individuals. We show that for various measures of linguistic complexity (which we consider to be a cognitive variable) and participation in specific social conventions (a social one), the correlation between scores on the two languages studied is significantly higher than would be expected by chance. We argue that this type of correlation may permit cross-linguistic authorship attribution.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1458395","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46316258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frequency Effect and Neutralization of Tones in Mandarin Chinese","authors":"Huifang Kong, Shengyi Wu","doi":"10.1080/09296174.2018.1452140","DOIUrl":"https://doi.org/10.1080/09296174.2018.1452140","url":null,"abstract":"Abstract Tonal neutralization in Mandarin has long been thought to be connected with lexical frequency. But this has never been investigated quantitatively because of the methodological challenge. In this study, a production experiment was run with speakers reading disyllabic words in neutral tones with frequency estimates derived from a Frequency Dictionary. The dependent measures were the three acoustic correlates of: duration, F0 contour and intensity. Independent measures included the lexical frequency at three levels (low, middle and high). Regression analysis showed that neutralization of tones are directly correlated with lexical frequency independent of other factors. A regularity, the more frequent, the shorter in duration; the more frequent, the lower in pitch; the more frequent, the weaker in intensity governs the neutralization of tones in reduced syllables. However, the exact shape of such an effect displays a different scenario in a different frequency range. Only high frequency words display a significant difference from low frequency words. Last but not the least, an exemplar representation is proposed to express a neutral tone’s observed frequency effect naturally.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1452140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45619077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Calculation of Semantic Distances Between Words: From Synonymy to Antonymy","authors":"M. Vakulenko","doi":"10.1080/09296174.2018.1452524","DOIUrl":"https://doi.org/10.1080/09296174.2018.1452524","url":null,"abstract":"Abstract A new approach to numerically measure the semantic distances between lexical units (words and collocations) based on the geometric analogies and analytical calculation, is put forward. Having considered the cases of equal and different weights of semes, we obtained exact algebraic formulas describing different levels of the meanings proximity, ranging from absolute synonymy to full antonymy. It was emphasized that absolute synonymy arises when the compared units contain equal numbers of semes that fully coincide and have equal weights in the corresponding pairs. Calculation of the semes weights helps to locate the unit more precisely on the semantic sphere. It was shown that the level of synonymy and antonymy decreases if different semes are accentuated, while the semantic distance between the units without identical semes cannot be influenced by seme boosting. It was observed that depending on the context, a word may wander over this sphere, thus modifying its lexical semantic relations with other units. As the proposed approach contributes to formalization of the units comparison procedure, it is advisable for incorporation into relevant automatic tools, particularly into WordNet and FrameNet. The obtained results may be useful for various linguistic and associated studies including automatic text analysis and processing, computer lexicography, information search and retrieval, machine translation and other NLP applications that are related to the artificial intelligence problem.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1452524","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47278957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}