On the physical origin of linguistic laws and lognormality in speech.

IF 2.9 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Royal Society Open Science Pub Date : 2019-08-21 eCollection Date: 2019-08-01 DOI:10.1098/rsos.191023
Iván G Torre, Bartolo Luque, Lucas Lacasa, Christopher T Kello, Antoni Hernández-Fernández
{"title":"On the physical origin of linguistic laws and lognormality in speech.","authors":"Iván G Torre,&nbsp;Bartolo Luque,&nbsp;Lucas Lacasa,&nbsp;Christopher T Kello,&nbsp;Antoni Hernández-Fernández","doi":"10.1098/rsos.191023","DOIUrl":null,"url":null,"abstract":"<p><p>Physical manifestations of linguistic units include sources of variability due to factors of speech production which are by definition excluded from counts of linguistic symbols. In this work, we examine whether linguistic laws hold with respect to the physical manifestations of linguistic units in spoken English. The data we analyse come from a phonetically transcribed database of acoustic recordings of spontaneous speech known as the Buckeye Speech corpus. First, we verify with unprecedented accuracy that acoustically transcribed durations of linguistic units at several scales comply with a lognormal distribution, and we quantitatively justify this 'lognormality law' using a stochastic generative model. Second, we explore the four classical linguistic laws (Zipf's Law, Herdan's Law, Brevity Law and Menzerath-Altmann's Law (MAL)) in oral communication, both in physical units and in symbolic units measured in the speech transcriptions, and find that the validity of these laws is typically stronger when using physical units than in their symbolic counterpart. Additional results include (i) coining a Herdan's Law in physical units, (ii) a precise mathematical formulation of Brevity Law, which we show to be connected to optimal compression principles in information theory and allows to formulate and validate yet another law which we call the size-rank law or (iii) a mathematical derivation of MAL which also highlights an additional regime where the law is inverted. Altogether, these results support the hypothesis that statistical laws in language have a physical origin.</p>","PeriodicalId":21525,"journal":{"name":"Royal Society Open Science","volume":"6 8","pages":"191023"},"PeriodicalIF":2.9000,"publicationDate":"2019-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1098/rsos.191023","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Royal Society Open Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsos.191023","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/8/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 37

Abstract

Physical manifestations of linguistic units include sources of variability due to factors of speech production which are by definition excluded from counts of linguistic symbols. In this work, we examine whether linguistic laws hold with respect to the physical manifestations of linguistic units in spoken English. The data we analyse come from a phonetically transcribed database of acoustic recordings of spontaneous speech known as the Buckeye Speech corpus. First, we verify with unprecedented accuracy that acoustically transcribed durations of linguistic units at several scales comply with a lognormal distribution, and we quantitatively justify this 'lognormality law' using a stochastic generative model. Second, we explore the four classical linguistic laws (Zipf's Law, Herdan's Law, Brevity Law and Menzerath-Altmann's Law (MAL)) in oral communication, both in physical units and in symbolic units measured in the speech transcriptions, and find that the validity of these laws is typically stronger when using physical units than in their symbolic counterpart. Additional results include (i) coining a Herdan's Law in physical units, (ii) a precise mathematical formulation of Brevity Law, which we show to be connected to optimal compression principles in information theory and allows to formulate and validate yet another law which we call the size-rank law or (iii) a mathematical derivation of MAL which also highlights an additional regime where the law is inverted. Altogether, these results support the hypothesis that statistical laws in language have a physical origin.

Abstract Image

Abstract Image

Abstract Image

语言规律的物理起源与言语的逻辑规范。
语言单位的物理表现包括由于言语产生的因素而产生的变异源,这些因素根据定义被排除在语言符号的计数之外。在这项工作中,我们考察了语言规律是否适用于口语中语言单位的物理表现。我们分析的数据来自一个被称为七叶树语音语料库的自发语音录音的语音转录数据库。首先,我们以前所未有的准确性验证了语言单元在几个尺度上的声学转录持续时间符合对数正态分布,并使用随机生成模型定量证明了这一“对数正态定律”。其次,我们探讨了口语交际中的四个经典语言定律(齐普夫定律、赫丹定律、简洁定律和门泽拉特·阿尔特曼定律),无论是在物理单位还是在语音转录中测量的符号单位中,并发现当使用物理单位时,这些定律的有效性通常比使用符号单位时更强。额外的结果包括(i)以物理单位创造赫丹定律、(ii)简洁定律的精确数学公式、,我们证明它与信息论中的最优压缩原理有关,并允许制定和验证另一个定律,我们称之为大小秩定律或(iii)MAL的数学推导。总之,这些结果支持了一个假设,即语言中的统计规律有物理起源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Royal Society Open Science
Royal Society Open Science Multidisciplinary-Multidisciplinary
CiteScore
6.00
自引率
0.00%
发文量
508
审稿时长
14 weeks
期刊介绍: Royal Society Open Science is a new open journal publishing high-quality original research across the entire range of science on the basis of objective peer-review. The journal covers the entire range of science and mathematics and will allow the Society to publish all the high-quality work it receives without the usual restrictions on scope, length or impact.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信