Subword symmetry in natural languages.

IF 2.9 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Royal Society Open Science Pub Date : 2025-08-21 eCollection Date: 2025-08-01 DOI:10.1098/rsos.250295
Olga Pelloni, Rob van der Goot, Peter Ranacher, Ivan Vulic, Tanja Samardzic
{"title":"Subword symmetry in natural languages.","authors":"Olga Pelloni, Rob van der Goot, Peter Ranacher, Ivan Vulic, Tanja Samardzic","doi":"10.1098/rsos.250295","DOIUrl":null,"url":null,"abstract":"<p><p>Symmetric patterns are found in the orderly arrangements of natural structures, from proteins to the symmetry in animals' bodies. Symmetric structures are more stable and easier to describe and compress, which is why they may have been preferred as building blocks in natural selection. The idea that natural languages undergo an evolutionary process akin to the evolution of species has been pervasive in the study of language. This process might result in symmetric patterns as in other natural structures, but the notion of symmetry is rarely associated with the study of natural language. In this study, we look for symmetric patterns in text data, considering the length of subword units under a range of possible subword analyses. We study the length of subword units in 32 languages and discover that the splits of long words tend to be symmetric regardless of the segmentation method and that some automatic methods give symmetric splits at all word lengths. These results include natural language in the set of phenomena that can be described in terms of symmetry, opening a new research avenue for the empirical study of text data as a structure comparable to various other structures in the natural world.</p>","PeriodicalId":21525,"journal":{"name":"Royal Society Open Science","volume":"12 8","pages":"250295"},"PeriodicalIF":2.9000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370235/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Royal Society Open Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsos.250295","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Symmetric patterns are found in the orderly arrangements of natural structures, from proteins to the symmetry in animals' bodies. Symmetric structures are more stable and easier to describe and compress, which is why they may have been preferred as building blocks in natural selection. The idea that natural languages undergo an evolutionary process akin to the evolution of species has been pervasive in the study of language. This process might result in symmetric patterns as in other natural structures, but the notion of symmetry is rarely associated with the study of natural language. In this study, we look for symmetric patterns in text data, considering the length of subword units under a range of possible subword analyses. We study the length of subword units in 32 languages and discover that the splits of long words tend to be symmetric regardless of the segmentation method and that some automatic methods give symmetric splits at all word lengths. These results include natural language in the set of phenomena that can be described in terms of symmetry, opening a new research avenue for the empirical study of text data as a structure comparable to various other structures in the natural world.

Abstract Image

Abstract Image

Abstract Image

自然语言中的子词对称。
从蛋白质到动物身体的对称性,在自然结构的有序排列中都能找到对称的模式。对称结构更稳定,更容易描述和压缩,这就是为什么它们可能在自然选择中被首选为构建模块的原因。自然语言经历了一个类似于物种进化的进化过程,这一观点在语言研究中一直很普遍。这个过程可能会像其他自然结构一样产生对称模式,但对称的概念很少与自然语言的研究联系在一起。在这项研究中,我们在文本数据中寻找对称模式,考虑在一系列可能的子词分析下子词单位的长度。我们研究了32种语言的子词单元长度,发现无论采用何种分词方法,长词的分词都趋向于对称,并且有些自动分词方法在所有词长度下都给出对称的分词。这些结果包括自然语言在一组可以用对称性来描述的现象中,为文本数据作为与自然世界中各种其他结构相比较的结构的实证研究开辟了新的研究途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Royal Society Open Science
Royal Society Open Science Multidisciplinary-Multidisciplinary
CiteScore
6.00
自引率
0.00%
发文量
508
审稿时长
14 weeks
期刊介绍: Royal Society Open Science is a new open journal publishing high-quality original research across the entire range of science on the basis of objective peer-review. The journal covers the entire range of science and mathematics and will allow the Society to publish all the high-quality work it receives without the usual restrictions on scope, length or impact.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信