Information-theoretic measures for mapping regularities between orthography and phonology: A comprehensive quantification and validation in the Chinese writing system.

IF 3.9 2区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL
Zhe Xiao, Huimin Xiao, Caihua Xu
{"title":"Information-theoretic measures for mapping regularities between orthography and phonology: A comprehensive quantification and validation in the Chinese writing system.","authors":"Zhe Xiao, Huimin Xiao, Caihua Xu","doi":"10.3758/s13428-025-02721-3","DOIUrl":null,"url":null,"abstract":"<p><p>Information theory has been widely applied to quantify mapping regularities between orthography and phonology in alphabetic writing systems. However, their applicability to the Chinese writing system-marked by distinct mapping characteristics-remains underexplored. This study presents a comprehensive quantification of mapping regularities in the Chinese writing system using information-theoretic measures and validates their effectiveness. We first compute three core measures-entropy, surprisal, and information gain-across multiple dimensions: mapping direction (orthography-to-phonology vs. phonology-to-orthography), frequency type (type vs. token frequency), and grain size (tonal vs. base syllable; direct vs. fundamental phonetic radical), followed by an assessment of overall system uncertainty using entropy. Second, we demonstrate the ability of information-theoretic measures to predict Chinese reading performance using a large-scale Chinese naming dataset. Third, we show that these measures capture unique aspects of character variability in naming performance not accounted for by traditional measures (i.e., regularity and consistency). Finally, by comparing information-theoretic measures across different mapping directions, frequency data types, and grain sizes, we highlight their flexibility in capturing unique aspects of character variability in naming performance. Critically, these findings hold across both a self-compiled database and an external corpus featuring a substantially larger token pool, underscoring the robustness and generalizability of these measures. In sum, we emphasize the effectiveness and adaptability of information-theoretic measures in capturing mapping regularities in a writing system that is notably distinct from alphabetic systems, and we discuss their promising applications for advancing psychological and linguistic research.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 9","pages":"232"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02721-3","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Information theory has been widely applied to quantify mapping regularities between orthography and phonology in alphabetic writing systems. However, their applicability to the Chinese writing system-marked by distinct mapping characteristics-remains underexplored. This study presents a comprehensive quantification of mapping regularities in the Chinese writing system using information-theoretic measures and validates their effectiveness. We first compute three core measures-entropy, surprisal, and information gain-across multiple dimensions: mapping direction (orthography-to-phonology vs. phonology-to-orthography), frequency type (type vs. token frequency), and grain size (tonal vs. base syllable; direct vs. fundamental phonetic radical), followed by an assessment of overall system uncertainty using entropy. Second, we demonstrate the ability of information-theoretic measures to predict Chinese reading performance using a large-scale Chinese naming dataset. Third, we show that these measures capture unique aspects of character variability in naming performance not accounted for by traditional measures (i.e., regularity and consistency). Finally, by comparing information-theoretic measures across different mapping directions, frequency data types, and grain sizes, we highlight their flexibility in capturing unique aspects of character variability in naming performance. Critically, these findings hold across both a self-compiled database and an external corpus featuring a substantially larger token pool, underscoring the robustness and generalizability of these measures. In sum, we emphasize the effectiveness and adaptability of information-theoretic measures in capturing mapping regularities in a writing system that is notably distinct from alphabetic systems, and we discuss their promising applications for advancing psychological and linguistic research.

正字法与音系映射规律的信息论测度:汉语书写系统的综合量化与验证。
信息论已被广泛应用于量化字母书写系统中正字法和音系之间的映射规律。然而,它们对中国书写系统的适用性——以鲜明的映射特征为标志——仍未得到充分探索。本文运用信息论的方法对汉字系统的映射规律进行了全面的量化,并对其有效性进行了验证。我们首先在多个维度上计算三个核心度量——熵、惊喜和信息增益:映射方向(正字法到音系vs音系到正字法)、频率类型(类型vs标记频率)和粒度(音调vs基音节;直接vs基本语音自由基),然后使用熵对整个系统的不确定性进行评估。其次,我们使用大规模中文命名数据集证明了信息论方法预测中文阅读表现的能力。第三,我们表明这些措施捕捉了传统措施(即规律性和一致性)无法解释的命名性能中字符可变性的独特方面。最后,通过比较不同映射方向、频率数据类型和粒度的信息理论度量,我们强调了它们在捕获命名性能中字符可变性的独特方面的灵活性。至关重要的是,这些发现适用于自编译数据库和具有更大令牌池的外部语料库,强调了这些措施的稳健性和通用性。总之,我们强调了信息论方法在捕捉与字母系统明显不同的书写系统中的映射规律方面的有效性和适应性,并讨论了它们在推进心理学和语言学研究方面的应用前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.30
自引率
9.30%
发文量
266
期刊介绍: Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信