{"title":"Early Evidence of the Pareto Principle in Grammatical Distribution: Causative Situations in Chinese Conversational Discourse","authors":"Danjie Su","doi":"10.1353/jcl.2022.0017","DOIUrl":null,"url":null,"abstract":"ABSTRACT:This study is an initial report on Pareto distribution (the 80/20 rule) of grammatical constructions; namely, about 20% of the types of grammatical constructions for causative situations account for about 80% of the uses in conversation. I use a data-driven approach to investigate the grammatical constructions that Chinese L1 speakers choose in spontaneous talk show conversations to describe causative situations. I identify two specific Pareto distributional patterns. 1) The distribution of all 22 constructions for causative situations constitutes a Pareto ABC diagram with the A-class (ba-; unmarked passive; rang-; bei-; resultative; gei-) containing 27.3% of the types but accounting for 88.8% of all the 1,497 uses. 2) Most uses of a grammatical construction come from a small set of subtypes: The full ba-accounts for 87.9% of all ba-uses; the reduced bei-accounts for 86.8%; 37.5% of rang-subtypes account for 84.2%. These patterns can be explained by the Lens concept. I conclude that a few constructions account for most grammatical choices of L1 Chinese speakers in conversation. Understanding these grammatical distributions in natural discourse can improve the efficiency and efficacy of language teaching and Natural Language Processing (NLP).摘要:本研究是关于自然会话中语法构式的帕累托(Pareto)分布(二八法则) 的第一份报告——大约 20%的语法构式类型占表述致使情景的所有实 际用例的 80%。基于脱口秀自然会话语料,本文使用数据驱动的方法 穷尽式地探究汉语母语者选择何种语法构式表述会话中的致使情景。 本文关于帕累托分布的具体发现是:(一)会话中表述致使情景的所有 22 种汉语语法构式的分布反映了帕累托原理及其 ABC 等级分布。A 级 的构式类型数量为 22 种构式类型的 27.3%,却占到所有 1,497 条用例 的 88.8%。A 级包括的最高频构式依次是:把字句、无标记被动句、 让字句、被字句、结果补语、给字句。B 级的构式类型数量同样占 27.3%,却仅占所有用例的 8.9%。C 级的构式类型数量占了近一半 (45.5%),却只占所有用例的 2.3%。(二)语法构式的大多数用例来自 个别子类型:完整版把字句占所有把字句用例的 87.9%;减短版被字 句占所有被字句用例的 86.8%;37.5%的让字句类型占所有让字句用 例的 84.2%。Lens 理论可以解释这些分布规律。本文结论是,汉语母 语者在自然会话中选用少数构式类型来表述绝大部分致使情景。该发 现进一步揭示了自然话语中语法构式的分布,这对语言教学和自然语 言处理具有直接参考价值。","PeriodicalId":44675,"journal":{"name":"Journal of Chinese Linguistics","volume":null,"pages":null},"PeriodicalIF":0.2000,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chinese Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1353/jcl.2022.0017","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ASIAN STUDIES","Score":null,"Total":0}
引用次数: 0
Abstract
ABSTRACT:This study is an initial report on Pareto distribution (the 80/20 rule) of grammatical constructions; namely, about 20% of the types of grammatical constructions for causative situations account for about 80% of the uses in conversation. I use a data-driven approach to investigate the grammatical constructions that Chinese L1 speakers choose in spontaneous talk show conversations to describe causative situations. I identify two specific Pareto distributional patterns. 1) The distribution of all 22 constructions for causative situations constitutes a Pareto ABC diagram with the A-class (ba-; unmarked passive; rang-; bei-; resultative; gei-) containing 27.3% of the types but accounting for 88.8% of all the 1,497 uses. 2) Most uses of a grammatical construction come from a small set of subtypes: The full ba-accounts for 87.9% of all ba-uses; the reduced bei-accounts for 86.8%; 37.5% of rang-subtypes account for 84.2%. These patterns can be explained by the Lens concept. I conclude that a few constructions account for most grammatical choices of L1 Chinese speakers in conversation. Understanding these grammatical distributions in natural discourse can improve the efficiency and efficacy of language teaching and Natural Language Processing (NLP).摘要:本研究是关于自然会话中语法构式的帕累托(Pareto)分布(二八法则) 的第一份报告——大约 20%的语法构式类型占表述致使情景的所有实 际用例的 80%。基于脱口秀自然会话语料,本文使用数据驱动的方法 穷尽式地探究汉语母语者选择何种语法构式表述会话中的致使情景。 本文关于帕累托分布的具体发现是:(一)会话中表述致使情景的所有 22 种汉语语法构式的分布反映了帕累托原理及其 ABC 等级分布。A 级 的构式类型数量为 22 种构式类型的 27.3%,却占到所有 1,497 条用例 的 88.8%。A 级包括的最高频构式依次是:把字句、无标记被动句、 让字句、被字句、结果补语、给字句。B 级的构式类型数量同样占 27.3%,却仅占所有用例的 8.9%。C 级的构式类型数量占了近一半 (45.5%),却只占所有用例的 2.3%。(二)语法构式的大多数用例来自 个别子类型:完整版把字句占所有把字句用例的 87.9%;减短版被字 句占所有被字句用例的 86.8%;37.5%的让字句类型占所有让字句用 例的 84.2%。Lens 理论可以解释这些分布规律。本文结论是,汉语母 语者在自然会话中选用少数构式类型来表述绝大部分致使情景。该发 现进一步揭示了自然话语中语法构式的分布,这对语言教学和自然语 言处理具有直接参考价值。
期刊介绍:
Journal of Chinese Linguistics (JCL) is an academic journal, which comprises research content from both general linguistics and Chinese linguistics. It is edited by a distinguished editorial board of international expertise. There are two publications: Journal of Chinese Linguistics (JCL) and Journal of Chinese Linguistics Monograph Series (JCLMS).