文学风格异质性的贝叶斯分析

Q3 Mathematics
Martí Font, X. Puig, J. Ginebra
{"title":"文学风格异质性的贝叶斯分析","authors":"Martí Font, X. Puig, J. Ginebra","doi":"10.15446/RCE.V39N2.50151","DOIUrl":null,"url":null,"abstract":"We proposed statistical analysis of the heterogeneity of literary style in a set of texts that simultaneously use different stylometric characteristics, like word length and the frequency of function words. The data set consists of several tables with the same number of rows, with the i-th row of all tables corresponding to the i-th text. The analysis proposed clusters the rows of all these tables simultaneously into groups with homogeneous style, based on a finite mixture of sets of multinomial models, one set for each table.  Different from the usual heuristic cluster analysis approaches, our method naturally incorporates the text size, the discrete nature of the data, and the dependence between categories in the analysis. The model is checked and chosen with the help of posterior predictive checks, together with the use of closed form expressions for the posterior probabilities that each of the models considered to be appropriate. This is illustrated through an analysis of the heterogeneity in Shakespeare’s plays, and by revisiting the authorshipattribution problem of Tirant lo Blanc .","PeriodicalId":54477,"journal":{"name":"Revista Colombiana De Estadistica","volume":"96 1","pages":"205-227"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Bayesian Analysis of the Heterogeneity of Literary Style\",\"authors\":\"Martí Font, X. Puig, J. Ginebra\",\"doi\":\"10.15446/RCE.V39N2.50151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We proposed statistical analysis of the heterogeneity of literary style in a set of texts that simultaneously use different stylometric characteristics, like word length and the frequency of function words. The data set consists of several tables with the same number of rows, with the i-th row of all tables corresponding to the i-th text. The analysis proposed clusters the rows of all these tables simultaneously into groups with homogeneous style, based on a finite mixture of sets of multinomial models, one set for each table.  Different from the usual heuristic cluster analysis approaches, our method naturally incorporates the text size, the discrete nature of the data, and the dependence between categories in the analysis. The model is checked and chosen with the help of posterior predictive checks, together with the use of closed form expressions for the posterior probabilities that each of the models considered to be appropriate. This is illustrated through an analysis of the heterogeneity in Shakespeare’s plays, and by revisiting the authorshipattribution problem of Tirant lo Blanc .\",\"PeriodicalId\":54477,\"journal\":{\"name\":\"Revista Colombiana De Estadistica\",\"volume\":\"96 1\",\"pages\":\"205-227\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Revista Colombiana De Estadistica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15446/RCE.V39N2.50151\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista Colombiana De Estadistica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15446/RCE.V39N2.50151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 1

摘要

我们提出了一组同时使用不同文体特征(如单词长度和虚词频率)的文本中文学风格异质性的统计分析。数据集由几个行数相同的表组成,所有表的第i行对应第i个文本。该分析建议将所有这些表的行同时聚类成具有同质风格的组,基于多项式模型集合的有限混合,每个表一个集合。与通常的启发式聚类分析方法不同,我们的方法自然地结合了文本大小、数据的离散性以及分析中类别之间的依赖性。通过后验预测检查来检查和选择模型,并使用每个模型认为合适的后验概率的封闭形式表达式。这是通过分析莎士比亚戏剧的异质性,并通过重新审视《勃朗峰》的作者归属问题来说明的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Bayesian Analysis of the Heterogeneity of Literary Style
We proposed statistical analysis of the heterogeneity of literary style in a set of texts that simultaneously use different stylometric characteristics, like word length and the frequency of function words. The data set consists of several tables with the same number of rows, with the i-th row of all tables corresponding to the i-th text. The analysis proposed clusters the rows of all these tables simultaneously into groups with homogeneous style, based on a finite mixture of sets of multinomial models, one set for each table.  Different from the usual heuristic cluster analysis approaches, our method naturally incorporates the text size, the discrete nature of the data, and the dependence between categories in the analysis. The model is checked and chosen with the help of posterior predictive checks, together with the use of closed form expressions for the posterior probabilities that each of the models considered to be appropriate. This is illustrated through an analysis of the heterogeneity in Shakespeare’s plays, and by revisiting the authorshipattribution problem of Tirant lo Blanc .
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Revista Colombiana De Estadistica
Revista Colombiana De Estadistica STATISTICS & PROBABILITY-
CiteScore
1.20
自引率
0.00%
发文量
0
审稿时长
>12 weeks
期刊介绍: The Colombian Journal of Statistics publishes original articles of theoretical, methodological and educational kind in any branch of Statistics. Purely theoretical papers should include illustration of the techniques presented with real data or at least simulation experiments in order to verify the usefulness of the contents presented. Informative articles of high quality methodologies or statistical techniques applied in different fields of knowledge are also considered. Only articles in English language are considered for publication. The Editorial Committee assumes that the works submitted for evaluation have not been previously published and are not being given simultaneously for publication elsewhere, and will not be without prior consent of the Committee, unless, as a result of the assessment, decides not publish in the journal. It is further assumed that when the authors deliver a document for publication in the Colombian Journal of Statistics, they know the above conditions and agree with them.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信