Workshop on Computational Humanities Research最新文献

筛选
英文 中文
Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches 大量文本历史文献的页面布局分析:文本与视觉方法的比较
Workshop on Computational Humanities Research Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.13924
Sven Najem-Meyer, Matteo Romanello
{"title":"Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches","authors":"Sven Najem-Meyer, Matteo Romanello","doi":"10.48550/arXiv.2212.13924","DOIUrl":"https://doi.org/10.48550/arXiv.2212.13924","url":null,"abstract":"Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest. With highly complex layouts and mixed scripts, scholarly commentaries are text-heavy documents which remain challenging for state-of-the-art models. Their layout considerably varies across editions and their most important regions are mainly defined by semantic rather than graphical characteristics such as position or appearance. This setting calls for a comparison between textual, visual and hybrid approaches. We therefore assess the performances of two transformers (LayoutLMv3 and RoBERTa) and an objection-detection network (YOLOv5). If results show a clear advantage in favor of the latter, we also list several caveats to this finding. In addition to our experiments, we release a dataset of ca. 300 annotated pages sampled from 19th century commentaries.","PeriodicalId":191971,"journal":{"name":"Workshop on Computational Humanities Research","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133199820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Boosting Word Frequencies in Authorship Attribution 在作者归属中提高词频
Workshop on Computational Humanities Research Pub Date : 2022-11-03 DOI: 10.48550/arXiv.2211.01289
Maciej Eder
{"title":"Boosting Word Frequencies in Authorship Attribution","authors":"Maciej Eder","doi":"10.48550/arXiv.2211.01289","DOIUrl":"https://doi.org/10.48550/arXiv.2211.01289","url":null,"abstract":"In this paper, I introduce a simple method of computing relative word frequencies for authorship attribution and similar stylometric tasks. Rather than computing relative frequencies as the number of occurrences of a given word divided by the total number of tokens in a text, I argue that a more efficient normalization factor is the total number of relevant tokens only. The notion of relevant words includes synonyms and, usually, a few dozen other words in some ways semantically similar to a word in question. To determine such a semantic background, one of word embedding models can be used. The proposed method outperforms classical most-frequent-word approaches substantially, usually by a few percentage points depending on the input settings.","PeriodicalId":191971,"journal":{"name":"Workshop on Computational Humanities Research","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125944291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reviewer Preferences and Gender Disparities in Aesthetic Judgments 审稿人偏好与审美判断中的性别差异
Workshop on Computational Humanities Research Pub Date : 2022-06-17 DOI: 10.48550/arXiv.2206.08697
I. Lassen, Yuri Bizzoni, Telam Peura, M. Thomsen, K. Nielbo
{"title":"Reviewer Preferences and Gender Disparities in Aesthetic Judgments","authors":"I. Lassen, Yuri Bizzoni, Telam Peura, M. Thomsen, K. Nielbo","doi":"10.48550/arXiv.2206.08697","DOIUrl":"https://doi.org/10.48550/arXiv.2206.08697","url":null,"abstract":"Aesthetic preferences are considered highly subjective resulting in inherently noisy judgements of aesthetic objects, yet certain aspects of aesthetic judgement display convergent trends over time. This paper present a study that uses literary reviews as a proxy for aesthetic judgement in order to identify systematic components that can be attributed to bias. Specifically we find that judgement of literary quality in newspapers displays a gender bias in preference of male writers. Male reviewers have a same gender preference while female reviewer show an opposite gender preference. While alternative accounts exist of this apparent gender disparity, we argue that it reflects a cultural gender antagonism.","PeriodicalId":191971,"journal":{"name":"Workshop on Computational Humanities Research","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115079091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Measuring the Acceleration of the Social Construction of Time using the BOE (Boletin Oficial del Estado) 用BOE测量时间的社会建构加速
Workshop on Computational Humanities Research Pub Date : 2020-11-03 DOI: 10.5281/ZENODO.4357663
E. Fernández, Mirco Schönfeld, J. Pfeffer
{"title":"Measuring the Acceleration of the Social Construction of Time using the BOE (Boletin Oficial del Estado)","authors":"E. Fernández, Mirco Schönfeld, J. Pfeffer","doi":"10.5281/ZENODO.4357663","DOIUrl":"https://doi.org/10.5281/ZENODO.4357663","url":null,"abstract":"The Practice of Conceptual History, by Reinhart Koselleck, explores the idea that there is a direct relationship between technological advancements and an acceleration in the social construction of time. This paper will quantify this theory by measuring information density and information variety of narratives in a BOE (Boletín Oficial del Estado) dataset of thirty years (1988-2018). Using Quantitative Narrative Analysis, we will define a narrative unit as a triplet of Subject, Verb, Object (SVO), and we will define information density (ID) as the ratio of narrative units per words per year. Afterwards, we will quantify the different contexts of narratives to measure information variety (IV) by constructing a network of semantic closeness from trained word embeddings. This paper will present an increased IV and ID over the observation time, indicating more and more facts being reported. The results will show evidence of an acceleration of the social construction of time.","PeriodicalId":191971,"journal":{"name":"Workshop on Computational Humanities Research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122827680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cultural Accumulation and Improvement in Online Fan Fiction 网络同人小说的文化积淀与完善
Workshop on Computational Humanities Research Pub Date : 2020-09-24 DOI: 10.31219/osf.io/4wjnm
Federico Pianzola, Alberto Acerbi, S. Rebora
{"title":"Cultural Accumulation and Improvement in Online Fan Fiction","authors":"Federico Pianzola, Alberto Acerbi, S. Rebora","doi":"10.31219/osf.io/4wjnm","DOIUrl":"https://doi.org/10.31219/osf.io/4wjnm","url":null,"abstract":"We analyse stories in Harry Potter fan fiction published on Archive of Our Own (AO3), using concepts from cultural evolution. In particular, we focus on cumulative cultural evolution, that is, the idea that cultural systems improve with time, drawing on previous innovations. In this study we examine two features of cumulative culture: accumulation and improvement. First, we show that stories in Harry Potter’s fan fiction accumulate cultural traits—unique tags, in our analysis—through time, both globally and at the level of single stories. Second, more recent stories are also liked more by readers than earlier stories. Our research illustrates the potential of the combination of cultural evolution theory and digital literary studies, and it paves the way for the study of the effects of online digital media on cultural cumulation.","PeriodicalId":191971,"journal":{"name":"Workshop on Computational Humanities Research","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124489937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward a Thermodynamics of Meaning 走向意义的热力学
Workshop on Computational Humanities Research Pub Date : 2020-09-24 DOI: 10.5281/ZENODO.4302259
Jonathan Scott Enderle
{"title":"Toward a Thermodynamics of Meaning","authors":"Jonathan Scott Enderle","doi":"10.5281/ZENODO.4302259","DOIUrl":"https://doi.org/10.5281/ZENODO.4302259","url":null,"abstract":"As language models such as GPT-3 become increasingly successful at generating realistic text, questions about what purely text-based modeling can learn about the world have become more urgent. Is text purely syntactic, as skeptics argue? Or does it in fact contain some semantic information that a sufficiently sophisticated language model could use to learn about the world without any additional inputs? This paper describes a new model that suggests some qualified answers to those questions. By theorizing the relationship between text and the world it describes as an equilibrium relationship between a thermodynamic system and a much larger reservoir, this paper argues that even very simple language models do learn structural facts about the world, while also proposing relatively precise limits on the nature and extent of those facts. This perspective promises not only to answer questions about what language models actually learn, but also to explain the consistent and surprising success of cooccurrence prediction as a meaning-making strategy in AI.","PeriodicalId":191971,"journal":{"name":"Workshop on Computational Humanities Research","volume":"13 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129235321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信