Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature最新文献

筛选
英文 中文
Unsupervised Adverbial Identification in Modern Chinese Literature 中国现代文学中的无监督状语识别
Wenxiu Xie, J. Lee, Fangqiong Zhan, Xiao Han, Chi-Yin Chow
{"title":"Unsupervised Adverbial Identification in Modern Chinese Literature","authors":"Wenxiu Xie, J. Lee, Fangqiong Zhan, Xiao Han, Chi-Yin Chow","doi":"10.18653/v1/2021.latechclfl-1.10","DOIUrl":"https://doi.org/10.18653/v1/2021.latechclfl-1.10","url":null,"abstract":"In many languages, adverbials can be derived from words of various parts-of-speech. In Chinese, the derivation may be marked either with the standard adverbial marker DI, or the non-standard marker DE. Since DE also serves double duty as the attributive marker, accurate identification of adverbials requires disambiguation of its syntactic role. As parsers are trained predominantly on texts using the standard adverbial marker DI, they often fail to recognize adverbials suffixed with the non-standard DE. This paper addresses this problem with an unsupervised, rule-based approach for adverbial identification that utilizes dependency tree patterns. Experiment results show that this approach outperforms a masked language model baseline. We apply this approach to analyze standard and non-standard adverbial marker usage in modern Chinese literature.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130737852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF 近代早期荷兰媒体景观。使用词嵌入和CRF检测编年史中的媒体提及
A. Lassche, R. Morante
{"title":"The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF","authors":"A. Lassche, R. Morante","doi":"10.18653/v1/2021.latechclfl-1.1","DOIUrl":"https://doi.org/10.18653/v1/2021.latechclfl-1.1","url":null,"abstract":"While the production of information in the European early modern period is a well-researched topic, the question how people were engaging with the information explosion that occurred in early modern Europe, is still underexposed. This paper presents the annotations and experiments aimed at exploring whether we can automatically extract media related information (source, perception, and receiver) from a corpus of early modern Dutch chronicles in order to get insight in the mediascape of early modern middle class people from a historic perspective. In a number of classification experiments with Conditional Random Fields, three categories of features are tested: (i) raw and binary word embedding features, (ii) lexicon features, and (iii) character features. Overall, the classifier that uses raw embeddings performs slightly better. However, given that the best F-scores are around 0.60, we conclude that the machine learning approach needs to be combined with a close reading approach for the results to be useful to answer history research questions.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114269711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Period Classification in Chinese Historical Texts 中国历史文本的时期分类
Zuoyu Tian, Sandra Kübler
{"title":"Period Classification in Chinese Historical Texts","authors":"Zuoyu Tian, Sandra Kübler","doi":"10.18653/v1/2021.latechclfl-1.19","DOIUrl":"https://doi.org/10.18653/v1/2021.latechclfl-1.19","url":null,"abstract":"In this study, we study language change in Chinese Biji by using a classification task: classifying Ancient Chinese texts by time periods. Specifically, we focus on a unique genre in classical Chinese literature: Biji (literally “notebook” or “brush notes”), i.e., collections of anecdotes, quotations, etc., anything authors consider noteworthy, Biji span hundreds of years across many dynasties and conserve informal language in written form. For these reasons, they are regarded as a good resource for investigating language change in Chinese (Fang, 2010). In this paper, we create a new dataset of 108 Biji across four dynasties. Based on the dataset, we first introduce a time period classification task for Chinese. Then we investigate different feature representation methods for classification. The results show that models using contextualized embeddings perform best. An analysis of the top features chosen by the word n-gram model (after bleaching proper nouns) confirms that these features are informative and correspond to observations and assumptions made by historical linguists.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115326174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Batavia asked for advice. Pretrained language models for Named Entity Recognition in historical texts. 巴达维亚向他征求意见。历史文本中命名实体识别的预训练语言模型。
S. Arnoult, L. Petram, P. Vossen
{"title":"Batavia asked for advice. Pretrained language models for Named Entity Recognition in historical texts.","authors":"S. Arnoult, L. Petram, P. Vossen","doi":"10.18653/v1/2021.latechclfl-1.3","DOIUrl":"https://doi.org/10.18653/v1/2021.latechclfl-1.3","url":null,"abstract":"Pretrained language models like BERT have advanced the state of the art for many NLP tasks. For resource-rich languages, one has the choice between a number of language-specific models, while multilingual models are also worth considering. These models are well known for their crosslingual performance, but have also shown competitive in-language performance on some tasks. We consider monolingual and multilingual models from the perspective of historical texts, and in particular for texts enriched with editorial notes: how do language models deal with the historical and editorial content in these texts? We present a new Named Entity Recognition dataset for Dutch based on 17th and 18th century United East India Company (VOC) reports extended with modern editorial notes. Our experiments with multilingual and Dutch pretrained language models confirm the crosslingual abilities of multilingual models while showing that all language models can leverage mixed-variant data. In particular, language models successfully incorporate notes for the prediction of entities in historical texts. We also find that multilingual models outperform monolingual models on our data, but that this superiority is linked to the task at hand: multilingual models lose their advantage when confronted with more semantical tasks.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125739078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying Contextual Aspects of Inter-annotator Agreement in Intertextuality Research 互文性研究中注释者间一致的量化语境因素
Enrique Manjavacas Arevalo, Laurence Mellerin, M. Kestemont
{"title":"Quantifying Contextual Aspects of Inter-annotator Agreement in Intertextuality Research","authors":"Enrique Manjavacas Arevalo, Laurence Mellerin, M. Kestemont","doi":"10.18653/v1/2021.latechclfl-1.4","DOIUrl":"https://doi.org/10.18653/v1/2021.latechclfl-1.4","url":null,"abstract":"We report on an inter-annotator agreement experiment involving instances of text reuse focusing on the well-known case of biblical intertextuality in medieval literature. We target the application use case of literary scholars whose aim is to document instances of biblical references in the ‘apparatus fontium’ of a prospective digital edition. We develop a Bayesian implementation of Cohen’s kappa for multiple annotators that allows us to assess the influence of various contextual effects on the inter-annotator agreement, producing both more robust estimates of the agreement indices as well as insights into the annotation process that leads to the estimated indices. As a result, we are able to produce a novel and nuanced estimation of inter-annotator agreement in the context of intertextuality, exploring the challenges that arise from manually annotating a dataset of biblical references in the writings of Bernard of Clairvaux. Among others, our method was able to unveil the fact that the obtained agreement depends heavily on the biblical source book of the proposed reference, as well as the underlying algorithm used to retrieve the candidate match.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127985718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FrameNet-like Annotation of Olfactory Information in Texts 文本嗅觉信息的类框架标注
Sara Tonelli, S. Menini
{"title":"FrameNet-like Annotation of Olfactory Information in Texts","authors":"Sara Tonelli, S. Menini","doi":"10.18653/v1/2021.latechclfl-1.2","DOIUrl":"https://doi.org/10.18653/v1/2021.latechclfl-1.2","url":null,"abstract":"Although olfactory references play a crucial role in our cultural memory, only few works in NLP have tried to capture them from a computational perspective. Currently, the main challenge is not much the development of technological components for olfactory information extraction, given recent advances in semantic processing and natural language understanding, but rather the lack of a theoretical framework to capture this information from a linguistic point of view, as a preliminary step towards the development of automated systems. Therefore, in this work we present the annotation guidelines, developed with the help of history scholars and domain experts, aimed at capturing all the relevant elements involved in olfactory situations or events described in texts. These guidelines have been inspired by FrameNet annotation, but underwent some adaptations, which are detailed in this paper. Furthermore, we present a case study concerning the annotation of olfactory situations in English historical travel writings describing trips to Italy. An analysis of the most frequent role fillers show that olfactory descriptions pertain to some typical domains such as religion, food, nature, ancient past, poor sanitation, all supporting the creation of a stereotypical imagery related to Italy. On the other hand, positive feelings triggered by smells are prevalent, and contribute to framing travels to Italy as an exciting experience involving all senses.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"11 7‐8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113977116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Data-Driven Detection of General Chiasmi Using Lexical and Semantic Features 基于词汇和语义特征的数据驱动的一般误语检测
Felix Schneider, Björn Barz, Phillip Brandes, Sophie Marshall, Joachim Denzler
{"title":"Data-Driven Detection of General Chiasmi Using Lexical and Semantic Features","authors":"Felix Schneider, Björn Barz, Phillip Brandes, Sophie Marshall, Joachim Denzler","doi":"10.18653/v1/2021.latechclfl-1.11","DOIUrl":"https://doi.org/10.18653/v1/2021.latechclfl-1.11","url":null,"abstract":"Automatic detection of stylistic devices is an important tool for literary studies, e.g., for stylometric analysis or argument mining. A particularly striking device is the rhetorical figure called chiasmus, which involves the inversion of semantically or syntactically related words. Existing works focus on a special case of chiasmi that involve identical words in an A B B A pattern, so-called antimetaboles. In contrast, we propose an approach targeting the more general and challenging case A B B’ A’, where the words A, A’ and B, B’ constituting the chiasmus do not need to be identical but just related in meaning. To this end, we generalize the established candidate phrase mining strategy from antimetaboles to general chiasmi and propose novel features based on word embeddings and lemmata for capturing both semantic and syntactic information. These features serve as input for a logistic regression classifier, which learns to distinguish between rhetorical chiasmi and coincidental chiastic word orders without special meaning. We evaluate our approach on two datasets consisting of classical German dramas, four texts with annotated chiasmi and 500 unannotated texts. Compared to previous methods for chiasmus detection, our novel features improve the average precision from 17% to 28% and the precision among the top 100 results from 13% to 35%.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132033803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
End-to-end style-conditioned poetry generation: What does it take to learn from examples alone? 端到端受风格限制的诗歌创作:仅从例子中学习需要什么?
Jörg Wöckener, T. Haider, Tristan Miller, The-Khang Nguyen, Thanh Tung Linh Nguyen, Minh Vu Pham, Jonas Belouadi, Steffen Eger
{"title":"End-to-end style-conditioned poetry generation: What does it take to learn from examples alone?","authors":"Jörg Wöckener, T. Haider, Tristan Miller, The-Khang Nguyen, Thanh Tung Linh Nguyen, Minh Vu Pham, Jonas Belouadi, Steffen Eger","doi":"10.18653/v1/2021.latechclfl-1.7","DOIUrl":"https://doi.org/10.18653/v1/2021.latechclfl-1.7","url":null,"abstract":"In this work, we design an end-to-end model for poetry generation based on conditioned recurrent neural network (RNN) language models whose goal is to learn stylistic features (poem length, sentiment, alliteration, and rhyming) from examples alone. We show this model successfully learns the ‘meaning’ of length and sentiment, as we can control it to generate longer or shorter as well as more positive or more negative poems. However, the model does not grasp sound phenomena like alliteration and rhyming, but instead exploits low-level statistical cues. Possible reasons include the size of the training data, the relatively low frequency and difficulty of these sublexical phenomena as well as model biases. We show that more recent GPT-2 models also have problems learning sublexical phenomena such as rhyming from examples alone.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127606605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Mixed-Methods Analysis of Western and Hong Kong–based Reporting on the 2019–2020 Protests 西方和香港对2019-2020年抗议活动报道的混合方法分析
Arya D. McCarthy, James Scharf, G. Dore
{"title":"A Mixed-Methods Analysis of Western and Hong Kong–based Reporting on the 2019–2020 Protests","authors":"Arya D. McCarthy, James Scharf, G. Dore","doi":"10.18653/v1/2021.latechclfl-1.20","DOIUrl":"https://doi.org/10.18653/v1/2021.latechclfl-1.20","url":null,"abstract":"We apply statistical techniques from natural language processing to Western and Hong Kong–based English language newspaper articles that discuss the 2019–2020 Hong Kong protests of the Anti-Extradition Law Amendment Bill Movement. Topic modeling detects central themes of the reporting and shows the differing agendas toward one country, two systems. Embedding-based usage shift (at the word level) and sentiment analysis (at the document level) both support that Hong Kong–based reporting is more negative and more emotionally charged. A two-way test shows that while July 1, 2019 is a turning point for media portrayal, the differences between western- and Hong Kong–based reporting did not magnify when the protests began; rather, they already existed. Taken together, these findings clarify how the portrayal of activism in Hong Kong evolved throughout the Movement.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129627267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信