Workshop on Computational Approaches to Historical Language Change最新文献

筛选
英文 中文
Using neural topic models to track context shifts of words: a case study of COVID-related terms before and after the lockdown in April 2020 使用神经主题模型跟踪单词的上下文变化:以2020年4月封锁前后的covid - 19相关术语为例研究
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.14
Olga Kellert, M. Zaman
{"title":"Using neural topic models to track context shifts of words: a case study of COVID-related terms before and after the lockdown in April 2020","authors":"Olga Kellert, M. Zaman","doi":"10.18653/v1/2022.lchange-1.14","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.14","url":null,"abstract":"This paper explores lexical meaning changes in a new dataset, which includes tweets from before and after the COVID-related lockdown in April 2020. We use this dataset to evaluate traditional and more recent unsupervised approaches to lexical semantic change that make use of contextualized word representations based on the BERT neural language model to obtain representations of word usages. We argue that previous models that encode local representations of words cannot capture global context shifts such as the context shift of face masks since the pandemic outbreak. We experiment with neural topic models to track context shifts of words. We show that this approach can reveal textual associations of words that go beyond their lexical meaning representation. We discuss future work and how to proceed capturing the pragmatic aspect of meaning change as opposed to lexical semantic change.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134503937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Modeling the Evolution of Word Senses with Force-Directed Layouts of Co-occurrence Networks 用共现网络的力向布局来模拟词义的演化
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.lchange-1.8
T. Reke, Robert Schwanhold, Ralf Krestel
{"title":"Modeling the Evolution of Word Senses with Force-Directed Layouts of Co-occurrence Networks","authors":"T. Reke, Robert Schwanhold, Ralf Krestel","doi":"10.18653/v1/2021.lchange-1.8","DOIUrl":"https://doi.org/10.18653/v1/2021.lchange-1.8","url":null,"abstract":"Languages evolve over time and the meaning of words can shift. Furthermore, individual words can have multiple senses. However, existing language models often only reflect one word sense per word and do not reflect semantic changes over time. While there are language models that can either model semantic change of words or multiple word senses, none of them cover both aspects simultaneously. We propose a novel force-directed graph layout algorithm to draw a network of frequently co-occurring words. In this way, we are able to use the drawn graph to visualize the evolution of word senses. In addition, we hope that jointly modeling semantic change and multiple senses of words results in improvements for the individual tasks.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123889421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
black[LSCDiscovery shared task] GlossReader at LSCDiscovery: Train to Select a Proper Gloss in English – Discover Lexical Semantic Change in Spanish black[LSCDiscovery共享任务]:在英语中选择适当的光泽训练-发现西班牙语词汇语义变化
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.22
M. Rachinskiy, N. Arefyev
{"title":"black[LSCDiscovery shared task] \u0000 GlossReader at LSCDiscovery: Train to Select a Proper Gloss in English – Discover Lexical Semantic Change in Spanish","authors":"M. Rachinskiy, N. Arefyev","doi":"10.18653/v1/2022.lchange-1.22","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.22","url":null,"abstract":"The contextualized embeddings obtained from neural networks pre-trained as Language Models (LM) or Masked Language Models (MLM) are not well suitable for solving the Lexical Semantic Change Detection (LSCD) task because they are more sensitive to changes in word forms rather than word meaning, a property previously known as the word form bias or orthographic bias. Unlike many other NLP tasks, it is also not obvious how to fine-tune such models for LSCD. In order to conclude if there are any differences between senses of a particular word in two corpora, a human annotator or a system shall analyze many examples containing this word from both corpora. This makes annotation of LSCD datasets very labour-consuming. The existing LSCD datasets contain up to 100 words that are labeled according to their semantic change, which is hardly enough for fine-tuning. To solve these problems we fine-tune the XLM-R MLM as part of a gloss-based WSD system on a large WSD dataset in English. Then we employ zero-shot cross-lingual transferability of XLM-R to build the contextualized embeddings for examples in Spanish. In order to obtain the graded change score for each word, we calculate the average distance between our improved contextualized embeddings of its old and new occurrences. For the binary change detection subtask, we apply thresholding to the same scores. Our solution has shown the best results among all other participants in all subtasks except for the optional sense gain detection subtask.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124183846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Cross-Lingual Part of Speech Tagging for Partially Reconstructing the Classic Language Family Tree Model 用跨语言词性标注部分重构经典语言谱系树模型
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.8
Anat Samohi, Daniel Weisberg Mitelman, Kfir Bar
{"title":"Using Cross-Lingual Part of Speech Tagging for Partially Reconstructing the Classic Language Family Tree Model","authors":"Anat Samohi, Daniel Weisberg Mitelman, Kfir Bar","doi":"10.18653/v1/2022.lchange-1.8","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.8","url":null,"abstract":"The tree model is well known for expressing the historic evolution of languages. This model has been considered as a method of describing genetic relationships between languages. Nevertheless, some researchers question the model’s ability to predict the proximity between two languages, since it represents genetic relatedness rather than linguistic resemblance. Defining other language proximity models has been an active research area for many years. In this paper we explore a part-of-speech model for defining proximity between languages using a multilingual language model that was fine-tuned on the task of cross-lingual part-of-speech tagging. We train the model on one language and evaluate it on another; the measured performance is then used to define the proximity between the two languages. By further developing the model, we show that it can reconstruct some parts of the tree model.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132857260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deconstructing destruction: A Cognitive Linguistics perspective on a computational analysis of diachronic change 解构破坏:历时变化计算分析的认知语言学视角
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.3
Karlien Franco, Mariana Montes, K. Heylen
{"title":"Deconstructing destruction: A Cognitive Linguistics perspective on a computational analysis of diachronic change","authors":"Karlien Franco, Mariana Montes, K. Heylen","doi":"10.18653/v1/2022.lchange-1.3","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.3","url":null,"abstract":"In this paper, we aim to introduce a Cognitive Linguistics perspective into a computational analysis of near-synonyms. We focus on a single set of Dutch near-synonyms, vernielen and vernietigen, roughly translated as ‘to destroy’, replicating the analysis from Geeraerts (1997) with distributional models. Our analysis, which tracks the meaning of both words in a corpus of 16th-20th century prose data, shows that both lexical items have undergone semantic change, led by differences in their prototypical semantic core.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116272596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
“Vaderland”, “Volk” and “Natie”: Semantic Change Related to Nationalism in Dutch Literature Between 1700 and 1880 Captured with Dynamic Bernoulli Word Embeddings “Vaderland”、“Volk”和“native”:用动态伯努利词嵌入捕捉1700 - 1880年荷兰文学中与民族主义相关的语义变化
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.13
Marije Timmermans, Eva Vanmassenhove, D. Shterionov
{"title":"“Vaderland”, “Volk” and “Natie”: Semantic Change Related to Nationalism in Dutch Literature Between 1700 and 1880 Captured with Dynamic Bernoulli Word Embeddings","authors":"Marije Timmermans, Eva Vanmassenhove, D. Shterionov","doi":"10.18653/v1/2022.lchange-1.13","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.13","url":null,"abstract":"Languages can respond to external events in various ways - the creation of new words or named entities, additional senses might develop for already existing words or the valence of words can change. In this work, we explore the semantic shift of the Dutch words “natie” (“nation”), “volk” (“people”) and “vaderland” (“fatherland”) over a period that is known for the rise of nationalism in Europe: 1700-1880. The semantic change is measured by means of Dynamic Bernoulli Word Embeddings which allow for comparison between word embeddings over different time slices. The word embeddings were generated based on Dutch fiction literature divided over different decades. From the analysis of the absolute drifts, it appears that the word “natie” underwent a relatively small drift. However, the drifts of “vaderland’” and “volk”’ show multiple peaks, culminating around the turn of the nineteenth century. To verify whether this semantic change can indeed be attributed to nationalistic movements, a detailed analysis of the nearest neighbours of the target words is provided. From the analysis, it appears that “natie”, “volk” and “vaderlan”’ became more nationalistically-loaded over time.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124074153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model 用BERT模型预测18世纪文本的可解释出版年份
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.7
Iiro Rastas, Yann Ciarán Ryan, Iiro Tiihonen, Mohammadreza Qaraei, Liina Repo, Rohit Babbar, E. Mäkelä, M. Tolonen, Filip Ginter
{"title":"Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model","authors":"Iiro Rastas, Yann Ciarán Ryan, Iiro Tiihonen, Mohammadreza Qaraei, Liina Repo, Rohit Babbar, E. Mäkelä, M. Tolonen, Filip Ginter","doi":"10.18653/v1/2022.lchange-1.7","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.7","url":null,"abstract":"In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125061419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信