Workshop on Computational Approaches to Historical Language Change最新文献

筛选
英文 中文
The Corpora They Are a-Changing: a Case Study in Italian Newspapers 语料库正在改变:以意大利报纸为例
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.lchange-1.3
Pierpaolo Basile, A. Caputo, Tommaso Caselli, Pierluigi Cassotti, Rossella Varvara
{"title":"The Corpora They Are a-Changing: a Case Study in Italian Newspapers","authors":"Pierpaolo Basile, A. Caputo, Tommaso Caselli, Pierluigi Cassotti, Rossella Varvara","doi":"10.18653/v1/2021.lchange-1.3","DOIUrl":"https://doi.org/10.18653/v1/2021.lchange-1.3","url":null,"abstract":"The use of automatic methods for the study of lexical semantic change (LSC) has led to the creation of evaluation benchmarks. Benchmark datasets, however, are intimately tied to the corpus used for their creation questioning their reliability as well as the robustness of automatic methods. This contribution investigates these aspects showing the impact of unforeseen social and cultural dimensions. We also identify a set of additional issues (OCR quality, named entities) that impact the performance of the automatic methods, especially when used to discover LSC.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123398709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tracking Semantic Change in Cognate Sets for English and Romance Languages 英语和罗曼语同源词集语义变化的跟踪
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.lchange-1.9
Ana Sabina Uban, Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu, Laurentiu Zoicas
{"title":"Tracking Semantic Change in Cognate Sets for English and Romance Languages","authors":"Ana Sabina Uban, Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu, Laurentiu Zoicas","doi":"10.18653/v1/2021.lchange-1.9","DOIUrl":"https://doi.org/10.18653/v1/2021.lchange-1.9","url":null,"abstract":"Semantic divergence in related languages is a key concern of historical linguistics. We cross-linguistically investigate the semantic divergence of cognate pairs in English and Romance languages, by means of word embeddings. To this end, we introduce a new curated dataset of cognates in all pairs of those languages. We describe the types of errors that occurred during the automated cognate identification process and manually correct them. Additionally, we label the English cognates according to their etymology, separating them into two groups: old borrowings and recent borrowings. On this curated dataset, we analyse word properties such as frequency and polysemy, and the distribution of similarity scores between cognate sets in different languages. We automatically identify different clusters of English cognates, setting a new direction of research in cognates, borrowings and possibly false friends analysis in related languages.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126681423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
black[LSCDiscovery shared task] DeepMistake at LSCDiscovery: Can a Multilingual Word-in-Context Model Replace Human Annotators? [LSCDiscovery共享任务]DeepMistake at LSCDiscovery:多语言上下文词模型能否取代人类注释器?
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.18
Daniil Homskiy, N. Arefyev
{"title":"black[LSCDiscovery shared task] \u0000 DeepMistake at LSCDiscovery: Can a Multilingual Word-in-Context Model Replace Human Annotators?","authors":"Daniil Homskiy, N. Arefyev","doi":"10.18653/v1/2022.lchange-1.18","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.18","url":null,"abstract":"In this paper we describe our solution of the LSCDiscovery shared task on Lexical Semantic Change Discovery (LSCD) in Spanish. Our solution employs a Word-in-Context (WiC) model, which is trained to determine if a particular word has the same meaning in two given contexts. We basically try to replicate the annotation of the dataset for the shared task, but replacing human annotators with a neural network. In the graded change discovery subtask, our solution has achieved the 2nd best result according to all metrics. In the main binary change detection subtask, our F1-score is 0.655 compared to 0.716 of the best submission, corresponding to the 5th place. However, in the optional sense gain detection subtask we have outperformed all other participants. During the post-evaluation experiments we compared different ways to prepare WiC data in Spanish for fine-tuning. We have found that it helps leaving only examples annotated as 1 (unrelated senses) and 4 (identical senses) rather than using 2x more examples including intermediate annotations.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"868 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124271776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low Saxon dialect distances at the orthographic and syntactic level 低撒克逊方言在正字法和句法水平上的距离
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.12
Jan B. Siewert, Yves Scherrer, Martijn Wieling
{"title":"Low Saxon dialect distances at the orthographic and syntactic level","authors":"Jan B. Siewert, Yves Scherrer, Martijn Wieling","doi":"10.18653/v1/2022.lchange-1.12","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.12","url":null,"abstract":"We compare five Low Saxon dialects from the 19th and 21st century from Germany and the Netherlands with each other as well as with modern Standard Dutch and Standard German. Our comparison is based on character n-grams on the one hand and PoS n-grams on the other and we show that these two lead to different distances. Particularly in the PoS-based distances, one can observe all of the 21st century Low Saxon dialects shifting towards the modern majority languages.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121587434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Roadblocks in Gender Bias Measurement for Diachronic Corpora 历时语料库性别偏见测量的障碍
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.15
Saied Alshahrani, Esma Wali, Abdullah R. Alshamsan, Yan Chen, Jeanna Neefe Matthews
{"title":"Roadblocks in Gender Bias Measurement for Diachronic Corpora","authors":"Saied Alshahrani, Esma Wali, Abdullah R. Alshamsan, Yan Chen, Jeanna Neefe Matthews","doi":"10.18653/v1/2022.lchange-1.15","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.15","url":null,"abstract":"The use of word embeddings is an important NLP technique for extracting meaningful conclusions from corpora of human text. One important question that has been raised about word embeddings is the degree of gender bias learned from corpora. Bolukbasi et al. (2016) proposed an important technique for quantifying gender bias in word embeddings that, at its heart, is lexically based and relies on sets of highly gendered word pairs (e.g., mother/father and madam/sir) and a list of professions words (e.g., doctor and nurse). In this paper, we document problems that arise with this method to quantify gender bias in diachronic corpora. Focusing on Arabic and Chinese corpora, in particular, we document clear changes in profession words used over time and, somewhat surprisingly, even changes in the simpler gendered defining set word pairs. We further document complications in languages such as Arabic, where many words are highly polysemous/homonymous, especially female professions words.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115807847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From qualifiers to quantifiers: semantic shift at the paradigm level 从限定词到量词:范式层面的语义转移
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.5
Q. Feltgen
{"title":"From qualifiers to quantifiers: semantic shift at the paradigm level","authors":"Q. Feltgen","doi":"10.18653/v1/2022.lchange-1.5","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.5","url":null,"abstract":"Language change has often been conceived as a competition between linguistic variants. However, language units may be complex organizations in themselves, e.g. in the case of schematic constructions, featuring a free slot. Such a slot is filled by words forming a set or ‘paradigm’ and engaging in inter-related dynamics within this constructional environment. To tackle this complexity, a simple computational method is offered to automatically characterize their interactions, and visualize them through networks of cooperation and competition. Applying this method to the French paradigm of quantifiers, I show that this method efficiently captures phenomena regarding the evolving organization of constructional paradigms, in particular the constitution of competing clusters of fillers that promote different semantic strategies overall.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132292276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Language Acquisition, Neutral Change, and Diachronic Trends in Noun Classifiers 名词分类词的语言习得、中性变化与历时趋势
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.2
Aniket Kali, Jordan Kodner
{"title":"Language Acquisition, Neutral Change, and Diachronic Trends in Noun Classifiers","authors":"Aniket Kali, Jordan Kodner","doi":"10.18653/v1/2022.lchange-1.2","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.2","url":null,"abstract":"Languages around the world employ classifier systems as a method of semantic organization and categorization. These systems are rife with variability, violability, and ambiguity, and are prone to constant change over time. We explicitly model change in classifier systems as the population-level outcome of child language acquisition over time in order to shed light on the factors that drive change to classifier systems. Our research consists of two parts: a contrastive corpus study of Cantonese and Mandarin child-directed speech to determine the role that ambiguity and homophony avoidance may play in classifier learning and change followed by a series of population-level learning simulations of an abstract classifier system. We find that acquisition without reference to ambiguity avoidance is sufficient to drive broad trends in classifier change and suggest an additional role for adults and discourse factors in classifier death.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133089286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multilingual Benchmark to Capture Olfactory Situations over Time 捕捉嗅觉情况的多语言基准
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.1
S. Menini, Teresa Paccosi, Sara Tonelli, M. van Erp, I. Leemans, Pasquale Lisena, Raphael Troncy, William Tullett, Ali Hürriyetoǧlu, Ger Dijkstra, F. Gordijn, Elias Jürgens, Josephine Koopman, Aron Ouwerkerk, Sanne Steen, I. Novalija, J. Brank, Dunja Mladenić, Anja Zidar
{"title":"A Multilingual Benchmark to Capture Olfactory Situations over Time","authors":"S. Menini, Teresa Paccosi, Sara Tonelli, M. van Erp, I. Leemans, Pasquale Lisena, Raphael Troncy, William Tullett, Ali Hürriyetoǧlu, Ger Dijkstra, F. Gordijn, Elias Jürgens, Josephine Koopman, Aron Ouwerkerk, Sanne Steen, I. Novalija, J. Brank, Dunja Mladenić, Anja Zidar","doi":"10.18653/v1/2022.lchange-1.1","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.1","url":null,"abstract":"We present a benchmark in six European languages containing manually annotated information about olfactory situations and events following a FrameNet-like approach. The documents selection covers ten domains of interest to cultural historians in the olfactory domain and includes texts published between 1620 to 1920, allowing a diachronic analysis of smell descriptions. With this work, we aim to foster the development of olfactory information extraction approaches as well as the analysis of changes in smell descriptions over time.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128881364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Caveats of Measuring Semantic Change of Cognates and Borrowings using Multilingual Word Embeddings 用多语言词嵌入测量同源词和借词语义变化的注意事项
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.lchange-1.10
Clémentine Fourrier, Syrielle Montariol
{"title":"Caveats of Measuring Semantic Change of Cognates and Borrowings using Multilingual Word Embeddings","authors":"Clémentine Fourrier, Syrielle Montariol","doi":"10.18653/v1/2022.lchange-1.10","DOIUrl":"https://doi.org/10.18653/v1/2022.lchange-1.10","url":null,"abstract":"Cognates and borrowings carry different aspects of etymological evolution. In this work, we study semantic change of such items using multilingual word embeddings, both static and contextualised. We underline caveats identified while building and evaluating these embeddings. We release both said embeddings and a newly-built historical words lexicon, containing typed relations between words of varied Romance languages.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115242007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Linguistic change and historical periodization of Old Literary Finnish 古芬兰文学的语言变迁与历史分期
Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.lchange-1.4
N. Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter
{"title":"Linguistic change and historical periodization of Old Literary Finnish","authors":"N. Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter","doi":"10.18653/v1/2021.lchange-1.4","DOIUrl":"https://doi.org/10.18653/v1/2021.lchange-1.4","url":null,"abstract":"In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola. We analyse the error types that occur and appear in different decades, and use word error rate (WER) and different error types as a proxy for measuring linguistic innovation and change. We show that the proposed approach works, and the errors are connected to accumulating changes and innovations, which also results in a continuous decrease in the accuracy of the model. The described error types also guide further work in improving these models, and document the currently observed issues. We also have trained word embeddings for four centuries of lemmatized Old Literary Finnish, which are available on Zenodo.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130570470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信