语料库正在改变:以意大利报纸为例

Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI:10.18653/v1/2021.lchange-1.3

Pierpaolo Basile, A. Caputo, Tommaso Caselli, Pierluigi Cassotti, Rossella Varvara

{"title":"语料库正在改变:以意大利报纸为例","authors":"Pierpaolo Basile, A. Caputo, Tommaso Caselli, Pierluigi Cassotti, Rossella Varvara","doi":"10.18653/v1/2021.lchange-1.3","DOIUrl":null,"url":null,"abstract":"The use of automatic methods for the study of lexical semantic change (LSC) has led to the creation of evaluation benchmarks. Benchmark datasets, however, are intimately tied to the corpus used for their creation questioning their reliability as well as the robustness of automatic methods. This contribution investigates these aspects showing the impact of unforeseen social and cultural dimensions. We also identify a set of additional issues (OCR quality, named entities) that impact the performance of the automatic methods, especially when used to discover LSC.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Corpora They Are a-Changing: a Case Study in Italian Newspapers\",\"authors\":\"Pierpaolo Basile, A. Caputo, Tommaso Caselli, Pierluigi Cassotti, Rossella Varvara\",\"doi\":\"10.18653/v1/2021.lchange-1.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of automatic methods for the study of lexical semantic change (LSC) has led to the creation of evaluation benchmarks. Benchmark datasets, however, are intimately tied to the corpus used for their creation questioning their reliability as well as the robustness of automatic methods. This contribution investigates these aspects showing the impact of unforeseen social and cultural dimensions. We also identify a set of additional issues (OCR quality, named entities) that impact the performance of the automatic methods, especially when used to discover LSC.\",\"PeriodicalId\":120650,\"journal\":{\"name\":\"Workshop on Computational Approaches to Historical Language Change\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Computational Approaches to Historical Language Change\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2021.lchange-1.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Computational Approaches to Historical Language Change","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.lchange-1.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

词汇语义变化(LSC)研究的自动化方法的使用导致了评价基准的创建。然而，基准数据集与用于创建它们的语料库密切相关，质疑它们的可靠性以及自动方法的鲁棒性。这篇文章调查了这些方面，显示了不可预见的社会和文化层面的影响。我们还确定了一组影响自动方法性能的附加问题(OCR质量、命名实体)，特别是在用于发现LSC时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Corpora They Are a-Changing: a Case Study in Italian Newspapers

The use of automatic methods for the study of lexical semantic change (LSC) has led to the creation of evaluation benchmarks. Benchmark datasets, however, are intimately tied to the corpus used for their creation questioning their reliability as well as the robustness of automatic methods. This contribution investigates these aspects showing the impact of unforeseen social and cultural dimensions. We also identify a set of additional issues (OCR quality, named entities) that impact the performance of the automatic methods, especially when used to discover LSC.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Computational Approaches to Historical Language Change

自引率

0.00%

发文量