古芬兰文学的语言变迁与历史分期

Workshop on Computational Approaches to Historical Language Change Pub Date : 1900-01-01 DOI:10.18653/v1/2021.lchange-1.4

N. Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

{"title":"古芬兰文学的语言变迁与历史分期","authors":"N. Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter","doi":"10.18653/v1/2021.lchange-1.4","DOIUrl":null,"url":null,"abstract":"In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola. We analyse the error types that occur and appear in different decades, and use word error rate (WER) and different error types as a proxy for measuring linguistic innovation and change. We show that the proposed approach works, and the errors are connected to accumulating changes and innovations, which also results in a continuous decrease in the accuracy of the model. The described error types also guide further work in improving these models, and document the currently observed issues. We also have trained word embeddings for four centuries of lemmatized Old Literary Finnish, which are available on Zenodo.","PeriodicalId":120650,"journal":{"name":"Workshop on Computational Approaches to Historical Language Change","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Linguistic change and historical periodization of Old Literary Finnish\",\"authors\":\"N. Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter\",\"doi\":\"10.18653/v1/2021.lchange-1.4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola. We analyse the error types that occur and appear in different decades, and use word error rate (WER) and different error types as a proxy for measuring linguistic innovation and change. We show that the proposed approach works, and the errors are connected to accumulating changes and innovations, which also results in a continuous decrease in the accuracy of the model. The described error types also guide further work in improving these models, and document the currently observed issues. We also have trained word embeddings for four centuries of lemmatized Old Literary Finnish, which are available on Zenodo.\",\"PeriodicalId\":120650,\"journal\":{\"name\":\"Workshop on Computational Approaches to Historical Language Change\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Computational Approaches to Historical Language Change\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2021.lchange-1.4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Computational Approaches to Historical Language Change","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.lchange-1.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在本研究中，我们使用Agricola文本训练的词形化模型对旧芬兰文学语料库进行了规范化和词形化。我们分析了不同年代发生和出现的错误类型，并使用单词错误率(WER)和不同的错误类型作为衡量语言创新和变化的代理。我们证明了所提出的方法是有效的，并且误差与累积的变化和创新有关，这也导致模型的精度不断降低。所描述的错误类型还指导了改进这些模型的进一步工作，并记录了当前观察到的问题。我们也训练了四个世纪的词源化的旧芬兰文学的词嵌入，可以在Zenodo上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Linguistic change and historical periodization of Old Literary Finnish

In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola. We analyse the error types that occur and appear in different decades, and use word error rate (WER) and different error types as a proxy for measuring linguistic innovation and change. We show that the proposed approach works, and the errors are connected to accumulating changes and innovations, which also results in a continuous decrease in the accuracy of the model. The described error types also guide further work in improving these models, and document the currently observed issues. We also have trained word embeddings for four centuries of lemmatized Old Literary Finnish, which are available on Zenodo.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Computational Approaches to Historical Language Change

自引率

0.00%

发文量