{"title":"Old geographical corpora: A methodology for interpretative transcription","authors":"Mihaela Plamada-Onofrei, Daniela Gîfu, Cecilia Bolea","doi":"10.1109/SPED.2017.7990445","DOIUrl":null,"url":null,"abstract":"This paper describes a study of the evolution of Romanian language, belonging to 18h and 19h centuries, from geographical domain, in order to develop an automatic recognition and interpretative transcription of Romanian historical heritage writings from Cyrillic into Latin, in printed forms. It is well known that the operation of interpretative transcription of texts written in Cyrillic is extremely laborious, but it will solve a problem of great interest to humanities researchers who are concerned with the study of the Romanian language in its diachronic evolution. We think that the present study will impact the humanities research, including that of paleography, history, archaeology and that field of linguistics interested in the study of the language in diachrony, but it will also help the researchers in the field of computational linguistics that develop models for old language, in order to develop a diachronic POS tagger, so necessary to recover old lemmata.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPED.2017.7990445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper describes a study of the evolution of Romanian language, belonging to 18h and 19h centuries, from geographical domain, in order to develop an automatic recognition and interpretative transcription of Romanian historical heritage writings from Cyrillic into Latin, in printed forms. It is well known that the operation of interpretative transcription of texts written in Cyrillic is extremely laborious, but it will solve a problem of great interest to humanities researchers who are concerned with the study of the Romanian language in its diachronic evolution. We think that the present study will impact the humanities research, including that of paleography, history, archaeology and that field of linguistics interested in the study of the language in diachrony, but it will also help the researchers in the field of computational linguistics that develop models for old language, in order to develop a diachronic POS tagger, so necessary to recover old lemmata.