{"title":"Skolēnu pārspriedumu korpusa izveide","authors":"Kristīne Levāne-Petrova, Kristīne Pokratniece","doi":"10.37384/lva.2021.106","DOIUrl":null,"url":null,"abstract":"The popularity of teaching Latvian as a mother tongue has always been of great importance. Although various teaching materials have been developed over time, corpus-based learning materials are becoming more and more popular nowadays. Since December 2018, the Institute of Mathematics and Computer Science of the University of Latvia has carried out the Latvian State Research Programme “Latvian Language”, agreement No. VPP-IZM-2018/2-0002 (subproject “Acquisi-tion of Latvian Language”). During the project, “The Corpus of Students’ Essays” will be created. With the help of the Corpus, it will be possible to create new teaching materials and also carry out different Latvian grammar studies, etc. Based on the Corpus, it will also be possible to analyse different aspects of Latvian language acquisition of students from different Latvian schools, find out what is most difficult for students, and later work with various Latvian language acquisition issues. “Corpus of Student’s Essays” is a specialized language corpus that contains texts limited to one or more subject areas, domains, topics, etc. The corpus includes 468 essays from a class 12 Latvian language exam (approximately 185 000 running words). These are works by students of Latvian-language secondary schools, minority schools, and state gymnasiums from Kurzeme, Latgale, and Riga. “Corpus of Student’s Essays” contains uncorrected texts, retaining all typing, punctuation, etc. mistakes made by students. The corpus is automatically morphologically tagged. “Corpus of Student’s Essays”, like other language corpora, is developed in several stages. The beginning of the creation of any corpus is related to the definition of text selection criteria, then the selection of texts for the corpus, digitization, if necessary, editing, morphological annotation, creation and addition of a metadata set, etc. Since the corpus is not just texts, there is also an infrastructure for the collecting and processing of corpus data, as well as an interface for the use of the corpus. This article will deal with the structure and development stages of the “Corpus of Student’s Essays”, as well as the problems and solutions associated with it.","PeriodicalId":231190,"journal":{"name":"Latviešu valodas apguve. XIII Starptautiskais baltistu kongress : rakstu krājums","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Latviešu valodas apguve. XIII Starptautiskais baltistu kongress : rakstu krājums","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37384/lva.2021.106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The popularity of teaching Latvian as a mother tongue has always been of great importance. Although various teaching materials have been developed over time, corpus-based learning materials are becoming more and more popular nowadays. Since December 2018, the Institute of Mathematics and Computer Science of the University of Latvia has carried out the Latvian State Research Programme “Latvian Language”, agreement No. VPP-IZM-2018/2-0002 (subproject “Acquisi-tion of Latvian Language”). During the project, “The Corpus of Students’ Essays” will be created. With the help of the Corpus, it will be possible to create new teaching materials and also carry out different Latvian grammar studies, etc. Based on the Corpus, it will also be possible to analyse different aspects of Latvian language acquisition of students from different Latvian schools, find out what is most difficult for students, and later work with various Latvian language acquisition issues. “Corpus of Student’s Essays” is a specialized language corpus that contains texts limited to one or more subject areas, domains, topics, etc. The corpus includes 468 essays from a class 12 Latvian language exam (approximately 185 000 running words). These are works by students of Latvian-language secondary schools, minority schools, and state gymnasiums from Kurzeme, Latgale, and Riga. “Corpus of Student’s Essays” contains uncorrected texts, retaining all typing, punctuation, etc. mistakes made by students. The corpus is automatically morphologically tagged. “Corpus of Student’s Essays”, like other language corpora, is developed in several stages. The beginning of the creation of any corpus is related to the definition of text selection criteria, then the selection of texts for the corpus, digitization, if necessary, editing, morphological annotation, creation and addition of a metadata set, etc. Since the corpus is not just texts, there is also an infrastructure for the collecting and processing of corpus data, as well as an interface for the use of the corpus. This article will deal with the structure and development stages of the “Corpus of Student’s Essays”, as well as the problems and solutions associated with it.