Latviešu valodas apguvēju korpusa (LaVA) izmantošana pētniecībā un mācību uzdevumu izstrādē

Ilze Auziņa, Kristīne Levāne-Petrova, Roberts Darģis, Kristīne Pokratniece, Inga Kaija
{"title":"Latviešu valodas apguvēju korpusa (LaVA) izmantošana pētniecībā un mācību uzdevumu izstrādē","authors":"Ilze Auziņa, Kristīne Levāne-Petrova, Roberts Darģis, Kristīne Pokratniece, Inga Kaija","doi":"10.37384/lva.2021.142","DOIUrl":null,"url":null,"abstract":"The Latvian Language Learners Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia, includes more than 1000 texts created by foreign Latvian language learners studying at Latvian higher education institutions for the first or second semester reaching A1 (possibly A2) Latvian language proficiency level. The size of the corpus is more than 180 000 words. The morphologically annotated texts have been checked manually; the language learners' errors have been manually annotated. In addition, each text is accompanied by information about the author of the text (metadata): gender, age, native language, knowledge of other languages. When analysing the data, this information can be used to determine how the learner's mother tongue and language skills, in general, affect the acquisition of the Latvian language. Users of the corpus can analyse the data both on the LaVA website (see http://lava.korpuss.lv/search) and in the SketchEngine tool, where the quantitative and qualitative analysis of the data can be performed. The quantitative approach makes it possible to find out the tendencies of the use of a word, word form, or construction and allows to determine the frequency of mistakes made by language learners. In addition, the objectivity of the research is ensured by looking at the data of language learners from different aspects and performing repeated analysis. For example, by statistically analysing the nouns used in learners' texts, it can be concluded that declension 4 nouns are most often used. The next in terms of frequency of use are declension 1, 5 and 2 nouns, while declension 3 and 6 nouns and indeclinable nouns are used very rarely. Qualitative analysis reveals certain features of morphology and word formation, including aspects of syntax, based on empirical data. It is possible to qualitatively analyse the erroneous use of nouns, verbs, or other parts of speech, trying to understand what rules determine this. For example, consider using non-reflexive verbs instead of reflexive verbs, using infinitives instead of finite forms (person forms), using a suffix that does not fit the noun paradigm, etc. According to LaVA data analysis, including learners error analysis, exercises and tests are generated. The exercises are intended to help the language learner to strengthen the linguistic competence of the Latvian language, for example, the use of verb forms in the indicative mood, both in indefinite and perfect tense forms. Exercise creation consists of three stages: (1) analysis of LaVA errors and identification of typical errors, (2) Collecting of sample sentences from various corpora of the Latvian language, for example, LVK2018, Saeima, with word forms and constructions in which language learners most often make mistakes in LaVA texts, (3) generation of different exercises using the selected sample sentences.","PeriodicalId":231190,"journal":{"name":"Latviešu valodas apguve. XIII Starptautiskais baltistu kongress : rakstu krājums","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Latviešu valodas apguve. XIII Starptautiskais baltistu kongress : rakstu krājums","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37384/lva.2021.142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The Latvian Language Learners Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia, includes more than 1000 texts created by foreign Latvian language learners studying at Latvian higher education institutions for the first or second semester reaching A1 (possibly A2) Latvian language proficiency level. The size of the corpus is more than 180 000 words. The morphologically annotated texts have been checked manually; the language learners' errors have been manually annotated. In addition, each text is accompanied by information about the author of the text (metadata): gender, age, native language, knowledge of other languages. When analysing the data, this information can be used to determine how the learner's mother tongue and language skills, in general, affect the acquisition of the Latvian language. Users of the corpus can analyse the data both on the LaVA website (see http://lava.korpuss.lv/search) and in the SketchEngine tool, where the quantitative and qualitative analysis of the data can be performed. The quantitative approach makes it possible to find out the tendencies of the use of a word, word form, or construction and allows to determine the frequency of mistakes made by language learners. In addition, the objectivity of the research is ensured by looking at the data of language learners from different aspects and performing repeated analysis. For example, by statistically analysing the nouns used in learners' texts, it can be concluded that declension 4 nouns are most often used. The next in terms of frequency of use are declension 1, 5 and 2 nouns, while declension 3 and 6 nouns and indeclinable nouns are used very rarely. Qualitative analysis reveals certain features of morphology and word formation, including aspects of syntax, based on empirical data. It is possible to qualitatively analyse the erroneous use of nouns, verbs, or other parts of speech, trying to understand what rules determine this. For example, consider using non-reflexive verbs instead of reflexive verbs, using infinitives instead of finite forms (person forms), using a suffix that does not fit the noun paradigm, etc. According to LaVA data analysis, including learners error analysis, exercises and tests are generated. The exercises are intended to help the language learner to strengthen the linguistic competence of the Latvian language, for example, the use of verb forms in the indicative mood, both in indefinite and perfect tense forms. Exercise creation consists of three stages: (1) analysis of LaVA errors and identification of typical errors, (2) Collecting of sample sentences from various corpora of the Latvian language, for example, LVK2018, Saeima, with word forms and constructions in which language learners most often make mistakes in LaVA texts, (3) generation of different exercises using the selected sample sentences.
拉脱维亚语学习者语料库(LaVA)由拉脱维亚大学数学和计算机科学研究所开发,包括1000多篇由在拉脱维亚高等教育机构学习拉脱维亚语第一或第二学期达到A1(可能是A2)拉脱维亚语熟练程度的外国拉脱维亚语学习者编写的文本。语料库的大小超过18万字。形态学注释的文本已手工检查;语言学习者的错误已被手工标注。此外,每个文本都附有关于文本作者的信息(元数据):性别、年龄、母语、对其他语言的了解。在分析数据时,这些信息可用于确定学习者的母语和语言技能一般如何影响拉脱维亚语的习得。语料库的用户可以在LaVA网站(参见http://lava.korpuss.lv/search)和SketchEngine工具中分析数据,其中可以执行数据的定量和定性分析。定量的方法可以发现一个词、词形或结构的使用趋势,并可以确定语言学习者犯错误的频率。此外,通过从不同角度观察语言学习者的数据并进行重复分析,保证了研究的客观性。例如,通过统计分析学习者文本中使用的名词,我们可以得出这样的结论:变格名词的使用频率最高。在使用频率方面,其次是衰落1、5和2名词,而衰落3和6名词和不可衰落名词很少使用。定性分析揭示了基于经验数据的词法和构词法的某些特征,包括句法方面。有可能定性地分析名词、动词或其他词性的错误用法,试图理解是什么规则决定了这一点。例如,考虑使用非反身动词代替反身动词,使用不定式代替有限形式(人称形式),使用不符合名词范例的后缀,等等。根据LaVA数据分析,包括学习者错误分析,生成练习和测试。这些练习旨在帮助语言学习者加强拉脱维亚语的语言能力,例如,在指示语气中使用动词形式,包括不确定时态和完成时态形式。练习创作包括三个阶段:(1)分析LaVA错误并识别典型错误;(2)从拉脱维亚语的各种语料库中收集样句,例如LVK2018, Saeima,其中包含语言学习者在LaVA文本中最常犯错误的单词和结构;(3)使用所选的样句生成不同的练习。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信