Heade näitelausete automaattuvastamine eesti keele õppesõnastike jaoks

Q2 Arts and Humanities
Kristina Koppel
{"title":"Heade näitelausete automaattuvastamine eesti keele õppesõnastike jaoks","authors":"Kristina Koppel","doi":"10.5128/ERYA13.04","DOIUrl":null,"url":null,"abstract":"Artiklis keskendutakse tooriista Good Dictionary Example ehk GDEX (Kilgarriff jt 2008) eesti mooduli versiooni 1.4 loomisele. GDEX on tooriist, mis aitab sonastiku naitelauseks sobivaid korpuslauseid automaatselt tuvastada. GDEX-i moodul on seni loodud inglise, sloveeni, hollandi, portugali, hispaania, jaapani ja eesti keele jaoks. Siinses artiklis seletatakse esmalt lahti tooriista uldised toopohimotted. Seejarel keskendutakse naitelauseid tuvastavate parameetrite statistilisele analuusile ja parameetrite vaartuste maaramisele. Parameetrite vaartuste hindamisele ning eri moodulite vordlusele toetudes pakutakse valja eesti mooduli uus versioon 1.4.  \"Automatic detection of good dictionary examples in Estonian learner’s dictionaries\" This paper explains, firstly, how a tool called Good Dictionary Example (GDEX) (Kilgarriff et. al 2008) scores corpus sentences and helps the lexicographer automatically select the best examples for dictionaries. Secondly, the training datasets containing example sentences from the Estonian Collocations Dictionary (ECD) are introduced. Thirdly, the paper focuses on different parameters of good dictionary examples. Most of the paper is based on an analysis of the training datasets and an evaluation of the previous GDEX configurations. For evaluating the configurations, the graphical user interface GDEX Editor was used. Based on the results of statistical analysis and on the evaluation of different configurations, a new configuration 1.4 is introduced. There are 16 new parameters implemented in GDEX 1.4. The main parameters of GDEX 1.4 are as follows: the desired sentence is a full sentence; sentence length is 4–20 tokens; the sentence contains a verb; it does not contain low frequency words or words from the blacklist; the optimal length is 6–12 tokens; sentences containing more than 1 adverb, pronoun, proper name, numeral, conjunction, comma, more than 2 verbs and sentences containing certain pronouns are penalized. The output of GDEX 1.4 can be applied to the ECD project and to create a web interface SkELL for learners of Estonian.","PeriodicalId":35118,"journal":{"name":"Eesti Rakenduslingvistika Uhingu Aastaraamat","volume":"18 1","pages":"53-71"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eesti Rakenduslingvistika Uhingu Aastaraamat","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5128/ERYA13.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 5

Abstract

Artiklis keskendutakse tooriista Good Dictionary Example ehk GDEX (Kilgarriff jt 2008) eesti mooduli versiooni 1.4 loomisele. GDEX on tooriist, mis aitab sonastiku naitelauseks sobivaid korpuslauseid automaatselt tuvastada. GDEX-i moodul on seni loodud inglise, sloveeni, hollandi, portugali, hispaania, jaapani ja eesti keele jaoks. Siinses artiklis seletatakse esmalt lahti tooriista uldised toopohimotted. Seejarel keskendutakse naitelauseid tuvastavate parameetrite statistilisele analuusile ja parameetrite vaartuste maaramisele. Parameetrite vaartuste hindamisele ning eri moodulite vordlusele toetudes pakutakse valja eesti mooduli uus versioon 1.4.  "Automatic detection of good dictionary examples in Estonian learner’s dictionaries" This paper explains, firstly, how a tool called Good Dictionary Example (GDEX) (Kilgarriff et. al 2008) scores corpus sentences and helps the lexicographer automatically select the best examples for dictionaries. Secondly, the training datasets containing example sentences from the Estonian Collocations Dictionary (ECD) are introduced. Thirdly, the paper focuses on different parameters of good dictionary examples. Most of the paper is based on an analysis of the training datasets and an evaluation of the previous GDEX configurations. For evaluating the configurations, the graphical user interface GDEX Editor was used. Based on the results of statistical analysis and on the evaluation of different configurations, a new configuration 1.4 is introduced. There are 16 new parameters implemented in GDEX 1.4. The main parameters of GDEX 1.4 are as follows: the desired sentence is a full sentence; sentence length is 4–20 tokens; the sentence contains a verb; it does not contain low frequency words or words from the blacklist; the optimal length is 6–12 tokens; sentences containing more than 1 adverb, pronoun, proper name, numeral, conjunction, comma, more than 2 verbs and sentences containing certain pronouns are penalized. The output of GDEX 1.4 can be applied to the ECD project and to create a web interface SkELL for learners of Estonian.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Eesti Rakenduslingvistika Uhingu Aastaraamat
Eesti Rakenduslingvistika Uhingu Aastaraamat Arts and Humanities-Language and Linguistics
CiteScore
0.90
自引率
0.00%
发文量
19
审稿时长
28 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信