Eesti keele ühendverbide kompositsionaalsuse määramine

Q2 Arts and Humanities
Eleri Aedmaa
{"title":"Eesti keele ühendverbide kompositsionaalsuse määramine","authors":"Eleri Aedmaa","doi":"10.5128/ERYA12.01","DOIUrl":null,"url":null,"abstract":"Keele automaattootluse jaoks on pusiuhendite tuvastamine oluline ulesanne, mille lahendamiseks on puutud uhendeid eri meetodeid rakendades automaatselt klassifitseerida ning nende kompositsionaalsust maarata. Artiklis rakendatakse sonadevahelise seose tugevuse mootmise statistilisi meetodeid eesti keele uhendverbide automaatseks klassifitseerimiseks nende tahenduse moodustamise viisi alusel ning vaadeldakse, millise meetodi tulemused on koige paremad ja kas need on piisavalt head, et uhendverbide jaotus voiks sellele meetodile tugineda. Uurimuse pohieesmark on valja selgitada, kas distributiivse semantika vahendeid rakendades on voimalik automaatselt kindlaks maarata eesti keele pusiuhendite kompositsionaalsuse taset. Selleks tutvustatakse ja rakendatakse distributiivsel semantikal pohinevat tarkvara word2vec.  Detecting the compositionality of Estonian particle verbs The purposes of this article are to automatically classify Estonian particle verbs and detect their degree of compositionality. In order to group particle verbs, the lexical association measures (AMs) are compared. For the detection of the degree of compositionality of Estonian particle verbs, a model based on distributional semantics is used. The experiment is carried out with the word2vec tool, using a continuous bag-of-words model which predicts the word given its context. The analysis of the comparison of AMs revealed that none of the AMs used achieve high enough precision values to classify the particle verbs. Hence, it can be assumed that Estonian particle verbs cannot be divided cleanly into the classes of compositional and non-compositional particle verbs, but rather populate a continuum between entirely compositional and entirely non-compositional expressions. The experiment of assessing the degree of compositionality of the particle verbs using distributional semantic model proved successful. It is demonstrated that the value of cosine similarity can predict the degree of compositionality of particle verbs. However, in order to evaluate the method introduced here, it is important to create a ranking of human judgement on semantic compositionality for a series of particle verbs and base verbs to which they correspond.","PeriodicalId":35118,"journal":{"name":"Eesti Rakenduslingvistika Uhingu Aastaraamat","volume":"7 1","pages":"5-23"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eesti Rakenduslingvistika Uhingu Aastaraamat","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5128/ERYA12.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 0

Abstract

Keele automaattootluse jaoks on pusiuhendite tuvastamine oluline ulesanne, mille lahendamiseks on puutud uhendeid eri meetodeid rakendades automaatselt klassifitseerida ning nende kompositsionaalsust maarata. Artiklis rakendatakse sonadevahelise seose tugevuse mootmise statistilisi meetodeid eesti keele uhendverbide automaatseks klassifitseerimiseks nende tahenduse moodustamise viisi alusel ning vaadeldakse, millise meetodi tulemused on koige paremad ja kas need on piisavalt head, et uhendverbide jaotus voiks sellele meetodile tugineda. Uurimuse pohieesmark on valja selgitada, kas distributiivse semantika vahendeid rakendades on voimalik automaatselt kindlaks maarata eesti keele pusiuhendite kompositsionaalsuse taset. Selleks tutvustatakse ja rakendatakse distributiivsel semantikal pohinevat tarkvara word2vec.  Detecting the compositionality of Estonian particle verbs The purposes of this article are to automatically classify Estonian particle verbs and detect their degree of compositionality. In order to group particle verbs, the lexical association measures (AMs) are compared. For the detection of the degree of compositionality of Estonian particle verbs, a model based on distributional semantics is used. The experiment is carried out with the word2vec tool, using a continuous bag-of-words model which predicts the word given its context. The analysis of the comparison of AMs revealed that none of the AMs used achieve high enough precision values to classify the particle verbs. Hence, it can be assumed that Estonian particle verbs cannot be divided cleanly into the classes of compositional and non-compositional particle verbs, but rather populate a continuum between entirely compositional and entirely non-compositional expressions. The experiment of assessing the degree of compositionality of the particle verbs using distributional semantic model proved successful. It is demonstrated that the value of cosine similarity can predict the degree of compositionality of particle verbs. However, in order to evaluate the method introduced here, it is important to create a ranking of human judgement on semantic compositionality for a series of particle verbs and base verbs to which they correspond.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Eesti Rakenduslingvistika Uhingu Aastaraamat
Eesti Rakenduslingvistika Uhingu Aastaraamat Arts and Humanities-Language and Linguistics
CiteScore
0.90
自引率
0.00%
发文量
19
审稿时长
28 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信