{"title":"Eesti keele ühendverbide kompositsionaalsuse määramine","authors":"Eleri Aedmaa","doi":"10.5128/ERYA12.01","DOIUrl":null,"url":null,"abstract":"Keele automaattootluse jaoks on pusiuhendite tuvastamine oluline ulesanne, mille lahendamiseks on puutud uhendeid eri meetodeid rakendades automaatselt klassifitseerida ning nende kompositsionaalsust maarata. Artiklis rakendatakse sonadevahelise seose tugevuse mootmise statistilisi meetodeid eesti keele uhendverbide automaatseks klassifitseerimiseks nende tahenduse moodustamise viisi alusel ning vaadeldakse, millise meetodi tulemused on koige paremad ja kas need on piisavalt head, et uhendverbide jaotus voiks sellele meetodile tugineda. Uurimuse pohieesmark on valja selgitada, kas distributiivse semantika vahendeid rakendades on voimalik automaatselt kindlaks maarata eesti keele pusiuhendite kompositsionaalsuse taset. Selleks tutvustatakse ja rakendatakse distributiivsel semantikal pohinevat tarkvara word2vec. Detecting the compositionality of Estonian particle verbs The purposes of this article are to automatically classify Estonian particle verbs and detect their degree of compositionality. In order to group particle verbs, the lexical association measures (AMs) are compared. For the detection of the degree of compositionality of Estonian particle verbs, a model based on distributional semantics is used. The experiment is carried out with the word2vec tool, using a continuous bag-of-words model which predicts the word given its context. The analysis of the comparison of AMs revealed that none of the AMs used achieve high enough precision values to classify the particle verbs. Hence, it can be assumed that Estonian particle verbs cannot be divided cleanly into the classes of compositional and non-compositional particle verbs, but rather populate a continuum between entirely compositional and entirely non-compositional expressions. The experiment of assessing the degree of compositionality of the particle verbs using distributional semantic model proved successful. It is demonstrated that the value of cosine similarity can predict the degree of compositionality of particle verbs. However, in order to evaluate the method introduced here, it is important to create a ranking of human judgement on semantic compositionality for a series of particle verbs and base verbs to which they correspond.","PeriodicalId":35118,"journal":{"name":"Eesti Rakenduslingvistika Uhingu Aastaraamat","volume":"7 1","pages":"5-23"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eesti Rakenduslingvistika Uhingu Aastaraamat","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5128/ERYA12.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 0
Abstract
Keele automaattootluse jaoks on pusiuhendite tuvastamine oluline ulesanne, mille lahendamiseks on puutud uhendeid eri meetodeid rakendades automaatselt klassifitseerida ning nende kompositsionaalsust maarata. Artiklis rakendatakse sonadevahelise seose tugevuse mootmise statistilisi meetodeid eesti keele uhendverbide automaatseks klassifitseerimiseks nende tahenduse moodustamise viisi alusel ning vaadeldakse, millise meetodi tulemused on koige paremad ja kas need on piisavalt head, et uhendverbide jaotus voiks sellele meetodile tugineda. Uurimuse pohieesmark on valja selgitada, kas distributiivse semantika vahendeid rakendades on voimalik automaatselt kindlaks maarata eesti keele pusiuhendite kompositsionaalsuse taset. Selleks tutvustatakse ja rakendatakse distributiivsel semantikal pohinevat tarkvara word2vec. Detecting the compositionality of Estonian particle verbs The purposes of this article are to automatically classify Estonian particle verbs and detect their degree of compositionality. In order to group particle verbs, the lexical association measures (AMs) are compared. For the detection of the degree of compositionality of Estonian particle verbs, a model based on distributional semantics is used. The experiment is carried out with the word2vec tool, using a continuous bag-of-words model which predicts the word given its context. The analysis of the comparison of AMs revealed that none of the AMs used achieve high enough precision values to classify the particle verbs. Hence, it can be assumed that Estonian particle verbs cannot be divided cleanly into the classes of compositional and non-compositional particle verbs, but rather populate a continuum between entirely compositional and entirely non-compositional expressions. The experiment of assessing the degree of compositionality of the particle verbs using distributional semantic model proved successful. It is demonstrated that the value of cosine similarity can predict the degree of compositionality of particle verbs. However, in order to evaluate the method introduced here, it is important to create a ranking of human judgement on semantic compositionality for a series of particle verbs and base verbs to which they correspond.