Onur Açıkgöz, Ali Tunca Gürkan, Burak Ertopçu, Ozan Topsakal, Berke Özenç, Ali Bugra Kanburoglu, Ilker Çam, Begüm Avar, Gökhan Ercan, O. T. Yildiz
{"title":"All-words word sense disambiguation for Turkish","authors":"Onur Açıkgöz, Ali Tunca Gürkan, Burak Ertopçu, Ozan Topsakal, Berke Özenç, Ali Bugra Kanburoglu, Ilker Çam, Begüm Avar, Gökhan Ercan, O. T. Yildiz","doi":"10.1109/UBMK.2017.8093442","DOIUrl":null,"url":null,"abstract":"Identifying the sense of a word within a context is a challenging problem and has many applications in natural language processing. This assignment problem is called word sense disambiguation (WSD). Many papers in the literature focus on English language and data. Our dataset consists of 1400 sentences translated to Turkish from the Penn Treebank Corpus. This paper seeks to address and discuss 6 different feature extraction methods and its classification performances using C4.5, Random Forests, Rocchio, Naive Bayes, KNN, Linear and multilayer Perceptron. This paper calls into question how the described features perform on a morphologically rich language (Turkish) with several classifiers.","PeriodicalId":201903,"journal":{"name":"2017 International Conference on Computer Science and Engineering (UBMK)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK.2017.8093442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Identifying the sense of a word within a context is a challenging problem and has many applications in natural language processing. This assignment problem is called word sense disambiguation (WSD). Many papers in the literature focus on English language and data. Our dataset consists of 1400 sentences translated to Turkish from the Penn Treebank Corpus. This paper seeks to address and discuss 6 different feature extraction methods and its classification performances using C4.5, Random Forests, Rocchio, Naive Bayes, KNN, Linear and multilayer Perceptron. This paper calls into question how the described features perform on a morphologically rich language (Turkish) with several classifiers.