{"title":"Stem-based PoS tagging for agglutinative languages","authors":"Necva Bölücü, Burcu Can","doi":"10.1109/SIU.2017.7960386","DOIUrl":null,"url":null,"abstract":"Words are made up of morphemes being glued together in agglutinative languages. This makes it difficult to perform part-of-speech tagging for these languages due to sparsity. In this paper, we present two Hidden Markov Model based Bayesian PoS tagging models for agglutinative languages. Our first model is word-based and the second model is stem-based where the stems of the words are obtained from other two unsupervised stemmers: HPS stemmer and Morfessor FlatCat. The results show that stemming improves the accuracy in PoS tagging. We present the results for Turkish as an agglutinative language and English as a morphologically poor language.","PeriodicalId":217576,"journal":{"name":"2017 25th Signal Processing and Communications Applications Conference (SIU)","volume":"1966 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 25th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2017.7960386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Words are made up of morphemes being glued together in agglutinative languages. This makes it difficult to perform part-of-speech tagging for these languages due to sparsity. In this paper, we present two Hidden Markov Model based Bayesian PoS tagging models for agglutinative languages. Our first model is word-based and the second model is stem-based where the stems of the words are obtained from other two unsupervised stemmers: HPS stemmer and Morfessor FlatCat. The results show that stemming improves the accuracy in PoS tagging. We present the results for Turkish as an agglutinative language and English as a morphologically poor language.