{"title":"印尼语词性标注器的研究","authors":"R. S. Yuwana, A. R. Yuliani, H. Pardede","doi":"10.1109/ICITISEE.2017.8285530","DOIUrl":null,"url":null,"abstract":"In this paper we present an evaluation of six popular methods for Part-of-Speech (POS) tagging tasks of Indonesian language. They are Unigram, Hidden Markov Model, TnT, Brills, Naive Bayes, and Maximum Entropy taggers. Indonesian language, while is one of most spoken language in the world has very limited data for POS tagging tasks. Therefore, it is interesting to investigate and evaluate some popular approaches in POS tagging when dealing for such conditions. The results of our experiments show that Maximum Entropy provides the highest accuracy of all methods. It is consistently better even when the size of the training data is varied.","PeriodicalId":130873,"journal":{"name":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"On part of speech tagger for Indonesian language\",\"authors\":\"R. S. Yuwana, A. R. Yuliani, H. Pardede\",\"doi\":\"10.1109/ICITISEE.2017.8285530\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present an evaluation of six popular methods for Part-of-Speech (POS) tagging tasks of Indonesian language. They are Unigram, Hidden Markov Model, TnT, Brills, Naive Bayes, and Maximum Entropy taggers. Indonesian language, while is one of most spoken language in the world has very limited data for POS tagging tasks. Therefore, it is interesting to investigate and evaluate some popular approaches in POS tagging when dealing for such conditions. The results of our experiments show that Maximum Entropy provides the highest accuracy of all methods. It is consistently better even when the size of the training data is varied.\",\"PeriodicalId\":130873,\"journal\":{\"name\":\"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITISEE.2017.8285530\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITISEE.2017.8285530","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper we present an evaluation of six popular methods for Part-of-Speech (POS) tagging tasks of Indonesian language. They are Unigram, Hidden Markov Model, TnT, Brills, Naive Bayes, and Maximum Entropy taggers. Indonesian language, while is one of most spoken language in the world has very limited data for POS tagging tasks. Therefore, it is interesting to investigate and evaluate some popular approaches in POS tagging when dealing for such conditions. The results of our experiments show that Maximum Entropy provides the highest accuracy of all methods. It is consistently better even when the size of the training data is varied.