{"title":"Applicability of Text-representing Centroids for Thai Language Documents","authors":"Sureeporn Nualnim, Nirach Romyen, M. Sodanil","doi":"10.1145/3342827.3342853","DOIUrl":null,"url":null,"abstract":"Text-representing centroids are investigated method recently used to categorize and compare documents written in European languages. As it will be shown, Asian languages and in particular Thai exhibit completely other language structures. Nevertheless, a strong justification will be given that the methodology of the text-representing centroids can be successfully applied to Thai documents, too. For the experiments, a corpus which contained 100 randomly selected articles from an offline Thai Wikipedia was used. The obtained centroids well reflect the topic of those documents as in the original publication. In addition, the centroids are quite suitable to compare any two files.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3342827.3342853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Text-representing centroids are investigated method recently used to categorize and compare documents written in European languages. As it will be shown, Asian languages and in particular Thai exhibit completely other language structures. Nevertheless, a strong justification will be given that the methodology of the text-representing centroids can be successfully applied to Thai documents, too. For the experiments, a corpus which contained 100 randomly selected articles from an offline Thai Wikipedia was used. The obtained centroids well reflect the topic of those documents as in the original publication. In addition, the centroids are quite suitable to compare any two files.