{"title":"Thai text topic modeling system for discovering group interests of Facebook young adult users","authors":"Rachsuda Jiamthapthaksin","doi":"10.1109/ICSITECH.2016.7852614","DOIUrl":null,"url":null,"abstract":"Facebook is the largest digital social network in the world, and is the most popular social network in Thailand. This paper proposes Thai text topic modeling system that turns Facebook posts into valuable users' group interests. Latent Dirichlet Allocation (LDA) for topic modeling, if applied directly on Thai text posts, does not capture well the group interests due to unique characteristics of the data like intentional typo. The main contributions of the paper include the integration of Thai slangs from posts for extracting Thai words, insertion and stop words removal, slang stemming, and applying LDA for seed word acquisition and topic modeling enhancement. The experiments performed on Thai Facebook posts of undergraduate student volunteers at Assumption University was used to demonstrate feature size reduction, model enhancement, and discovery of meaningful group interests.","PeriodicalId":447090,"journal":{"name":"2016 2nd International Conference on Science in Information Technology (ICSITech)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Science in Information Technology (ICSITech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSITECH.2016.7852614","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Facebook is the largest digital social network in the world, and is the most popular social network in Thailand. This paper proposes Thai text topic modeling system that turns Facebook posts into valuable users' group interests. Latent Dirichlet Allocation (LDA) for topic modeling, if applied directly on Thai text posts, does not capture well the group interests due to unique characteristics of the data like intentional typo. The main contributions of the paper include the integration of Thai slangs from posts for extracting Thai words, insertion and stop words removal, slang stemming, and applying LDA for seed word acquisition and topic modeling enhancement. The experiments performed on Thai Facebook posts of undergraduate student volunteers at Assumption University was used to demonstrate feature size reduction, model enhancement, and discovery of meaningful group interests.