{"title":"基于LDA相似度和TF-ICF的移动应用评论标注","authors":"A. Puspaningrum, D. Siahaan, C. Fatichah","doi":"10.1109/ICITEED.2018.8534785","DOIUrl":null,"url":null,"abstract":"User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.","PeriodicalId":142523,"journal":{"name":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Mobile App Review Labeling Using LDA Similarity and Term Frequency-Inverse Cluster Frequency (TF-ICF)\",\"authors\":\"A. Puspaningrum, D. Siahaan, C. Fatichah\",\"doi\":\"10.1109/ICITEED.2018.8534785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.\",\"PeriodicalId\":142523,\"journal\":{\"name\":\"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITEED.2018.8534785\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEED.2018.8534785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mobile App Review Labeling Using LDA Similarity and Term Frequency-Inverse Cluster Frequency (TF-ICF)
User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.