Mobile App Review Labeling Using LDA Similarity and Term Frequency-Inverse Cluster Frequency (TF-ICF)

2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE) Pub Date : 2018-07-01 DOI:10.1109/ICITEED.2018.8534785

A. Puspaningrum, D. Siahaan, C. Fatichah

{"title":"Mobile App Review Labeling Using LDA Similarity and Term Frequency-Inverse Cluster Frequency (TF-ICF)","authors":"A. Puspaningrum, D. Siahaan, C. Fatichah","doi":"10.1109/ICITEED.2018.8534785","DOIUrl":null,"url":null,"abstract":"User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.","PeriodicalId":142523,"journal":{"name":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEED.2018.8534785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.

查看原文本刊更多论文

基于LDA相似度和TF-ICF的移动应用评论标注

用户评论挖掘吸引了许多研究人员分析和开发创新模型。这些模型为软件开发人员在软件维护和软件发展过程中做出决策提供了技术建议。其中一个建议是用户评论分类。有许多分类已经被广泛使用，即bug错误、特性请求和非信息性。有许多方法可以对用户评论进行分类。其中一种分类方法是潜狄利克雷分配(LDA)。LDA是一种能够映射文档中隐藏主题的主题建模方法。因此，将隐藏主题映射到类别的技术之一是计算隐藏主题与预定义能指术语表之间的术语相似度值。然而，每个类别有限的能指术语列表成为一个问题。同时，项频率-逆语料库频率(TF-ICF)能够提取集群上的重要项。为此，本文提出了一种将TF-ICF与基于相似度的LDA聚类(LDAS TF-ICF)相结合的方法来克服它。分类结果采用查全率、查全率和f1分进行计算。结果表明，该方法优于LDA。当使用75%的扩展词表时，LDAS TF-ICF的精度、召回率和dan f-measure值分别为0.564、0.507和0.491，表现最佳。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)

自引率

0.00%

发文量