CHAMP: Characterizing Undesired App Behaviors from User Comments Based on Market Policies

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-03-01 DOI:10.1109/ICSE43902.2021.00089

Yangyu Hu, Haoyu Wang, Tiantong Ji, Xusheng Xiao, Xiapu Luo, Peng Gao, Yao Guo

{"title":"CHAMP: Characterizing Undesired App Behaviors from User Comments Based on Market Policies","authors":"Yangyu Hu, Haoyu Wang, Tiantong Ji, Xusheng Xiao, Xiapu Luo, Peng Gao, Yao Guo","doi":"10.1109/ICSE43902.2021.00089","DOIUrl":null,"url":null,"abstract":"Millions of mobile apps have been available through various app markets. Although most app markets have enforced a number of automated or even manual mechanisms to vet each app before it is released to the market, thousands of low-quality apps still exist in different markets, some of which violate the explicitly specified market policies. In order to identify these violations accurately and timely, we resort to user comments, which can form an immediate feedback for app market maintainers, to identify undesired behaviors that violate market policies, including security-related user concerns. Specifically, we present the first large-scale study to detect and characterize the correlations between user comments and market policies. First, we propose CHAMP, an approach that adopts text mining and natural language processing (NLP) techniques to extract semantic rules through a semi-automated process, and classifies comments into 26 pre-defined types of undesired behaviors that violate market policies. Our evaluation on real-world user comments shows that it achieves both high precision and recall (>0.9) in classifying comments for undesired behaviors. Then, we curate a large-scale comment dataset (over 3 million user comments) from apps in Google Play and 8 popular alternative Android app markets, and apply CHAMP to understand the characteristics of undesired behavior comments in the wild. The results confirm our speculation that user comments can be used to pinpoint suspicious apps that violate policies declared by app markets. The study also reveals that policy violations are widespread in many app markets despite their extensive vetting efforts. CHAMP can be a whistle blower that assigns policy-violation scores and identifies most informative comments for apps.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Millions of mobile apps have been available through various app markets. Although most app markets have enforced a number of automated or even manual mechanisms to vet each app before it is released to the market, thousands of low-quality apps still exist in different markets, some of which violate the explicitly specified market policies. In order to identify these violations accurately and timely, we resort to user comments, which can form an immediate feedback for app market maintainers, to identify undesired behaviors that violate market policies, including security-related user concerns. Specifically, we present the first large-scale study to detect and characterize the correlations between user comments and market policies. First, we propose CHAMP, an approach that adopts text mining and natural language processing (NLP) techniques to extract semantic rules through a semi-automated process, and classifies comments into 26 pre-defined types of undesired behaviors that violate market policies. Our evaluation on real-world user comments shows that it achieves both high precision and recall (>0.9) in classifying comments for undesired behaviors. Then, we curate a large-scale comment dataset (over 3 million user comments) from apps in Google Play and 8 popular alternative Android app markets, and apply CHAMP to understand the characteristics of undesired behavior comments in the wild. The results confirm our speculation that user comments can be used to pinpoint suspicious apps that violate policies declared by app markets. The study also reveals that policy violations are widespread in many app markets despite their extensive vetting efforts. CHAMP can be a whistle blower that assigns policy-violation scores and identifies most informative comments for apps.

查看原文本刊更多论文

CHAMP:基于市场政策，从用户评论中描述不受欢迎的应用行为

数以百万计的移动应用程序可以通过各种应用程序市场获得。尽管大多数应用市场在应用发布前都有自动甚至手动的审核机制，但仍有成千上万的低质量应用存在于不同的市场中，其中一些违反了明确规定的市场政策。为了准确及时地识别这些违规行为，我们求助于用户评论，这可以为应用市场维护者提供即时反馈，以识别违反市场政策的不良行为，包括与安全相关的用户关注点。具体来说，我们提出了第一个大规模的研究来检测和表征用户评论和市场政策之间的相关性。首先，我们提出了CHAMP，这是一种采用文本挖掘和自然语言处理(NLP)技术通过半自动过程提取语义规则的方法，并将评论分类为26种预定义的违反市场政策的不希望的行为类型。我们对真实世界用户评论的评估表明，它在对不希望的行为进行评论分类方面实现了高精度和召回率(>0.9)。然后，我们从Google Play和8个流行的Android应用市场中收集了一个大规模的评论数据集(超过300万条用户评论)，并应用CHAMP来了解不受欢迎的行为评论的特征。结果证实了我们的猜测，即用户评论可以用来查明违反应用市场政策的可疑应用。该研究还显示，尽管许多应用市场进行了广泛的审查，但违反政策的情况仍然普遍存在。CHAMP可以是一个告密者，为应用程序分配违反政策的分数，并识别最有信息的评论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量