Peeking into the Gray Area of Mobile World: An Empirical Study of Unlabeled Android Apps

Sen Chen, Lingling Fan, Cuiyun Gao, Fu Song, Yang Liu
{"title":"Peeking into the Gray Area of Mobile World: An Empirical Study of Unlabeled Android Apps","authors":"Sen Chen, Lingling Fan, Cuiyun Gao, Fu Song, Yang Liu","doi":"10.1109/ISSRE52982.2021.00065","DOIUrl":null,"url":null,"abstract":"For the real-world dataset collected by our industrial partner, Pwnzen Infotech Inc., one of the leading industrial security companies, there are a large number of unlabeled Android applications (called unlabeled apps in this paper) that are unlikely to belong to known Android malware families nor ordinary benign apps according to the industrial black-list (i.e., signatures) and white-list (i.e., certificates). However, such apps have rarely been studied previously, but are important to peek into the gray area of mobile world. It is a time-consuming task for software analysts to understand the negative characteristics of these samples, which would lead to potential security or privacy threats for app users, significantly negative impacts on mobile system performance, and bad user experience, etc. To investigate the characteristics of these industrial unlabeled apps in a large-scale in practice, and provide insights to industrial software analysts as well as research communities, we collect a large-scale dataset of unlabeled apps (i.e., 22,886 in total) from our industrial partners. Given the common industrial perception of software analysts that a high percentage of these unlabeled apps could have some similar behaviors, we leverage the popular community-detection techniques based on widely-used app features in mal ware detection to cluster these unlabeled apps. After that, we investigate the common behaviors for different clusters with substantial human efforts and also conduct cross-validation across co-authors to check the results. Our manual analysis unveils the characteristics of these unlabeled apps by sampling data from different clusters, and discovers 11 categories, some of which have never been discovered by previous grayware research. Besides, from our exploration, we find that the community-based techniques are not effective enough in clustering unlabeled apps, so that manual analysis is encouraged. Manual analysis is an important first step towards studying unlabeled apps and understanding their characteristics. Finally, we highlight the lessons learned through real case studies, comparison study with existing malware/grayware research, in-depth discussion with industrial partners, and feedback from industrial partners.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE52982.2021.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

For the real-world dataset collected by our industrial partner, Pwnzen Infotech Inc., one of the leading industrial security companies, there are a large number of unlabeled Android applications (called unlabeled apps in this paper) that are unlikely to belong to known Android malware families nor ordinary benign apps according to the industrial black-list (i.e., signatures) and white-list (i.e., certificates). However, such apps have rarely been studied previously, but are important to peek into the gray area of mobile world. It is a time-consuming task for software analysts to understand the negative characteristics of these samples, which would lead to potential security or privacy threats for app users, significantly negative impacts on mobile system performance, and bad user experience, etc. To investigate the characteristics of these industrial unlabeled apps in a large-scale in practice, and provide insights to industrial software analysts as well as research communities, we collect a large-scale dataset of unlabeled apps (i.e., 22,886 in total) from our industrial partners. Given the common industrial perception of software analysts that a high percentage of these unlabeled apps could have some similar behaviors, we leverage the popular community-detection techniques based on widely-used app features in mal ware detection to cluster these unlabeled apps. After that, we investigate the common behaviors for different clusters with substantial human efforts and also conduct cross-validation across co-authors to check the results. Our manual analysis unveils the characteristics of these unlabeled apps by sampling data from different clusters, and discovers 11 categories, some of which have never been discovered by previous grayware research. Besides, from our exploration, we find that the community-based techniques are not effective enough in clustering unlabeled apps, so that manual analysis is encouraged. Manual analysis is an important first step towards studying unlabeled apps and understanding their characteristics. Finally, we highlight the lessons learned through real case studies, comparison study with existing malware/grayware research, in-depth discussion with industrial partners, and feedback from industrial partners.
窥视移动世界的灰色地带:对未标记Android应用的实证研究
对于我们的工业合作伙伴Pwnzen Infotech Inc.(领先的工业安全公司之一)收集的真实数据集,根据工业黑名单(即签名)和白名单(即证书),有大量未标记的Android应用程序(本文称为未标记应用程序)不太可能属于已知的Android恶意软件家族,也不属于普通的良性应用程序。然而,这类应用之前很少被研究过,但窥视移动世界的灰色地带是很重要的。对于软件分析人员来说,了解这些样本的负面特征是一项耗时的任务,这可能会对应用程序用户造成潜在的安全或隐私威胁,对移动系统性能产生显著的负面影响,以及不良的用户体验等。为了在实践中大规模调查这些工业未标记应用程序的特征,并为工业软件分析师和研究社区提供见解,我们从工业合作伙伴那里收集了未标记应用程序的大规模数据集(即总共22,886个)。鉴于软件分析师普遍认为,这些未标记的应用程序中有很大一部分可能有类似的行为,我们利用流行的社区检测技术,基于恶意软件检测中广泛使用的应用程序功能,对这些未标记的应用程序进行聚类。之后,我们用大量的人力调查不同集群的共同行为,并在共同作者之间进行交叉验证以检查结果。我们的人工分析揭示了这些未标记应用程序的特征,通过从不同的集群中采样数据,并发现了11个类别,其中一些从未被以前的灰色软件研究发现。此外,通过我们的探索,我们发现基于社区的技术对未标记应用的聚类不够有效,因此我们鼓励人工分析。手动分析是研究未标记应用程序并了解其特征的重要第一步。最后,我们强调了通过真实案例研究、与现有恶意软件/灰色软件研究的比较研究、与行业合作伙伴的深入讨论以及来自行业合作伙伴的反馈得出的经验教训。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信