从应用语料库中挖掘框架使用图

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2018-03-01 DOI:10.1109/SANER.2018.8330216

Sergio Mover, S. Sankaranarayanan, Rhys Braginton Pettee Olsen, B. E. Chang

{"title":"从应用语料库中挖掘框架使用图","authors":"Sergio Mover, S. Sankaranarayanan, Rhys Braginton Pettee Olsen, B. E. Chang","doi":"10.1109/SANER.2018.8330216","DOIUrl":null,"url":null,"abstract":"We investigate the problem of mining graph-based usage patterns for large, object-oriented frameworks like Android—revisiting previous approaches based on graph-based object usage models (groums). Groums are a promising approach to represent usage patterns for object-oriented libraries because they simultaneously describe control flow and data dependencies between methods of multiple interacting object types. However, this expressivity comes at a cost: mining groums requires solving a subgraph isomorphism problem that is well known to be expensive. This cost limits the applicability of groum mining to large API frameworks. In this paper, we employ groum mining to learn usage patterns for object-oriented frameworks from program corpora. The central challenge is to scale groum mining so that it is sensitive to usages horizontally across programs from arbitrarily many developers (as opposed to simply usages vertically within the program of a single developer). To address this challenge, we develop a novel groum mining algorithm that scales on a large corpus of programs. We first use frequent itemset mining to restrict the search for groums to smaller subsets of methods in the given corpus. Then, we pose the subgraph isomorphism as a SAT problem and apply efficient pre-processing algorithms to rule out fruitless comparisons ahead of time. Finally, we identify containment relationships between clusters of groums to characterize popular usage patterns in the corpus (as well as classify less popular patterns as possible anomalies). We find that our approach scales on a corpus of over five hundred open source Android applications, effectively mining obligatory and best-practice usage patterns.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"18 1","pages":"277-289"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Mining framework usage graphs from app corpora\",\"authors\":\"Sergio Mover, S. Sankaranarayanan, Rhys Braginton Pettee Olsen, B. E. Chang\",\"doi\":\"10.1109/SANER.2018.8330216\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We investigate the problem of mining graph-based usage patterns for large, object-oriented frameworks like Android—revisiting previous approaches based on graph-based object usage models (groums). Groums are a promising approach to represent usage patterns for object-oriented libraries because they simultaneously describe control flow and data dependencies between methods of multiple interacting object types. However, this expressivity comes at a cost: mining groums requires solving a subgraph isomorphism problem that is well known to be expensive. This cost limits the applicability of groum mining to large API frameworks. In this paper, we employ groum mining to learn usage patterns for object-oriented frameworks from program corpora. The central challenge is to scale groum mining so that it is sensitive to usages horizontally across programs from arbitrarily many developers (as opposed to simply usages vertically within the program of a single developer). To address this challenge, we develop a novel groum mining algorithm that scales on a large corpus of programs. We first use frequent itemset mining to restrict the search for groums to smaller subsets of methods in the given corpus. Then, we pose the subgraph isomorphism as a SAT problem and apply efficient pre-processing algorithms to rule out fruitless comparisons ahead of time. Finally, we identify containment relationships between clusters of groums to characterize popular usage patterns in the corpus (as well as classify less popular patterns as possible anomalies). We find that our approach scales on a corpus of over five hundred open source Android applications, effectively mining obligatory and best-practice usage patterns.\",\"PeriodicalId\":6602,\"journal\":{\"name\":\"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"volume\":\"18 1\",\"pages\":\"277-289\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SANER.2018.8330216\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2018.8330216","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

我们研究了为大型面向对象框架(如android)挖掘基于图的使用模式的问题——重新审视了以前基于图的对象使用模型(组)的方法。组是表示面向对象库的使用模式的一种很有前途的方法，因为它们同时描述了多个交互对象类型的方法之间的控制流和数据依赖关系。然而，这种表达性是有代价的:挖掘群需要解决子图同构问题，这是众所周知的昂贵问题。这种成本限制了群挖掘对大型API框架的适用性。在本文中，我们使用群挖掘从程序语料库中学习面向对象框架的使用模式。核心挑战是扩展群挖掘，以便对任意多个开发人员的程序中的横向使用敏感(与单个开发人员的程序中的简单垂直使用相反)。为了应对这一挑战，我们开发了一种新的群挖掘算法，该算法可以在大型程序语料库上进行扩展。我们首先使用频繁项集挖掘来将组的搜索限制在给定语料库中更小的方法子集中。然后，我们将子图同构作为一个SAT问题，并应用有效的预处理算法提前排除无结果的比较。最后，我们确定组簇之间的包含关系，以表征语料库中流行的使用模式(以及将不太流行的模式分类为可能的异常)。我们发现我们的方法可以在超过500个开源Android应用程序的语料库上扩展，有效地挖掘必要的和最佳实践的使用模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mining framework usage graphs from app corpora

We investigate the problem of mining graph-based usage patterns for large, object-oriented frameworks like Android—revisiting previous approaches based on graph-based object usage models (groums). Groums are a promising approach to represent usage patterns for object-oriented libraries because they simultaneously describe control flow and data dependencies between methods of multiple interacting object types. However, this expressivity comes at a cost: mining groums requires solving a subgraph isomorphism problem that is well known to be expensive. This cost limits the applicability of groum mining to large API frameworks. In this paper, we employ groum mining to learn usage patterns for object-oriented frameworks from program corpora. The central challenge is to scale groum mining so that it is sensitive to usages horizontally across programs from arbitrarily many developers (as opposed to simply usages vertically within the program of a single developer). To address this challenge, we develop a novel groum mining algorithm that scales on a large corpus of programs. We first use frequent itemset mining to restrict the search for groums to smaller subsets of methods in the given corpus. Then, we pose the subgraph isomorphism as a SAT problem and apply efficient pre-processing algorithms to rule out fruitless comparisons ahead of time. Finally, we identify containment relationships between clusters of groums to characterize popular usage patterns in the corpus (as well as classify less popular patterns as possible anomalies). We find that our approach scales on a corpus of over five hundred open source Android applications, effectively mining obligatory and best-practice usage patterns.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量