Unsupervised learning of API aliasing specifications

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI:10.1145/3314221.3314640

Jan Eberhardt, Samuel Steffen, Veselin Raychev, Martin T. Vechev

{"title":"Unsupervised learning of API aliasing specifications","authors":"Jan Eberhardt, Samuel Steffen, Veselin Raychev, Martin T. Vechev","doi":"10.1145/3314221.3314640","DOIUrl":null,"url":null,"abstract":"Real world applications make heavy use of powerful libraries and frameworks, posing a significant challenge for static analysis as the library implementation may be very complex or unavailable. Thus, obtaining specifications that summarize the behaviors of the library is important as it enables static analyzers to precisely track the effects of APIs on the client program, without requiring the actual API implementation. In this work, we propose a novel method for discovering aliasing specifications of APIs by learning from a large dataset of programs. Unlike prior work, our method does not require manual annotation, access to the library's source code or ability to run its APIs. Instead, it learns specifications in a fully unsupervised manner, by statically observing usages of APIs in the dataset. The core idea is to learn a probabilistic model of interactions between API methods and aliasing objects, enabling identification of additional likely aliasing relations, and to then infer aliasing specifications of APIs that explain these relations. The learned specifications are then used to augment an API-aware points-to analysis. We implemented our approach in a tool called USpec and used it to automatically learn aliasing specifications from millions of source code files. USpec learned over 2000 specifications of various Java and Python APIs, in the process improving the results of the points-to analysis and its clients.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"179 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3314221.3314640","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Real world applications make heavy use of powerful libraries and frameworks, posing a significant challenge for static analysis as the library implementation may be very complex or unavailable. Thus, obtaining specifications that summarize the behaviors of the library is important as it enables static analyzers to precisely track the effects of APIs on the client program, without requiring the actual API implementation. In this work, we propose a novel method for discovering aliasing specifications of APIs by learning from a large dataset of programs. Unlike prior work, our method does not require manual annotation, access to the library's source code or ability to run its APIs. Instead, it learns specifications in a fully unsupervised manner, by statically observing usages of APIs in the dataset. The core idea is to learn a probabilistic model of interactions between API methods and aliasing objects, enabling identification of additional likely aliasing relations, and to then infer aliasing specifications of APIs that explain these relations. The learned specifications are then used to augment an API-aware points-to analysis. We implemented our approach in a tool called USpec and used it to automatically learn aliasing specifications from millions of source code files. USpec learned over 2000 specifications of various Java and Python APIs, in the process improving the results of the points-to analysis and its clients.

查看原文本刊更多论文

API混叠规范的无监督学习

现实世界中的应用程序大量使用功能强大的库和框架，这对静态分析提出了重大挑战，因为库的实现可能非常复杂或不可用。因此，获得总结库行为的规范是很重要的，因为它使静态分析器能够精确地跟踪API对客户机程序的影响，而不需要实际的API实现。在这项工作中，我们提出了一种通过从大型程序数据集中学习来发现api混叠规范的新方法。与之前的工作不同，我们的方法不需要手动注释、访问库的源代码或运行其api的能力。相反，它通过静态地观察数据集中api的使用情况，以完全无监督的方式学习规范。核心思想是学习API方法和混叠对象之间交互的概率模型，从而能够识别其他可能的混叠关系，然后推断解释这些关系的API的混叠规范。然后使用学习到的规范来增强api感知点分析。我们在一个名为USpec的工具中实现了我们的方法，并使用它从数百万个源代码文件中自动学习混叠规范。USpec学习了2000多种Java和Python api的规范，在此过程中改进了point -to分析的结果及其客户端。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量