用于查找API使用错误的贝叶斯规范学习

V. Murali, Swarat Chaudhuri, C. Jermaine
{"title":"用于查找API使用错误的贝叶斯规范学习","authors":"V. Murali, Swarat Chaudhuri, C. Jermaine","doi":"10.1145/3106237.3106284","DOIUrl":null,"url":null,"abstract":"We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a \"reference distribution\" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"08 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"Bayesian specification learning for finding API usage errors\",\"authors\":\"V. Murali, Swarat Chaudhuri, C. Jermaine\",\"doi\":\"10.1145/3106237.3106284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a \\\"reference distribution\\\" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches.\",\"PeriodicalId\":313494,\"journal\":{\"name\":\"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering\",\"volume\":\"08 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3106237.3106284\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106237.3106284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 46

摘要

我们提出了一个贝叶斯框架,用于从大型非结构化代码语料库中学习概率规范,然后使用这些规范静态地检测异常,因此可能有错误的程序行为。我们的关键见解是建立一个统计模型,将隐藏在语料库中的所有规范与实现这些规范的程序的语法和观察到的行为联系起来。在分析一个特定的程序时,该模型被限制为一个后验分布,该分布优先考虑与程序相关的规范。发现异常的问题现在是定量的,作为计算我们的模型从程序中期望的程序行为的“参考分布”与程序实际产生的行为的分布之间的距离的问题。我们在一个名为Salento的系统中实现了我们的想法,用于发现Android程序中异常的API使用情况。Salento使用主题模型和神经网络模型的组合来学习规范。我们令人鼓舞的实验结果表明,该系统可以自动发现Android应用程序中的细微错误,与竞争的概率方法相比,具有较高的准确率和召回率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Bayesian specification learning for finding API usage errors
We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a "reference distribution" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信