用于训练程序的神经代理的复杂性引导数据采样

IF 2.2 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Renda, Alex, Ding, Yi, Carbin, Michael
{"title":"用于训练程序的神经代理的复杂性引导数据采样","authors":"Renda, Alex, Ding, Yi, Carbin, Michael","doi":"10.1145/3622856","DOIUrl":null,"url":null,"abstract":"Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"3 1","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs\",\"authors\":\"Renda, Alex, Ding, Yi, Carbin, Michael\",\"doi\":\"10.1145/3622856\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.\",\"PeriodicalId\":20697,\"journal\":{\"name\":\"Proceedings of the ACM on Programming Languages\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM on Programming Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3622856\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3622856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

程序员和研究人员越来越多地开发程序的替代品,即给定程序的可观察行为子集的模型,以解决各种软件开发挑战。程序员通过在输入示例数据集上对程序行为的测量来训练代理。构建代理的一个关键挑战是确定使用什么训练数据来训练给定程序的代理。我们提出了一种采样数据集的方法来训练基于神经网络的程序代理。我们首先根据学习相应执行路径的代理的复杂性来表征程序输入空间的每个区域(对应于程序的不同执行路径)的样本数据比例。接下来,我们将提供一个程序分析,以确定程序中不同路径的复杂性。我们在一系列现实世界的程序中评估了这些结果,证明了复杂性引导的采样结果在准确性方面的经验改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs
Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages Engineering-Safety, Risk, Reliability and Quality
CiteScore
5.20
自引率
22.20%
发文量
192
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信