自动策划的数据集

2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2019-09-01 DOI:10.1109/SCAM.2019.00015

Marcus Kessel, C. Atkinson

{"title":"自动策划的数据集","authors":"Marcus Kessel, C. Atkinson","doi":"10.1109/SCAM.2019.00015","DOIUrl":null,"url":null,"abstract":"o validate hypotheses and tools that depend on the semantics of software, it is necessary to assemble, prepare and maintain (i.e. curate) large, high-quality corpora of executable software systems exhibiting certain desired behavior and/or properties. Today this is a highly tedious and laborious activity requiring significant human time and effort. In this paper we therefore present a prototype platform that supports the notion of “live data sets” where almost all aspects of the data set curation process are automated. Instead of curating data sets by hand, or writing dedicated tools to select and check software samples on a case-by-case basis, a live data set allows users to simply describe their requirements as abstract scripts written in a declarative domain specific language. After explaining the approach and the key ideas behind its implementation, in this paper we present two examples of executable corpora generated automatically from a live data set populated from Maven Central. The first illustrates a “semantics agnostic” use case where the actual behavior of the software is unimportant, while the second illustrates a “semantics specific” use case where software implementing a specific functional abstraction is selected.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Automatically Curated Data Sets\",\"authors\":\"Marcus Kessel, C. Atkinson\",\"doi\":\"10.1109/SCAM.2019.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"o validate hypotheses and tools that depend on the semantics of software, it is necessary to assemble, prepare and maintain (i.e. curate) large, high-quality corpora of executable software systems exhibiting certain desired behavior and/or properties. Today this is a highly tedious and laborious activity requiring significant human time and effort. In this paper we therefore present a prototype platform that supports the notion of “live data sets” where almost all aspects of the data set curation process are automated. Instead of curating data sets by hand, or writing dedicated tools to select and check software samples on a case-by-case basis, a live data set allows users to simply describe their requirements as abstract scripts written in a declarative domain specific language. After explaining the approach and the key ideas behind its implementation, in this paper we present two examples of executable corpora generated automatically from a live data set populated from Maven Central. The first illustrates a “semantics agnostic” use case where the actual behavior of the software is unimportant, while the second illustrates a “semantics specific” use case where software implementing a specific functional abstraction is selected.\",\"PeriodicalId\":431316,\"journal\":{\"name\":\"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2019.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2019.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

为了验证依赖于软件语义的假设和工具，有必要组装、准备和维护(即管理)大型、高质量的可执行软件系统语料库，这些语料库显示出某些期望的行为和/或属性。今天，这是一项非常繁琐和费力的活动，需要大量的人力时间和精力。因此，在本文中，我们提出了一个支持“实时数据集”概念的原型平台，其中数据集管理过程的几乎所有方面都是自动化的。而不是手工管理数据集，或者编写专门的工具来逐个选择和检查软件样本，实时数据集允许用户简单地将他们的需求描述为用声明性领域特定语言编写的抽象脚本。在解释了该方法及其实现背后的关键思想之后，本文将给出两个可执行语料库的示例，这些语料库是从Maven Central填充的实时数据集自动生成的。第一个说明了一个“语义不可知”的用例，其中软件的实际行为是不重要的，而第二个说明了一个“语义特定”的用例，其中选择了实现特定功能抽象的软件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatically Curated Data Sets

o validate hypotheses and tools that depend on the semantics of software, it is necessary to assemble, prepare and maintain (i.e. curate) large, high-quality corpora of executable software systems exhibiting certain desired behavior and/or properties. Today this is a highly tedious and laborious activity requiring significant human time and effort. In this paper we therefore present a prototype platform that supports the notion of “live data sets” where almost all aspects of the data set curation process are automated. Instead of curating data sets by hand, or writing dedicated tools to select and check software samples on a case-by-case basis, a live data set allows users to simply describe their requirements as abstract scripts written in a declarative domain specific language. After explaining the approach and the key ideas behind its implementation, in this paper we present two examples of executable corpora generated automatically from a live data set populated from Maven Central. The first illustrates a “semantics agnostic” use case where the actual behavior of the software is unimportant, while the second illustrates a “semantics specific” use case where software implementing a specific functional abstraction is selected.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)

自引率

0.00%

发文量