{"title":"自动策划的数据集","authors":"Marcus Kessel, C. Atkinson","doi":"10.1109/SCAM.2019.00015","DOIUrl":null,"url":null,"abstract":"o validate hypotheses and tools that depend on the semantics of software, it is necessary to assemble, prepare and maintain (i.e. curate) large, high-quality corpora of executable software systems exhibiting certain desired behavior and/or properties. Today this is a highly tedious and laborious activity requiring significant human time and effort. In this paper we therefore present a prototype platform that supports the notion of “live data sets” where almost all aspects of the data set curation process are automated. Instead of curating data sets by hand, or writing dedicated tools to select and check software samples on a case-by-case basis, a live data set allows users to simply describe their requirements as abstract scripts written in a declarative domain specific language. After explaining the approach and the key ideas behind its implementation, in this paper we present two examples of executable corpora generated automatically from a live data set populated from Maven Central. The first illustrates a “semantics agnostic” use case where the actual behavior of the software is unimportant, while the second illustrates a “semantics specific” use case where software implementing a specific functional abstraction is selected.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Automatically Curated Data Sets\",\"authors\":\"Marcus Kessel, C. Atkinson\",\"doi\":\"10.1109/SCAM.2019.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"o validate hypotheses and tools that depend on the semantics of software, it is necessary to assemble, prepare and maintain (i.e. curate) large, high-quality corpora of executable software systems exhibiting certain desired behavior and/or properties. Today this is a highly tedious and laborious activity requiring significant human time and effort. In this paper we therefore present a prototype platform that supports the notion of “live data sets” where almost all aspects of the data set curation process are automated. Instead of curating data sets by hand, or writing dedicated tools to select and check software samples on a case-by-case basis, a live data set allows users to simply describe their requirements as abstract scripts written in a declarative domain specific language. After explaining the approach and the key ideas behind its implementation, in this paper we present two examples of executable corpora generated automatically from a live data set populated from Maven Central. The first illustrates a “semantics agnostic” use case where the actual behavior of the software is unimportant, while the second illustrates a “semantics specific” use case where software implementing a specific functional abstraction is selected.\",\"PeriodicalId\":431316,\"journal\":{\"name\":\"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2019.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2019.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
o validate hypotheses and tools that depend on the semantics of software, it is necessary to assemble, prepare and maintain (i.e. curate) large, high-quality corpora of executable software systems exhibiting certain desired behavior and/or properties. Today this is a highly tedious and laborious activity requiring significant human time and effort. In this paper we therefore present a prototype platform that supports the notion of “live data sets” where almost all aspects of the data set curation process are automated. Instead of curating data sets by hand, or writing dedicated tools to select and check software samples on a case-by-case basis, a live data set allows users to simply describe their requirements as abstract scripts written in a declarative domain specific language. After explaining the approach and the key ideas behind its implementation, in this paper we present two examples of executable corpora generated automatically from a live data set populated from Maven Central. The first illustrates a “semantics agnostic” use case where the actual behavior of the software is unimportant, while the second illustrates a “semantics specific” use case where software implementing a specific functional abstraction is selected.