在JavaScript中使用源代码嵌入探索可信的补丁

2021 IEEE/ACM International Workshop on Automated Program Repair (APR) Pub Date : 2021-03-31 DOI:10.1109/APR52552.2021.00010

Viktor Csuvik, Dániel Horváth, Márk Lajkó, László Vidács

{"title":"在JavaScript中使用源代码嵌入探索可信的补丁","authors":"Viktor Csuvik, Dániel Horváth, Márk Lajkó, László Vidács","doi":"10.1109/APR52552.2021.00010","DOIUrl":null,"url":null,"abstract":"Despite the immense popularity of the Automated Program Repair (APR) field, the question of patch validation is still open. Most of the present-day approaches follow the so-called Generate-and-Validate approach, where first a candidate solution is being generated and after validated against an oracle. The latter, however, might not give a reliable result, because of the imperfections in such oracles; one of which is usually the test suite. Although (re-) running the test suite is right under one's nose, in real life applications the problem of over- and underfitting often occurs, resulting in inadequate patches. Efforts that have been made to tackle with this problem include patch filtering, test suite expansion, careful patch producing and many more. Most approaches to date use post-filtering relying either on test execution traces or make use of some similarity concept measured on the generated patches. Our goal is to investigate the nature of these similarity-based approaches. To do so, we trained a Doc2Vec model on an open-source JavaScript project and generated 465 patches for 10 bugs in it. These plausible patches alongside with the developer fix are then ranked based on their similarity to the original program. We analyzed these similarity lists and found that plain document embeddings may lead to misclassification - it fails to capture nuanced code semantics. Nevertheless, in some cases it also provided useful information, thus helping to better understand the area of Automated Program Repair.","PeriodicalId":257468,"journal":{"name":"2021 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Exploring Plausible Patches Using Source Code Embeddings in JavaScript\",\"authors\":\"Viktor Csuvik, Dániel Horváth, Márk Lajkó, László Vidács\",\"doi\":\"10.1109/APR52552.2021.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the immense popularity of the Automated Program Repair (APR) field, the question of patch validation is still open. Most of the present-day approaches follow the so-called Generate-and-Validate approach, where first a candidate solution is being generated and after validated against an oracle. The latter, however, might not give a reliable result, because of the imperfections in such oracles; one of which is usually the test suite. Although (re-) running the test suite is right under one's nose, in real life applications the problem of over- and underfitting often occurs, resulting in inadequate patches. Efforts that have been made to tackle with this problem include patch filtering, test suite expansion, careful patch producing and many more. Most approaches to date use post-filtering relying either on test execution traces or make use of some similarity concept measured on the generated patches. Our goal is to investigate the nature of these similarity-based approaches. To do so, we trained a Doc2Vec model on an open-source JavaScript project and generated 465 patches for 10 bugs in it. These plausible patches alongside with the developer fix are then ranked based on their similarity to the original program. We analyzed these similarity lists and found that plain document embeddings may lead to misclassification - it fails to capture nuanced code semantics. Nevertheless, in some cases it also provided useful information, thus helping to better understand the area of Automated Program Repair.\",\"PeriodicalId\":257468,\"journal\":{\"name\":\"2021 IEEE/ACM International Workshop on Automated Program Repair (APR)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM International Workshop on Automated Program Repair (APR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APR52552.2021.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM International Workshop on Automated Program Repair (APR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APR52552.2021.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

尽管自动程序修复(APR)领域非常受欢迎，但补丁验证的问题仍然存在。目前的大多数方法都遵循所谓的“生成-验证”方法，即首先生成候选解决方案，然后根据oracle进行验证。然而，后者可能不会给出可靠的结果，因为这种预言存在缺陷;其中之一通常是测试套件。尽管(重新)运行测试套件是近在眼前的事情，但在实际应用中，经常会出现过度和不足拟合的问题，从而导致补丁不足。为解决这个问题所做的努力包括补丁过滤、测试套件扩展、精心制作补丁等等。迄今为止，大多数方法都使用后过滤，依赖于测试执行跟踪，或者使用在生成的补丁上测量的一些相似性概念。我们的目标是研究这些基于相似性的方法的本质。为此，我们在一个开源JavaScript项目上训练了一个Doc2Vec模型，并为其中的10个bug生成了465个补丁。然后根据与原始程序的相似度对这些合理的补丁以及开发人员修复程序进行排名。我们分析了这些相似性列表，发现普通文档嵌入可能导致错误分类——它无法捕捉细微的代码语义。然而，在某些情况下，它也提供了有用的信息，从而帮助更好地理解自动程序修复的领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring Plausible Patches Using Source Code Embeddings in JavaScript

Despite the immense popularity of the Automated Program Repair (APR) field, the question of patch validation is still open. Most of the present-day approaches follow the so-called Generate-and-Validate approach, where first a candidate solution is being generated and after validated against an oracle. The latter, however, might not give a reliable result, because of the imperfections in such oracles; one of which is usually the test suite. Although (re-) running the test suite is right under one's nose, in real life applications the problem of over- and underfitting often occurs, resulting in inadequate patches. Efforts that have been made to tackle with this problem include patch filtering, test suite expansion, careful patch producing and many more. Most approaches to date use post-filtering relying either on test execution traces or make use of some similarity concept measured on the generated patches. Our goal is to investigate the nature of these similarity-based approaches. To do so, we trained a Doc2Vec model on an open-source JavaScript project and generated 465 patches for 10 bugs in it. These plausible patches alongside with the developer fix are then ranked based on their similarity to the original program. We analyzed these similarity lists and found that plain document embeddings may lead to misclassification - it fails to capture nuanced code semantics. Nevertheless, in some cases it also provided useful information, thus helping to better understand the area of Automated Program Repair.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/ACM International Workshop on Automated Program Repair (APR)

自引率

0.00%

发文量