Shipwright: A Human-in-the-Loop System for Dockerfile Repair

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-02-22 DOI:10.1109/ICSE43902.2021.00106

Jordan Henkel, Denini Silva, Leopoldo Teixeira, Marcelo d’Amorim, T. Reps

{"title":"Shipwright: A Human-in-the-Loop System for Dockerfile Repair","authors":"Jordan Henkel, Denini Silva, Leopoldo Teixeira, Marcelo d’Amorim, T. Reps","doi":"10.1109/ICSE43902.2021.00106","DOIUrl":null,"url":null,"abstract":"Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and-to our great surprise-found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this problem, we propose SHIPWRIGHT, a human-in-the-loop system for finding repairs to broken Dockerfiles. SHIPWRIGHT uses a modified version of the BERT language model to embed build logs and to cluster broken Dockerfiles. Using these clusters and a search-based procedure, we were able to design 13 rules for making automated repairs to Dockerfiles. With the aid of SHIPWRIGHT, we submitted 45 pull requests (with a 42.2% acceptance rate) to GitHub projects with broken Dockerfiles. Furthermore, in a \"time-travel\" analysis of broken Dockerfiles that were later fixed, we found that SHIPWRIGHT proposed repairs that were equivalent to human-authored patches in 22.77% of the cases we studied. Finally, we compared our work with recent, state-of-the-art, static Dockerfile analyses, and found that, while static tools detected possible build-failure-inducing issues in 20.6–33.8% of the files we examined, SHIPWRIGHT was able to detect possible issues in 73.25% of the files and, additionally, provide automated repairs for 18.9% of the files.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"58 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and-to our great surprise-found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this problem, we propose SHIPWRIGHT, a human-in-the-loop system for finding repairs to broken Dockerfiles. SHIPWRIGHT uses a modified version of the BERT language model to embed build logs and to cluster broken Dockerfiles. Using these clusters and a search-based procedure, we were able to design 13 rules for making automated repairs to Dockerfiles. With the aid of SHIPWRIGHT, we submitted 45 pull requests (with a 42.2% acceptance rate) to GitHub projects with broken Dockerfiles. Furthermore, in a "time-travel" analysis of broken Dockerfiles that were later fixed, we found that SHIPWRIGHT proposed repairs that were equivalent to human-authored patches in 22.77% of the cases we studied. Finally, we compared our work with recent, state-of-the-art, static Dockerfile analyses, and found that, while static tools detected possible build-failure-inducing issues in 20.6–33.8% of the files we examined, SHIPWRIGHT was able to detect possible issues in 73.25% of the files and, additionally, provide automated repairs for 18.9% of the files.

查看原文本刊更多论文

船匠:用于码头文件修复的人在循环系统

Docker是一个轻量级的操作系统级虚拟化工具。Docker映像是通过执行构建来创建的，由称为Dockerfile的源级工件控制。我们研究了GitHub上的Dockerfiles，并惊讶地发现超过四分之一的Dockerfiles无法构建(因此无法生成映像)。为了解决这个问题，我们提出了SHIPWRIGHT，这是一个人工循环系统，用于查找损坏的Dockerfiles的修复。SHIPWRIGHT使用BERT语言模型的修改版本来嵌入构建日志并对损坏的dockerfile进行集群。使用这些集群和基于搜索的过程，我们能够设计13条规则来自动修复Dockerfiles。在SHIPWRIGHT的帮助下，我们向GitHub项目提交了45个拉请求(42.2%的接受率)，其中Dockerfiles损坏。此外，在对后来修复的损坏的Dockerfiles进行“时间旅行”分析时，我们发现，在我们研究的22.77%的案例中，SHIPWRIGHT提出的修复相当于人类编写的补丁。最后，我们将我们的工作与最新的、最先进的静态Dockerfile分析进行了比较，发现虽然静态工具在我们检查的20.6-33.8%的文件中检测到可能导致构建失败的问题，但SHIPWRIGHT能够在73.25%的文件中检测到可能的问题，此外，还为18.9%的文件提供了自动修复。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量