Jordan Henkel, Denini Silva, Leopoldo Teixeira, Marcelo d’Amorim, T. Reps
{"title":"Shipwright: A Human-in-the-Loop System for Dockerfile Repair","authors":"Jordan Henkel, Denini Silva, Leopoldo Teixeira, Marcelo d’Amorim, T. Reps","doi":"10.1109/ICSE43902.2021.00106","DOIUrl":null,"url":null,"abstract":"Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and-to our great surprise-found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this problem, we propose SHIPWRIGHT, a human-in-the-loop system for finding repairs to broken Dockerfiles. SHIPWRIGHT uses a modified version of the BERT language model to embed build logs and to cluster broken Dockerfiles. Using these clusters and a search-based procedure, we were able to design 13 rules for making automated repairs to Dockerfiles. With the aid of SHIPWRIGHT, we submitted 45 pull requests (with a 42.2% acceptance rate) to GitHub projects with broken Dockerfiles. Furthermore, in a \"time-travel\" analysis of broken Dockerfiles that were later fixed, we found that SHIPWRIGHT proposed repairs that were equivalent to human-authored patches in 22.77% of the cases we studied. Finally, we compared our work with recent, state-of-the-art, static Dockerfile analyses, and found that, while static tools detected possible build-failure-inducing issues in 20.6–33.8% of the files we examined, SHIPWRIGHT was able to detect possible issues in 73.25% of the files and, additionally, provide automated repairs for 18.9% of the files.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"58 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and-to our great surprise-found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this problem, we propose SHIPWRIGHT, a human-in-the-loop system for finding repairs to broken Dockerfiles. SHIPWRIGHT uses a modified version of the BERT language model to embed build logs and to cluster broken Dockerfiles. Using these clusters and a search-based procedure, we were able to design 13 rules for making automated repairs to Dockerfiles. With the aid of SHIPWRIGHT, we submitted 45 pull requests (with a 42.2% acceptance rate) to GitHub projects with broken Dockerfiles. Furthermore, in a "time-travel" analysis of broken Dockerfiles that were later fixed, we found that SHIPWRIGHT proposed repairs that were equivalent to human-authored patches in 22.77% of the cases we studied. Finally, we compared our work with recent, state-of-the-art, static Dockerfile analyses, and found that, while static tools detected possible build-failure-inducing issues in 20.6–33.8% of the files we examined, SHIPWRIGHT was able to detect possible issues in 73.25% of the files and, additionally, provide automated repairs for 18.9% of the files.