{"title":"OpenAI的Codex能修复bug吗?:对QuixBugs的评价","authors":"Julian Aron Prenner, Hlib Babii, R. Robbes","doi":"10.1145/3524459.3527351","DOIUrl":null,"url":null,"abstract":"OpenAI's Codex, a GPT-3like model trained on a large code corpus, has made headlines in and outside of academia. Given a short user-provided description, it is capable of synthesizing code snippets that are syntactically and semantically valid in most cases. In this work, we want to investigate whether Codex is able to localize and fix bugs, two important tasks in automated program repair. Our initial evaluation uses the multi-language QuixBugs benchmark (40 bugs in both Python and Java). We find that, despite not being trained for APR, Codex is surprisingly effective, and competitive with recent state of the art techniques. Our results also show that Codex is more successful at repairing Python than Java, fixing 50% more bugs in Python.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":"{\"title\":\"Can OpenAI's Codex Fix Bugs?: An evaluation on QuixBugs\",\"authors\":\"Julian Aron Prenner, Hlib Babii, R. Robbes\",\"doi\":\"10.1145/3524459.3527351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"OpenAI's Codex, a GPT-3like model trained on a large code corpus, has made headlines in and outside of academia. Given a short user-provided description, it is capable of synthesizing code snippets that are syntactically and semantically valid in most cases. In this work, we want to investigate whether Codex is able to localize and fix bugs, two important tasks in automated program repair. Our initial evaluation uses the multi-language QuixBugs benchmark (40 bugs in both Python and Java). We find that, despite not being trained for APR, Codex is surprisingly effective, and competitive with recent state of the art techniques. Our results also show that Codex is more successful at repairing Python than Java, fixing 50% more bugs in Python.\",\"PeriodicalId\":131481,\"journal\":{\"name\":\"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"51\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3524459.3527351\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3524459.3527351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Can OpenAI's Codex Fix Bugs?: An evaluation on QuixBugs
OpenAI's Codex, a GPT-3like model trained on a large code corpus, has made headlines in and outside of academia. Given a short user-provided description, it is capable of synthesizing code snippets that are syntactically and semantically valid in most cases. In this work, we want to investigate whether Codex is able to localize and fix bugs, two important tasks in automated program repair. Our initial evaluation uses the multi-language QuixBugs benchmark (40 bugs in both Python and Java). We find that, despite not being trained for APR, Codex is surprisingly effective, and competitive with recent state of the art techniques. Our results also show that Codex is more successful at repairing Python than Java, fixing 50% more bugs in Python.