学习如何预测Meta产品中的性能回归

2023 IEEE/ACM International Conference on Automation of Software Test (AST) Pub Date : 2022-08-08 DOI:10.1109/AST58925.2023.00010

M. Beller, Hongyu Li, V. Nair, V. Murali, Imad Ahmad, Jürgen Cito, Drew Carlson, Gareth Ari Aye, Wes Dyer

{"title":"学习如何预测Meta产品中的性能回归","authors":"M. Beller, Hongyu Li, V. Nair, V. Murali, Imad Ahmad, Jürgen Cito, Drew Carlson, Gareth Ari Aye, Wes Dyer","doi":"10.1109/AST58925.2023.00010","DOIUrl":null,"url":null,"abstract":"Catching and attributing code change-induced performance regressions in production is hard; predicting them beforehand, even harder. A primer on automatically learning to predict performance regressions in software, this article gives an account of the experiences we gained when researching and deploying an ML-based regression prediction pipeline at Meta.In this paper, we report on a comparative study with four ML models of increasing complexity, from (1) code-opaque, over (2) Bag of Words, (3) off-the-shelve Transformer-based, to (4) a bespoke Transformer-based model, coined SuperPerforator. Our investigation shows the inherent difficulty of the performance prediction problem, which is characterized by a large imbalance of benign onto regressing changes. Our results also call into question the general applicability of Transformer-based architectures for performance prediction: an off-the-shelve CodeBERT-based approach had surprisingly poor performance; even the highly customized SuperPerforator architecture achieved offline results that were on par with simpler Bag of Words models; it only started to significantly outperform it for down-stream use cases in an online setting. To gain further insight into SuperPerforator, we explored it via a series of experiments computing counterfactual explanations. These highlight which parts of a code change the model deems important, thereby validating it.The ability of SuperPerforator to transfer to an application with few learning examples afforded an opportunity to deploy it in practice at Meta: it can act as a pre-filter to sort out changes that are unlikely to introduce a regression, truncating the space of changes to search a regression in by up to 43%, a 45x improvement over a random baseline.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Learning to Learn to Predict Performance Regressions in Production at Meta\",\"authors\":\"M. Beller, Hongyu Li, V. Nair, V. Murali, Imad Ahmad, Jürgen Cito, Drew Carlson, Gareth Ari Aye, Wes Dyer\",\"doi\":\"10.1109/AST58925.2023.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Catching and attributing code change-induced performance regressions in production is hard; predicting them beforehand, even harder. A primer on automatically learning to predict performance regressions in software, this article gives an account of the experiences we gained when researching and deploying an ML-based regression prediction pipeline at Meta.In this paper, we report on a comparative study with four ML models of increasing complexity, from (1) code-opaque, over (2) Bag of Words, (3) off-the-shelve Transformer-based, to (4) a bespoke Transformer-based model, coined SuperPerforator. Our investigation shows the inherent difficulty of the performance prediction problem, which is characterized by a large imbalance of benign onto regressing changes. Our results also call into question the general applicability of Transformer-based architectures for performance prediction: an off-the-shelve CodeBERT-based approach had surprisingly poor performance; even the highly customized SuperPerforator architecture achieved offline results that were on par with simpler Bag of Words models; it only started to significantly outperform it for down-stream use cases in an online setting. To gain further insight into SuperPerforator, we explored it via a series of experiments computing counterfactual explanations. These highlight which parts of a code change the model deems important, thereby validating it.The ability of SuperPerforator to transfer to an application with few learning examples afforded an opportunity to deploy it in practice at Meta: it can act as a pre-filter to sort out changes that are unlikely to introduce a regression, truncating the space of changes to search a regression in by up to 43%, a 45x improvement over a random baseline.\",\"PeriodicalId\":252417,\"journal\":{\"name\":\"2023 IEEE/ACM International Conference on Automation of Software Test (AST)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM International Conference on Automation of Software Test (AST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AST58925.2023.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AST58925.2023.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在生产环境中捕捉代码变化引起的性能退化并将其归因是很困难的;提前预测就更难了。作为软件中自动学习预测性能回归的入门，本文介绍了我们在Meta研究和部署基于ml的回归预测管道时获得的经验。在本文中，我们报告了一项与四个日益复杂的ML模型的比较研究，从(1)代码不透明，到(2)word袋，(3)现成的基于transformer的模型，到(4)定制的基于transformer的模型，称为SuperPerforator。我们的研究显示了性能预测问题的固有困难，其特点是良性到回归变化的巨大不平衡。我们的结果也对基于transformer的架构在性能预测方面的普遍适用性提出了质疑:一种现成的基于codebert的方法具有令人惊讶的低性能;即使是高度定制的SuperPerforator架构，其离线效果也与更简单的Bag of Words模型相当;它只是在在线设置的下游用例中才开始明显优于它。为了进一步了解超级射孔器，我们通过一系列计算反事实解释的实验对其进行了探索。它们突出显示模型认为重要的代码更改部分，从而验证它。SuperPerforator能够应用到很少有学习实例的应用程序中，这为Meta提供了在实践中部署它的机会:它可以作为预过滤器来分类不太可能引入回归的变化，截断变化空间以搜索回归，最高可达43%，比随机基线提高了45倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning to Learn to Predict Performance Regressions in Production at Meta

Catching and attributing code change-induced performance regressions in production is hard; predicting them beforehand, even harder. A primer on automatically learning to predict performance regressions in software, this article gives an account of the experiences we gained when researching and deploying an ML-based regression prediction pipeline at Meta.In this paper, we report on a comparative study with four ML models of increasing complexity, from (1) code-opaque, over (2) Bag of Words, (3) off-the-shelve Transformer-based, to (4) a bespoke Transformer-based model, coined SuperPerforator. Our investigation shows the inherent difficulty of the performance prediction problem, which is characterized by a large imbalance of benign onto regressing changes. Our results also call into question the general applicability of Transformer-based architectures for performance prediction: an off-the-shelve CodeBERT-based approach had surprisingly poor performance; even the highly customized SuperPerforator architecture achieved offline results that were on par with simpler Bag of Words models; it only started to significantly outperform it for down-stream use cases in an online setting. To gain further insight into SuperPerforator, we explored it via a series of experiments computing counterfactual explanations. These highlight which parts of a code change the model deems important, thereby validating it.The ability of SuperPerforator to transfer to an application with few learning examples afforded an opportunity to deploy it in practice at Meta: it can act as a pre-filter to sort out changes that are unlikely to introduce a regression, truncating the space of changes to search a regression in by up to 43%, a 45x improvement over a random baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

自引率

0.00%

发文量