Bayesian propensity score matching in automotive embedded software engineering

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-09-26 DOI:10.1109/APSEC53868.2021.00031

Yuchu Liu, D. I. Mattos, J. Bosch, H. H. Olsson, Jonn Lantz

{"title":"Bayesian propensity score matching in automotive embedded software engineering","authors":"Yuchu Liu, D. I. Mattos, J. Bosch, H. H. Olsson, Jonn Lantz","doi":"10.1109/APSEC53868.2021.00031","DOIUrl":null,"url":null,"abstract":"Randomised field experiments, such as A/B testing, have long been the gold standard for evaluating the value that new software brings to customers. However, running randomised field experiments is not always desired, possible or even ethical in the development of automotive embedded software. In the face of such restrictions, we propose the use of the Bayesian propensity score matching technique for causal inference of observational studies in the automotive domain. In this paper, we present a method based on the Bayesian propensity score matching framework, applied in the unique setting of automotive software engineering. This method is used to generate balanced control and treatment groups from an observational online evaluation and estimate causal treatment effects from the software changes, even with limited samples in the treatment group. We exemplify the method with a proof-of-concept in the automotive domain. In the example, we have a larger control (Nc = 1100) fleet of cars using the current software and a small treatment fleet (Nt = 38), in which we introduce a new software variant. We demonstrate a scenario that shipping of a new software to all users is restricted, as a result, a fully randomised experiment could not be conducted. Therefore, we utilised the Bayesian propensity score matching method with 14 observed covariates as inputs. The results show more balanced groups, suitable for estimating causal treatment effects from the collected observational data. We describe the method in detail and share our configuration. Furthermore, we discuss how can such a method be used for online evaluation of new software utilising small groups of samples.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Randomised field experiments, such as A/B testing, have long been the gold standard for evaluating the value that new software brings to customers. However, running randomised field experiments is not always desired, possible or even ethical in the development of automotive embedded software. In the face of such restrictions, we propose the use of the Bayesian propensity score matching technique for causal inference of observational studies in the automotive domain. In this paper, we present a method based on the Bayesian propensity score matching framework, applied in the unique setting of automotive software engineering. This method is used to generate balanced control and treatment groups from an observational online evaluation and estimate causal treatment effects from the software changes, even with limited samples in the treatment group. We exemplify the method with a proof-of-concept in the automotive domain. In the example, we have a larger control (Nc = 1100) fleet of cars using the current software and a small treatment fleet (Nt = 38), in which we introduce a new software variant. We demonstrate a scenario that shipping of a new software to all users is restricted, as a result, a fully randomised experiment could not be conducted. Therefore, we utilised the Bayesian propensity score matching method with 14 observed covariates as inputs. The results show more balanced groups, suitable for estimating causal treatment effects from the collected observational data. We describe the method in detail and share our configuration. Furthermore, we discuss how can such a method be used for online evaluation of new software utilising small groups of samples.

查看原文本刊更多论文

汽车嵌入式软件工程中的贝叶斯倾向评分匹配

长期以来，随机现场实验(如A/B测试)一直是评估新软件给客户带来的价值的黄金标准。然而，在汽车嵌入式软件的开发中，运行随机场实验并不总是需要的，可能的，甚至是道德的。面对这些限制，我们建议使用贝叶斯倾向评分匹配技术对汽车领域的观察性研究进行因果推理。在本文中，我们提出了一种基于贝叶斯倾向评分匹配框架的方法，应用于汽车软件工程的独特设置。该方法用于通过观察性在线评估生成平衡的对照组和治疗组，并从软件更改中估计因果治疗效果，即使治疗组的样本有限。我们通过汽车领域的概念验证来举例说明该方法。在这个例子中，我们有一个使用当前软件的较大的控制车队(Nc = 1100)和一个较小的处理车队(Nt = 38)，其中我们引入了一个新的软件变体。我们演示了一种场景，即向所有用户提供新软件受到限制，因此无法进行完全随机化的实验。因此，我们使用贝叶斯倾向评分匹配方法，将14个观察到的协变量作为输入。结果显示出更平衡的组，适合从收集的观测数据估计因果治疗效果。我们将详细描述该方法并分享我们的配置。此外，我们讨论了如何将这种方法用于利用小样本组对新软件进行在线评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 28th Asia-Pacific Software Engineering Conference (APSEC)

自引率

0.00%

发文量