{"title":"Experiments with Predictive Long Term Guardrail Metrics","authors":"Sri Sri Perangur","doi":"10.1145/3488560.3510014","DOIUrl":null,"url":null,"abstract":"Product experiments today need a long term view of impact to make shipping decisions truly effective. Here we will discuss the challenges in the traditional metrics used in experiment analysis and how long term forecast metrics enable better decisions. Most tech companies such as Google, Amazon, Netflix etc run thousands of experiments (also known as A/B test) a year [1]. The aim is to measure the impact new features have on core Key Predictive Indicators (KPIs) before deciding to launch it to production. Traditional A/B testing metrics will usually measure the impact of the feature on core KPIs in the short-term. However, for many lines of business (such as loyalty and memberships), this is not enough, as we want to understand the impact of the features in the mid/long term. This reality can force companies to run experiments to 6+ months duration, or use a correlated leading metric (such as user activity, engagement level) with estimated impact in the long term. Both these situations are not ideal, the first slows down the rate of innovation while the second does not account for multiple factors that define the future results. At Lyft, this reality is shared, and one that becomes a challenge for innovation as we need to know the long term impact before we decide to ship new features. As a solution we design forecasted metrics for retention and revenue at a user level that can be used to measure the impact of experiments in the long term. In this talk we will discuss challenges and learnings from this approach, when applied in practice.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3488560.3510014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Product experiments today need a long term view of impact to make shipping decisions truly effective. Here we will discuss the challenges in the traditional metrics used in experiment analysis and how long term forecast metrics enable better decisions. Most tech companies such as Google, Amazon, Netflix etc run thousands of experiments (also known as A/B test) a year [1]. The aim is to measure the impact new features have on core Key Predictive Indicators (KPIs) before deciding to launch it to production. Traditional A/B testing metrics will usually measure the impact of the feature on core KPIs in the short-term. However, for many lines of business (such as loyalty and memberships), this is not enough, as we want to understand the impact of the features in the mid/long term. This reality can force companies to run experiments to 6+ months duration, or use a correlated leading metric (such as user activity, engagement level) with estimated impact in the long term. Both these situations are not ideal, the first slows down the rate of innovation while the second does not account for multiple factors that define the future results. At Lyft, this reality is shared, and one that becomes a challenge for innovation as we need to know the long term impact before we decide to ship new features. As a solution we design forecasted metrics for retention and revenue at a user level that can be used to measure the impact of experiments in the long term. In this talk we will discuss challenges and learnings from this approach, when applied in practice.