Sushant More, Priya Kotwal, Sujith Chappidi, Dinesh Mandalapu, Chris Khawand
{"title":"规模化双重机器学习预测客户行为的因果影响","authors":"Sushant More, Priya Kotwal, Sujith Chappidi, Dinesh Mandalapu, Chris Khawand","doi":"arxiv-2409.02332","DOIUrl":null,"url":null,"abstract":"Causal Impact (CI) of customer actions are broadly used across the industry\nto inform both short- and long-term investment decisions of various types. In\nthis paper, we apply the double machine learning (DML) methodology to estimate\nthe CI values across 100s of customer actions of business interest and 100s of\nmillions of customers. We operationalize DML through a causal ML library based\non Spark with a flexible, JSON-driven model configuration approach to estimate\nCI at scale (i.e., across hundred of actions and millions of customers). We\noutline the DML methodology and implementation, and associated benefits over\nthe traditional potential outcomes based CI model. We show population-level as\nwell as customer-level CI values along with confidence intervals. The\nvalidation metrics show a 2.2% gain over the baseline methods and a 2.5X gain\nin the computational time. Our contribution is to advance the scalable\napplication of CI, while also providing an interface that allows faster\nexperimentation, cross-platform support, ability to onboard new use cases, and\nimproves accessibility of underlying code for partner teams.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Double Machine Learning at Scale to Predict Causal Impact of Customer Actions\",\"authors\":\"Sushant More, Priya Kotwal, Sujith Chappidi, Dinesh Mandalapu, Chris Khawand\",\"doi\":\"arxiv-2409.02332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Causal Impact (CI) of customer actions are broadly used across the industry\\nto inform both short- and long-term investment decisions of various types. In\\nthis paper, we apply the double machine learning (DML) methodology to estimate\\nthe CI values across 100s of customer actions of business interest and 100s of\\nmillions of customers. We operationalize DML through a causal ML library based\\non Spark with a flexible, JSON-driven model configuration approach to estimate\\nCI at scale (i.e., across hundred of actions and millions of customers). We\\noutline the DML methodology and implementation, and associated benefits over\\nthe traditional potential outcomes based CI model. We show population-level as\\nwell as customer-level CI values along with confidence intervals. The\\nvalidation metrics show a 2.2% gain over the baseline methods and a 2.5X gain\\nin the computational time. Our contribution is to advance the scalable\\napplication of CI, while also providing an interface that allows faster\\nexperimentation, cross-platform support, ability to onboard new use cases, and\\nimproves accessibility of underlying code for partner teams.\",\"PeriodicalId\":501293,\"journal\":{\"name\":\"arXiv - ECON - Econometrics\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - ECON - Econometrics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.02332\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
客户行为的因果影响(CI)被广泛应用于整个行业,为各种类型的短期和长期投资决策提供依据。在本文中,我们应用双重机器学习(DML)方法来估算企业感兴趣的数百种客户行为和数亿客户的 CI 值。我们通过基于 Spark 的因果 ML 库和灵活的 JSON 驱动型模型配置方法对 DML 进行操作,以大规模(即跨越数百个行为和数百万客户)估算 CI。我们概述了 DML 方法和实施,以及与传统的基于潜在结果的 CI 模型相比的相关优势。我们展示了人口级和客户级 CI 值以及置信区间。验证指标显示,与基线方法相比,DML 的收益为 2.2%,计算时间增加了 2.5 倍。我们的贡献在于推进了 CI 的可扩展应用,同时还提供了一个接口,允许快速实验、跨平台支持、加入新用例的能力,并提高了合作伙伴团队对底层代码的可访问性。
Double Machine Learning at Scale to Predict Causal Impact of Customer Actions
Causal Impact (CI) of customer actions are broadly used across the industry
to inform both short- and long-term investment decisions of various types. In
this paper, we apply the double machine learning (DML) methodology to estimate
the CI values across 100s of customer actions of business interest and 100s of
millions of customers. We operationalize DML through a causal ML library based
on Spark with a flexible, JSON-driven model configuration approach to estimate
CI at scale (i.e., across hundred of actions and millions of customers). We
outline the DML methodology and implementation, and associated benefits over
the traditional potential outcomes based CI model. We show population-level as
well as customer-level CI values along with confidence intervals. The
validation metrics show a 2.2% gain over the baseline methods and a 2.5X gain
in the computational time. Our contribution is to advance the scalable
application of CI, while also providing an interface that allows faster
experimentation, cross-platform support, ability to onboard new use cases, and
improves accessibility of underlying code for partner teams.