规模化双重机器学习预测客户行为的因果影响

arXiv - ECON - Econometrics Pub Date : 2024-09-03 DOI:arxiv-2409.02332

Sushant More, Priya Kotwal, Sujith Chappidi, Dinesh Mandalapu, Chris Khawand

{"title":"规模化双重机器学习预测客户行为的因果影响","authors":"Sushant More, Priya Kotwal, Sujith Chappidi, Dinesh Mandalapu, Chris Khawand","doi":"arxiv-2409.02332","DOIUrl":null,"url":null,"abstract":"Causal Impact (CI) of customer actions are broadly used across the industry\nto inform both short- and long-term investment decisions of various types. In\nthis paper, we apply the double machine learning (DML) methodology to estimate\nthe CI values across 100s of customer actions of business interest and 100s of\nmillions of customers. We operationalize DML through a causal ML library based\non Spark with a flexible, JSON-driven model configuration approach to estimate\nCI at scale (i.e., across hundred of actions and millions of customers). We\noutline the DML methodology and implementation, and associated benefits over\nthe traditional potential outcomes based CI model. We show population-level as\nwell as customer-level CI values along with confidence intervals. The\nvalidation metrics show a 2.2% gain over the baseline methods and a 2.5X gain\nin the computational time. Our contribution is to advance the scalable\napplication of CI, while also providing an interface that allows faster\nexperimentation, cross-platform support, ability to onboard new use cases, and\nimproves accessibility of underlying code for partner teams.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Double Machine Learning at Scale to Predict Causal Impact of Customer Actions\",\"authors\":\"Sushant More, Priya Kotwal, Sujith Chappidi, Dinesh Mandalapu, Chris Khawand\",\"doi\":\"arxiv-2409.02332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Causal Impact (CI) of customer actions are broadly used across the industry\\nto inform both short- and long-term investment decisions of various types. In\\nthis paper, we apply the double machine learning (DML) methodology to estimate\\nthe CI values across 100s of customer actions of business interest and 100s of\\nmillions of customers. We operationalize DML through a causal ML library based\\non Spark with a flexible, JSON-driven model configuration approach to estimate\\nCI at scale (i.e., across hundred of actions and millions of customers). We\\noutline the DML methodology and implementation, and associated benefits over\\nthe traditional potential outcomes based CI model. We show population-level as\\nwell as customer-level CI values along with confidence intervals. The\\nvalidation metrics show a 2.2% gain over the baseline methods and a 2.5X gain\\nin the computational time. Our contribution is to advance the scalable\\napplication of CI, while also providing an interface that allows faster\\nexperimentation, cross-platform support, ability to onboard new use cases, and\\nimproves accessibility of underlying code for partner teams.\",\"PeriodicalId\":501293,\"journal\":{\"name\":\"arXiv - ECON - Econometrics\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - ECON - Econometrics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.02332\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

客户行为的因果影响（CI）被广泛应用于整个行业，为各种类型的短期和长期投资决策提供依据。在本文中，我们应用双重机器学习（DML）方法来估算企业感兴趣的数百种客户行为和数亿客户的 CI 值。我们通过基于 Spark 的因果 ML 库和灵活的 JSON 驱动型模型配置方法对 DML 进行操作，以大规模（即跨越数百个行为和数百万客户）估算 CI。我们概述了 DML 方法和实施，以及与传统的基于潜在结果的 CI 模型相比的相关优势。我们展示了人口级和客户级 CI 值以及置信区间。验证指标显示，与基线方法相比，DML 的收益为 2.2%，计算时间增加了 2.5 倍。我们的贡献在于推进了 CI 的可扩展应用，同时还提供了一个接口，允许快速实验、跨平台支持、加入新用例的能力，并提高了合作伙伴团队对底层代码的可访问性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Double Machine Learning at Scale to Predict Causal Impact of Customer Actions

Causal Impact (CI) of customer actions are broadly used across the industry to inform both short- and long-term investment decisions of various types. In this paper, we apply the double machine learning (DML) methodology to estimate the CI values across 100s of customer actions of business interest and 100s of millions of customers. We operationalize DML through a causal ML library based on Spark with a flexible, JSON-driven model configuration approach to estimate CI at scale (i.e., across hundred of actions and millions of customers). We outline the DML methodology and implementation, and associated benefits over the traditional potential outcomes based CI model. We show population-level as well as customer-level CI values along with confidence intervals. The validation metrics show a 2.2% gain over the baseline methods and a 2.5X gain in the computational time. Our contribution is to advance the scalable application of CI, while also providing an interface that allows faster experimentation, cross-platform support, ability to onboard new use cases, and improves accessibility of underlying code for partner teams.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - ECON - Econometrics

自引率

0.00%

发文量