论生成模型的因果关系保护能力

IF 2.1 2区 数学 Q1 MATHEMATICS, APPLIED
Yves-Cédric Bauwelinckx , Jan Dhaene , Milan van den Heuvel , Tim Verdonck
{"title":"论生成模型的因果关系保护能力","authors":"Yves-Cédric Bauwelinckx ,&nbsp;Jan Dhaene ,&nbsp;Milan van den Heuvel ,&nbsp;Tim Verdonck","doi":"10.1016/j.cam.2024.116312","DOIUrl":null,"url":null,"abstract":"<div><div>Modelling is essential in both the financial and insurance industries. The emergence of machine learning and deep learning models offers new tools for this, but they often require large datasets that are typically unavailable in business fields due to privacy and ethical concerns. This lack of data is currently one of the main hurdles in developing better models. Generative modelling, such as Generative Adversarial Networks (GANs), can address this issue by creating synthetic data that can be freely shared. While GANs are widely studied in fields like computer vision, their use in business is limited, primarily because business questions often focus on identifying causal effects, whereas GANs and neural networks typically emphasise high-dimensional correlations. This paper explores whether GANs can produce synthetic data that reliably answers causal questions by performing causal analyses on GAN-generated data under varying assumptions. The study includes cross-sectional, time series, and complete structural model scenarios. Findings show that while basic GANs replicate causal relationships in simple cross-sectional data, they struggle with more complex structural models. In contrast, CausalGAN effectively replicates the original causal model, and TimeGAN modifies the causal representation in time series data.</div></div>","PeriodicalId":50226,"journal":{"name":"Journal of Computational and Applied Mathematics","volume":"457 ","pages":"Article 116312"},"PeriodicalIF":2.1000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the causality-preservation capabilities of generative modelling\",\"authors\":\"Yves-Cédric Bauwelinckx ,&nbsp;Jan Dhaene ,&nbsp;Milan van den Heuvel ,&nbsp;Tim Verdonck\",\"doi\":\"10.1016/j.cam.2024.116312\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Modelling is essential in both the financial and insurance industries. The emergence of machine learning and deep learning models offers new tools for this, but they often require large datasets that are typically unavailable in business fields due to privacy and ethical concerns. This lack of data is currently one of the main hurdles in developing better models. Generative modelling, such as Generative Adversarial Networks (GANs), can address this issue by creating synthetic data that can be freely shared. While GANs are widely studied in fields like computer vision, their use in business is limited, primarily because business questions often focus on identifying causal effects, whereas GANs and neural networks typically emphasise high-dimensional correlations. This paper explores whether GANs can produce synthetic data that reliably answers causal questions by performing causal analyses on GAN-generated data under varying assumptions. The study includes cross-sectional, time series, and complete structural model scenarios. Findings show that while basic GANs replicate causal relationships in simple cross-sectional data, they struggle with more complex structural models. In contrast, CausalGAN effectively replicates the original causal model, and TimeGAN modifies the causal representation in time series data.</div></div>\",\"PeriodicalId\":50226,\"journal\":{\"name\":\"Journal of Computational and Applied Mathematics\",\"volume\":\"457 \",\"pages\":\"Article 116312\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational and Applied Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0377042724005600\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational and Applied Mathematics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377042724005600","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0

摘要

建模对于金融和保险行业都至关重要。机器学习和深度学习模型的出现为此提供了新的工具,但它们通常需要大量数据集,而由于隐私和道德方面的原因,商业领域通常无法获得这些数据集。缺乏数据是目前开发更好模型的主要障碍之一。生成模型,如生成对抗网络(GANs),可以通过创建可自由共享的合成数据来解决这一问题。虽然 GANs 在计算机视觉等领域得到了广泛研究,但其在商业领域的应用却很有限,这主要是因为商业问题通常侧重于识别因果效应,而 GANs 和神经网络通常强调高维相关性。本文通过在不同的假设条件下对 GAN 生成的数据进行因果分析,探讨 GAN 能否生成能可靠回答因果问题的合成数据。研究包括横截面、时间序列和完整的结构模型方案。研究结果表明,虽然基本的 GAN 在简单的横截面数据中复制了因果关系,但在处理更复杂的结构模型时却显得力不从心。相比之下,CausalGAN 能有效复制原始因果模型,而 TimeGAN 则能修改时间序列数据中的因果表示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the causality-preservation capabilities of generative modelling
Modelling is essential in both the financial and insurance industries. The emergence of machine learning and deep learning models offers new tools for this, but they often require large datasets that are typically unavailable in business fields due to privacy and ethical concerns. This lack of data is currently one of the main hurdles in developing better models. Generative modelling, such as Generative Adversarial Networks (GANs), can address this issue by creating synthetic data that can be freely shared. While GANs are widely studied in fields like computer vision, their use in business is limited, primarily because business questions often focus on identifying causal effects, whereas GANs and neural networks typically emphasise high-dimensional correlations. This paper explores whether GANs can produce synthetic data that reliably answers causal questions by performing causal analyses on GAN-generated data under varying assumptions. The study includes cross-sectional, time series, and complete structural model scenarios. Findings show that while basic GANs replicate causal relationships in simple cross-sectional data, they struggle with more complex structural models. In contrast, CausalGAN effectively replicates the original causal model, and TimeGAN modifies the causal representation in time series data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.40
自引率
4.20%
发文量
437
审稿时长
3.0 months
期刊介绍: The Journal of Computational and Applied Mathematics publishes original papers of high scientific value in all areas of computational and applied mathematics. The main interest of the Journal is in papers that describe and analyze new computational techniques for solving scientific or engineering problems. Also the improved analysis, including the effectiveness and applicability, of existing methods and algorithms is of importance. The computational efficiency (e.g. the convergence, stability, accuracy, ...) should be proved and illustrated by nontrivial numerical examples. Papers describing only variants of existing methods, without adding significant new computational properties are not of interest. The audience consists of: applied mathematicians, numerical analysts, computational scientists and engineers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信