{"title":"用于合成保险数据的变异自动编码器","authors":"Charlotte Jamotton, Donatien Hainaut","doi":"10.1016/j.iswa.2024.200455","DOIUrl":null,"url":null,"abstract":"<div><div>This article explores the application of Variational AutoEncoders (VAEs) to insurance data. Previous research has demonstrated the successful implementation of generative models, especially VAEs, across various domains, such as image recognition, text classification, and recommender systems. However, their application to insurance data, particularly to heterogeneous insurance portfolios with mixed continuous and discrete attributes, remains unexplored. This study introduces novel insights into utilising VAEs for unsupervised learning tasks in actuarial science, including dimension reduction and synthetic data generation. We propose a VAE model with a quantile transformation for continuous (latent) variables, a reconstruction loss that combines categorical cross-entropy and mean squared error, and a KL divergence-based regularisation term. Our VAE model’s architecture circumvents the need to pre-train and fine-tune a neural network to encode categorical variables into <span><math><mi>n</mi></math></span>-dimensional representative vectors within a continuous vector space of dimension <span><math><msup><mrow><mi>R</mi></mrow><mrow><mi>n</mi></mrow></msup></math></span>. We assess our VAE’s ability to reconstruct complex insurance data and generate synthetic insurance policies using a motor portfolio. Our experimental results and analysis highlight the potential of VAEs in addressing challenges related to data availability in the insurance industry.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"24 ","pages":"Article 200455"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Variational AutoEncoder for synthetic insurance data\",\"authors\":\"Charlotte Jamotton, Donatien Hainaut\",\"doi\":\"10.1016/j.iswa.2024.200455\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This article explores the application of Variational AutoEncoders (VAEs) to insurance data. Previous research has demonstrated the successful implementation of generative models, especially VAEs, across various domains, such as image recognition, text classification, and recommender systems. However, their application to insurance data, particularly to heterogeneous insurance portfolios with mixed continuous and discrete attributes, remains unexplored. This study introduces novel insights into utilising VAEs for unsupervised learning tasks in actuarial science, including dimension reduction and synthetic data generation. We propose a VAE model with a quantile transformation for continuous (latent) variables, a reconstruction loss that combines categorical cross-entropy and mean squared error, and a KL divergence-based regularisation term. Our VAE model’s architecture circumvents the need to pre-train and fine-tune a neural network to encode categorical variables into <span><math><mi>n</mi></math></span>-dimensional representative vectors within a continuous vector space of dimension <span><math><msup><mrow><mi>R</mi></mrow><mrow><mi>n</mi></mrow></msup></math></span>. We assess our VAE’s ability to reconstruct complex insurance data and generate synthetic insurance policies using a motor portfolio. Our experimental results and analysis highlight the potential of VAEs in addressing challenges related to data availability in the insurance industry.</div></div>\",\"PeriodicalId\":100684,\"journal\":{\"name\":\"Intelligent Systems with Applications\",\"volume\":\"24 \",\"pages\":\"Article 200455\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Systems with Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667305324001297\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305324001297","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Variational AutoEncoder for synthetic insurance data
This article explores the application of Variational AutoEncoders (VAEs) to insurance data. Previous research has demonstrated the successful implementation of generative models, especially VAEs, across various domains, such as image recognition, text classification, and recommender systems. However, their application to insurance data, particularly to heterogeneous insurance portfolios with mixed continuous and discrete attributes, remains unexplored. This study introduces novel insights into utilising VAEs for unsupervised learning tasks in actuarial science, including dimension reduction and synthetic data generation. We propose a VAE model with a quantile transformation for continuous (latent) variables, a reconstruction loss that combines categorical cross-entropy and mean squared error, and a KL divergence-based regularisation term. Our VAE model’s architecture circumvents the need to pre-train and fine-tune a neural network to encode categorical variables into -dimensional representative vectors within a continuous vector space of dimension . We assess our VAE’s ability to reconstruct complex insurance data and generate synthetic insurance policies using a motor portfolio. Our experimental results and analysis highlight the potential of VAEs in addressing challenges related to data availability in the insurance industry.