关于分层分类风险因素的聚类水平

IF 1 Q3 BUSINESS, FINANCE

Annals of Actuarial Science Pub Date : 2024-02-01 DOI:10.1017/s1748499523000283

Bavo D.C. Campo, Katrien Antonio

{"title":"关于分层分类风险因素的聚类水平","authors":"Bavo D.C. Campo, Katrien Antonio","doi":"10.1017/s1748499523000283","DOIUrl":null,"url":null,"abstract":"<p>Handling nominal covariates with a large number of categories is challenging for both statistical and machine learning techniques. This problem is further exacerbated when the nominal variable has a hierarchical structure. We commonly rely on methods such as the random effects approach to incorporate these covariates in a predictive model. Nonetheless, in certain situations, even the random effects approach may encounter estimation problems. We propose the data-driven Partitioning Hierarchical Risk-factors Adaptive Top-down algorithm to reduce the hierarchically structured risk factor to its essence, by grouping similar categories at each level of the hierarchy. We work top-down and engineer several features to characterize the profile of the categories at a specific level in the hierarchy. In our workers’ compensation case study, we characterize the risk profile of an industry via its observed damage rates and claim frequencies. In addition, we use embeddings to encode the textual description of the economic activity of the insured company. These features are then used as input in a clustering algorithm to group similar categories. Our method substantially reduces the number of categories and results in a grouping that is generalizable to out-of-sample data. Moreover, we obtain a better differentiation between high-risk and low-risk companies.</p>","PeriodicalId":44135,"journal":{"name":"Annals of Actuarial Science","volume":"120 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On clustering levels of a hierarchical categorical risk factor\",\"authors\":\"Bavo D.C. Campo, Katrien Antonio\",\"doi\":\"10.1017/s1748499523000283\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Handling nominal covariates with a large number of categories is challenging for both statistical and machine learning techniques. This problem is further exacerbated when the nominal variable has a hierarchical structure. We commonly rely on methods such as the random effects approach to incorporate these covariates in a predictive model. Nonetheless, in certain situations, even the random effects approach may encounter estimation problems. We propose the data-driven Partitioning Hierarchical Risk-factors Adaptive Top-down algorithm to reduce the hierarchically structured risk factor to its essence, by grouping similar categories at each level of the hierarchy. We work top-down and engineer several features to characterize the profile of the categories at a specific level in the hierarchy. In our workers’ compensation case study, we characterize the risk profile of an industry via its observed damage rates and claim frequencies. In addition, we use embeddings to encode the textual description of the economic activity of the insured company. These features are then used as input in a clustering algorithm to group similar categories. Our method substantially reduces the number of categories and results in a grouping that is generalizable to out-of-sample data. Moreover, we obtain a better differentiation between high-risk and low-risk companies.</p>\",\"PeriodicalId\":44135,\"journal\":{\"name\":\"Annals of Actuarial Science\",\"volume\":\"120 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Actuarial Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/s1748499523000283\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BUSINESS, FINANCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Actuarial Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/s1748499523000283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}

引用次数: 0

摘要

对于统计和机器学习技术来说，处理具有大量类别的名义协变量都是一项挑战。当名义变量具有层次结构时，这一问题就会进一步恶化。我们通常依靠随机效应等方法将这些协变量纳入预测模型。然而，在某些情况下，即使是随机效应方法也会遇到估计问题。我们提出了数据驱动的分层风险因子自适应自上而下算法，通过将层级结构中每个层级的相似类别分组，将层级结构的风险因子还原到其本质。我们采用自上而下的方法，并设计了若干特征来描述层次结构中特定层级的类别特征。在我们的工人赔偿案例研究中，我们通过观测到的损失率和索赔频率来描述一个行业的风险概况。此外，我们还使用嵌入法对投保公司经济活动的文本描述进行编码。然后将这些特征作为聚类算法的输入，对类似类别进行分组。我们的方法大大减少了类别的数量，并可对样本外数据进行分组。此外，我们还能更好地区分高风险和低风险公司。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On clustering levels of a hierarchical categorical risk factor

Handling nominal covariates with a large number of categories is challenging for both statistical and machine learning techniques. This problem is further exacerbated when the nominal variable has a hierarchical structure. We commonly rely on methods such as the random effects approach to incorporate these covariates in a predictive model. Nonetheless, in certain situations, even the random effects approach may encounter estimation problems. We propose the data-driven Partitioning Hierarchical Risk-factors Adaptive Top-down algorithm to reduce the hierarchically structured risk factor to its essence, by grouping similar categories at each level of the hierarchy. We work top-down and engineer several features to characterize the profile of the categories at a specific level in the hierarchy. In our workers’ compensation case study, we characterize the risk profile of an industry via its observed damage rates and claim frequencies. In addition, we use embeddings to encode the textual description of the economic activity of the insured company. These features are then used as input in a clustering algorithm to group similar categories. Our method substantially reduces the number of categories and results in a grouping that is generalizable to out-of-sample data. Moreover, we obtain a better differentiation between high-risk and low-risk companies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of Actuarial Science ECONOMICS-

CiteScore

3.10

自引率

5.90%

发文量