On clustering levels of a hierarchical categorical risk factor

IF 1.5 Q3 BUSINESS, FINANCE
Bavo D.C. Campo, Katrien Antonio
{"title":"On clustering levels of a hierarchical categorical risk factor","authors":"Bavo D.C. Campo, Katrien Antonio","doi":"10.1017/s1748499523000283","DOIUrl":null,"url":null,"abstract":"<p>Handling nominal covariates with a large number of categories is challenging for both statistical and machine learning techniques. This problem is further exacerbated when the nominal variable has a hierarchical structure. We commonly rely on methods such as the random effects approach to incorporate these covariates in a predictive model. Nonetheless, in certain situations, even the random effects approach may encounter estimation problems. We propose the data-driven Partitioning Hierarchical Risk-factors Adaptive Top-down algorithm to reduce the hierarchically structured risk factor to its essence, by grouping similar categories at each level of the hierarchy. We work top-down and engineer several features to characterize the profile of the categories at a specific level in the hierarchy. In our workers’ compensation case study, we characterize the risk profile of an industry via its observed damage rates and claim frequencies. In addition, we use embeddings to encode the textual description of the economic activity of the insured company. These features are then used as input in a clustering algorithm to group similar categories. Our method substantially reduces the number of categories and results in a grouping that is generalizable to out-of-sample data. Moreover, we obtain a better differentiation between high-risk and low-risk companies.</p>","PeriodicalId":44135,"journal":{"name":"Annals of Actuarial Science","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Actuarial Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/s1748499523000283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}
引用次数: 0

Abstract

Handling nominal covariates with a large number of categories is challenging for both statistical and machine learning techniques. This problem is further exacerbated when the nominal variable has a hierarchical structure. We commonly rely on methods such as the random effects approach to incorporate these covariates in a predictive model. Nonetheless, in certain situations, even the random effects approach may encounter estimation problems. We propose the data-driven Partitioning Hierarchical Risk-factors Adaptive Top-down algorithm to reduce the hierarchically structured risk factor to its essence, by grouping similar categories at each level of the hierarchy. We work top-down and engineer several features to characterize the profile of the categories at a specific level in the hierarchy. In our workers’ compensation case study, we characterize the risk profile of an industry via its observed damage rates and claim frequencies. In addition, we use embeddings to encode the textual description of the economic activity of the insured company. These features are then used as input in a clustering algorithm to group similar categories. Our method substantially reduces the number of categories and results in a grouping that is generalizable to out-of-sample data. Moreover, we obtain a better differentiation between high-risk and low-risk companies.

关于分层分类风险因素的聚类水平
对于统计和机器学习技术来说,处理具有大量类别的名义协变量都是一项挑战。当名义变量具有层次结构时,这一问题就会进一步恶化。我们通常依靠随机效应等方法将这些协变量纳入预测模型。然而,在某些情况下,即使是随机效应方法也会遇到估计问题。我们提出了数据驱动的分层风险因子自适应自上而下算法,通过将层级结构中每个层级的相似类别分组,将层级结构的风险因子还原到其本质。我们采用自上而下的方法,并设计了若干特征来描述层次结构中特定层级的类别特征。在我们的工人赔偿案例研究中,我们通过观测到的损失率和索赔频率来描述一个行业的风险概况。此外,我们还使用嵌入法对投保公司经济活动的文本描述进行编码。然后将这些特征作为聚类算法的输入,对类似类别进行分组。我们的方法大大减少了类别的数量,并可对样本外数据进行分组。此外,我们还能更好地区分高风险和低风险公司。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.10
自引率
5.90%
发文量
22
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信