A multivariate frequency-severity framework for healthcare data breaches

Hong Sun, Maochao Xu, P. Zhao
{"title":"A multivariate frequency-severity framework for healthcare data breaches","authors":"Hong Sun, Maochao Xu, P. Zhao","doi":"10.1214/22-aoas1625","DOIUrl":null,"url":null,"abstract":"Data breaches in healthcare have become a substantial concern in recent years, and cause millions of dollars in financial losses each year. It is fundamental for government regulators, insurance companies, and stakeholders to understand the breach frequency and the number of affected individuals in each state, as these are directly related to the federal Health Insurance Portability and Accountability Act (HIPAA) and state data breach laws. However, an obstacle to studying data breaches in healthcare is the lack of suitable statistical approaches. We develop a novel multivariate frequency-severity framework to analyze breach frequency and the number of affected individuals at the state level. A mixed effects model is developed to model the square root transformed frequency, and the log-gamma distribution is proposed to capture the skewness and heavy tail exhibited by the distribution of numbers of affected individuals. We further discover a positive nonlinear dependence between the transformed frequency and the log-transformed numbers of affected individuals (i.e., severity). In particular, we propose to use a D-vine copula to capture the multivariate dependence among conditional severities given frequencies due to its inherent temporal structure and rich bivariate copula families. The rejection sampling technique is developed to simulate the predictive distributions. Both the in-sample and out-of-sample studies show that the proposed multivariate frequency-severity model that accommodates non-linear dependence has satisfactory fitting and prediction performances.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Applied Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/22-aoas1625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data breaches in healthcare have become a substantial concern in recent years, and cause millions of dollars in financial losses each year. It is fundamental for government regulators, insurance companies, and stakeholders to understand the breach frequency and the number of affected individuals in each state, as these are directly related to the federal Health Insurance Portability and Accountability Act (HIPAA) and state data breach laws. However, an obstacle to studying data breaches in healthcare is the lack of suitable statistical approaches. We develop a novel multivariate frequency-severity framework to analyze breach frequency and the number of affected individuals at the state level. A mixed effects model is developed to model the square root transformed frequency, and the log-gamma distribution is proposed to capture the skewness and heavy tail exhibited by the distribution of numbers of affected individuals. We further discover a positive nonlinear dependence between the transformed frequency and the log-transformed numbers of affected individuals (i.e., severity). In particular, we propose to use a D-vine copula to capture the multivariate dependence among conditional severities given frequencies due to its inherent temporal structure and rich bivariate copula families. The rejection sampling technique is developed to simulate the predictive distributions. Both the in-sample and out-of-sample studies show that the proposed multivariate frequency-severity model that accommodates non-linear dependence has satisfactory fitting and prediction performances.
针对医疗保健数据泄露的多变量频率-严重性框架
近年来,医疗保健领域的数据泄露已成为一个重大问题,每年造成数百万美元的经济损失。对于政府监管机构、保险公司和利益相关者来说,了解每个州的违规频率和受影响个人的数量是至关重要的,因为这些与联邦健康保险可携带性和责任法案(HIPAA)和州数据泄露法直接相关。然而,研究医疗保健数据泄露的一个障碍是缺乏合适的统计方法。我们开发了一种新的多变量频率-严重性框架来分析违规频率和州一级受影响个人的数量。建立了一个混合效应模型来模拟平方根变换频率,并提出了对数-伽马分布来捕捉受影响个体数量分布所表现出的偏态和重尾。我们进一步发现变换频率与受影响个体的对数变换数(即严重程度)之间存在正非线性依赖关系。特别是,由于其固有的时间结构和丰富的二元联结族,我们建议使用D-vine联结符来捕获给定频率的条件严重程度之间的多元依赖关系。为了模拟预测分布,提出了拒绝抽样技术。样本内和样本外的研究表明,所提出的多变量频率-严重程度模型具有良好的拟合和预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信