Improvement of decision tree classifier accuracy for healthcare insurance fraud prediction by using Extreme Gradient Boosting algorithm

Nur Arifin Akbar, A. Sunyoto, M. Rudyanto Arief, W. Caesarendra
{"title":"Improvement of decision tree classifier accuracy for healthcare insurance fraud prediction by using Extreme Gradient Boosting algorithm","authors":"Nur Arifin Akbar, A. Sunyoto, M. Rudyanto Arief, W. Caesarendra","doi":"10.1109/ICIMCIS51567.2020.9354286","DOIUrl":null,"url":null,"abstract":"Fraud in the healthcare sector is prevalent and very cumbersome. Fraud generally involves intentional disappointment, and frustration or misrepresentation usually leads to an unfair benefit. Such exciting demand for insurance services has led to manipulative and inappropriate behaviour. Based on the report published by the United States Government Accountability Office, healthcare insurance fraud contributes to a 10% unexpected rise of annual health expenditure, which amounts to US$ 100 billion per year. In order to identify and avoid fraud, the scientific state of the art is applied. This paper seeks to analyze statistical modelling approaches for the assessment of fake health benefits using state-of-the-art techniques. Once the data is collected and the study of exploratory data is completed, it can use random forest regression and the classification of trees algorithm with extreme gradient boost (XGB) to determine the most efficient models. Compared to the Random Forest Method that reaches 81% accuracy with for class 1 recall, XGB Tree method of random sub-sampling was successfully achieved by 86% overall accuracy and 87% with illegitimate providers. Refer to the result, XGB method produce more accuracy for clean data that has been tuned with several adjustment.","PeriodicalId":441670,"journal":{"name":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIMCIS51567.2020.9354286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Fraud in the healthcare sector is prevalent and very cumbersome. Fraud generally involves intentional disappointment, and frustration or misrepresentation usually leads to an unfair benefit. Such exciting demand for insurance services has led to manipulative and inappropriate behaviour. Based on the report published by the United States Government Accountability Office, healthcare insurance fraud contributes to a 10% unexpected rise of annual health expenditure, which amounts to US$ 100 billion per year. In order to identify and avoid fraud, the scientific state of the art is applied. This paper seeks to analyze statistical modelling approaches for the assessment of fake health benefits using state-of-the-art techniques. Once the data is collected and the study of exploratory data is completed, it can use random forest regression and the classification of trees algorithm with extreme gradient boost (XGB) to determine the most efficient models. Compared to the Random Forest Method that reaches 81% accuracy with for class 1 recall, XGB Tree method of random sub-sampling was successfully achieved by 86% overall accuracy and 87% with illegitimate providers. Refer to the result, XGB method produce more accuracy for clean data that has been tuned with several adjustment.
利用极值梯度增强算法提高医疗保险欺诈预测决策树分类器的准确率
医疗保健行业的欺诈行为很普遍,而且非常麻烦。欺诈通常涉及故意失望,挫折或虚假陈述通常导致不公平的利益。对保险服务的这种令人兴奋的需求导致了操纵和不适当的行为。根据美国政府问责局发布的报告,医疗保险欺诈导致年度卫生支出意外增加10%,达到每年1000亿美元。为了识别和避免欺诈,采用了最先进的科学技术。本文试图利用最先进的技术分析评估虚假健康效益的统计建模方法。一旦数据收集完成,探索性数据的研究完成,就可以使用随机森林回归和极端梯度增强(XGB)的树木分类算法来确定最有效的模型。与随机森林方法相比,随机子抽样的XGB树方法在1类查全率下达到81%的准确率,在非法提供者的情况下达到86%的总准确率和87%的准确率。参考结果,XGB方法对经过多次调整的干净数据产生更高的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信