Improvement of decision tree classifier accuracy for healthcare insurance fraud prediction by using Extreme Gradient Boosting algorithm

2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS) Pub Date : 2020-11-19 DOI:10.1109/ICIMCIS51567.2020.9354286

Nur Arifin Akbar, A. Sunyoto, M. Rudyanto Arief, W. Caesarendra

{"title":"Improvement of decision tree classifier accuracy for healthcare insurance fraud prediction by using Extreme Gradient Boosting algorithm","authors":"Nur Arifin Akbar, A. Sunyoto, M. Rudyanto Arief, W. Caesarendra","doi":"10.1109/ICIMCIS51567.2020.9354286","DOIUrl":null,"url":null,"abstract":"Fraud in the healthcare sector is prevalent and very cumbersome. Fraud generally involves intentional disappointment, and frustration or misrepresentation usually leads to an unfair benefit. Such exciting demand for insurance services has led to manipulative and inappropriate behaviour. Based on the report published by the United States Government Accountability Office, healthcare insurance fraud contributes to a 10% unexpected rise of annual health expenditure, which amounts to US$ 100 billion per year. In order to identify and avoid fraud, the scientific state of the art is applied. This paper seeks to analyze statistical modelling approaches for the assessment of fake health benefits using state-of-the-art techniques. Once the data is collected and the study of exploratory data is completed, it can use random forest regression and the classification of trees algorithm with extreme gradient boost (XGB) to determine the most efficient models. Compared to the Random Forest Method that reaches 81% accuracy with for class 1 recall, XGB Tree method of random sub-sampling was successfully achieved by 86% overall accuracy and 87% with illegitimate providers. Refer to the result, XGB method produce more accuracy for clean data that has been tuned with several adjustment.","PeriodicalId":441670,"journal":{"name":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIMCIS51567.2020.9354286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Fraud in the healthcare sector is prevalent and very cumbersome. Fraud generally involves intentional disappointment, and frustration or misrepresentation usually leads to an unfair benefit. Such exciting demand for insurance services has led to manipulative and inappropriate behaviour. Based on the report published by the United States Government Accountability Office, healthcare insurance fraud contributes to a 10% unexpected rise of annual health expenditure, which amounts to US$ 100 billion per year. In order to identify and avoid fraud, the scientific state of the art is applied. This paper seeks to analyze statistical modelling approaches for the assessment of fake health benefits using state-of-the-art techniques. Once the data is collected and the study of exploratory data is completed, it can use random forest regression and the classification of trees algorithm with extreme gradient boost (XGB) to determine the most efficient models. Compared to the Random Forest Method that reaches 81% accuracy with for class 1 recall, XGB Tree method of random sub-sampling was successfully achieved by 86% overall accuracy and 87% with illegitimate providers. Refer to the result, XGB method produce more accuracy for clean data that has been tuned with several adjustment.

查看原文本刊更多论文

利用极值梯度增强算法提高医疗保险欺诈预测决策树分类器的准确率

医疗保健行业的欺诈行为很普遍，而且非常麻烦。欺诈通常涉及故意失望，挫折或虚假陈述通常导致不公平的利益。对保险服务的这种令人兴奋的需求导致了操纵和不适当的行为。根据美国政府问责局发布的报告，医疗保险欺诈导致年度卫生支出意外增加10%，达到每年1000亿美元。为了识别和避免欺诈，采用了最先进的科学技术。本文试图利用最先进的技术分析评估虚假健康效益的统计建模方法。一旦数据收集完成，探索性数据的研究完成，就可以使用随机森林回归和极端梯度增强(XGB)的树木分类算法来确定最有效的模型。与随机森林方法相比，随机子抽样的XGB树方法在1类查全率下达到81%的准确率，在非法提供者的情况下达到86%的总准确率和87%的准确率。参考结果，XGB方法对经过多次调整的干净数据产生更高的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)

自引率

0.00%

发文量