Nur Arifin Akbar, A. Sunyoto, M. Rudyanto Arief, W. Caesarendra
{"title":"Improvement of decision tree classifier accuracy for healthcare insurance fraud prediction by using Extreme Gradient Boosting algorithm","authors":"Nur Arifin Akbar, A. Sunyoto, M. Rudyanto Arief, W. Caesarendra","doi":"10.1109/ICIMCIS51567.2020.9354286","DOIUrl":null,"url":null,"abstract":"Fraud in the healthcare sector is prevalent and very cumbersome. Fraud generally involves intentional disappointment, and frustration or misrepresentation usually leads to an unfair benefit. Such exciting demand for insurance services has led to manipulative and inappropriate behaviour. Based on the report published by the United States Government Accountability Office, healthcare insurance fraud contributes to a 10% unexpected rise of annual health expenditure, which amounts to US$ 100 billion per year. In order to identify and avoid fraud, the scientific state of the art is applied. This paper seeks to analyze statistical modelling approaches for the assessment of fake health benefits using state-of-the-art techniques. Once the data is collected and the study of exploratory data is completed, it can use random forest regression and the classification of trees algorithm with extreme gradient boost (XGB) to determine the most efficient models. Compared to the Random Forest Method that reaches 81% accuracy with for class 1 recall, XGB Tree method of random sub-sampling was successfully achieved by 86% overall accuracy and 87% with illegitimate providers. Refer to the result, XGB method produce more accuracy for clean data that has been tuned with several adjustment.","PeriodicalId":441670,"journal":{"name":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIMCIS51567.2020.9354286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Fraud in the healthcare sector is prevalent and very cumbersome. Fraud generally involves intentional disappointment, and frustration or misrepresentation usually leads to an unfair benefit. Such exciting demand for insurance services has led to manipulative and inappropriate behaviour. Based on the report published by the United States Government Accountability Office, healthcare insurance fraud contributes to a 10% unexpected rise of annual health expenditure, which amounts to US$ 100 billion per year. In order to identify and avoid fraud, the scientific state of the art is applied. This paper seeks to analyze statistical modelling approaches for the assessment of fake health benefits using state-of-the-art techniques. Once the data is collected and the study of exploratory data is completed, it can use random forest regression and the classification of trees algorithm with extreme gradient boost (XGB) to determine the most efficient models. Compared to the Random Forest Method that reaches 81% accuracy with for class 1 recall, XGB Tree method of random sub-sampling was successfully achieved by 86% overall accuracy and 87% with illegitimate providers. Refer to the result, XGB method produce more accuracy for clean data that has been tuned with several adjustment.