{"title":"A benchmark of health insurance fraud detection using machine learning techniques","authors":"Ossama Cherkaoui, H. Anoun, A. Maizate","doi":"10.11591/ijai.v13.i2.pp1925-1934","DOIUrl":null,"url":null,"abstract":"Health insurance fraud is a complex problem that also has a significant financial impact. Recently, with the availability of large volumes of data and the evolution of computing power, machine learning techniques have become the preferred method for fraud detection. However, the main difficulty facing researchers in this field is the lack of real data sets and the absence of reliable fraud labels. Most published studies use aggregated provider-level or simulated data to test fraud detection algorithms, which may not deliver accurate results. The present study aims to provide a more accurate assessment of fraud detection methods by using real detailed health insurance claims data to compare six of the most common supervised classification algorithms including neural networks and the use of two categorical feature preparation methods. The study was conducted under the guidance of insurance experts, who provided the fraud label inference rules and reviewed the results. A comprehensive description of the benchmarking process and an interpretation of the results are provided in this paper. The results show that supervised classification can be used effectively to detect health insurance fraud, improving detection accuracy by a factor of 4.2 (84% recall for a positive rate of 20%). ","PeriodicalId":507934,"journal":{"name":"IAES International Journal of Artificial Intelligence (IJ-AI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence (IJ-AI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v13.i2.pp1925-1934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Health insurance fraud is a complex problem that also has a significant financial impact. Recently, with the availability of large volumes of data and the evolution of computing power, machine learning techniques have become the preferred method for fraud detection. However, the main difficulty facing researchers in this field is the lack of real data sets and the absence of reliable fraud labels. Most published studies use aggregated provider-level or simulated data to test fraud detection algorithms, which may not deliver accurate results. The present study aims to provide a more accurate assessment of fraud detection methods by using real detailed health insurance claims data to compare six of the most common supervised classification algorithms including neural networks and the use of two categorical feature preparation methods. The study was conducted under the guidance of insurance experts, who provided the fraud label inference rules and reviewed the results. A comprehensive description of the benchmarking process and an interpretation of the results are provided in this paper. The results show that supervised classification can be used effectively to detect health insurance fraud, improving detection accuracy by a factor of 4.2 (84% recall for a positive rate of 20%).