{"title":"Semi-Supervised Medical Insurance Fraud Detection by Predicting Indirect Reductions Rate using Machine Learning Generalization Capability","authors":"Parvin Esmaeili Ataabadi, Behzad Soleimani Neysiani, Mohammad Zahiri Nogorani, Nazanin Mehraby","doi":"10.1109/ICWR54782.2022.9786251","DOIUrl":null,"url":null,"abstract":"There is 10% fraud in medical insurance based on published statistics in Insurance Research Institute of Islamic Republic of Iran in 1399 –solar system eq. 2020 in the Gregorian calendar-which cost about 28 thousand billion RIALs –the official currency of Iran eq. to about 320 million dollars-. This study proposes a machine learning-based technique to predict the claim cost based on other patients’ history and predict fraud or abnormal costs in claims that significantly differ from other claims. Besides, a new data sampling approach is proposed to lead the machine learning algorithms that focus on exceptional cases. A real-world private dataset is used to evaluate 700,000 claims of the RASA web portal, used for supplementary insurance by famous companies like Day. The proposed data sampling approach reduced absolute error in exceptional cases from 35 to 23 errors for deduction rate. The evaluation results show about 0.5% of abnormal cases in the dataset with a higher than 20% absolute error. The abnormal rates can be adjusted to a lower or higher range.","PeriodicalId":355187,"journal":{"name":"2022 8th International Conference on Web Research (ICWR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR54782.2022.9786251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
There is 10% fraud in medical insurance based on published statistics in Insurance Research Institute of Islamic Republic of Iran in 1399 –solar system eq. 2020 in the Gregorian calendar-which cost about 28 thousand billion RIALs –the official currency of Iran eq. to about 320 million dollars-. This study proposes a machine learning-based technique to predict the claim cost based on other patients’ history and predict fraud or abnormal costs in claims that significantly differ from other claims. Besides, a new data sampling approach is proposed to lead the machine learning algorithms that focus on exceptional cases. A real-world private dataset is used to evaluate 700,000 claims of the RASA web portal, used for supplementary insurance by famous companies like Day. The proposed data sampling approach reduced absolute error in exceptional cases from 35 to 23 errors for deduction rate. The evaluation results show about 0.5% of abnormal cases in the dataset with a higher than 20% absolute error. The abnormal rates can be adjusted to a lower or higher range.
根据伊朗伊斯兰共和国保险研究所(insurance Research Institute of Islamic Republic of Iran)在1399年(公历太阳系相当于2020年)公布的统计数据,医疗保险中存在10%的欺诈行为,其成本约为28万亿里亚尔(伊朗官方货币相当于约3.2亿美元)。本研究提出了一种基于机器学习的技术,可以根据其他患者的病史预测索赔成本,并预测与其他索赔显著不同的索赔中的欺诈或异常成本。此外,提出了一种新的数据采样方法,以引导机器学习算法关注异常情况。一个真实的私人数据集被用来评估RASA门户网站的70万份索赔,这些索赔被像Day这样的著名公司用于补充保险。提出的数据抽样方法将例外情况下的绝对误差从35个误差降低到23个误差。评估结果显示,数据集中约有0.5%的异常情况,绝对误差大于20%。该异常速率可以调节到更低或更高的范围。