Joffrey L. Leevy, John T. Hancock, T. Khoshgoftaar
{"title":"Assessing One-Class and Binary Classification Approaches for Identifying Medicare Fraud","authors":"Joffrey L. Leevy, John T. Hancock, T. Khoshgoftaar","doi":"10.1109/IRI58017.2023.00053","DOIUrl":null,"url":null,"abstract":"Machine learning research on Medicare fraud detection is of national importance, primarily due to the extensive financial losses caused by this deceptive practice. Our big data study focuses on the Medicare Part D dataset, which we utilize to detect healthcare fraud perpetrated by physicians. In this paper, we compare and contrast One-Class Classification (OCC) and binary classification by examining eight different classifiers. The metrics applied in this analysis are Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC). Our findings indicate that binary classification outperforms OCC in Medicare fraud detection. Furthermore, we establish that the Decision Tree-based classifiers employed in the research are the most effective, with CatBoost delivering the best performance.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI58017.2023.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning research on Medicare fraud detection is of national importance, primarily due to the extensive financial losses caused by this deceptive practice. Our big data study focuses on the Medicare Part D dataset, which we utilize to detect healthcare fraud perpetrated by physicians. In this paper, we compare and contrast One-Class Classification (OCC) and binary classification by examining eight different classifiers. The metrics applied in this analysis are Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC). Our findings indicate that binary classification outperforms OCC in Medicare fraud detection. Furthermore, we establish that the Decision Tree-based classifiers employed in the research are the most effective, with CatBoost delivering the best performance.