{"title":"Data Science Techniques to Improve Accuracy of Provider Network Directory","authors":"Priya Kandasamy, Divya Raji, Arun Sundararaman","doi":"10.1109/HIPCW.2018.8634423","DOIUrl":null,"url":null,"abstract":"Trivial or tactical as it may appear, yet, Provider data inaccuracy continues to be a major challenge in healthcare industry. With about 250 key attributes per provider and roughly 500K providers in USA, this translates to maintaining current and correct values for a whopping 12.5 M attributes dataset that is very dynamic and volatile. Inaccuracy in this dataset implies 2 major adverse consequences; a) Regulatory penalties ranging from few thousand dollars to few million dollars and b) potential member attrition due to member dissatisfaction, triggered by increased waiting time, delay in accessing the medical service, efforts wasted on reaching out to incorrect provider etc. Many of the current solutions carry limitations such as lack of centralized storage, data latency issues and non-standardized questionnaire to capture provider update etc. This paper introduces an innovative approach that addresses these limitations using Predictive Analytics and Intake Scoring techniques. Rooted in Data Science, the proposed ensemble model combines the advantages of individual prediction models such as Logistic Regression, Random Forest, Neural Network and XgBoost. This automated approach also brings down the dependency on external systems and automatically updates the database, keeping it up to date. A detailed analysis of results from work carried out using this innovative approach are discussed at length and the paper concludes with directions for future work.)","PeriodicalId":401060,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIPCW.2018.8634423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Trivial or tactical as it may appear, yet, Provider data inaccuracy continues to be a major challenge in healthcare industry. With about 250 key attributes per provider and roughly 500K providers in USA, this translates to maintaining current and correct values for a whopping 12.5 M attributes dataset that is very dynamic and volatile. Inaccuracy in this dataset implies 2 major adverse consequences; a) Regulatory penalties ranging from few thousand dollars to few million dollars and b) potential member attrition due to member dissatisfaction, triggered by increased waiting time, delay in accessing the medical service, efforts wasted on reaching out to incorrect provider etc. Many of the current solutions carry limitations such as lack of centralized storage, data latency issues and non-standardized questionnaire to capture provider update etc. This paper introduces an innovative approach that addresses these limitations using Predictive Analytics and Intake Scoring techniques. Rooted in Data Science, the proposed ensemble model combines the advantages of individual prediction models such as Logistic Regression, Random Forest, Neural Network and XgBoost. This automated approach also brings down the dependency on external systems and automatically updates the database, keeping it up to date. A detailed analysis of results from work carried out using this innovative approach are discussed at length and the paper concludes with directions for future work.)